Parsing System (v0.9)
|
class Token
This is the main interface for processing parse trees. See the explanation of Tokens vs Symbols. module goldie.token None
Line
Block
module goldie.token Compact
Omit all whitespace, error and comment tokens.
CompactWithSpaces
Like Compact, but adds a space between each token.
default Smart
Like Compact, but adds a space between two
tokens whenever the last character of the first
token and the first character of the second token
are both either alphanumeric or an underscore.
Full
Includes all whitespace, error and comment tokens. Note: Doesn't currently work after the parse phase because whitespace, error and comment tokens are not currently preserved by the parser. module goldie.token Returns a forward range that performs a preorder traversal of the tree beginning at token start. Takes an optional predicate as a template parameter which can be used to selectively exclude any subtree. The predicate can be either a bool delegate(Token t) or a string, à la std.algorithm. Either way, it takes in a Token (for example, Token t) and returns a bool. If the predicate returns false, then the entire subtree starting with Token t will be skipped. If you wish to filter out individual tokens rather than (or in addition to) entire subtrees, you can simply pass the result of traverse to std.algorithm.filter. The range also has a function void skip() which tells the range to skip the entire subtree of the current front the next time popFront is called. Example - Visit all descendants of token: foreach(tok; traverse(token))
writeln("Found: ", tok.name);
Example - Skip all <Declaration> subtrees (these are all equivalent): alias Token_foo TokFoo; // For convenience
// Dynamic-style
foreach(tok; traverse!`a.name != "<Declaration>"`(token))
writeln("Found: ", tok.name);
// Dynamic-style
foreach(tok; traverse!( (Token t) { return t.name != "<Declaration>"; } )(token))
writeln("Found: ", tok.name);
// Static-style
foreach(tok; traverse!( (Token t) { return cast(TokFoo!"<Declaration>")t is null; } )(token))
writeln("Found: ", tok.name);
// Static-style, using isAnyType from semitwist.util.reflect
foreach(tok; traverse!( !isAnyType!(TokFoo!"<Declaration>") )(token))
writeln("Found: ", tok.name);
Example - Only traverse <Foo> and <Bar> tokens (these are both equivalent): alias Token_foo TokFoo; // For convenience
// Dynamic-style
foreach(tok; traverse!`a.name == "<Foo>" || a.name == "<Bar>"`(token))
writeln("Found: ", tok.name);
// Static-style, using isAnyType from semitwist.util.reflect
foreach(tok; traverse!( isAnyType!(TokFoo!"<Foo>", TokFoo!"<Bar>") )(token))
writeln("Found: ", tok.name);
Example - Using skip(): alias Token_foo TokFoo; // For convenience
// Skip all <Declaration> subtrees
auto r = traverse(token);
foreach(tok; r)
{
if(t.name == "<Declaration>") // Dynamic-style
r.skip();
writeln("Found: ", tok.name);
}
// Only traverse <Foo> and <Bar> tokens
auto r = traverse(token);
foreach(tok; r)
{
if( !isAnyType!(TokFoo!"<Foo>", TokFoo!"<Bar>")(t) ) // Static-style
r.skip();
writeln("Found: ", tok.name);
}
module goldie.token The dynamic-style token interface. If you are using static-style, then all the static-style token types are derived from this class. Note, however, the static-style versions of get and getRequired actually live directly in this class.
Constructor for nonterminals. Normally, only GoldieLib itself needs to
instantiate tokens, unless you want to create/modify a parse tree or
Token array manually.
this(Symbol symbol, Language lang, string content, string file="{unknown}", ptrdiff_t line=0, ptrdiff_t srcIndexStart=0, ptrdiff_t srcIndexEnd=0, CommentType commentMode=CommentType.None, string debugInfo="")
Constructor for terminals. Normally, only GoldieLib itself needs to
instantiate tokens, unless you want to create/modify a parse tree or
Token array manually.
@property size_t length
size_t opDollar() Token opIndex(size_t index) Token opIndexAssign(Token tok, size_t index) Token[] opSlice() Token[] opSlice(size_t a, size_t b) Token[] opSliceAssign(Token tok) Token[] opSliceAssign(Token tok, size_t a, size_t b) Token[] opSliceAssign(Token[] toks) Token[] opSliceAssign(Token[] toks, size_t a, size_t b) int opApply(int delegate(ref Token) dg) int opApply(int delegate(ref size_t, ref Token) dg) int opApplyReverse(int delegate(ref Token) dg) int opApplyReverse(int delegate(ref size_t, ref Token) dg) Access the child tokens (ie, subtokens) of this token (if this is a nonterminal). Note that opDollar only works in DMD 2.057 and up. Example: Token firstChild = myToken[0];
Token lastChild = myToken[$-1]; // The '$' requires at least DMD 2.057
Token[] firstThree = myToken[0..3];
// Assignments are possible, but NOT recommended if you are
// using static-style, because it can easily mess up the
// static-style sub, get and getRequired.
myToken[1] = anotherToken;
// Iterate
foreach(i, subToken; myToken) {}
Direct access to the child tokens (ie, subtokens) of this token (if this is a nonterminal). This will be removed from a later version of Goldie in favor of the operator overloads above. T get(T)(int index=0)
Static-style function to retrieve the first subtoken matching the given token type T. The type T must be a Token_ or !Token_. ! If there are multiple subtokens matching the desired type, the index parameter can be used to select which one is desired. If the desired subtoken isn't found, null is returned. Example: // The type of 'token' is: Token_myLang!("<A>", "<Foo>", "#", "<Bar>", "*", "<Foo>", ";")
// ie, 'token' is a nonterminal matching the rule:
// <A> ::= <Foo> '#' <Bar> '*' <Foo> ';'
assert( token.get!( Token_myLang!"<Foo>" )() is token.sub!0 );
assert( token.get!( Token_myLang!"<Bar>" )() is token.sub!2 );
assert( token.get!( Token_myLang!"<Foo>" )(1) is token.sub!4 );
assert( token.get!( Token_myLang!"<Foo>" )(2) is null ); // Doesn't exist!
Ts[$-1] get(Ts...)() if(Ts.length > 1)
Static-style pattern matching to find a descendant token. Returns null if the given path doesn't exist. This is like chaining calls to get!T(0) above, but safely checks for null at each step. The types passed in must be Token_ or !Token_. ! Example: alias Token_myLang Tok; // For convenience
auto x = token.get!( Tok!"<A>", Tok!"<B>", Tok!"<C>" )(); // Never dereferences null
auto y = token.get!(Tok!"<A>")().get!(Tok!"<B>")().get!(Tok!"<C>")(); // Could dereference null
assert( x is y );
Dynamic-style versions of get. The symbol names given must be valid symbol names in this Token's language or an Exception will be thrown. Example: // 'token' is a nonterminal matching the rule:
// <A> ::= <Foo> '#' <Bar> '*' <Foo> ';'
assert( token.matches("<A>", "<Foo>", "#", "<Bar>", "*", "<Foo>", ";") );
assert( token.get("<Foo>") is token[0] );
assert( token.get("<Bar>") is token[2] );
assert( token.get("<Foo>", 1) is token[4] );
assert( token.get("<Foo>", 2) is null ); // Doesn't exist!
Token x = token.get( ["<Bar>", "<B>", "<Q>"] ); // If '<Bar> ::= <B>' and '<B> ::= <Q>'
assert( token.get([]) is token ); // Empty array results in the original token
T getRequired(T)(int index=0)
Ts[$-1] getRequired(Ts...)() if(Ts.length > 1) Token getRequired(string symbolName, int index=0) Token getRequired(string[] symbolNames) Just like get, but throws an Exception if the desired subtoken isn't found. The templated versions of getRequired are static-style. The types passed in to these must be Token_ or !Token_. ! readonly @property int ruleId
If this Token is a nonterminal, then this is the ID of the
reduction rule
that was used to create the token. This ID is
an index into Language.ruleTable.
The SymbolType of this Token.
See the explanation of
Tokens, Symbols, and Symbol Types
for more information.
The Symbol of this Token.
See the explanation of
Tokens, Symbols, and Symbol Types
for more information.
readonly @property string typeName
The name of this Token's SymbolType.
readonly @property string name
readonly @property string fullName
This is just typeName ~ "." ~ name.
For instance, "NonTerminal.<Foo>"
REMOVED
readonly @property int id
This has been removed. Use symbol.id instead.
readonly @property ptrdiff_t line
readonly @property string file The file and line number of the original source where this Token starts. See Goldie's conventions relating to Line and Column Numbers. For the line number where this Token ends, or the column number where this Token starts or ends, use srcIndexStart or srcIndexStart together with Lexer.lineIndicies and Lexer.lineAtIndex. readonly @property ptrdiff_t srcIndexStart
readonly @property ptrdiff_t srcIndexEnd readonly @property ptrdiff_t srcLength
The locations (zero-indexed) in the original source where this
Token starts and ends, and the difference between
them.
Indicates whether or not this token exists inside a comment and,
if so, what type of comment.
The first and last terminals in this Token.
If this Token isn't a nonterminal, then these
both just return this.
readonly @property string debugInfo
A place for extra debugging information to be stored.
bool matches(string parentSymbol, string[] subSymbols...)
Determine if the Token matches (ie, was created from) a particular reduction rule. Example: // Did this token come from this reduction rule?
// <Add Exp> ::= <Add Exp> '+' <Mult Exp>
bool checkToken(Token tok)
{
return tok.matches("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>");
}
string toString()
string toString(TokenToStringMode mode) string toStringCompact() string toStringCompactWithSpaces() string toStringSmart() string toStringFull() Converts the Token to a string that resembles the original source. See TokenToStringMode for descriptions of the different modes of conversion. Note: Depending on the language and the chosen mode of conversion, the result might not be valid code in the original language, or may have subtly changed meaning. Not all modes of conversion are suitable for all purposes or all languages. Depending on the language or purpose, it may be that none of these are appropriate and you'll have to create a string by walking the Token tree manually. These functions are merely provided as a convenience. semitwist.treeout.TreeNode toTreeNode()
TreeNode is a type from SemiTwist D Tools that provides an easy way to convert a tree to a text format such as JSON or XML. Example: To convert a Token to JSON: import semitwist.treeout;
string tokenToJSON(Token tok, bool prettyPrint)
{
if(prettyPrint)
return tok.toTreeNode().format(formatterPrettyJSON);
else
return tok.toTreeNode().format(formatterTrimmedJSON);
}
Note, however, if you wish to use the resulting JSON in JsonViewer, and get JsonViewer's enhanced source-viewing features, then you'll need to add a few things to the returned TreeNode before formatting it to a string. See the source of the Parse tool for an example. module {user-specified package}.token
{languageName} = Name of static-style language
This is the common base class for all tokens in a given static-style language. static enum string StringOf
This is a workaround for DMD Bug #1748. Evaluates to Token_{languageName}. For example, if the language is named foo, then this evaluates to Token_foo module {user-specified package}.token
{languageName} = Name of static-style language
{symbol} = [ SymbolType staticSymbolType, ] string staticName=null
This type is for tokens representing a specific Symbol in a static-style language. This is a templated type. Instantiation example: // Assume the language is named "calc"
// For a SymbolType.Terminal symbol named "Ident":
// These are the SAME type:
Token_calc!"Ident"
Token_calc!(SymbolType.Terminal, "Ident")
// For a SymbolType.NonTerminal symbol named "<Add Exp>"
// These are the SAME type (but different from the above types):
Token_calc!"<Add Exp>"
Token_calc!(SymbolType.NonTerminal, "<Add Exp>")
// All the above share common base-types: Token_calc and Token.
// This only shares a common base-type of Token
// (since it's from a different language).
Token_anotherCalc!"Ident"
The two-parameter form is needed if there are two Symbols with the same name. Attempting to instantiate a Token_ with a symbol that doesn't exist in the language will result in a compile-time error. ! static enum string StringOf
This is a workaround for DMD Bug #1748. Evaluates to Token_{languageName}!(SymbolType.{symbolType}, "{symbolName}"). Example: void showStringOf(Token_foo!"Ident" tok)
{
// Output: Token_foo!(SymbolType.Terminal, "Ident")
writeln(typeof(tok).StringOf);
}
static enum string staticName
Compile-time equivalent to Token.name.
this(Language lang, string content, string file="{unknown}", ptrdiff_t line=0, ptrdiff_t srcIndexStart=0, ptrdiff_t srcIndexEnd=0, CommentType commentMode=CommentType.None, string debugInfo="")
Constructor for terminals.
This member doesn't exist if is a nonterminal.
module {user-specified package}.token
{languageName} = Name of static-style language
{rule} = string staticName, ( int staticRuleId | subTokenTypes... )
Nonterminals have one Token_ for each rule that can create them. ! This is a templated type. See Static And Dynamic Styles: Types and Inheritance for an explanation of how it works. Instantiation example: // Assume the language is named "calc"
// These three are all the SAME type, and
// are for a nonterminal Token created from
// this reduction rule:
// <Add Exp> ::= <Add Exp> '+' <Mult Exp>
Token_calc!("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>")
Token_calc!("<Add Exp>", "<Add Exp>", Token_calc!(SymbolType.Terminal, "+"), "<Mult Exp>")
Token_calc!("<Add Exp>", ruleIdOf_calc!("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>"))
// This is a different type, but shares the common
// base-class of Token_calc!"<Add Exp>" with the above:
Token_calc!("<Add Exp>", "<Mult Exp>")
// This is another different type, but the only base-types this
// one shares with the above are Token_calc and Token (because
// it has a different reduction symbol, ie the first argument).
Token_calc!("<Mult Exp>", "<Negate Exp>")
// The only base-type this shares with any of the above is Token,
// since it's from a different language:
Token_anotherCalc!("<Add Exp>", "<Mult Exp>")
// Use null to refer to a rule that has no sub-tokens, such as in this:
// <OptionalHello> ::= 'Hello'
// |
Token_foo!("<OptionalHello>", "Hello") // First rule
Token_foo!("<OptionalHello>", null) // Second rule
See also the documentation on Ambiguous Symbols. Attempting to instantiate a Token_ with a rule that doesn't exist in the language will result in a compile-time error. !
Constructor. Normally, only GoldieLib itself needs to
instantiate tokens, unless you want to create/modify a parse tree or
Token array manually.
Type-safe static-style counterpart to the opIndex and opSlice operator overloads. Sample usage: myToken.sub!2 Example: // Assume the language "calc":
// <Mult Exp> ::= <Mult Exp> '*' <Negate Exp>
// | <Mult Exp> '/' <Negate Exp>
// | <Negate Exp>
// <Negate Exp> ::= '-' <Value>
// | <Value>
void foo(Token_calc!("<Mult Exp>", "<Mult Exp>", "*", "<Negate Exp>") tok)
{
// The first subtoken is known (even at compile-time) to be a <Mult Exp>.
// The others are also known. These are actually checked at compile-time:
// If you get them mixed up, you'll get a type-mismatch error when compiling.
Token_calc!"<Mult Exp>" multTok = tok.sub!0;
Token_calc!"*" operatorTok = tok.sub!1;
Token_calc!"<Negate Exp>" negateTok = tok.sub!2;
// Determine exact rule used for the <Negate Exp> subtoken:
// Can't know this at compile-time because it depends on
// the actual source that was parsed.
if( cast(Token_calc!("<Negate Exp>", "-", "<Value>")) negateTok )
{
writeln("negateTok came from: <Negate Exp> ::= '-' <Value>");
}
else if( cast(Token_calc!("<Negate Exp>", "<Value>")) negateTok )
{
writeln("negateTok came from: <Negate Exp> ::= <Value>");
}
else
writeln("Forgot to handle some other rule!");
}
static enum string StringOf
This is a workaround for DMD Bug #1748. Evaluates to Token_{languageName}!(SymbolType.NonTerminal, "{symbolName}", ...). Example: void showStringOf(Token_calc!("<Negate Exp>", "-", "<Value>") tok)
{
// Output: Token_calc!(SymbolType.NonTerminal, "<Negate Exp>", ...)
writeln(typeof(tok).StringOf);
}
|