Goldie: class Token
Parsing System (v0.9)
Goldie Home (v0.9) -> GoldieLib Reference -> class Token

class Token

This is the main interface for processing parse trees.

See the explanation of Tokens vs Symbols.

module goldie.token

None
Line
Block

module goldie.token

Compact
Omit all whitespace, error and comment tokens.
CompactWithSpaces
Like Compact, but adds a space between each token.
default Smart
Like Compact, but adds a space between two tokens whenever the last character of the first token and the first character of the second token are both either alphanumeric or an underscore.
Full

Includes all whitespace, error and comment tokens.

Note: Doesn't currently work after the parse phase because whitespace, error and comment tokens are not currently preserved by the parser.

module goldie.token

{ForwardRange_of_Token} traverse(pred)(Token start)

Returns a forward range that performs a preorder traversal of the tree beginning at token start. Takes an optional predicate as a template parameter which can be used to selectively exclude any subtree.

The predicate can be either a bool delegate(Token t) or a string, à la std.algorithm. Either way, it takes in a Token (for example, Token t) and returns a bool. If the predicate returns false, then the entire subtree starting with Token t will be skipped.

If you wish to filter out individual tokens rather than (or in addition to) entire subtrees, you can simply pass the result of traverse to std.algorithm.filter.

The range also has a function void skip() which tells the range to skip the entire subtree of the current front the next time popFront is called.

Example - Visit all descendants of token:

foreach(tok; traverse(token)) writeln("Found: ", tok.name);

Example - Skip all <Declaration> subtrees (these are all equivalent):

alias Token_foo TokFoo; // For convenience // Dynamic-style foreach(tok; traverse!`a.name != "<Declaration>"`(token)) writeln("Found: ", tok.name); // Dynamic-style foreach(tok; traverse!( (Token t) { return t.name != "<Declaration>"; } )(token)) writeln("Found: ", tok.name); // Static-style foreach(tok; traverse!( (Token t) { return cast(TokFoo!"<Declaration>")t is null; } )(token)) writeln("Found: ", tok.name); // Static-style, using isAnyType from semitwist.util.reflect foreach(tok; traverse!( !isAnyType!(TokFoo!"<Declaration>") )(token)) writeln("Found: ", tok.name);

Example - Only traverse <Foo> and <Bar> tokens (these are both equivalent):

alias Token_foo TokFoo; // For convenience // Dynamic-style foreach(tok; traverse!`a.name == "<Foo>" || a.name == "<Bar>"`(token)) writeln("Found: ", tok.name); // Static-style, using isAnyType from semitwist.util.reflect foreach(tok; traverse!( isAnyType!(TokFoo!"<Foo>", TokFoo!"<Bar>") )(token)) writeln("Found: ", tok.name);

Example - Using skip():

alias Token_foo TokFoo; // For convenience // Skip all <Declaration> subtrees auto r = traverse(token); foreach(tok; r) { if(t.name == "<Declaration>") // Dynamic-style r.skip(); writeln("Found: ", tok.name); } // Only traverse <Foo> and <Bar> tokens auto r = traverse(token); foreach(tok; r) { if( !isAnyType!(TokFoo!"<Foo>", TokFoo!"<Bar>")(t) ) // Static-style r.skip(); writeln("Found: ", tok.name); }

module goldie.token

The dynamic-style token interface.

If you are using static-style, then all the static-style token types are derived from this class. Note, however, the static-style versions of get and getRequired actually live directly in this class.

this(Symbol symbol, Token[] sub, Language lang, int ruleId)
Constructor for nonterminals. Normally, only GoldieLib itself needs to instantiate tokens, unless you want to create/modify a parse tree or Token array manually.
this(Symbol symbol, Language lang, string content, string file="{unknown}", ptrdiff_t line=0, ptrdiff_t srcIndexStart=0, ptrdiff_t srcIndexEnd=0, CommentType commentMode=CommentType.None, string debugInfo="")
Constructor for terminals. Normally, only GoldieLib itself needs to instantiate tokens, unless you want to create/modify a parse tree or Token array manually.
readonly @property Language lang
The Language this Token is associated with.
@property size_t length
size_t opDollar()
Token opIndex(size_t index)
Token opIndexAssign(Token tok, size_t index)
Token[] opSlice()
Token[] opSlice(size_t a, size_t b)
Token[] opSliceAssign(Token tok)
Token[] opSliceAssign(Token tok, size_t a, size_t b)
Token[] opSliceAssign(Token[] toks)
Token[] opSliceAssign(Token[] toks, size_t a, size_t b)
int opApply(int delegate(ref Token) dg)
int opApply(int delegate(ref size_t, ref Token) dg)
int opApplyReverse(int delegate(ref Token) dg)
int opApplyReverse(int delegate(ref size_t, ref Token) dg)

Access the child tokens (ie, subtokens) of this token (if this is a nonterminal).

Note that opDollar only works in DMD 2.057 and up.

Example:

Token firstChild = myToken[0]; Token lastChild = myToken[$-1]; // The '$' requires at least DMD 2.057 Token[] firstThree = myToken[0..3]; // Assignments are possible, but NOT recommended if you are // using static-style, because it can easily mess up the // static-style sub, get and getRequired. myToken[1] = anotherToken; // Iterate foreach(i, subToken; myToken) {}
Token[] subX

Direct access to the child tokens (ie, subtokens) of this token (if this is a nonterminal).

This will be removed from a later version of Goldie in favor of the operator overloads above.

T get(T)(int index=0)

Static-style function to retrieve the first subtoken matching the given token type T. The type T must be a Token_{languageName}!{symbol} or Token_{languageName}!{rule}.

If there are multiple subtokens matching the desired type, the index parameter can be used to select which one is desired. If the desired subtoken isn't found, null is returned.

Example:

// The type of 'token' is: Token_myLang!("<A>", "<Foo>", "#", "<Bar>", "*", "<Foo>", ";") // ie, 'token' is a nonterminal matching the rule: // <A> ::= <Foo> '#' <Bar> '*' <Foo> ';' assert( token.get!( Token_myLang!"<Foo>" )() is token.sub!0 ); assert( token.get!( Token_myLang!"<Bar>" )() is token.sub!2 ); assert( token.get!( Token_myLang!"<Foo>" )(1) is token.sub!4 ); assert( token.get!( Token_myLang!"<Foo>" )(2) is null ); // Doesn't exist!
Ts[$-1] get(Ts...)() if(Ts.length > 1)

Static-style pattern matching to find a descendant token. Returns null if the given path doesn't exist. This is like chaining calls to get!T(0) above, but safely checks for null at each step.

The types passed in must be Token_{languageName}!{symbol} or Token_{languageName}!{rule}.

Example:

alias Token_myLang Tok; // For convenience auto x = token.get!( Tok!"<A>", Tok!"<B>", Tok!"<C>" )(); // Never dereferences null auto y = token.get!(Tok!"<A>")().get!(Tok!"<B>")().get!(Tok!"<C>")(); // Could dereference null assert( x is y );
Token get(string symbolName, int index=0)
Token get(string[] symbolNames)

Dynamic-style versions of get.

The symbol names given must be valid symbol names in this Token's language or an Exception will be thrown.

Example:

// 'token' is a nonterminal matching the rule: // <A> ::= <Foo> '#' <Bar> '*' <Foo> ';' assert( token.matches("<A>", "<Foo>", "#", "<Bar>", "*", "<Foo>", ";") ); assert( token.get("<Foo>") is token[0] ); assert( token.get("<Bar>") is token[2] ); assert( token.get("<Foo>", 1) is token[4] ); assert( token.get("<Foo>", 2) is null ); // Doesn't exist! Token x = token.get( ["<Bar>", "<B>", "<Q>"] ); // If '<Bar> ::= <B>' and '<B> ::= <Q>' assert( token.get([]) is token ); // Empty array results in the original token
T getRequired(T)(int index=0)
Ts[$-1] getRequired(Ts...)() if(Ts.length > 1)
Token getRequired(string symbolName, int index=0)
Token getRequired(string[] symbolNames)

Just like get, but throws an Exception if the desired subtoken isn't found.

The templated versions of getRequired are static-style. The types passed in to these must be Token_{languageName}!{symbol} or Token_{languageName}!{rule}.

readonly @property int ruleId
If this Token is a nonterminal, then this is the ID of the reduction rule that was used to create the token. This ID is an index into Language.ruleTable.
The SymbolType of this Token. See the explanation of Tokens, Symbols, and Symbol Types for more information.
Symbol symbol
The Symbol of this Token. See the explanation of Tokens, Symbols, and Symbol Types for more information.
readonly @property string typeName
The name of this Token's SymbolType.
readonly @property string name
readonly @property string fullName
This is just typeName ~ "." ~ name. For instance, "NonTerminal.<Foo>"
REMOVED readonly @property int id
This has been removed. Use symbol.id instead.
readonly @property ptrdiff_t line
readonly @property string file

The file and line number of the original source where this Token starts. See Goldie's conventions relating to Line and Column Numbers.

For the line number where this Token ends, or the column number where this Token starts or ends, use srcIndexStart or srcIndexStart together with Lexer.lineIndicies and Lexer.lineAtIndex.

readonly @property ptrdiff_t srcIndexStart
readonly @property ptrdiff_t srcIndexEnd
readonly @property ptrdiff_t srcLength
The locations (zero-indexed) in the original source where this Token starts and ends, and the difference between them.
readonly @property CommentType commentMode
Indicates whether or not this token exists inside a comment and, if so, what type of comment.
readonly @property Token firstLeaf
readonly @property Token lastLeaf
The first and last terminals in this Token. If this Token isn't a nonterminal, then these both just return this.
readonly @property string debugInfo
A place for extra debugging information to be stored.
bool matches(string parentSymbol, string[] subSymbols...)

Determine if the Token matches (ie, was created from) a particular reduction rule.

Example:

// Did this token come from this reduction rule? // <Add Exp> ::= <Add Exp> '+' <Mult Exp> bool checkToken(Token tok) { return tok.matches("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>"); }
string toString()
string toString(TokenToStringMode mode)
string toStringCompact()
string toStringCompactWithSpaces()
string toStringSmart()
string toStringFull()

Converts the Token to a string that resembles the original source. See TokenToStringMode for descriptions of the different modes of conversion.

Note: Depending on the language and the chosen mode of conversion, the result might not be valid code in the original language, or may have subtly changed meaning. Not all modes of conversion are suitable for all purposes or all languages. Depending on the language or purpose, it may be that none of these are appropriate and you'll have to create a string by walking the Token tree manually. These functions are merely provided as a convenience.

semitwist.treeout.TreeNode toTreeNode()

TreeNode is a type from SemiTwist D Tools that provides an easy way to convert a tree to a text format such as JSON or XML.

Example: To convert a Token to JSON:

import semitwist.treeout; string tokenToJSON(Token tok, bool prettyPrint) { if(prettyPrint) return tok.toTreeNode().format(formatterPrettyJSON); else return tok.toTreeNode().format(formatterTrimmedJSON); }

Note, however, if you wish to use the resulting JSON in JsonViewer, and get JsonViewer's enhanced source-viewing features, then you'll need to add a few things to the returned TreeNode before formatting it to a string. See the source of the Parse tool for an example.

module {user-specified package}.token

module {user-specified package}.token

{languageName} = Name of static-style language
{symbol} = [ SymbolType staticSymbolType, ] string staticName=null

This type is for tokens representing a specific Symbol in a static-style language.

This is a templated type. Instantiation example:

// Assume the language is named "calc" // For a SymbolType.Terminal symbol named "Ident": // These are the SAME type: Token_calc!"Ident" Token_calc!(SymbolType.Terminal, "Ident") // For a SymbolType.NonTerminal symbol named "<Add Exp>" // These are the SAME type (but different from the above types): Token_calc!"<Add Exp>" Token_calc!(SymbolType.NonTerminal, "<Add Exp>") // All the above share common base-types: Token_calc and Token. // This only shares a common base-type of Token // (since it's from a different language). Token_anotherCalc!"Ident"

The two-parameter form is needed if there are two Symbols with the same name.

Attempting to instantiate a Token_{languageName}!{symbol} with a symbol that doesn't exist in the language will result in a compile-time error.

static enum string StringOf

This is a workaround for DMD Bug #1748.

Evaluates to Token_{languageName}!(SymbolType.{symbolType}, "{symbolName}").

Example:

void showStringOf(Token_foo!"Ident" tok) { // Output: Token_foo!(SymbolType.Terminal, "Ident") writeln(typeof(tok).StringOf); }
static enum string staticName
Compile-time equivalent to Token.name.
this(Language lang, string content, string file="{unknown}", ptrdiff_t line=0, ptrdiff_t srcIndexStart=0, ptrdiff_t srcIndexEnd=0, CommentType commentMode=CommentType.None, string debugInfo="")
Constructor for terminals. This member doesn't exist if {symbol} is a nonterminal.

module {user-specified package}.token

{languageName} = Name of static-style language
{rule} = string staticName, ( int staticRuleId | subTokenTypes... )

Nonterminals have one Token_{languageName}!{rule} for each rule that can create them.

This is a templated type. See Static And Dynamic Styles: Types and Inheritance for an explanation of how it works.

Instantiation example:

// Assume the language is named "calc" // These three are all the SAME type, and // are for a nonterminal Token created from // this reduction rule: // <Add Exp> ::= <Add Exp> '+' <Mult Exp> Token_calc!("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>") Token_calc!("<Add Exp>", "<Add Exp>", Token_calc!(SymbolType.Terminal, "+"), "<Mult Exp>") Token_calc!("<Add Exp>", ruleIdOf_calc!("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>")) // This is a different type, but shares the common // base-class of Token_calc!"<Add Exp>" with the above: Token_calc!("<Add Exp>", "<Mult Exp>") // This is another different type, but the only base-types this // one shares with the above are Token_calc and Token (because // it has a different reduction symbol, ie the first argument). Token_calc!("<Mult Exp>", "<Negate Exp>") // The only base-type this shares with any of the above is Token, // since it's from a different language: Token_anotherCalc!("<Add Exp>", "<Mult Exp>") // Use null to refer to a rule that has no sub-tokens, such as in this: // <OptionalHello> ::= 'Hello' // | Token_foo!("<OptionalHello>", "Hello") // First rule Token_foo!("<OptionalHello>", null) // Second rule

See also the documentation on Ambiguous Symbols.

Attempting to instantiate a Token_{languageName}!{rule} with a rule that doesn't exist in the language will result in a compile-time error.

this(Token[] sub, Language lang)
Constructor. Normally, only GoldieLib itself needs to instantiate tokens, unless you want to create/modify a parse tree or Token array manually.

Type-safe static-style counterpart to the opIndex and opSlice operator overloads.

Sample usage: myToken.sub!2

Example:

// Assume the language "calc": // <Mult Exp> ::= <Mult Exp> '*' <Negate Exp> // | <Mult Exp> '/' <Negate Exp> // | <Negate Exp> // <Negate Exp> ::= '-' <Value> // | <Value> void foo(Token_calc!("<Mult Exp>", "<Mult Exp>", "*", "<Negate Exp>") tok) { // The first subtoken is known (even at compile-time) to be a <Mult Exp>. // The others are also known. These are actually checked at compile-time: // If you get them mixed up, you'll get a type-mismatch error when compiling. Token_calc!"<Mult Exp>" multTok = tok.sub!0; Token_calc!"*" operatorTok = tok.sub!1; Token_calc!"<Negate Exp>" negateTok = tok.sub!2; // Determine exact rule used for the <Negate Exp> subtoken: // Can't know this at compile-time because it depends on // the actual source that was parsed. if( cast(Token_calc!("<Negate Exp>", "-", "<Value>")) negateTok ) { writeln("negateTok came from: <Negate Exp> ::= '-' <Value>"); } else if( cast(Token_calc!("<Negate Exp>", "<Value>")) negateTok ) { writeln("negateTok came from: <Negate Exp> ::= <Value>"); } else writeln("Forgot to handle some other rule!"); }
static enum string StringOf

This is a workaround for DMD Bug #1748.

Evaluates to Token_{languageName}!(SymbolType.NonTerminal, "{symbolName}", ...).

Example:

void showStringOf(Token_calc!("<Negate Exp>", "-", "<Value>") tok) { // Output: Token_calc!(SymbolType.NonTerminal, "<Negate Exp>", ...) writeln(typeof(tok).StringOf); }