Parsing System (v0.4)
|
class Token
This is the main interface for processing parse trees. See the explanation of Tokens vs Symbols. module goldie.token None
Line
Block
module goldie.token Compact
Omit all whitespace, error and comment tokens.
CompactWithSpaces
Like Compact, but adds a space between each token.
default Smart
Like Compact, but adds a space between two
tokens whenever the last character of the first
token and the first character of the second token
are both either alphanumeric or an underscore.
Full
Includes all whitespace, error and comment tokens. Note: Doesn't currently work after the parse phase because whitespace, error and comment tokens are not currently preserved by the parser. module goldie.token
Constructor for nonterminals. Normally, only GoldieLib itself needs to
instantiate tokens, unless you want to create/modify a parse tree or
Token array manually.
this(Symbol symbol, Language lang, string content, string file="{unknown}", int line=0, int srcIndexStart=0, int srcIndexEnd=0, CommentType commentMode=CommentType.None, string debugInfo="")
Constructor for terminals. Normally, only GoldieLib itself needs to
instantiate tokens, unless you want to create/modify a parse tree or
Token array manually.
The sub-tokens of this Token (if this is a nonterminal).
readonly property int ruleId
If this Token is a nonterminal, then this is the ID of the
reduction rule
that was used to create the token. This ID is
an index into Language.ruleTable.
The SymbolType of this Token.
See the explanation of
Tokens, Symbols, and Symbol Types
for more information.
The Symbol of this Token.
See the explanation of
Tokens, Symbols, and Symbol Types
for more information.
readonly property string typeName
The name of this Token's SymbolType.
readonly property string name
readonly property string fullName
This is just typeName ~ "." ~ name.
readonly property int id
readonly property int line
readonly property string file
The file and line number of the original source where this Token starts. See Goldie's conventions relating to Line and Column Numbers. For the line number where this Token ends, or the column number where this Token starts or ends, use srcIndexStart or srcIndexStart together with Lexer.lineIndicies and Lexer.lineAtIndex. readonly property int srcIndexStart
readonly property int srcIndexEnd
readonly property int srcLength
The locations (zero-indexed) in the original source where this
Token starts and ends, and the difference between
them.
Indicates whether or not this token exists inside a comment and,
if so, what type of comment.
The first and last terminals in this Token.
If this Token isn't a nonterminal, then these
both just return this.
readonly property string debugInfo
A place for extra debugging information to be stored.
bool matches(string parentSymbol, string[] subSymbols...)
Determine if the Token matches (ie, was created from) a particular reduction rule. Example: // Did this token come from this reduction rule? // <Add Exp> ::= <Add Exp> '+' <Mult Exp> bool checkToken(Token tok) { return tok.matches("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>"); } string toString()
string toString(TokenToStringMode mode) string toStringCompact() string toStringCompactWithSpaces() string toStringSmart() string toStringFull() Converts the Token to a string that resembles the original source. See TokenToStringMode for descriptions of the different modes of conversion. Note: Depending on the language and the chosen mode of conversion, the result might not be valid code in the original language, or may have subtly changed meaning. Not all modes of conversion are suitable for all purposes or all languages. Depending on the language or purpose, it may be that none of these are appropriate and you'll have to create a string by walking the Token tree manually. These functions are merely provided as a convenience. semitwist.treeout.TreeNode toTreeNode()
TreeNode is a type from SemiTwist D Tools that provides an easy way to convert a tree to a text format such as JSON or XML. Example: To convert a Token to JSON: import semitwist.treeout; string tokenToJSON(Token tok, bool prettyPrint) { if(prettyPrint) return tok.toTreeNode().format(formatterPrettyJSON); else return tok.toTreeNode().format(formatterTrimmedJSON); } Note, however, if you wish to use the resulting JSON in JsonViewer, and get JsonViewer's enhanced source-viewing features, then you'll need to add a few things to the returned TreeNode before formatting it to a string. See the source of the Parse tool for an example. module {user-specified package}.token
{languageName} = Name of static-style language
This is the common base class for all tokens in a given static-style language. static enum string StringOf
This is a workaround for DMD Bug #1748. Evaluates to Token_{languageName}. For example, if the language is named foo, then this evaluates to Token_foo module {user-specified package}.token
{languageName} = Name of static-style language
{symbol} = [ SymbolType staticSymbolType, ] string staticName=null
This type is for tokens representing a specific Symbol in a static-style language. This is a templated type. Instantiation example: // Assume the language is named "calc" // For a SymbolType.Terminal symbol named "Ident": // These are the SAME type: Token_calc!"Ident" Token_calc!(SymbolType.Terminal, "Ident") // For a SymbolType.NonTerminal symbol named "<Add Exp>" // These are the SAME type (but different from the above types): Token_calc!"<Add Exp>" Token_calc!(SymbolType.NonTerminal, "<Add Exp>") // All the above share common base-types: Token_calc and Token. // This only shares a common base-type of Token // (since it's from a different language). Token_anotherCalc!"Ident" The two-parameter form is needed if there are two Symbols with the same name. Attempting to instantiate a Token_ with a symbol that doesn't exist in the language will result in a compile-time error. ! static enum string StringOf
This is a workaround for DMD Bug #1748. Evaluates to Token_{languageName}!(SymbolType.{symbolType}, "{symbolName}"). Example: void showStringOf(Token_foo!"Ident" tok) { // Output: Token_foo!(SymbolType.Terminal, "Ident") writeln(typeof(tok).StringOf); } module {user-specified package}.token
{languageName} = Name of static-style language
{rule} = string staticName, ( int staticRuleId | subTokenTypes... )
Nonterminals have one Token_ for each rule that can create them. ! TODO: Refer to StatVsDyn:"Types and Inheritance" This is a templated type. Instantiation example: // Assume the language is named "calc" // These three are all the SAME type, and // are for a nonterminal Token created from // this reduction rule: // <Add Exp> ::= <Add Exp> '+' <Mult Exp> Token_calc!("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>") Token_calc!("<Add Exp>", "<Add Exp>", Token_calc!(SymbolType.Terminal, "+"), "<Mult Exp>") Token_calc!("<Add Exp>", ruleIdOf_calc!("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>")) // This is a different type, but shares the common // base-class of Token_calc!"<Add Exp>" with the above: Token_calc!("<Add Exp>", "<Mult Exp>") // This is another different type, but the only base-types this // one shares with the above are Token_calc and Token (because // it has a different reduction symbol, ie the first argument). Token_calc!("<Mult Exp>", "<Negate Exp>") // The only base-type this shares with any of the above is Token, // since it's from a different language: Token_anotherCalc!("<Add Exp>", "<Mult Exp>") // Use null to refer to a rule that has no sub-tokens, such as in this: // <OptionalHello> ::= 'Hello' // | Token_foo!("<OptionalHello>", "Hello") // First rule Token_foo!("<OptionalHello>", null) // Second rule See also the documentation on Ambiguous Symbols. Attempting to instantiate a Token_ with a rule that doesn't exist in the language will result in a compile-time error. !
Constructor. Normally, only GoldieLib itself needs to
instantiate tokens, unless you want to create/modify a parse tree or
Token array manually.
Type-safe static-style counterpart to Token.subX. Sample usage: myToken.sub!2 Example: // Assume the language "calc": // <Mult Exp> ::= <Mult Exp> '*' <Negate Exp> // | <Mult Exp> '/' <Negate Exp> // | <Negate Exp> // <Negate Exp> ::= '-' <Value> // | <Value> void foo(Token_calc!("<Mult Exp>", "<Mult Exp>", "*", "<Negate Exp>") tok) { // The third subtoken is known (even at compile-time) to be // a <Negate Exp>. The others are also known. // These are actually checked at compile-time. // If you get them mixed up, you'll get a type-mismatch error when compiling. Token_calc!"<Negate Exp>" negateTok = tok.sub!2; Token_calc!"<Mult Exp>" multTok = tok.sub!0; Token_calc!"*" operatorTok = tok.sub!1; // Determine exact type of the <Negate Exp> subtoken: // Can't know this at compile-time because it depends on // the actual code that was parsed. if( cast(Token_calc!("<Negate Exp>", "-", "<Value>")) negateTok ) { writeln("negateTok came from: <Negate Exp> ::= '-' <Value>"); } else if( cast(Token_calc!("<Negate Exp>", "<Value>")) negateTok ) { writeln("negateTok came from: <Negate Exp> ::= <Value>"); } else writeln("Forgot to handle some other rule!"); } static enum string StringOf
This is a workaround for DMD Bug #1748. Evaluates to Token_{languageName}!(SymbolType.NonTerminal, "{symbolName}", ...). Example: void showStringOf(Token_calc!("<Negate Exp>", "-", "<Value>") tok) { // Output: Token_calc!(SymbolType.NonTerminal, "<Negate Exp>", ...) writeln(typeof(tok).StringOf); } |