Parsing System (v0.4)
|
class LanguageThis is the main interface to GoldieLib. module goldie.lang This class corresponds to a loaded CGT file, ie. a compiled grammar. This is used for dynamic-style. this()
Creates a new blank Language. This should only be used if you're going to manually create a Language, which is an advanced feature. Normally, you would create a Language via Language.loadCGT, StaticLang or possibly one of the Language.compileGrammar* functions.
Loads a CGT compiled grammar file.
static bool compileGrammarGoldCompatibility
Used by the Language.compileGrammar* functions. This is equivalent to the -gold command-line switch in GRMC: Grammar Compiler. Default value is false. Compiles a grammar definition into a dynamic-style Language. This uses the exact same code as GRMC: Grammar Compiler. The filename from which the grammar definition originated can be provided so any errors during compilation can report the grammar definition's filename. If there's an error in the grammar, instead of an exception being thrown, an error message will be sent to stdout and null will be returned. This will be fixed in a future version of Goldie. Just like compileGrammar, but loads the grammar definition from a .grm file.
Same as compileGrammar and compileGrammarFile, except it also
stores the lexer's NFA and DFA
(in
Graphviz
DOT
format)
into Language.nfaDot and Language.dfaDot.
void save(string cgtFilename)
Saves this Language to a CGT file. string filename
The path and name of the CGT file this Language was loaded from (if any).
string name
string ver string author string about bool caseSensitive
Metadata about the language.
For more information, see GOLD's documentation for the
grammar definition language
and
CGT files.
Note that all of these, including caseSensitive, are informational-only
and do not actually affect GoldieLib's behavior.
Symbol[] symbolTable
CharSet[] charSetTable Rule[] ruleTable DFAState[] dfaTable LALRState[] lalrTable int startSymbolIndex int initialDFAState int initialLALRState int eofSymbolIndex int errorSymbolIndex The actual language-defining information in the CGT file. See GOLD's CGT documentation for more information. These are very low-level to the lexing/parsing process and most people will not need to access these directly. In particular, modifying any of these is an advanced feature that should only be done if you really know what you're doing. string[] uniqueSymbolNames()
Returns an array of all valid symbol names. If
multiple Symbols exist with the same name,
the name is only included in the array once.
Returns the Symbol for end-of-file.
Returns the Symbol for error.
string nfaDot
string dfaDot
The lexer's NFA and DFA in Graphviz DOT format. These are always empty unless the Language was created via Language.compileGrammarDebug or Language.compileGrammarFileDebug. Languages loaded from a CGT file or via StaticLang will never have these filled in. int nfaNumStates
The number of NFA states created when generating the lexer. The number of DFA and LALR states can always be found with dfaTable.length and lalrTable.length. This is always 0 unless the Language was created via Language.compileGrammarDebug or Language.compileGrammarFileDebug. Languages loaded from a CGT file or via StaticLang will never have this filled in. Loads a file, lexes and parses it with a new Lexer and a new Parser, and returns the Parser which can then be used to obtain the parsing (and lexing) results or can be reused to parse something else. Throws a ParseException if the source contains an error. Creates a new Parser and a new Lexer, uses them to lex and parse "source", and returns the Parser which can then be used to obtain the parsing (and lexing) results or can be reused to parse something else. Throws a ParseException if the source contains an error. The filename from which the source originated can be provided so the error messages upon any parsing or lexing errors can report the filename. Usually just called by the other parse functions. Creates a new Parser, uses it to parse an already-lexed array of Tokens, and returns the Parser which can then be used to obtain the parsing results or can be reused to parse something else. Throws a ParseException if the source contains an error. The filename from which the source originated can be provided so the error messages upon any parsing errors can report the filename. The Lexer that was used can be provided so that the Parser returned can provide access to the lexing results. Usually just called by the parse functions. Loads a file, lexes it with a new Lexer, and returns the Lexer which can then be used to obtain the lexing results or can be reused to lex something else. Throws a LexException if the source contains an error. Usually just called by the parse functions. Creates a new Lexer, uses it to lex "source", and returns the Lexer which can then be used to obtain the lexing results or can be reused to lex something else. Throws a LexException if the source contains an error. The filename from which the source originated can be provided so the error messages upon any lexing errors can report the filename. Returns an array of all Symbols with the name "name". Note that GOLD, and therefore Goldie, allows multiple symbols with the same name as long as each symbol is of a different SymbolType.
Just like symbolsByName except this only returns
the SymbolTypes of
each symbol, rather than the Symbols themselves.
string symbolTypesStrByName(string name)
Like symbolTypesByName, but returns a human-readable list in string form.
int ruleIdOf(string parentSymbol, string[] subSymbols...)
Returns an index into ruleTable given the name of the reduction symbol and the names of the symbols being reduced. For example, if your grammar has a rule like this: <Add Exp> ::= <Add Exp> '+' <Mult Exp>
Then you can retrieve the corresponding Rule like this: myLang.ruleTable[ myLang.ruleIdOf("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>") ]
Throws if such a rule doesn't exist or if any of the given symbol names are ambiguous (ie, if more than one symbol exists with the given name). Note: This is just a quick-n-dirty implementation at the moment. It works, but it might run slow. string ruleToString(int ruleId)
Returns the given rule in a human-readable string. Note this might not actually be valid code for the grammar description language. For example, if your grammar has a rule like this: <Add Exp> ::= <Add Exp> '+' <Mult Exp>
Then: auto ruleId = myLang.ruleIdOf("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>") assert(myLang.ruleToString(ruleId) == "<Add Exp> ::= <Add Exp> + <Mult Exp>"); bool isSymbolNameValid(string name)
Returns true if AT LEAST one Symbol exists with the given name.
bool isSymbolNameAmbiguous(string name)
Returns true if MORE THAN one Symbol exists with the given name. Note that GOLD allows multiple symbols with the same name as long as each symbol is of a different SymbolType. module {user-specified package}.lang
{languageName} = Name of static-style language
This is the static-style counterpart to Language, generated by the StaticLang tool. If the name of a language is foo (for example), then the name of this class will be Language_foo. All of the Language members are available, but alternate versions are added. static enum string staticName
static enum string staticVer static enum string staticAuthor static enum string staticAbout static enum bool staticCaseSensitive static enum Symbol[] staticSymbolTable static enum CharSet[] staticCharSetTable static enum Rule[] staticRuleTable static enum DFAState[] staticDFATable static enum LALRState[] staticLALRTable static enum int staticStartSymbolIndex static enum int staticInitialDFAState static enum int staticInitialLALRState static enum int staticEofSymbolIndex static enum int staticErrorSymbolIndex static enum string[] staticUniqueSymbolNameArray static string[] staticUniqueSymbolNames() static enum Symbol staticEofSymbol static enum Symbol staticErrorSymbol static bool staticIsSymbolNameValid(string name) static bool staticIsSymbolNameAmbiguous(string name)
Compile-time counterparts to the corresponding Language
members.
Type-safe static-style counterparts to the respective "X"-suffixed lex and parse functions in Language. module {user-specified package}.lang A pre-instantiated instance of a Language_, generated by StaticLang and only created for static-style languages. For example, if the name of a language is foo, then the declaration of this will be: Language_foo language_foo; |