Goldie Homepage
|
API Overview
Import
Importing is simple:
import goldie.all;
Conventions Used By Goldie
Line and Column Numbers
All line numbers and column numbers are internally stored and treated
by the API as zero-indexed and displayed to the user as one-indexed.
When Goldie refers to a "column number", it really means "the number of
characters (ie, UTF code-points) from the start of the line".
This behavior is more reliable and more useful than a true "column number"
because:
-
Tab-size is a matter of presentation (For better or worse, depending on one's perspective).
-
Non-printing characters may exist in a source to be parsed.
-
The increasingly popular concept of
elastic tabstops
makes proportional fonts practical for source code, which renders
the traditional concept of "column number" meaningless.
Lexing, Parsing and Semantic Analysis
What many people refer to as "parsing" is really three separate steps: Lexing
(or "Lexical Analysis"), Parsing (or "Grammatical/Syntactical Analysis")
and Semantic Analysis.
-
Lexing:
This separates the source into a series of tokens. For instance,
int numApples = 10 gets converted into
"Keyword 'int', Identifier 'numApples', Equals sign, Number 10".
Goldie does this in the Lexer class by using a
DFA.
Lexers are also sometimes called tokenizers and scanners.
You can view the result of this step using Parse and JsonViewer.
-
Parsing:
This arranges the lexed tokens into a tree. The structure of the tree is
based on the rules in the language's grammar.
Goldie does this in the Parser class by using an
LALR algorithm.
You can view the result of this step using Parse and JsonViewer.
-
Semantic Analysis:
This step is generally NOT performed by automatic parsers (such as Goldie,
YACC, Bison, or ANTLR). The user of such tools has to perform this
step their self because it's not as easily formalized as lexing or parsing.
In this step, the parse tree generated from the parsing step
is analyzed and actual meaning is interpreted. This often involves extra error
checking. For instance, in statically-typed languages, the type system exists
in the semantic analysis phase. This step is also where type-mismatch errors
and "undefined function/variable" errors are generated.
This step can, but doesn't have to, involve constructing an AST (Abstract Syntax Tree).
An HTML or XML DOM is an example of an AST. For another example, see the output
of GenDocs's /ast flag. See the
GenDocs source
for an example of lexing/parsing with Goldie and then constructing an AST tree
and performing semantic analysis.
Tokens, Symbols, and Symbol Types
A Token can be thought of as an instance of a Symbol.
A Token is part of the parsed source, and a Symbol
is part of the grammar.
For example, consider this grammar:
Word = {Letter}+
<Sentence> ::= <Sentence> Word
And this source:
Hello world
These are the Tokens and Symbols:
Word | Symbol | This Symbol's type is SymbolType.Terminal. |
<Sentence> | Symbol | This Symbol's type is SymbolType.NonTerminal. |
Hello | Token | This Token's Symbol is Word. |
world | Token | This Token's Symbol is Word. |
Hello world | Token | This Token's Symbol is <Sentence>. |
Note that there are more symbol types
than just Terminal and NonTerminal (the
SymbolType documentation explains this).
So do NOT check if a Symbol or Token is a SymbolType.NonTerminal
by comparing the type with SymbolType.Terminal. Just because
something isn't a SymbolType.Terminal does NOT imply that
it's a SymbolType.NonTerminal.
Simple Examples
For simple examples of how to use Goldie, see the Included Sample Apps.
|