Language of your choice: library documentation
For performance reasons, a Token ought to be a structure rather than a class. But if Token is a struct, we have a conundrum: how do we support tokens from different languages? We can't use inheritance since structs do not support it. When EC# is ready, we could use a single struct plus an alias for each language, but of course this structure predates the implementation of EC#.
Luckily, tokens in most languages are very similar. A four-word structure generally suffices:
enum. All enums can be converted to an integer, so Token uses Int32 as the token type. In order to support DSLs via token literals (e.g. LLLPG is a DSL inside EC#), the TypeInt should be based on TokenKind.
Originally I planned to use Symbol as the common token type, because it is extensible and could nicely represent tokens in all languages; unfortunately, Symbol may reduce parsing performance because it cannot be used with the switch opcode (i.e. the switch statement in C#), so I decided to switch to integers instead and to introduce the concept of TokenKind, which is derived from Type using TokenKind.KindMask. Each language should have, in the namespace of that language, an extension method
public static TokenType Type(this Token t) that converts the TypeInt to the enum type for that language.
To save space (and because .NET doesn't handle large structures well), tokens do not know what source file they came from and cannot convert their location to a line number. For this reason, one should keep a reference to the ISourceFile and call IIndexToLine.IndexToLine(int) to get the source location.
A generic token also cannot convert itself to a properly-formatted string. The ToString method does allow
|Token type. More...|
|Location in the orginal source file where the token starts, or -1 for a synthetic token. More...|
|const int||LengthMask = 0x00FFFFFF|
|const int||StyleMask = unchecked((int)0xFF000000)|
|const int||StyleShift = 24|
|The parsed value of the token. More...|
|const int||TokenKindShift = 8|
|const int||NumPuncSymbols = ((TokenKind.RBrace - TokenKind.LParen) >> TokenKindShift) + 1|
static readonly |
< Token, string > >
|ToStringStrategyTLV = new ThreadLocalVariable<Func<Token,string>>(Loyc.Syntax.Les.TokenExt.ToString)|
|static readonly Symbol||Parens = GSymbol.Get("()")|
|static readonly Symbol||IndentDedent = GSymbol.Get("IndentDedent")|
|static readonly Symbol||LOtherROther = GSymbol.Get("LOtherROther")|
|static readonly Symbol||TokenKindPunctuationSymbols|
static readonly InternalList|
< Symbol >
|_kindAttrTable = KindAttrTable()|
|Token kind. More...|
|int ISimpleToken< int >.||StartIndex|
|Length of the token in the source file, or 0 for a synthetic or implied token. More...|
|8 bits of nonsemantic information about the token. The style is used to distinguish hex literals from decimal literals, or triple- quoted strings from double-quoted strings. More...|
|Returns Value as TokenTree (null if not a TokenTree). More...|
|Returns StartIndex + Length. More...|
|Returns true if Value == WhitespaceTag.Value. More...|
|static Func< Token, string >||ToStringStrategy|
|Gets or sets the strategy used by ToString. More...|
|int ISimpleToken< int >.||Type|
|object IHasValue< object >.||Value|
IListSource< IToken< int >|
> IToken< int >.
|Properties inherited from Loyc.Syntax.Lexing.IToken< TT >|
|IListSource< IToken< TT > >||Children|
|Properties inherited from Loyc.Syntax.Lexing.ISimpleToken< TokenType >|
|The category of the token (integer, keyword, etc.) used as the primary value for identifying the token in a parser. More...|
|Character index where the token starts in the source file. More...|
|Properties inherited from Loyc.IHasValue< out T >|
|Token (int type, int startIndex, int length, NodeStyle style=0, object value=null)|
|Token (int type, int startIndex, int length, object value)|
|bool||Is (int type, object value)|
|Returns true if the specified type and value match this token. More...|
|SourceRange||Range (ISourceFile sf)|
|Gets the SourceRange of a token, under the assumption that the token came from the specified source file. More...|
|SourceRange||Range (ILexer< Token > l)|
|UString||SourceText (ICharSource file)|
|Gets the original source text for a token if available, under the assumption that the specified source file correctly specifies where the token came from. If the token is synthetic, returns UString.Null. More...|
|UString||SourceText (ILexer< Token > l)|
|override string||ToString ()|
|Reconstructs a string that represents the token, if possible. Does not work for whitespace and comments, because the value of these token types is stored in the original source file and for performance reasons is not copied to the token. More...|
|override bool||Equals (object obj)|
|bool||Equals (Token other)|
|Equality depends on TypeInt and Value, but not StartIndex and Length (this is the same equality condition as LNode). More...|
|override int||GetHashCode ()|
|Token||TryGet (int index, out bool fail)|
|IEnumerator< Token >||GetEnumerator ()|
IRange< Token > IListSource|
< Token >.
|Slice (int start, int count)|
|Slice_< Token >||Slice (int start, int count)|
|IToken< int > IToken< int >.||WithType (int type)|
|Token||WithType (int type)|
|IToken< int > IToken< int >.||WithValue (object value)|
|Token||WithValue (object value)|
|Token||WithRange (int startIndex, int endIndex)|
|Token||WithStartIndex (int startIndex)|
IToken< int > ICloneable|
< IToken< int > >.
|object||ToSourceRange (ISourceFile sourceFile)|
|LNode||ToLNode (ISourceFile file)|
|Converts a Token to a LNode. More...|
|static bool||IsOpener (TokenKind tt)|
|static bool||IsCloser (TokenKind tt)|
|static bool||IsOpenerOrCloser (TokenKind tt)|
|static Symbol||GetParenPairSymbol (TokenKind k, TokenKind k2)|
Equality depends on TypeInt and Value, but not StartIndex and Length (this is the same equality condition as LNode).
Returns true if the specified type and value match this token.
Gets the SourceRange of a token, under the assumption that the token came from the specified source file.
Gets the original source text for a token if available, under the assumption that the specified source file correctly specifies where the token came from. If the token is synthetic, returns UString.Null.
|file||This becomes the LNode.Source property.|
If you really need to store tokens as LNodes, use this. Only the Kind, not the TypeInt, is preserved. Identifiers (where Kind==TokenKind.Id and Value is Symbol) are translated as Id nodes; everything else is translated as a call, using the TokenKind as the LNode.Name and the value, if any, as parameters. For example, if it has been treeified with TokensToTree, the token list for
"Nodes".Substring(1, 3) as parsed by LES might translate to the LNode sequence
String("Nodes"), Dot(@.), Substring, LParam(Number(1), Separator(@,), Number(3)), RParen(). The LNode.Range will match the range of the token.
Reconstructs a string that represents the token, if possible. Does not work for whitespace and comments, because the value of these token types is stored in the original source file and for performance reasons is not copied to the token.
This does not return the original source text; it uses a language- specific stringizer (ToStringStrategy).
The returned string, in general, will not match the original token, since the ToStringStrategy does not have access to the original source file.
|readonly int Loyc.Syntax.Lexing.Token.StartIndex|
Location in the orginal source file where the token starts, or -1 for a synthetic token.
|readonly int Loyc.Syntax.Lexing.Token.TypeInt|
The parsed value of the token.
The value is
For performance reasons, the text of whitespace is not extracted from the source file; Value is WhitespaceTag.Value for whitespace. Value must be assigned for other types such as identifiers and literals.
Since the same identifiers and literals are often used more than once in a given source file, an optimized lexer could use a data structure such as a trie or hashtable to cache boxed literals and identifier symbols, and re-use the same values when the same identifiers and literals are encountered multiple times. Done carefully, this avoids the overhead of repeatedly extracting string objects from the source file. If strings must be extracted for some reason (e.g.
double.TryParse requires an extracted string), at least memory can be saved.
Returns StartIndex + Length.
Referenced by Loyc.Syntax.Les.LesIndentTokenGenerator.MakeIndentToken().
Returns true if Value == WhitespaceTag.Value.
Length of the token in the source file, or 0 for a synthetic or implied token.
8 bits of nonsemantic information about the token. The style is used to distinguish hex literals from decimal literals, or triple- quoted strings from double-quoted strings.