Enhanced C#
Language of your choice: library documentation
|
A common token type recommended for Loyc languages that want to use features such as token literals or the TokensToTree class. More...
A common token type recommended for Loyc languages that want to use features such as token literals or the TokensToTree class.
For performance reasons, a Token ought to be a structure rather than a class. But if Token is a struct, we have a conundrum: how do we support tokens from different languages? We can't use inheritance since structs do not support it. When EC# is ready, we could use a single struct plus an alias for each language, but of course this structure predates the implementation of EC#.
Luckily, tokens in most languages are very similar. A four-word structure generally suffices:
enum
. All enums can be converted to an integer, so Token uses Int32 as the token type. In order to support DSLs via token literals (e.g. LLLPG is a DSL inside EC#), the TypeInt should be based on TokenKind. Originally I planned to use Symbol as the common token type, because it is extensible and could nicely represent tokens in all languages; unfortunately, Symbol may reduce parsing performance because it cannot be used with the switch opcode (i.e. the switch statement in C#), so I decided to switch to integers instead and to introduce the concept of TokenKind, which is derived from Type using TokenKind.KindMask. Each language should have, in the namespace of that language, an extension method public static TokenType Type(this Token t)
that converts the TypeInt to the enum type for that language.
To save space (and because .NET doesn't handle large structures well), tokens do not know what source file they came from and cannot convert their location to a line number. For this reason, one should keep a reference to the ISourceFile and call IIndexToLine.IndexToLine(int) to get the source location.
A generic token also cannot convert itself to a properly-formatted string. The ToString method does allow
Public fields | |
readonly int | TypeInt |
Token type. More... | |
readonly int | StartIndex |
Location in the orginal source file where the token starts, or -1 for a synthetic token. More... | |
int | _length |
const int | LengthMask = 0x00FFFFFF |
const int | StyleMask = unchecked((int)0xFF000000) |
const int | StyleShift = 24 |
object | Value |
The parsed value of the token. More... | |
const int | TokenKindShift = 8 |
const int | NumPuncSymbols = ((TokenKind.RBrace - TokenKind.LParen) >> TokenKindShift) + 1 |
Public static fields | |
static readonly ThreadLocalVariable< Func < Token, string > > | ToStringStrategyTLV = new ThreadLocalVariable<Func<Token,string>>(Loyc.Syntax.Les.TokenExt.ToString) |
static readonly Symbol | Parens = GSymbol.Get("()") |
static readonly Symbol | IndentDedent = GSymbol.Get("IndentDedent") |
static readonly Symbol | LOtherROther = GSymbol.Get("LOtherROther") |
static readonly Symbol[] | TokenKindPunctuationSymbols |
static readonly InternalList < Symbol > | _kindAttrTable = KindAttrTable() |
Properties | |
TokenKind | Kind [get] |
Token kind. More... | |
int ISimpleToken< int >. | StartIndex [get] |
int | Length [get] |
Length of the token in the source file, or 0 for a synthetic or implied token. More... | |
NodeStyle | Style [get] |
8 bits of nonsemantic information about the token. The style is used to distinguish hex literals from decimal literals, or triple- quoted strings from double-quoted strings. More... | |
TokenTree | Children [get] |
Returns Value as TokenTree (null if not a TokenTree). More... | |
int | EndIndex [get] |
Returns StartIndex + Length. More... | |
bool | IsWhitespace [get] |
Returns true if Value == WhitespaceTag.Value. More... | |
static Func< Token, string > | ToStringStrategy [get, set] |
Gets or sets the strategy used by ToString. More... | |
Token | this[int index] [get] |
int | Count [get] |
int ISimpleToken< int >. | Type [get] |
object IHasValue< object >. | Value [get] |
IListSource< IToken< int > > IToken< int >. | Children [get] |
Properties inherited from Loyc.Syntax.Lexing.IToken< TT > | |
int | Length [get] |
TokenKind | Kind [get] |
IListSource< IToken< TT > > | Children [get] |
Properties inherited from Loyc.Syntax.Lexing.ISimpleToken< TokenType > | |
TokenType | Type [get] |
The category of the token (integer, keyword, etc.) used as the primary value for identifying the token in a parser. More... | |
int | StartIndex [get] |
Character index where the token starts in the source file. More... | |
Properties inherited from Loyc.IHasValue< out T > | |
T | Value [get] |
Public Member Functions | |
Token (int type, int startIndex, int length, NodeStyle style=0, object value=null) | |
Token (int type, int startIndex, int length, object value) | |
bool | Is (int type, object value) |
Returns true if the specified type and value match this token. More... | |
SourceRange | Range (ISourceFile sf) |
Gets the SourceRange of a token, under the assumption that the token came from the specified source file. More... | |
SourceRange | Range (ILexer< Token > l) |
UString | SourceText (ICharSource file) |
Gets the original source text for a token if available, under the assumption that the specified source file correctly specifies where the token came from. If the token is synthetic, returns UString.Null. More... | |
UString | SourceText (ILexer< Token > l) |
override string | ToString () |
Reconstructs a string that represents the token, if possible. Does not work for whitespace and comments, because the value of these token types is stored in the original source file and for performance reasons is not copied to the token. More... | |
override bool | Equals (object obj) |
bool | Equals (Token other) |
Equality depends on TypeInt and Value, but not StartIndex and Length (this is the same equality condition as LNode). More... | |
override int | GetHashCode () |
Token | TryGet (int index, out bool fail) |
IEnumerator< Token > | GetEnumerator () |
System.Collections.IEnumerator System.Collections.IEnumerable. | GetEnumerator () |
IRange< Token > IListSource < Token >. | Slice (int start, int count) |
Slice_< Token > | Slice (int start, int count) |
IToken< int > IToken< int >. | WithType (int type) |
Token | WithType (int type) |
IToken< int > IToken< int >. | WithValue (object value) |
Token | WithValue (object value) |
Token | WithRange (int startIndex, int endIndex) |
Token | WithStartIndex (int startIndex) |
IToken< int > ICloneable < IToken< int > >. | Clone () |
object | ToSourceRange (ISourceFile sourceFile) |
LNode | ToLNode (ISourceFile file) |
Converts a Token to a LNode. More... | |
Static Public Member Functions | |
static bool | IsOpener (TokenKind tt) |
static bool | IsCloser (TokenKind tt) |
static bool | IsOpenerOrCloser (TokenKind tt) |
static Symbol | GetParenPairSymbol (TokenKind k, TokenKind k2) |
|
inline |
Equality depends on TypeInt and Value, but not StartIndex and Length (this is the same equality condition as LNode).
|
inline |
Returns true if the specified type and value match this token.
|
inline |
Gets the SourceRange of a token, under the assumption that the token came from the specified source file.
|
inline |
Gets the original source text for a token if available, under the assumption that the specified source file correctly specifies where the token came from. If the token is synthetic, returns UString.Null.
|
inline |
file | This becomes the LNode.Source property. |
If you really need to store tokens as LNodes, use this. Only the Kind, not the TypeInt, is preserved. Identifiers (where Kind==TokenKind.Id and Value is Symbol) are translated as Id nodes; everything else is translated as a call, using the TokenKind as the LNode.Name and the value, if any, as parameters. For example, if it has been treeified with TokensToTree, the token list for "Nodes".Substring(1, 3)
as parsed by LES might translate to the LNode sequence String("Nodes"), Dot(@.), Substring, LParam(Number(1), Separator(@,), Number(3)), RParen()
. The LNode.Range will match the range of the token.
|
inline |
Reconstructs a string that represents the token, if possible. Does not work for whitespace and comments, because the value of these token types is stored in the original source file and for performance reasons is not copied to the token.
This does not return the original source text; it uses a language- specific stringizer (ToStringStrategy).
The returned string, in general, will not match the original token, since the ToStringStrategy does not have access to the original source file.
readonly int Loyc.Syntax.Lexing.Token.StartIndex |
Location in the orginal source file where the token starts, or -1 for a synthetic token.
|
static |
readonly int Loyc.Syntax.Lexing.Token.TypeInt |
Token type.
object Loyc.Syntax.Lexing.Token.Value |
The parsed value of the token.
The value is
1+1
) For performance reasons, the text of whitespace is not extracted from the source file; Value is WhitespaceTag.Value for whitespace. Value must be assigned for other types such as identifiers and literals.
Since the same identifiers and literals are often used more than once in a given source file, an optimized lexer could use a data structure such as a trie or hashtable to cache boxed literals and identifier symbols, and re-use the same values when the same identifiers and literals are encountered multiple times. Done carefully, this avoids the overhead of repeatedly extracting string objects from the source file. If strings must be extracted for some reason (e.g. double.TryParse
requires an extracted string), at least memory can be saved.
Referenced by Loyc.Syntax.Les.LesIndentTokenGenerator.GetTokenCategory(), and Loyc.Ecs.Parser.TokenExt.ToString().
|
get |
|
get |
Returns StartIndex + Length.
Referenced by Loyc.Syntax.Les.LesIndentTokenGenerator.MakeIndentToken().
|
get |
Returns true if Value == WhitespaceTag.Value.
|
get |
Token kind.
Referenced by Loyc.Ecs.Parser.TokenExt.ToString().
|
get |
Length of the token in the source file, or 0 for a synthetic or implied token.
|
get |
8 bits of nonsemantic information about the token. The style is used to distinguish hex literals from decimal literals, or triple- quoted strings from double-quoted strings.