Enhanced C#
Language of your choice: library documentation

Documentation moved to ecsharp.net

GitHub doesn't support HTTP redirects, so you'll be redirected in 3 seconds.

 All Classes Namespaces Functions Variables Enumerations Enumerator Properties Events Pages
Public fields | Public static fields | Properties | Public Member Functions | Static Public Member Functions | List of all members
Loyc.Syntax.Lexing.Token Struct Reference

A common token type recommended for Loyc languages that want to use features such as token literals or the TokensToTree class. More...


Source file:
Inheritance diagram for Loyc.Syntax.Lexing.Token:
Loyc.Syntax.Lexing.IToken< TT > Loyc.Syntax.Lexing.ISimpleToken< TokenType > Loyc.ICloneable< out T > Loyc.IHasValue< out T >

Remarks

A common token type recommended for Loyc languages that want to use features such as token literals or the TokensToTree class.

For performance reasons, a Token ought to be a structure rather than a class. But if Token is a struct, we have a conundrum: how do we support tokens from different languages? We can't use inheritance since structs do not support it. When EC# is ready, we could use a single struct plus an alias for each language, but of course this structure predates the implementation of EC#.

Luckily, tokens in most languages are very similar. A four-word structure generally suffices:

  1. TypeInt: each language can use a different set of token types represented by a different enum. All enums can be converted to an integer, so Token uses Int32 as the token type. In order to support DSLs via token literals (e.g. LLLPG is a DSL inside EC#), the TypeInt should be based on TokenKind.
  2. Value: this can be any object. For literals, this should be the actual value of the literal, for whitespace it should be WhitespaceTag.Value, etc. See Value for the complete list.
  3. StartIndex: location in the original source file where the token starts.
  4. Length: length of the token in the source file (24 bits).
  5. Style: 8 bits for other information.

Originally I planned to use Symbol as the common token type, because it is extensible and could nicely represent tokens in all languages; unfortunately, Symbol may reduce parsing performance because it cannot be used with the switch opcode (i.e. the switch statement in C#), so I decided to switch to integers instead and to introduce the concept of TokenKind, which is derived from Type using TokenKind.KindMask. Each language should have, in the namespace of that language, an extension method public static TokenType Type(this Token t) that converts the TypeInt to the enum type for that language.

To save space (and because .NET doesn't handle large structures well), tokens do not know what source file they came from and cannot convert their location to a line number. For this reason, one should keep a reference to the ISourceFile and call IIndexToLine.IndexToLine(int) to get the source location.

A generic token also cannot convert itself to a properly-formatted string. The ToString method does allow

Public fields

readonly int TypeInt
 Token type. More...
 
readonly int StartIndex
 Location in the orginal source file where the token starts, or -1 for a synthetic token. More...
 
int _length
 
const int LengthMask = 0x00FFFFFF
 
const int StyleMask = unchecked((int)0xFF000000)
 
const int StyleShift = 24
 
object Value
 The parsed value of the token. More...
 
const int TokenKindShift = 8
 
const int NumPuncSymbols = ((TokenKind.RBrace - TokenKind.LParen) >> TokenKindShift) + 1
 

Public static fields

static readonly
ThreadLocalVariable< Func
< Token, string > > 
ToStringStrategyTLV = new ThreadLocalVariable<Func<Token,string>>(Loyc.Syntax.Les.TokenExt.ToString)
 
static readonly Symbol Parens = GSymbol.Get("()")
 
static readonly Symbol IndentDedent = GSymbol.Get("IndentDedent")
 
static readonly Symbol LOtherROther = GSymbol.Get("LOtherROther")
 
static readonly Symbol[] TokenKindPunctuationSymbols
 
static readonly InternalList
< Symbol
_kindAttrTable = KindAttrTable()
 

Properties

TokenKind Kind [get]
 Token kind. More...
 
int ISimpleToken< int >. StartIndex [get]
 
int Length [get]
 Length of the token in the source file, or 0 for a synthetic or implied token. More...
 
NodeStyle Style [get]
 8 bits of nonsemantic information about the token. The style is used to distinguish hex literals from decimal literals, or triple- quoted strings from double-quoted strings. More...
 
TokenTree Children [get]
 Returns Value as TokenTree (null if not a TokenTree). More...
 
int EndIndex [get]
 Returns StartIndex + Length. More...
 
bool IsWhitespace [get]
 Returns true if Value == WhitespaceTag.Value. More...
 
static Func< Token, string > ToStringStrategy [get, set]
 Gets or sets the strategy used by ToString. More...
 
Token this[int index] [get]
 
int Count [get]
 
int ISimpleToken< int >. Type [get]
 
object IHasValue< object >. Value [get]
 
IListSource< IToken< int >
> IToken< int >. 
Children [get]
 
- Properties inherited from Loyc.Syntax.Lexing.IToken< TT >
int Length [get]
 
TokenKind Kind [get]
 
IListSource< IToken< TT > > Children [get]
 
- Properties inherited from Loyc.Syntax.Lexing.ISimpleToken< TokenType >
TokenType Type [get]
 The category of the token (integer, keyword, etc.) used as the primary value for identifying the token in a parser. More...
 
int StartIndex [get]
 Character index where the token starts in the source file. More...
 
- Properties inherited from Loyc.IHasValue< out T >
Value [get]
 

Public Member Functions

 Token (int type, int startIndex, int length, NodeStyle style=0, object value=null)
 
 Token (int type, int startIndex, int length, object value)
 
bool Is (int type, object value)
 Returns true if the specified type and value match this token. More...
 
SourceRange Range (ISourceFile sf)
 Gets the SourceRange of a token, under the assumption that the token came from the specified source file. More...
 
SourceRange Range (ILexer< Token > l)
 
UString SourceText (ICharSource file)
 Gets the original source text for a token if available, under the assumption that the specified source file correctly specifies where the token came from. If the token is synthetic, returns UString.Null. More...
 
UString SourceText (ILexer< Token > l)
 
override string ToString ()
 Reconstructs a string that represents the token, if possible. Does not work for whitespace and comments, because the value of these token types is stored in the original source file and for performance reasons is not copied to the token. More...
 
override bool Equals (object obj)
 
bool Equals (Token other)
 Equality depends on TypeInt and Value, but not StartIndex and Length (this is the same equality condition as LNode). More...
 
override int GetHashCode ()
 
Token TryGet (int index, out bool fail)
 
IEnumerator< TokenGetEnumerator ()
 
System.Collections.IEnumerator
System.Collections.IEnumerable. 
GetEnumerator ()
 
IRange< Token > IListSource
< Token >. 
Slice (int start, int count)
 
Slice_< TokenSlice (int start, int count)
 
IToken< int > IToken< int >. WithType (int type)
 
Token WithType (int type)
 
IToken< int > IToken< int >. WithValue (object value)
 
Token WithValue (object value)
 
Token WithRange (int startIndex, int endIndex)
 
Token WithStartIndex (int startIndex)
 
IToken< int > ICloneable
< IToken< int > >. 
Clone ()
 
object ToSourceRange (ISourceFile sourceFile)
 
LNode ToLNode (ISourceFile file)
 Converts a Token to a LNode. More...
 

Static Public Member Functions

static bool IsOpener (TokenKind tt)
 
static bool IsCloser (TokenKind tt)
 
static bool IsOpenerOrCloser (TokenKind tt)
 
static Symbol GetParenPairSymbol (TokenKind k, TokenKind k2)
 

Member Function Documentation

bool Loyc.Syntax.Lexing.Token.Equals ( Token  other)
inline

Equality depends on TypeInt and Value, but not StartIndex and Length (this is the same equality condition as LNode).

bool Loyc.Syntax.Lexing.Token.Is ( int  type,
object  value 
)
inline

Returns true if the specified type and value match this token.

SourceRange Loyc.Syntax.Lexing.Token.Range ( ISourceFile  sf)
inline

Gets the SourceRange of a token, under the assumption that the token came from the specified source file.

UString Loyc.Syntax.Lexing.Token.SourceText ( ICharSource  file)
inline

Gets the original source text for a token if available, under the assumption that the specified source file correctly specifies where the token came from. If the token is synthetic, returns UString.Null.

LNode Loyc.Syntax.Lexing.Token.ToLNode ( ISourceFile  file)
inline

Converts a Token to a LNode.

Parameters
fileThis becomes the LNode.Source property.

If you really need to store tokens as LNodes, use this. Only the Kind, not the TypeInt, is preserved. Identifiers (where Kind==TokenKind.Id and Value is Symbol) are translated as Id nodes; everything else is translated as a call, using the TokenKind as the LNode.Name and the value, if any, as parameters. For example, if it has been treeified with TokensToTree, the token list for "Nodes".Substring(1, 3) as parsed by LES might translate to the LNode sequence String("Nodes"), Dot(@.), Substring, LParam(Number(1), Separator(@,), Number(3)), RParen(). The LNode.Range will match the range of the token.

override string Loyc.Syntax.Lexing.Token.ToString ( )
inline

Reconstructs a string that represents the token, if possible. Does not work for whitespace and comments, because the value of these token types is stored in the original source file and for performance reasons is not copied to the token.

This does not return the original source text; it uses a language- specific stringizer (ToStringStrategy).

The returned string, in general, will not match the original token, since the ToStringStrategy does not have access to the original source file.

Member Data Documentation

readonly int Loyc.Syntax.Lexing.Token.StartIndex

Location in the orginal source file where the token starts, or -1 for a synthetic token.

readonly Symbol [] Loyc.Syntax.Lexing.Token.TokenKindPunctuationSymbols
static
Initial value:
= new Symbol[NumPuncSymbols] {
(Symbol)"(", (Symbol)")",
(Symbol)"[", (Symbol)"]",
(Symbol)"{", (Symbol)"}"
}
readonly int Loyc.Syntax.Lexing.Token.TypeInt

Token type.

object Loyc.Syntax.Lexing.Token.Value

The parsed value of the token.

The value is

  • For strings: the parsed value of the string (no quotes, escape sequences removed), i.e. a boxed char or string. A backquoted string is converted to a Symbol because it is a kind of operator.
  • For numbers: the parsed value of the number (e.g. 4 => int, 4L => long, 4.0f => float)
  • For identifiers: the parsed name of the identifier, as a Symbol (e.g. x => x, => for, <tt>1+1 => 1+1)
  • For any keyword including AttrKeyword and TypeKeyword tokens: a Symbol containing the name of the keyword, with "#" prefix
  • For punctuation and operators: the text of the punctuation as a symbol (with '#' in front, if the language conventionally uses this prefix)
  • For openers (open paren, open brace, etc.) after the tokens have been processed by TokensToTree: a TokenTree object.
  • For spaces and comments: WhitespaceTag.Value
  • When no value is needed (because the Type() is enough): null

For performance reasons, the text of whitespace is not extracted from the source file; Value is WhitespaceTag.Value for whitespace. Value must be assigned for other types such as identifiers and literals.

Since the same identifiers and literals are often used more than once in a given source file, an optimized lexer could use a data structure such as a trie or hashtable to cache boxed literals and identifier symbols, and re-use the same values when the same identifiers and literals are encountered multiple times. Done carefully, this avoids the overhead of repeatedly extracting string objects from the source file. If strings must be extracted for some reason (e.g. double.TryParse requires an extracted string), at least memory can be saved.

Referenced by Loyc.Syntax.Les.LesIndentTokenGenerator.GetTokenCategory(), and Loyc.Ecs.Parser.TokenExt.ToString().

Property Documentation

TokenTree Loyc.Syntax.Lexing.Token.Children
get

Returns Value as TokenTree (null if not a TokenTree).

int Loyc.Syntax.Lexing.Token.EndIndex
get

Returns StartIndex + Length.

Referenced by Loyc.Syntax.Les.LesIndentTokenGenerator.MakeIndentToken().

bool Loyc.Syntax.Lexing.Token.IsWhitespace
get

Returns true if Value == WhitespaceTag.Value.

TokenKind Loyc.Syntax.Lexing.Token.Kind
get
int Loyc.Syntax.Lexing.Token.Length
get

Length of the token in the source file, or 0 for a synthetic or implied token.

NodeStyle Loyc.Syntax.Lexing.Token.Style
get

8 bits of nonsemantic information about the token. The style is used to distinguish hex literals from decimal literals, or triple- quoted strings from double-quoted strings.

Func<Token, string> Loyc.Syntax.Lexing.Token.ToStringStrategy
staticgetset

Gets or sets the strategy used by ToString.