A common token type recommended for Loyc languages that want to use features such as token literals or the TokensToTree class. More...

Source file:

/Core/Loyc.Syntax/Lexing/Token.cs

Inheritance diagram for Loyc.Syntax.Lexing.Token:

Remarks

A common token type recommended for Loyc languages that want to use features such as token literals or the TokensToTree class.

For performance reasons, a Token ought to be a structure rather than a class. But if Token is a struct, we have a conundrum: how do we support tokens from different languages? We can't use inheritance since structs do not support it. When EC# is ready, we could use a single struct plus an alias for each language, but of course this structure predates the implementation of EC#.

Luckily, tokens in most languages are very similar. A four-word structure generally suffices:

TypeInt: each language can use a different set of token types represented by a different enum. All enums can be converted to an integer, so Token uses Int32 as the token type. In order to support DSLs via token literals (e.g. LLLPG is a DSL inside EC#), the TypeInt should be based on TokenKind.
Value: this can be any object. For literals, this should be the actual value of the literal, for whitespace it should be WhitespaceTag.Value, etc. See Value for the complete list.
StartIndex: location in the original source file where the token starts.
Length: length of the token in the source file (24 bits).
Style: 8 bits for other information.

Originally I planned to use Symbol as the common token type, because it is extensible and could nicely represent tokens in all languages; unfortunately, Symbol may reduce parsing performance because it cannot be used with the switch opcode (i.e. the switch statement in C#), so I decided to switch to integers instead and to introduce the concept of TokenKind, which is derived from Type using TokenKind.KindMask. Each language should have, in the namespace of that language, an extension method public static TokenType Type(this Token t) that converts the TypeInt to the enum type for that language.

To save space (and because .NET doesn't handle large structures well), tokens do not know what source file they came from and cannot convert their location to a line number. For this reason, one should keep a reference to the ISourceFile and call IIndexToLine.IndexToLine(int) to get the source location.

A generic token also cannot convert itself to a properly-formatted string. The ToString method does allow

Public fields
readonly int	TypeInt
	Token type. More...

readonly int	StartIndex
	Location in the orginal source file where the token starts, or -1 for a synthetic token. More...

int	_length

const int	LengthMask = 0x00FFFFFF

const int	StyleMask = unchecked((int)0xFF000000)

const int	StyleShift = 24

object	Value
	The parsed value of the token. More...

const int	TokenKindShift = 8

const int	NumPuncSymbols = ((TokenKind.RBrace - TokenKind.LParen) >> TokenKindShift) + 1

Public static fields
static readonly ThreadLocalVariable< Func < Token, string > >	ToStringStrategyTLV = new ThreadLocalVariable<Func<Token,string>>(Loyc.Syntax.Les.TokenExt.ToString)

static readonly Symbol	Parens = GSymbol.Get("()")

static readonly Symbol	IndentDedent = GSymbol.Get("IndentDedent")

static readonly Symbol	LOtherROther = GSymbol.Get("LOtherROther")

static readonly Symbol[]	TokenKindPunctuationSymbols

static readonly InternalList < Symbol >	_kindAttrTable = KindAttrTable()

Properties
TokenKind	Kind `[get]`
	Token kind. More...

int ISimpleToken< int >.	StartIndex `[get]`

int	Length `[get]`
	Length of the token in the source file, or 0 for a synthetic or implied token. More...

NodeStyle	Style `[get]`
	8 bits of nonsemantic information about the token. The style is used to distinguish hex literals from decimal literals, or triple- quoted strings from double-quoted strings. More...

TokenTree	Children `[get]`
	Returns Value as TokenTree (null if not a TokenTree). More...

int	EndIndex `[get]`
	Returns StartIndex + Length. More...

bool	IsWhitespace `[get]`
	Returns true if Value == WhitespaceTag.Value. More...

static Func< Token, string >	ToStringStrategy `[get, set]`
	Gets or sets the strategy used by ToString. More...

Token	this[int index] `[get]`

int	Count `[get]`

int ISimpleToken< int >.	Type `[get]`

object IHasValue< object >.	Value `[get]`

IListSource< IToken< int > > IToken< int >.	Children `[get]`

Properties inherited from Loyc.Syntax.Lexing.IToken< TT >
int	Length `[get]`

TokenKind	Kind `[get]`

IListSource< IToken< TT > >	Children `[get]`

Properties inherited from Loyc.Syntax.Lexing.ISimpleToken< TokenType >
TokenType	Type `[get]`
	The category of the token (integer, keyword, etc.) used as the primary value for identifying the token in a parser. More...

int	StartIndex `[get]`
	Character index where the token starts in the source file. More...

Properties inherited from Loyc.IHasValue< out T >
T	Value `[get]`

Public Member Functions
	Token (int type, int startIndex, int length, NodeStyle style=0, object value=null)

	Token (int type, int startIndex, int length, object value)

bool	Is (int type, object value)
	Returns true if the specified type and value match this token. More...

SourceRange	Range (ISourceFile sf)
	Gets the SourceRange of a token, under the assumption that the token came from the specified source file. More...

SourceRange	Range (ILexer< Token > l)

UString	SourceText (ICharSource file)
	Gets the original source text for a token if available, under the assumption that the specified source file correctly specifies where the token came from. If the token is synthetic, returns UString.Null. More...

UString	SourceText (ILexer< Token > l)

override string	ToString ()
	Reconstructs a string that represents the token, if possible. Does not work for whitespace and comments, because the value of these token types is stored in the original source file and for performance reasons is not copied to the token. More...

override bool	Equals (object obj)

bool	Equals (Token other)
	Equality depends on TypeInt and Value, but not StartIndex and Length (this is the same equality condition as LNode). More...

override int	GetHashCode ()

Token	TryGet (int index, out bool fail)

IEnumerator< Token >	GetEnumerator ()

System.Collections.IEnumerator System.Collections.IEnumerable.	GetEnumerator ()

IRange< Token > IListSource < Token >.	Slice (int start, int count)

Slice_< Token >	Slice (int start, int count)

IToken< int > IToken< int >.	WithType (int type)

Token	WithType (int type)

IToken< int > IToken< int >.	WithValue (object value)

Token	WithValue (object value)

Token	WithRange (int startIndex, int endIndex)

Token	WithStartIndex (int startIndex)

IToken< int > ICloneable < IToken< int > >.	Clone ()

object	ToSourceRange (ISourceFile sourceFile)

LNode	ToLNode (ISourceFile file)
	Converts a Token to a LNode. More...

Static Public Member Functions
static bool	IsOpener (TokenKind tt)

static bool	IsCloser (TokenKind tt)

static bool	IsOpenerOrCloser (TokenKind tt)

static Symbol	GetParenPairSymbol (TokenKind k, TokenKind k2)

Member Function Documentation

bool Loyc.Syntax.Lexing.Token.Equals ( Token other )

inline

Equality depends on TypeInt and Value, but not StartIndex and Length (this is the same equality condition as LNode).

bool Loyc.Syntax.Lexing.Token.Is	(	int	type,
		object	value
	)

inline

Returns true if the specified type and value match this token.

SourceRange Loyc.Syntax.Lexing.Token.Range ( ISourceFile sf )

inline

Gets the SourceRange of a token, under the assumption that the token came from the specified source file.

UString Loyc.Syntax.Lexing.Token.SourceText ( ICharSource file )

inline

Gets the original source text for a token if available, under the assumption that the specified source file correctly specifies where the token came from. If the token is synthetic, returns UString.Null.

LNode Loyc.Syntax.Lexing.Token.ToLNode ( ISourceFile file )

inline

Converts a Token to a LNode.

Parameters

file	This becomes the LNode.Source property.

If you really need to store tokens as LNodes, use this. Only the Kind, not the TypeInt, is preserved. Identifiers (where Kind==TokenKind.Id and Value is Symbol) are translated as Id nodes; everything else is translated as a call, using the TokenKind as the LNode.Name and the value, if any, as parameters. For example, if it has been treeified with TokensToTree, the token list for "Nodes".Substring(1, 3) as parsed by LES might translate to the LNode sequence String("Nodes"), Dot(@.), Substring, LParam(Number(1), Separator(@,), Number(3)), RParen(). The LNode.Range will match the range of the token.

override string Loyc.Syntax.Lexing.Token.ToString ( )

inline

Reconstructs a string that represents the token, if possible. Does not work for whitespace and comments, because the value of these token types is stored in the original source file and for performance reasons is not copied to the token.

This does not return the original source text; it uses a language- specific stringizer (ToStringStrategy).

The returned string, in general, will not match the original token, since the ToStringStrategy does not have access to the original source file.

Member Data Documentation

readonly int Loyc.Syntax.Lexing.Token.StartIndex

Location in the orginal source file where the token starts, or -1 for a synthetic token.

readonly Symbol [] Loyc.Syntax.Lexing.Token.TokenKindPunctuationSymbols

static

Initial value:

= new Symbol[NumPuncSymbols] {
            (Symbol)"(", (Symbol)")", 
            (Symbol)"[", (Symbol)"]",
            (Symbol)"{", (Symbol)"}"
        }

readonly int Loyc.Syntax.Lexing.Token.TypeInt

Token type.

object Loyc.Syntax.Lexing.Token.Value

The parsed value of the token.

The value is

For strings: the parsed value of the string (no quotes, escape sequences removed), i.e. a boxed char or string. A backquoted string is converted to a Symbol because it is a kind of operator.
For numbers: the parsed value of the number (e.g. 4 => int, 4L => long, 4.0f => float)
For identifiers: the parsed name of the identifier, as a Symbol (e.g. x => x, => for, <tt>1+1 => 1+1)
For any keyword including AttrKeyword and TypeKeyword tokens: a Symbol containing the name of the keyword, with "#" prefix
For punctuation and operators: the text of the punctuation as a symbol (with '#' in front, if the language conventionally uses this prefix)
For openers (open paren, open brace, etc.) after the tokens have been processed by TokensToTree: a TokenTree object.
For spaces and comments: WhitespaceTag.Value
When no value is needed (because the Type() is enough): null

For performance reasons, the text of whitespace is not extracted from the source file; Value is WhitespaceTag.Value for whitespace. Value must be assigned for other types such as identifiers and literals.

Since the same identifiers and literals are often used more than once in a given source file, an optimized lexer could use a data structure such as a trie or hashtable to cache boxed literals and identifier symbols, and re-use the same values when the same identifiers and literals are encountered multiple times. Done carefully, this avoids the overhead of repeatedly extracting string objects from the source file. If strings must be extracted for some reason (e.g. double.TryParse requires an extracted string), at least memory can be saved.

Referenced by Loyc.Syntax.Les.LesIndentTokenGenerator.GetTokenCategory(), and Loyc.Ecs.Parser.TokenExt.ToString().

Property Documentation

TokenTree Loyc.Syntax.Lexing.Token.Children

get

Returns Value as TokenTree (null if not a TokenTree).

int Loyc.Syntax.Lexing.Token.EndIndex

get

Returns StartIndex + Length.

Referenced by Loyc.Syntax.Les.LesIndentTokenGenerator.MakeIndentToken().

bool Loyc.Syntax.Lexing.Token.IsWhitespace

get

Returns true if Value == WhitespaceTag.Value.

TokenKind Loyc.Syntax.Lexing.Token.Kind

get

Token kind.

Referenced by Loyc.Ecs.Parser.TokenExt.ToString().

int Loyc.Syntax.Lexing.Token.Length

get

Length of the token in the source file, or 0 for a synthetic or implied token.

NodeStyle Loyc.Syntax.Lexing.Token.Style

get

8 bits of nonsemantic information about the token. The style is used to distinguish hex literals from decimal literals, or triple- quoted strings from double-quoted strings.

Func<Token, string> Loyc.Syntax.Lexing.Token.ToStringStrategy

staticgetset

Gets or sets the strategy used by ToString.

Documentation moved to ecsharp.net

Remarks

Public fields

Public static fields

Properties

Public Member Functions

Static Public Member Functions

Member Function Documentation

Member Data Documentation

Property Documentation