Enhanced C#
Language of your choice: library documentation

Documentation moved to ecsharp.net

GitHub doesn't support HTTP redirects, so you'll be redirected in 3 seconds.

 All Classes Namespaces Functions Variables Enumerations Enumerator Properties Events Pages
Classes | Enumerations
Package Loyc.Syntax.Lexing

Contains classes related to lexical analysis, such as the universal token type (Loyc.Syntax.Lexing.Token) and Loyc.Syntax.Lexing.TokensToTree. More...

Classes

class  BaseILexer< CharSrc, Token >
 A version of BaseLexer{CharSrc} that implements ILexer{Token}. You should use this base class if you want to wrap your lexer in a postprocessor such as IndentTokenGenerator or TokensToTree. More...
 
class  BaseLexer
 Alias for BaseLexer{C} where C is ICharSource. More...
 
class  BaseLexer< CharSrc >
 The recommended base class for lexers generated by LLLPG, when not using the inputSource option. More...
 
interface  ILexer< Token >
 A standard interface for lexers. More...
 
interface  ILllpgApi< Token, MatchType, LaType >
 For reference purposes, this interface is a list of the non-static methods that LLLPG expects to be able to call when it is generating code. LLLPG does not actually need lexers and parsers to implement this interface; they simply need to implement the same set of methods as this interface contains. More...
 
interface  ILllpgLexerApi< Token >
 For reference purposes, this interface contains the non-static methods that LLLPG expects lexers to implement. LLLPG does not actually expect lexers to implement this interface; they simply need to implement the same set of methods as this interface contains. More...
 
class  IndentTokenGenerator
 A preprocessor usually inserted between the lexer and parser that inserts "indent", "dedent", and "end-of-line" tokens at appropriate places in a token stream. More...
 
class  IndentTokenGenerator< Token >
 A preprocessor usually inserted between the lexer and parser that inserts "indent", "dedent", and "end-of-line" tokens at appropriate places in a token stream. More...
 
interface  ISimpleToken
 Alias for ISimpleToken{int}. More...
 
interface  ISimpleToken< TokenType >
 Basic information about a token as expected by BaseParser{Token}: a token Type, which is the type of a "word" in the program (string, identifier, plus sign, etc.), a value (e.g. the name of an identifier), and an index where the token starts in the source file. More...
 
interface  IToken< TT >
 The methods of Token in the form of an interface. More...
 
class  LexerSource
 A synonym for LexerSource{C} where C is ICharSource. More...
 
class  LexerSource< CharSrc >
 An implementation of the LLLPG Lexer API, used with the LLLPG options inputSource and inputClass. More...
 
class  LexerSourceFile< CharSource >
 Adds the AfterNewline method to SourceFile. More...
 
class  LexerSourceWorkaround< CharSrc >
 This class only exists to work around a limitation of the C# language: "cannot change access modifiers when overriding 'protected' inherited member Error(...)". More...
 
class  LexerWrapper< Token >
 A base class for wrappers that modify lexer behavior. Implements the ILexer interface, except for the NextToken() method. More...
 
struct  Token
 A common token type recommended for Loyc languages that want to use features such as token literals or the TokensToTree class. More...
 
class  TokenListAsLexer
 Adapter: converts IEnumerable(Token) to the ILexer{Token} interface. More...
 
class  TokensToTree
 A preprocessor usually inserted between the lexer and parser that converts a token list into a token tree. Everything inside brackets, parens or braces is made a child of the open bracket. More...
 
class  TokenTree
 A list of Token structures along with the ISourceFile object that represents the source file that the tokens came from. More...
 
class  WhitespaceFilter
 Alias for WhitespaceFilter{Token} More...
 
class  WhitespaceFilter< Token >
 Filters out tokens whose Value is WhitespaceTag.Value. More...
 
class  WhitespaceTag
 WhitespaceTag.Value can be used as the Token.Value of whitespace tokens, to make whitespace easy to filter out. More...
 

Enumerations

enum  TokenKind {
  TokenKind.Spaces = 0x0000, TokenKind.Comment = 0x0100, TokenKind.Id = 0x0200,
  TokenKind.Literal = 0x0300, TokenKind.Dot = 0x0600, TokenKind.Assignment = 0x0700,
  TokenKind.Operator = 0x0800, TokenKind.Separator = 0x0900, TokenKind.AttrKeyword = 0x0A00,
  TokenKind.TypeKeyword = 0x0B00, TokenKind.OtherKeyword = 0x0C00, TokenKind.Other = 0x0F00,
  LParen = 0x1000, RParen = 0x1100, LBrack = 0x1200,
  RBrack = 0x1300, LBrace = 0x1400, RBrace = 0x1500,
  Indent = 0x1600, Dedent = 0x1700, LOther = 0x1800,
  ROther = 0x1900, KindMask = 0x1F00
}
 A list of token categories that most programming languages have. More...
 

Detailed Description

Contains classes related to lexical analysis, such as the universal token type (Loyc.Syntax.Lexing.Token) and Loyc.Syntax.Lexing.TokensToTree.

Enumeration Type Documentation

A list of token categories that most programming languages have.

Some Loyc languages will support the concept of a "token literal" which is a TokenTree, and some DSLs will rely on these token literals for input. However, tokens differ between different languages; for instance the set of operators varies between languages. On the other hand, most languages do have some concept of "an operator" and "an identifier", and the TokenKind reflects this fact.

When you are using Token to represent tokens in your language, it is recommended to define every value of your "TokenType" enumeration in terms of TokenKind using integer offsets, like this:

enum MyTokenType {
    EOF         = TokenKind.Spaces,
    Id          = TokenKind.Id,
    IfKeyword   = TokenKind.OtherKeyword,
    ForKeyword  = TokenKind.OtherKeyword + 1,
    LoopKeyword = TokenKind.OtherKeyword + 2,
    ...
    MulOp   = TokenKind.Operator,
    AddOp   = TokenKind.Operator + 1,
    DivOp   = TokenKind.Operator + 2,
    DotOp   = TokenKind.Dot,
    ...
}

Using TokenKind is only important if you intend to support DSLs via token literals (e.g. LLLPG) in your language.

A DSL that just needs simple tokens like "strings", "identifiers" and "dots" can write a parser based on values of Token.Kind alone; if it needs certain specific operators or "keywords" that do not have a dedicated TokenKind, such as + and %, it can further check the Value of the token; meanwhile, the host language put a global Symbol in the Token.Value to represent operators, keywords and identifiers.

Enumerator
Spaces 

Spaces, tabs, non-semantic newlines, and EOF

Spaces and comments are typically filtered out before parsing and will not appear in token literals.

Comment 

Single- and multi-line comments

Spaces and comments are typically filtered out before parsing and will not appear in token literals.

Id 

Simple identifiers

Literal 

Literals, such as numbers and strings.

Dot 

Scope operator (dot and dot-like ops such as :: in C++)

Assignment 

Simple or compound assignment

Operator 

All operators except assignment, dot, or separators

Separator 

e.g. semicolon, comma (if not considered an operator)

AttrKeyword 

e.g. public, private, static, virtual

TypeKeyword 

e.g. int, bool, double, void

OtherKeyword 

e.g. sizeof, struct

Other 

For token types not covered by other token kinds.