Enhanced C#
Language of your choice: library documentation

Documentation moved to ecsharp.net

GitHub doesn't support HTTP redirects, so you'll be redirected in 3 seconds.

 All Classes Namespaces Functions Variables Enumerations Enumerator Properties Events Pages
Properties | Public Member Functions | Protected Member Functions | List of all members
Loyc.Syntax.Lexing.IndentTokenGenerator Class Reference

A preprocessor usually inserted between the lexer and parser that inserts "indent", "dedent", and "end-of-line" tokens at appropriate places in a token stream. More...


Source file:

Remarks

A preprocessor usually inserted between the lexer and parser that inserts "indent", "dedent", and "end-of-line" tokens at appropriate places in a token stream.

Suppose you use an IndentToken and DedentToken that are equal to the token types you've chosen for { braces } (e.g. (TokenKind.LBrace and TokenKind.RBrace), the only indent trigger is a colon (:), and you set EolToken to the token type you're using for semicolons. Then the token stream from input such as

def Sqrt(value):
if value == 0: return 0
g = 0; bshft = Log2Floor(value) >> 1;
b = 1 << bshft
do:
temp = (g + g + b) << bshft
if value >= temp: g += b
value -= temp
b >>= 1
while (bshft– > 0)
return g

will be converted to a token stream equivalent to

def Sqrt(value): {
if value == 0: { return 0;
} g = 0; bshft = Log2Floor(value) >> 1;
b = 1 << bshft;
do: {
temp = (g + g + b) << bshft
if value >= temp: { g += b;
value -= temp;
} b >>= 1;
} while (bshft– > 0);
return g;
}

That is, a semicolon is added to lines that don't already have one, open braces are inserted right after colons, and semicolons are not added right after opening braces.

If multiple indents occur on a single line, as in

if x: if y:
Foo(x, y)

The output will be like this:

if x: { if y: {
Foo(x, y);
}}

Configuration for Python

Newlines generally represent the end of a statement, while colons mark places where a "child" block is expected. Inside parenthesis, square brackets, or braces, newlines are ignored:

s = ("this is a pretty long string that I'd like "
+ " to continue writing on the next line")

And, inside brackets, indentation is ignored, so this is allowed:

if foo:
s = ("this is a pretty long string that I'd like "
+ " to continue writing on the next line")
print(s)

Note that if you don't use brackets, Python 3 doesn't try to figure out if you "really" meant to continue a statement on the next line:

<h1>SyntaxError after '+': invalid syntax</h1>
s = "this is a pretty long string that I'd like " +
" to continue writing on the next line"

Thus OpenBrackets and CloseBrackets should be ( [ { and ) ] }, respectively. IndentType and DedentType should be synthetic Indent and Dedent tokens, since curly braces have a different meaning (they define a dictionary).

In Python, it appears you can't write two "block" statements on one line, as in this example:

if True: if True: print() # SyntaxError: invalid syntax

You're also not allowed to indent the next line if the block statement on the current line is followed by another statement:

if True: print('a')
print('b') # IndentationError: unexpected indent

But you can switch style in different branches:

if True:
print("t")
else: print("f")
try: print("t")
except:
print("e")

Also, although you can normally separate statements with semicolons:

print("hell", end=""); print("o")

You are not allowed to write this:

print("?"); if True: # SyntaxError: invalid syntax
print("t")

Considering these three facts, I would say that the colon should be classified as an EOL indent trigger (EolIndentTriggers), and the parser should

  1. recognize non-block statements separately from block statements,
  2. expect a colon to be followed by either an indented block of a non-block statement, but
  3. recognize a non-block "statement" as a list of statements separated by semicolons, with an optional semicolon at the end.

Now, Python doesn't allow a block statement without a pass, e.g.:

if cond: # "do nothing"
return # IndentationError: expected an indented block

I'm inclined to treat this as a special case to be detected in the parser. And although you can write a semicolon on a line by itself, you can't write any of these lines:

if cond: ; # SyntaxError: invalid syntax
print(); ; print() # SyntaxError: invalid syntax
; ; # SyntaxError: invalid syntax

My interpretation is that a semicolon by itself is treated as a block statement (i.e. illegal in a non-block statement context). Since a semicolon is not treated the same way as a newline, the EolToken should be a special token, not a semicolon.

Configuration for LES

For more information about LES's indent processing, see LesIndentTokenGenerator .

See also
IndentTokenGenerator{Token}

Properties

int[] AllIndentTriggers [get, set]
 
int[] EolIndentTriggers [get, set]
 
Token EolToken [get, set]
 Gets or sets the prototype token for end-statement (a.k.a. end-of-line) markers, cast to an integer as required by Token. Use null to avoid generating such markers. More...
 
Token IndentToken [get, set]
 Gets or sets the prototype token for indentation markers. More...
 
Token DedentToken [get, set]
 Gets or sets the prototype token for unindentation markers. More...
 

Public Member Functions

 IndentTokenGenerator (ILexer< Token > lexer, int[] allIndentTriggers, Token?eolToken, Token indentToken, Token dedentToken)
 Initializes the indent detector. More...
 
 IndentTokenGenerator (ILexer< Token > lexer, int[] allIndentTriggers, Token?eolToken)
 
override TokenCategory GetTokenCategory (Token token)
 

Protected Member Functions

bool Contains (int[] list, int item)
 
override Maybe< TokenMakeIndentToken (Token indentTrigger, ref Maybe< Token > tokenAfterward, bool newlineAfter)
 
override IEnumerator< TokenMakeDedentToken (Token tokenBeforeDedent, ref Maybe< Token > tokenAfterDedent)
 
override Maybe< TokenMakeEndOfLineToken (Token tokenBeforeNewline, ref Maybe< Token > tokenAfterNewline, int?deltaIndent)
 

Constructor & Destructor Documentation

Loyc.Syntax.Lexing.IndentTokenGenerator.IndentTokenGenerator ( ILexer< Token lexer,
int[]  allIndentTriggers,
Token eolToken,
Token  indentToken,
Token  dedentToken 
)
inline

Initializes the indent detector.

Parameters
lexerOriginal lexer
allIndentTriggersA list of all token types that could trigger the insertion of an indentation token.
eolTokenPrototype token for end-statement markers inserted when newlines are encountered, or null to avoid generating such markers.
indentTokenPrototype token for indentation markers
dedentTokenPrototype token for un-indent markers

Property Documentation

Token Loyc.Syntax.Lexing.IndentTokenGenerator.DedentToken
getset

Gets or sets the prototype token for unindentation markers.

The StartIndex is updated for each actual token emitted.

Token Loyc.Syntax.Lexing.IndentTokenGenerator.EolToken
getset

Gets or sets the prototype token for end-statement (a.k.a. end-of-line) markers, cast to an integer as required by Token. Use null to avoid generating such markers.

Note: if the last token on a line has this same type, this class will not generate an extra newline token.

The StartIndex is updated for each actual token emitted.

Token Loyc.Syntax.Lexing.IndentTokenGenerator.IndentToken
getset

Gets or sets the prototype token for indentation markers.

The StartIndex is updated for each actual token emitted.