| re2c(1) - phpMan
RE2C(1) General Commands Manual RE2C(1)
NAME
re2c - convert regular-expressions to C/C++
SYNOPSIS
re2c [-bdDefFghisuvVw1] [-o output] [-c [-t header]] file
DESCRIPTION
re2c is a preprocessor that generates C-based recognizers from regular expressions. The
input to re2c consists of C/C++ source interleaved with comments of the form /*!re2c ...
*/ which contain scanner specifications. In the output these comments are replaced with
code that, when executed, will find the next input token and then execute some user-sup‐
plied token-specific code.
For example, given the following code
char *scan(char *p)
{
/*!re2c
re2c:define:YYCTYPE = "unsigned char";
re2c:define:YYCURSOR = p;
re2c:yyfill:enable = 0;
re2c:yych:conversion = 1;
re2c:indent:top = 1;
[0-9]+ {return p;}
[^] {return (char*)0;}
*/
}
re2c -is will generate
/* Generated by re2c on Sat Apr 16 11:40:58 1994 */
char *scan(char *p)
{
{
unsigned char yych;
yych = (unsigned char)*p;
if(yych <= '/') goto yy4;
if(yych >= ':') goto yy4;
++p;
yych = (unsigned char)*p;
goto yy7;
yy3:
{return p;}
yy4:
++p;
yych = (unsigned char)*p;
{return char*)0;}
yy6:
++p;
yych = (unsigned char)*p;
yy7:
if(yych <= '/') goto yy3;
if(yych <= '9') goto yy6;
goto yy3;
}
}
You can place one /*!max:re2c */ comment that will output a "#define YYMAXFILL <n>" line
that holds the maximum number of characters required to parse the input. That is the maxi‐
mum value YYFILL(n) will receive. If -1 is in effect then YYMAXFILL can only be triggered
once after the last /*!re2c */.
You can also use /*!ignore:re2c */ blocks that allows to document the scanner code and
will not be part of the output.
OPTIONS
re2c provides the following options:
-? -h Invoke a short help.
-b Implies -s. Use bit vectors as well in the attempt to coax better code out of the
compiler. Most useful for specifications with more than a few keywords (e.g. for
most programming languages).
-c Used to support (f)lex-like condition support.
-d Creates a parser that dumps information about the current position and in which
state the parser is while parsing the input. This is useful to debug parser issues
and states. If you use this switch you need to define a macro YYDEBUG that is
called like a function with two parameters: void YYDEBUG(int state, char current).
The first parameter receives the state or -1 and the second parameter receives the
input at the current cursor.
-D Emit Graphviz dot data. It can then be processed with e.g. "dot -Tpng input.dot >
output.png". Please note that scanners with many states may crash dot.
-e Cross-compile from an ASCII platform to an EBCDIC one.
-f Generate a scanner with support for storable state. For details see below at SCAN‐
NER WITH STORABLE STATES.
-F Partial support for flex syntax. When this flag is active then named definitions
must be surrounded by curly braces and can be defined without an equal sign and the
terminating semi colon. Instead names are treated as direct double quoted strings.
-g Generate a scanner that utilizes GCC's computed goto feature. That is re2c gener‐
ates jump tables whenever a decision is of a certain complexity (e.g. a lot of if
conditions are otherwise necessary). This is only useable with GCC and produces
output that cannot be compiled with any other compiler. Note that this implies -b
and that the complexity threshold can be configured using the inplace configuration
"cgoto:threshold".
-i Do not output #line information. This is usefull when you want use a CMS tool with
the re2c output which you might want if you do not require your users to have re2c
themselves when building from your source. -o output Specify the output file.
-r Allows reuse of scanner definitions with '/*!use:re2c' after every '/*!use:re2c'
block that follows. These blocks can contain inplace configurations, especially
're2c:flags:w' and 're2c:flags:u'. That way it is possible to create the same
scanner multiple times for different character types, different input mechanisms or
different output mechanisms. The '/*!use:re2c' blocks can also contain additional
rules that will be appended to the set of rules in '/*!rules:re2c'.
-s Generate nested ifs for some switches. Many compilers need this assist to generate
better code.
-t Create a header file that contains types for the (f)lex-like condition support.
This can only be activated when -c is in use.
-u Generate a parser that supports Unicode chars (UTF-32). This means the generated
code can deal with any valid Unicode character up to 0x10FFFF. When UTF-8 or UTF-16
needs to be supported you need to convert the incoming stream to UTF-32 upon input
yourself.
-v Show version information.
-V Show the version as a number XXYYZZ.
-w Create a parser that supports wide chars (UCS-2). This implies -s and cannot be
used together with -e switch.
-1 Force single pass generation, this cannot be combined with -f and disables YYMAX‐
FILL generation prior to last re2c block.
--no-generation-date
Suppress date output in the generated output so that it only shows the re2c ver‐
sion.
--case-insensitive
All strings are case insensitive, so all "-expressions are treated in the same way
'-expressions are.
--case-inverted
Invert the meaning of single and double quoted strings. With this switch single
quotes are case sensitive and double quotes are case insensitive.
INTERFACE CODE
Unlike other scanner generators, re2c does not generate complete scanners: the user must
supply some interface code. In particular, the user must define the following macros or
use the corresponding inplace configurations:
YYCONDTYPE
In -c mode you can use -t to generate a file that contains the enumeration used as
conditions. Each of the values refers to a condition of a rule set.
YYCTXMARKER
l-expression of type *YYCTYPE. The generated code saves trailing context back‐
tracking information in YYCTXMARKER. The user only needs to define this macro if a
scanner specification uses trailing context in one or more of its regular-expres‐
sions.
YYCTYPE
Type used to hold an input symbol. Usually char or unsigned char.
YYCURSOR
l-expression of type *YYCTYPE that points to the current input symbol. The gener‐
ated code advances YYCURSOR as symbols are matched. On entry, YYCURSOR is assumed
to point to the first character of the current token. On exit, YYCURSOR will point
to the first character of the following token.
YYDEBUG(state,current)
This is only needed if the -d flag was specified. It allows to easily debug the
generated parser by calling a user defined function for every state. The function
should have the following signature: void YYDEBUG(int state, char current). The
first parameter receives the state or -1 and the second parameter receives the
input at the current cursor.
YYFILL(n)
The generated code "calls" YYFILL(n) when the buffer needs (re)filling: at least n
additional characters should be provided. YYFILL(n) should adjust YYCURSOR,
YYLIMIT, YYMARKER and YYCTXMARKER as needed. Note that for typical programming
languages n will be the length of the longest keyword plus one. The user can place
a comment of the form /*!max:re2c */ once to insert a YYMAXFILL(n) definition that
is set to the maximum length value. If -1 switch is used then YYMAXFILL can be
triggered only once after the last /*!re2c */ block.
YYGETCONDITION()
This define is used to get the condition prior to entering the scanner code when
using -c switch. The value must be initialized with a value from the enumeration
YYCONDTYPE type.
YYGETSTATE()
The user only needs to define this macro if the -f flag was specified. In that
case, the generated code "calls" YYGETSTATE() at the very beginning of the scanner
in order to obtain the saved state. YYGETSTATE() must return a signed integer. The
value must be either -1, indicating that the scanner is entered for the first time,
or a value previously saved by YYSETSTATE(s). In the second case, the scanner will
resume operations right after where the last YYFILL(n) was called.
YYLIMIT
Expression of type *YYCTYPE that marks the end of the buffer (YYLIMIT[-1] is the
last character in the buffer). The generated code repeatedly compares YYCURSOR to
YYLIMIT to determine when the buffer needs (re)filling.
YYMARKER
l-expression of type *YYCTYPE. The generated code saves backtracking information
in YYMARKER. Some easy scanners might not use this.
YYMAXFILL
This will be automatically defined by /*!max:re2c */ blocks as explained above.
YYSETCONDITION(c)
This define is used to set the condition in transition rules. This is only being
used when -c is active and transition rules are being used.
YYSETSTATE(s)
The user only needs to define this macro if the -f flag was specified. In that
case, the generated code "calls" YYSETSTATE just before calling YYFILL(n). The
parameter to YYSETSTATE is a signed integer that uniquely identifies the specific
instance of YYFILL(n) that is about to be called. Should the user wish to save the
state of the scanner and have YYFILL(n) return to the caller, all he has to do is
store that unique identifer in a variable. Later, when the scannered is called
again, it will call YYGETSTATE() and resume execution right where it left off. The
generated code will contain both YYSETSTATE(s) and YYGETSTATE even if YYFILL(n) is
being disabled.
SCANNER WITH STORABLE STATES
When the -f flag is specified, re2c generates a scanner that can store its current state,
return to the caller, and later resume operations exactly where it left off.
The default operation of re2c is a "pull" model, where the scanner asks for extra input
whenever it needs it. However, this mode of operation assumes that the scanner is the
"owner" the parsing loop, and that may not always be convenient.
Typically, if there is a preprocessor ahead of the scanner in the stream, or for that mat‐
ter any other procedural source of data, the scanner cannot "ask" for more data unless
both scanner and source live in a separate threads.
The -f flag is useful for just this situation : it lets users design scanners that work in
a "push" model, i.e. where data is fed to the scanner chunk by chunk. When the scanner
runs out of data to consume, it just stores its state, and return to the caller. When more
input data is fed to the scanner, it resumes operations exactly where it left off.
When using the -f option re2c does not accept stdin because it has to do the full genera‐
tion process twice which means it has to read the input twice. That means re2c would fail
in case it cannot open the input twice or reading the input for the first time influences
the second read attempt.
Changes needed compared to the "pull" model.
1. User has to supply macros YYSETSTATE() and YYGETSTATE(state)
2. The -f option inhibits declaration of yych and yyaccept. So the user has to declare
these. Also the user has to save and restore these. In the example examples/push.re these
are declared as fields of the (C++) class of which the scanner is a method, so they do not
need to be saved/restored explicitly. For C they could e.g. be made macros that select
fields from a structure passed in as parameter. Alternatively, they could be declared as
local variables, saved with YYFILL(n) when it decides to return and restored at entry to
the function. Also, it could be more efficient to save the state from YYFILL(n) because
YYSETSTATE(state) is called unconditionally. YYFILL(n) however does not get state as
parameter, so we would have to store state in a local variable by YYSETSTATE(state).
3. Modify YYFILL(n) to return (from the function calling it) if more input is needed.
4. Modify caller to recognise "more input is needed" and respond appropriately.
5. The generated code will contain a switch block that is used to restores the last state
by jumping behind the corrspoding YYFILL(n) call. This code is automatically generated in
the epilog of the first "/*!re2c */" block. It is possible to trigger generation of the
YYGETSTATE() block earlier by placing a "/*!getstate:re2c */" comment. This is especially
useful when the scanner code should be wrapped inside a loop.
Please see examples/push.re for push-model scanner. The generated code can be tweaked
using inplace configurations "state:abort" and "state:nextlabel".
SCANNER WITH CONDITION SUPPORT
You can preceed regular-expressions with a list of condition names when using the -c
switch. In this case re2c generates scanner blocks for each conditon. Where each of the
generated blocks has its own precondition. The precondition is given by the interface
define YYGETCONDITON and must be of type YYCONDTYPE.
There are two special rule types. First, the rules of the condition '*' are merged to all
conditions. And second the empty condition list allows to provide a code block that does
not have a scanner part. Meaning it does not allow any regular expression. The condition
value referring to this special block is always the one with the enumeration value 0. This
way the code of this special rule can be used to initialize a scanner. It is in no way
necessary to have these rules: but sometimes it is helpful to have a dedicated uninitial‐
ized condition state.
Non empty rules allow to specify the new condition, which makes them transition rules.
Besides generating calls for the define YYSETCONDTITION no other special code is gener‐
ated.
There is another kind of special rules that allow to prepend code to any code block of all
rules of a certain set of conditions or to all code blocks to all rules. This can be help‐
ful when some operation is common among rules. For instance this can be used to store the
length of the scanned string. These special setup rules start with an exclamation mark
followed by either a list of conditions <! condition, ... > or a star <!*>. When re2c
generates the code for a rule whose state does not have a setup rule and a star'd setup
rule is present, than that code will be used as setup code.
SCANNER SPECIFICATIONS
Each scanner specification consists of a set of rules, named definitions and configura‐
tions.
Rules consist of a regular-expression along with a block of C/C++ code that is to be exe‐
cuted when the associated regular-expression is matched. You can either start the code
with an opening curly brace or the sequence ':='. When the code with a curly brace then
re2c counts the brace depth and stops looking for code automatically. Otherwise curly
braces are not allowed and re2c stops looking for code at the first line that does not
begin with whitespace.
regular-expression { C/C++ code }
regular-expression := C/C++ code
If -c is active then each regular-expression is preceeded by a list of comma separated
condition names. Besides normal naming rules there are two special cases. A rule may con‐
tain the single condition name '*' and no contition name at all. In the latter case the
rule cannot have a regular-expression. Non empty rules may further more specify the new
condition. In that case re2c will generated the necessary code to chnage the condition
automatically. Just as above code can be started with a curly brace of the sequence ':='.
Further more rules can use ':=>' as a shortcut to automatically generate code that not
only sets the new condition state but also continues execution with the new state. A
shortcut rule should not be used in a loop where there is code between the start of the
loop and the re2c block unless re2c:cond:goto is changed to 'continue;'. If code is neces‐
sary before all rule (though not simple jumps) you can doso by using <! pseudo-rules.
<condition-list> regular-expression { C/C++ code }
<condition-list> regular-expression := C/C++ code
<condition-list> regular-expression => condition { C/C++ code }
<condition-list> regular-expression => condition := C/C++ code
<condition-list> regular-expression :=> condition
<*> regular-expression { C/C++ code }
<*> regular-expression := C/C++ code
<*> regular-expression => condition { C/C++ code }
<*> regular-expression => condition := C/C++ code
<*> regular-expression :=> condition
<> { C/C++ code }
<> := C/C++ code
<> => condition { C/C++ code }
<> => condition := C/C++ code
<> :=> condition
<!condition-list> { C/C++ code }
<!condition-list> := C/C++ code
<!*> { C/C++ code }
<!*> := C/C++ code
Named definitions are of the form:
name = regular-expression;
-F is active, then named definitions are also of the form:
name regular-expression
Configurations look like named definitions whose names start with "re2c:":
re2c:name = value;
re2c:name = "value";
SUMMARY OF RE2C REGULAR-EXPRESSIONS
"foo" the literal string foo. ANSI-C escape sequences can be used.
'foo' the literal string foo (characters [a-zA-Z] treated case-insensitive). ANSI-C
escape sequences can be used.
[xyz] a "character class"; in this case, the regular-expression matches either an 'x', a
'y', or a 'z'.
[abj-oZ]
a "character class" with a range in it; matches an 'a', a 'b', any letter from 'j'
through 'o', or a 'Z'.
[^class]
an inverted "character class".
r\s match any r which isn't an s. r and s must be regular-expressions which can be
expressed as character classes.
r* zero or more r's, where r is any regular-expression
r+ one or more r's
r? zero or one r's (that is, "an optional r")
name the expansion of the "named definition" (see above)
(r) an r; parentheses are used to override precedence (see below)
rs an r followed by an s ("concatenation")
r|s either an r or an s
r/s an r but only if it is followed by an s. The s is not part of the matched text.
This type of regular-expression is called "trailing context". A trailing context
can only be the end of a rule and not part of a named definition.
r{n} matches r exactly n times.
r{n,} matches r at least n times.
r{n,m} matches r at least n but not more than m times.
. match any character except newline (\n).
def matches named definition as specified by def only if -F is off. If the switch -F is
active then this behaves like it was enclosed in double quotes and matches the
string def.
Character classes and string literals may contain octoal or hexadecimal character defini‐
tions and the following set of escape sequences (\n,
\t, \v, \b, \r, \f, \a, \\). An octal character is defined by a backslash followed by
its three octal digits and a hexadecimal character is defined by backslash, a lower cased
'x' and its two hexadecimal digits or a backslash, an upper cased X and its four hexadeci‐
mal digits.
re2c further more supports the c/c++ unicode notation. That is a backslash followed by
either a lowercased u and its four hexadecimal digits or an uppercased U and its eight
hexadecimal digits. However only in -u mode the generated code can deal with any valid
Unicode character up to 0x10FFFF.
Since characters greater \X00FF are not allowed in non unicode mode, the only portable
"any" rules are (.|"\n") and [^].
The regular-expressions listed above are grouped according to precedence, from highest
precedence at the top to lowest at the bottom. Those grouped together have equal prece‐
dence.
INPLACE CONFIGURATION
It is possible to configure code generation inside re2c blocks. The following lists the
available configurations:
re2c:condprefix = yyc_ ;
Allows to specify the prefix used for condition labels. That is this text is
prepended to any condition label in the generated output file.
re2c:condenumprefix = yyc ;
Allows to specify the prefix used for condition values. That is this text is
prepended to any condition enum value in the generated output file.
re2c:cond:divider = "/* *********************************** */" ;
Allows to customize the devider for condition blocks. You can use '@@' to put the
name of the condition or ustomize the plaeholder using re2c:cond:divider@cond.
re2c:cond:divider@cond = @@ ;
Specifies the placeholder that will be replaced with the condition name in
re2c:cond:divider.
re2c:cond:goto = "goto @@;" ;
Allows to customize the condition goto statements used with ':=>' style rules. You
can use '@@' to put the name of the condition or ustomize the plaeholder using
re2c:cond:goto@cond. You can also change this to 'continue;', which would allow you
to continue with the next loop cycle including any code between loop start and re2c
block.
re2c:cond:goto@cond = @@ ;
Spcifies the placeholder that will be replaced with the condition label in
re2c:cond:goto.
re2c:indent:top = 0 ;
Specifies the minimum number of indendation to use. Requires a numeric value
greater than or equal zero.
re2c:indent:string = "\t" ;
Specifies the string to use for indendation. Requires a string that should contain
only whitespace unless you need this for external tools. The easiest way to specify
spaces is to enclude them in single or double quotes. If you do not want any inden‐
dation at all you can simply set this to "".
re2c:yych:conversion = 0 ;
When this setting is non zero, then re2c automatically generates conversion code
whenever yych gets read. In this case the type must be defined using
re2c:define:YYCTYPE.
re2c:yych:emit = 1 ;
Generation of yych can be suppressed by setting this to 0.
re2c:yybm:hex = 0 ;
If set to zero then a decimal table is being used else a hexadecimal table will be
generated.
re2c:yyfill:enable = 1 ;
Set this to zero to suppress generation of YYFILL(n). When using this be sure to
verify that the generated scanner does not read behind input. Allowing this behav‐
ior might introduce sever security issues to you programs.
re2c:yyfill:check = 1 ;
This can be set 0 to suppress output of the pre condition using YYCURSOR and
YYLIMIT which becomes usefull when YYLIMIT + max(YYFILL) is always accessible.
re2c:yyfill:parameter = 1 ;
Allows to suppress parameter passing to YYFILL calls. If set to zero then no param‐
eter is passed to YYFILL. However define:YYFILL@LEN allows to specify a replacement
string for the actual length value. If set to a non zero value then YYFILL usage
will be followed by the number of requested characters in braces unless
re2c:define:YYFILL:naked is set. Also look at re2c:define:YYFILL:naked and
re2c:define:YYFILL@LEN.
re2c:startlabel = 0 ;
If set to a non zero integer then the start label of the next scanner blocks will
be generated even if not used by the scanner itself. Otherwise the normal yy0 like
start label is only being generated if needed. If set to a text value then a label
with that text will be generated regardless of whether the normal start label is
being used or not. This setting is being reset to 0 after a start label has been
generated.
re2c:labelprefix = yy ;
Allows to change the prefix of numbered labels. The default is yy and can be set
any string that is a valid label.
re2c:state:abort = 0 ;
When not zero and switch -f is active then the YYGETSTATE block will contain a
default case that aborts and a -1 case is used for initialization.
re2c:state:nextlabel = 0 ;
Used when -f is active to control whether the YYGETSTATE block is followed by a
yyNext: label line. Instead of using yyNext you can usually also use configuration
startlabel to force a specific start label or default to yy0 as start label.
Instead of using a dedicated label it is often better to separate the YYGETSTATE
code from the actual scanner code by placing a "/*!getstate:re2c */" comment.
re2c:cgoto:threshold = 9 ;
When -g is active this value specifies the complexity threshold that triggers gen‐
eration of jump tables rather than using nested if's and decision bitfields. The
threshold is compared against a calculated estimation of if-s needed where every
used bitmap divides the threshold by 2.
re2c:yych:conversion = 0 ;
When the input uses signed characters and -s or -b switches are in effect re2c
allows to automatically convert to the unsigned character type that is then neces‐
sary for its internal single character. When this setting is zero or an empty
string the conversion is disabled. Using a non zero number the conversion is taken
from YYCTYPE. If that is given by an inplace configuration that value is being
used. Otherwise it will be (YYCTYPE) and changes to that configuration are no
longer possible. When this setting is a string the braces must be specified. Now
assuming your input is a char* buffer and you are using above mentioned switches
you can set YYCTYPE to unsigned char and this setting to either 1 or "(unsigned
char)".
re2c:define:define:YYCONDTYPE = YYCONDTYPE ;
Enumeration used for condition support with -c mode.
re2c:define:YYCTXMARKER = YYCTXMARKER ;
Allows to overwrite the define YYCTXMARKER and thus avoiding it by setting the
value to the actual code needed.
re2c:define:YYCTYPE = YYCTYPE ;
Allows to overwrite the define YYCTYPE and thus avoiding it by setting the value to
the actual code needed.
re2c:define:YYCURSOR = YYCURSOR ;
Allows to overwrite the define YYCURSOR and thus avoiding it by setting the value
to the actual code needed.
re2c:define:YYDEBUG = YYDEBUG ;
Allows to overwrite the define YYDEBUG and thus avoiding it by setting the value to
the actual code needed.
re2c:define:YYFILL = YYFILL ;
Allows to overwrite the define YYFILL and thus avoiding it by setting the value to
the actual code needed.
re2c:define:YYFILL:naked = 0 ;
When set to 1 neither braces, parameter nor semicolon gets emitted.
re2c:define:YYFILL@len = @@ ;
When using re2c:define:YYFILL and re2c:yyfill:parameter is 0 then any occurence of
this text inside YYFILL will be replaced with the actual length value.
re2c:define:YYGETCONDITION = YYGETCONDITION ;
Allows to overwrite the define YYGETCONDITION.
re2c:define:YYGETCONDITION:naked = ;
When set to 1 neither braces, parameter nor semicolon gets emitted.
re2c:define:YYGETSTATE = YYGETSTATE ;
Allows to overwrite the define YYGETSTATE and thus avoiding it by setting the value
to the actual code needed.
re2c:define:YYGETSTATE:naked = 0 ;
When set to 1 neither braces, parameter nor semicolon gets emitted.
re2c:define:YYLIMIT = YYLIMIT ;
Allows to overwrite the define YYLIMIT and thus avoiding it by setting the value to
the actual code needed.
re2c:define:YYMARKER = YYMARKER ;
Allows to overwrite the define YYMARKER and thus avoiding it by setting the value
to the actual code needed.
re2c:define:YYSETCONDITION = YYSETCONDITION ;
Allows to overwrite the define YYSETCONDITION.
re2c:define:YYSETCONDITION@cond = @@ ;
When using re2c:define:YYSETCONDITION then any occurence of this text inside YYSET‐
CONDITION will be replaced with the actual new condition value.
re2c:define:YYSETSTATE = YYSETSTATE ;
Allows to overwrite the define YYSETSTATE and thus avoiding it by setting the value
to the actual code needed.
re2c:define:YYSETSTATE:naked = 0 ;
When set to 1 neither braces, parameter nor semicolon gets emitted.
re2c:define:YYSETSTATE@state = @@ ;
When using re2c:define:YYSETSTATE then any occurence of this text inside YYSETSTATE
will be replaced with the actual new state value.
re2c:label:yyFillLabel = yyFillLabel ;
Allows to overwrite the name of the label yyFillLabel.
re2c:label:yyNext = yyNext ;
Allows to overwrite the name of the label yyNext.
re2c:variable:yyaccept = yyaccept ;
Allows to overwrite the name of the variable yyaccept.
re2c:variable:yybm = yybm ;
Allows to overwrite the name of the variable yybm.
re2c:variable:yych = yych ;
Allows to overwrite the name of the variable yych.
re2c:variable:yyctable = yyctable ;
When both -c and -g are active then re2c uses this variable to generate a static
jump table for YYGETCONDITION.
re2c:variable:yystable = yystable ;
When both -f and -g are active then re2c uses this variable to generate a static
jump table for YYGETSTATE.
re2c:variable:yytarget = yytarget ;
Allows to overwrite the name of the variable yytarget.
UNDERSTANDING RE2C
The subdirectory lessons of the re2c distribution contains a few step by step lessons to
get you started with re2c. All examples in the lessons subdirectory can be compiled and
actually work.
FEATURES
re2c does not provide a default action: the generated code assumes that the input will
consist of a sequence of tokens. Typically this can be dealt with by adding a rule such
as the one for unexpected characters in the example above.
The user must arrange for a sentinel token to appear at the end of input (and provide a
rule for matching it): re2c does not provide an <<EOF>> expression. If the source is from
a null-byte terminated string, a rule matching a null character will suffice. If the
source is from a file then you could pad the input with a newline (or some other character
that cannot appear within another token); upon recognizing such a character check to see
if it is the sentinel and act accordingly. And you can also use YYFILL(n) to end the scan‐
ner in case not enough characters are available which is nothing else then e detection of
end of data/file.
BUGS
Difference only works for character sets.
The re2c internal algorithms need documentation.
SEE ALSO
flex(1), lex(1).
More information on re2c can be found here:
http://re2c.org/
AUTHORS
Peter Bumbulis <peter AT csg.ca>
Brian Young <bayoung AT acm.org>
Dan Nuffer <nuffer AT users.net>
Marcus Boerger <helly AT users.net>
Hartmut Kaiser <hkaiser AT users.net>
Emmanuel Mogenet <mgix AT mgix.com> added storable state
VERSION INFORMATION
This manpage describes re2c, version 0.13.5.
Version 0.13.5 17 Jul 2008 RE2C(1)
|