Cross Reference: /gem5/ext/ply/CHANGES

Deleted Added

sdiff udiff text old ( 2632:1bb2f91485ea ) new ( 4479:61d3ed46e373 )

full compact

CHANGES (2632:1bb2f91485ea)	CHANGES (4479:61d3ed46e373)
	1Version 2.3 2----------------------------- 302/20/07: beazley 4 Fixed a bug with character literals if the literal '.' appeared as the 5 last symbol of a grammar rule. Reported by Ales Smrcka. 6 702/19/07: beazley 8 Warning messages are now redirected to stderr instead of being printed 9 to standard output. 10 1102/19/07: beazley 12 Added a warning message to lex.py if it detects a literal backslash 13 character inside the t_ignore declaration. This is to help 14 problems that might occur if someone accidentally defines t_ignore 15 as a Python raw string. For example: 16 17 t_ignore = r' \t' 18 19 The idea for this is from an email I received from David Cimimi who 20 reported bizarre behavior in lexing as a result of defining t_ignore 21 as a raw string by accident. 22 2302/18/07: beazley 24 Performance improvements. Made some changes to the internal 25 table organization and LR parser to improve parsing performance. 26 2702/18/07: beazley 28 Automatic tracking of line number and position information must now be 29 enabled by a special flag to parse(). For example: 30 31 yacc.parse(data,tracking=True) 32 33 In many applications, it's just not that important to have the 34 parser automatically track all line numbers. By making this an 35 optional feature, it allows the parser to run significantly faster 36 (more than a 20% speed increase in many cases). Note: positional 37 information is always available for raw tokens---this change only 38 applies to positional information associated with nonterminal 39 grammar symbols. 40 * POTENTIAL INCOMPATIBILITY * 41 4202/18/07: beazley 43 Yacc no longer supports extended slices of grammar productions. 44 However, it does support regular slices. For example: 45 46 def p_foo(p): 47 '''foo: a b c d e''' 48 p[0] = p[1:3] 49 50 This change is a performance improvement to the parser--it streamlines 51 normal access to the grammar values since slices are now handled in 52 a __getslice__() method as opposed to __getitem__(). 53 5402/12/07: beazley 55 Fixed a bug in the handling of token names when combined with 56 start conditions. Bug reported by Todd O'Bryan. 57 58Version 2.2 59------------------------------ 6011/01/06: beazley 61 Added lexpos() and lexspan() methods to grammar symbols. These 62 mirror the same functionality of lineno() and linespan(). For 63 example: 64 65 def p_expr(p): 66 'expr : expr PLUS expr' 67 p.lexpos(1) # Lexing position of left-hand-expression 68 p.lexpos(1) # Lexing position of PLUS 69 start,end = p.lexspan(3) # Lexing range of right hand expression 70 7111/01/06: beazley 72 Minor change to error handling. The recommended way to skip characters 73 in the input is to use t.lexer.skip() as shown here: 74 75 def t_error(t): 76 print "Illegal character '%s'" % t.value[0] 77 t.lexer.skip(1) 78 79 The old approach of just using t.skip(1) will still work, but won't 80 be documented. 81 8210/31/06: beazley 83 Discarded tokens can now be specified as simple strings instead of 84 functions. To do this, simply include the text "ignore_" in the 85 token declaration. For example: 86 87 t_ignore_cppcomment = r'//.' 88 89 Previously, this had to be done with a function. For example: 90 91 def t_ignore_cppcomment(t): 92 r'//.' 93 pass 94 95 If start conditions/states are being used, state names should appear 96 before the "ignore_" text. 97 9810/19/06: beazley 99 The Lex module now provides support for flex-style start conditions 100 as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html. 101 Please refer to this document to understand this change note. Refer to 102 the PLY documentation for PLY-specific explanation of how this works. 103 104 To use start conditions, you first need to declare a set of states in 105 your lexer file: 106 107 states = ( 108 ('foo','exclusive'), 109 ('bar','inclusive') 110 ) 111 112 This serves the same role as the %s and %x specifiers in flex. 113 114 One a state has been declared, tokens for that state can be 115 declared by defining rules of the form t_state_TOK. For example: 116 117 t_PLUS = '\+' # Rule defined in INITIAL state 118 t_foo_NUM = '\d+' # Rule defined in foo state 119 t_bar_NUM = '\d+' # Rule defined in bar state 120 121 t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar 122 t_ANY_NUM = '\d+' # Rule defined in all states 123 124 In addition to defining tokens for each state, the t_ignore and t_error 125 specifications can be customized for specific states. For example: 126 127 t_foo_ignore = " " # Ignored characters for foo state 128 def t_bar_error(t): 129 # Handle errors in bar state 130 131 With token rules, the following methods can be used to change states 132 133 def t_TOKNAME(t): 134 t.lexer.begin('foo') # Begin state 'foo' 135 t.lexer.push_state('foo') # Begin state 'foo', push old state 136 # onto a stack 137 t.lexer.pop_state() # Restore previous state 138 t.lexer.current_state() # Returns name of current state 139 140 These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and 141 yy_top_state() functions in flex. 142 143 The use of start states can be used as one way to write sub-lexers. 144 For example, the lexer or parser might instruct the lexer to start 145 generating a different set of tokens depending on the context. 146 147 example/yply/ylex.py shows the use of start states to grab C/C++ 148 code fragments out of traditional yacc specification files. 149 150 * NEW FEATURE * Suggested by Daniel Larraz with whom I also 151 discussed various aspects of the design. 152 15310/19/06: beazley 154 Minor change to the way in which yacc.py was reporting shift/reduce 155 conflicts. Although the underlying LALR(1) algorithm was correct, 156 PLY was under-reporting the number of conflicts compared to yacc/bison 157 when precedence rules were in effect. This change should make PLY 158 report the same number of conflicts as yacc. 159 16010/19/06: beazley 161 Modified yacc so that grammar rules could also include the '-' 162 character. For example: 163 164 def p_expr_list(p): 165 'expression-list : expression-list expression' 166 167 Suggested by Oldrich Jedlicka. 168 16910/18/06: beazley 170 Attribute lexer.lexmatch added so that token rules can access the re 171 match object that was generated. For example: 172 173 def t_FOO(t): 174 r'some regex' 175 m = t.lexer.lexmatch 176 # Do something with m 177 178 179 This may be useful if you want to access named groups specified within 180 the regex for a specific token. Suggested by Oldrich Jedlicka. 181 18210/16/06: beazley 183 Changed the error message that results if an illegal character 184 is encountered and no default error function is defined in lex. 185 The exception is now more informative about the actual cause of 186 the error. 187 188Version 2.1 189------------------------------ 19010/02/06: beazley 191 The last Lexer object built by lex() can be found in lex.lexer. 192 The last Parser object built by yacc() can be found in yacc.parser. 193 19410/02/06: beazley 195 New example added: examples/yply 196 197 This example uses PLY to convert Unix-yacc specification files to 198 PLY programs with the same grammar. This may be useful if you 199 want to convert a grammar from bison/yacc to use with PLY. 200 20110/02/06: beazley 202 Added support for a start symbol to be specified in the yacc 203 input file itself. Just do this: 204 205 start = 'name' 206 207 where 'name' matches some grammar rule. For example: 208 209 def p_name(p): 210 'name : A B C' 211 ... 212 213 This mirrors the functionality of the yacc %start specifier. 214 21509/30/06: beazley 216 Some new examples added.: 217 218 examples/GardenSnake : A simple indentation based language similar 219 to Python. Shows how you might handle 220 whitespace. Contributed by Andrew Dalke. 221 222 examples/BASIC : An implementation of 1964 Dartmouth BASIC. 223 Contributed by Dave against his better 224 judgement. 225 22609/28/06: beazley 227 Minor patch to allow named groups to be used in lex regular 228 expression rules. For example: 229 230 t_QSTRING = r'''(?P<quote>['"]).?(?P=quote)''' 231* 232 Patch submitted by Adam Ring. 233 23409/28/06: beazley 235 LALR(1) is now the default parsing method. To use SLR, use 236 yacc.yacc(method="SLR"). Note: there is no performance impact 237 on parsing when using LALR(1) instead of SLR. However, constructing 238 the parsing tables will take a little longer. 239 24009/26/06: beazley 241 Change to line number tracking. To modify line numbers, modify 242 the line number of the lexer itself. For example: 243 244 def t_NEWLINE(t): 245 r'\n' 246 t.lexer.lineno += 1 247 248 This modification is both cleanup and a performance optimization. 249 In past versions, lex was monitoring every token for changes in 250 the line number. This extra processing is unnecessary for a vast 251 majority of tokens. Thus, this new approach cleans it up a bit. 252 253 * POTENTIAL INCOMPATIBILITY * 254 You will need to change code in your lexer that updates the line 255 number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1" 256 25709/26/06: beazley 258 Added the lexing position to tokens as an attribute lexpos. This 259 is the raw index into the input text at which a token appears. 260 This information can be used to compute column numbers and other 261 details (e.g., scan backwards from lexpos to the first newline 262 to get a column position). 263 26409/25/06: beazley 265 Changed the name of the __copy__() method on the Lexer class 266 to clone(). This is used to clone a Lexer object (e.g., if 267 you're running different lexers at the same time). 268 26909/21/06: beazley 270 Limitations related to the use of the re module have been eliminated. 271 Several users reported problems with regular expressions exceeding 272 more than 100 named groups. To solve this, lex.py is now capable 273 of automatically splitting its master regular regular expression into 274 smaller expressions as needed. This should, in theory, make it 275 possible to specify an arbitrarily large number of tokens. 276 27709/21/06: beazley 278 Improved error checking in lex.py. Rules that match the empty string 279 are now rejected (otherwise they cause the lexer to enter an infinite 280 loop). An extra check for rules containing '#' has also been added. 281 Since lex compiles regular expressions in verbose mode, '#' is interpreted 282 as a regex comment, it is critical to use '\#' instead. 283 28409/18/06: beazley 285 Added a @TOKEN decorator function to lex.py that can be used to 286 define token rules where the documentation string might be computed 287 in some way. 288 289 digit = r'([0-9])' 290 nondigit = r'([_A-Za-z])' 291 identifier = r'(' + nondigit + r'(' + digit + r'\|' + nondigit + r'))' 292* 293 from ply.lex import TOKEN 294 295 @TOKEN(identifier) 296 def t_ID(t): 297 # Do whatever 298 299 The @TOKEN decorator merely sets the documentation string of the 300 associated token function as needed for lex to work. 301 302 Note: An alternative solution is the following: 303 304 def t_ID(t): 305 # Do whatever 306 307 t_ID.__doc__ = identifier 308 309 Note: Decorators require the use of Python 2.4 or later. If compatibility 310 with old versions is needed, use the latter solution. 311 312 The need for this feature was suggested by Cem Karan. 313 31409/14/06: beazley 315 Support for single-character literal tokens has been added to yacc. 316 These literals must be enclosed in quotes. For example: 317 318 def p_expr(p): 319 "expr : expr '+' expr" 320 ... 321 322 def p_expr(p): 323 'expr : expr "-" expr' 324 ... 325 326 In addition to this, it is necessary to tell the lexer module about 327 literal characters. This is done by defining the variable 'literals' 328 as a list of characters. This should be defined in the module that 329 invokes the lex.lex() function. For example: 330 331 literals = ['+','-','','/','(',')','='] 332* 333 or simply 334 335 literals = '+=/()=' 336* 337 It is important to note that literals can only be a single character. 338 When the lexer fails to match a token using its normal regular expression 339 rules, it will check the current character against the literal list. 340 If found, it will be returned with a token type set to match the literal 341 character. Otherwise, an illegal character will be signalled. 342 343 34409/14/06: beazley 345 Modified PLY to install itself as a proper Python package called 'ply'. 346 This will make it a little more friendly to other modules. This 347 changes the usage of PLY only slightly. Just do this to import the 348 modules 349 350 import ply.lex as lex 351 import ply.yacc as yacc 352 353 Alternatively, you can do this: 354 355 from ply import * 356 357 Which imports both the lex and yacc modules. 358 Change suggested by Lee June. 359 36009/13/06: beazley 361 Changed the handling of negative indices when used in production rules. 362 A negative production index now accesses already parsed symbols on the 363 parsing stack. For example, 364 365 def p_foo(p): 366 "foo: A B C D" 367 print p[1] # Value of 'A' symbol 368 print p[2] # Value of 'B' symbol 369 print p[-1] # Value of whatever symbol appears before A 370 # on the parsing stack. 371 372 p[0] = some_val # Sets the value of the 'foo' grammer symbol 373 374 This behavior makes it easier to work with embedded actions within the 375 parsing rules. For example, in C-yacc, it is possible to write code like 376 this: 377 378 bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; } 379 380 In this example, the printf() code executes immediately after A has been 381 parsed. Within the embedded action code, $1 refers to the A symbol on 382 the stack. 383 384 To perform this equivalent action in PLY, you need to write a pair 385 of rules like this: 386 387 def p_bar(p): 388 "bar : A seen_A B" 389 do_stuff 390 391 def p_seen_A(p): 392 "seen_A :" 393 print "seen an A =", p[-1] 394 395 The second rule "seen_A" is merely a empty production which should be 396 reduced as soon as A is parsed in the "bar" rule above. The use 397 of the negative index p[-1] is used to access whatever symbol appeared 398 before the seen_A symbol. 399 400 This feature also makes it possible to support inherited attributes. 401 For example: 402 403 def p_decl(p): 404 "decl : scope name" 405 406 def p_scope(p): 407 """scope : GLOBAL 408 \| LOCAL""" 409 p[0] = p[1] 410 411 def p_name(p): 412 "name : ID" 413 if p[-1] == "GLOBAL": 414 # ... 415 else if p[-1] == "LOCAL": 416 #... 417 418 In this case, the name rule is inheriting an attribute from the 419 scope declaration that precedes it. 420 421 * POTENTIAL INCOMPATIBILITY * 422 If you are currently using negative indices within existing grammar rules, 423 your code will break. This should be extremely rare if non-existent in 424 most cases. The argument to various grammar rules is not usually not 425 processed in the same way as a list of items. 426 427Version 2.0 428------------------------------ 42909/07/06: beazley 430 Major cleanup and refactoring of the LR table generation code. Both SLR 431 and LALR(1) table generation is now performed by the same code base with 432 only minor extensions for extra LALR(1) processing. 433 43409/07/06: beazley 435 Completely reimplemented the entire LALR(1) parsing engine to use the 436 DeRemer and Pennello algorithm for calculating lookahead sets. This 437 significantly improves the performance of generating LALR(1) tables 438 and has the added feature of actually working correctly! If you 439 experienced weird behavior with LALR(1) in prior releases, this should 440 hopefully resolve all of those problems. Many thanks to 441 Andrew Waters and Markus Schoepflin for submitting bug reports 442 and helping me test out the revised LALR(1) support. 443 444Version 1.8 445------------------------------ 44608/02/06: beazley 447 Fixed a problem related to the handling of default actions in LALR(1) 448 parsing. If you experienced subtle and/or bizarre behavior when trying 449 to use the LALR(1) engine, this may correct those problems. Patch 450 contributed by Russ Cox. Note: This patch has been superceded by 451 revisions for LALR(1) parsing in Ply-2.0. 452 45308/02/06: beazley 454 Added support for slicing of productions in yacc. 455 Patch contributed by Patrick Mezard. 456 457Version 1.7 458------------------------------ 45903/02/06: beazley 460 Fixed infinite recursion problem ReduceToTerminals() function that 461 would sometimes come up in LALR(1) table generation. Reported by 462 Markus Schoepflin. 463 46403/01/06: beazley 465 Added "reflags" argument to lex(). For example: 466 467 lex.lex(reflags=re.UNICODE) 468 469 This can be used to specify optional flags to the re.compile() function 470 used inside the lexer. This may be necessary for special situations such 471 as processing Unicode (e.g., if you want escapes like \w and \b to consult 472 the Unicode character property database). The need for this suggested by 473 Andreas Jung. 474 47503/01/06: beazley 476 Fixed a bug with an uninitialized variable on repeated instantiations of parser 477 objects when the write_tables=0 argument was used. Reported by Michael Brown. 478 47903/01/06: beazley 480 Modified lex.py to accept Unicode strings both as the regular expressions for 481 tokens and as input. Hopefully this is the only change needed for Unicode support. 482 Patch contributed by Johan Dahl. 483 48403/01/06: beazley 485 Modified the class-based interface to work with new-style or old-style classes. 486 Patch contributed by Michael Brown (although I tweaked it slightly so it would work 487 with older versions of Python). 488 489Version 1.6 490------------------------------ 49105/27/05: beazley 492 Incorporated patch contributed by Christopher Stawarz to fix an extremely 493 devious bug in LALR(1) parser generation. This patch should fix problems 494 numerous people reported with LALR parsing. 495 49605/27/05: beazley 497 Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav, 498 and Thad Austin. 499 50005/27/05: beazley 501 Added outputdir option to yacc() to control output directory. Contributed 502 by Christopher Stawarz. 503 50405/27/05: beazley 505 Added rununit.py test script to run tests using the Python unittest module. 506 Contributed by Miki Tebeka. 507 508Version 1.5 509------------------------------ 51005/26/04: beazley 511 Major enhancement. LALR(1) parsing support is now working. 512 This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu) 513 and optimized by David Beazley. To use LALR(1) parsing do 514 the following: 515 516 yacc.yacc(method="LALR") 517 518 Computing LALR(1) parsing tables takes about twice as long as 519 the default SLR method. However, LALR(1) allows you to handle 520 more complex grammars. For example, the ANSI C grammar 521 (in example/ansic) has 13 shift-reduce conflicts with SLR, but 522 only has 1 shift-reduce conflict with LALR(1). 523 52405/20/04: beazley 525 Added a __len__ method to parser production lists. Can 526 be used in parser rules like this: 527 528 def p_somerule(p): 529 """a : B C D 530 \| E F" 531 if (len(p) == 3): 532 # Must have been first rule 533 elif (len(p) == 2): 534 # Must be second rule 535 536 Suggested by Joshua Gerth and others. 537 538Version 1.4 539------------------------------ 54004/23/04: beazley 541 Incorporated a variety of patches contributed by Eric Raymond. 542 These include: 543 544 0. Cleans up some comments so they don't wrap on an 80-column display. 545 1. Directs compiler errors to stderr where they belong. 546 2. Implements and documents automatic line counting when \n is ignored. 547 3. Changes the way progress messages are dumped when debugging is on. 548 The new format is both less verbose and conveys more information than 549 the old, including shift and reduce actions. 550 55104/23/04: beazley 552 Added a Python setup.py file to simply installation. Contributed 553 by Adam Kerrison. 554 55504/23/04: beazley 556 Added patches contributed by Adam Kerrison. 557 558 - Some output is now only shown when debugging is enabled. This 559 means that PLY will be completely silent when not in debugging mode. 560 561 - An optional parameter "write_tables" can be passed to yacc() to 562 control whether or not parsing tables are written. By default, 563 it is true, but it can be turned off if you don't want the yacc 564 table file. Note: disabling this will cause yacc() to regenerate 565 the parsing table each time. 566 56704/23/04: beazley 568 Added patches contributed by David McNab. This patch addes two 569 features: 570 571 - The parser can be supplied as a class instead of a module. 572 For an example of this, see the example/classcalc directory. 573 574 - Debugging output can be directed to a filename of the user's 575 choice. Use 576 577 yacc(debugfile="somefile.out") 578 579
1Version 1.3 2------------------------------ 312/10/02: jmdyck 4 Various minor adjustments to the code that Dave checked in today. 5 Updated test/yacc_{inf,unused}.exp to reflect today's changes. 6 712/10/02: beazley 8 Incorporated a variety of minor bug fixes to empty production --- 150 unchanged lines hidden ---	580Version 1.3 581------------------------------ 58212/10/02: jmdyck 583 Various minor adjustments to the code that Dave checked in today. 584 Updated test/yacc_{inf,unused}.exp to reflect today's changes. 585 58612/10/02: beazley 587 Incorporated a variety of minor bug fixes to empty production --- 150 unchanged lines hidden ---

CHANGES (2632:1bb2f91485ea)

CHANGES (4479:61d3ed46e373)

1Version 2.3
2-----------------------------
302/20/07: beazley
4 Fixed a bug with character literals if the literal '.' appeared as the
5 last symbol of a grammar rule. Reported by Ales Smrcka.
6
702/19/07: beazley
8 Warning messages are now redirected to stderr instead of being printed
9 to standard output.
10
1102/19/07: beazley
12 Added a warning message to lex.py if it detects a literal backslash
13 character inside the t_ignore declaration. This is to help
14 problems that might occur if someone accidentally defines t_ignore
15 as a Python raw string. For example:
16
17 t_ignore = r' \t'
18
19 The idea for this is from an email I received from David Cimimi who
20 reported bizarre behavior in lexing as a result of defining t_ignore
21 as a raw string by accident.
22
2302/18/07: beazley
24 Performance improvements. Made some changes to the internal
25 table organization and LR parser to improve parsing performance.
26
2702/18/07: beazley
28 Automatic tracking of line number and position information must now be
29 enabled by a special flag to parse(). For example:
30
31 yacc.parse(data,tracking=True)
32
33 In many applications, it's just not that important to have the
34 parser automatically track all line numbers. By making this an
35 optional feature, it allows the parser to run significantly faster
36 (more than a 20% speed increase in many cases). Note: positional
37 information is always available for raw tokens---this change only
38 applies to positional information associated with nonterminal
39 grammar symbols.
40 *** POTENTIAL INCOMPATIBILITY ***
41
4202/18/07: beazley
43 Yacc no longer supports extended slices of grammar productions.
44 However, it does support regular slices. For example:
45
46 def p_foo(p):
47 '''foo: a b c d e'''
48 p[0] = p[1:3]
49
50 This change is a performance improvement to the parser--it streamlines
51 normal access to the grammar values since slices are now handled in
52 a __getslice__() method as opposed to __getitem__().
53
5402/12/07: beazley
55 Fixed a bug in the handling of token names when combined with
56 start conditions. Bug reported by Todd O'Bryan.
57
58Version 2.2
59------------------------------
6011/01/06: beazley
61 Added lexpos() and lexspan() methods to grammar symbols. These
62 mirror the same functionality of lineno() and linespan(). For
63 example:
64
65 def p_expr(p):
66 'expr : expr PLUS expr'
67 p.lexpos(1) # Lexing position of left-hand-expression
68 p.lexpos(1) # Lexing position of PLUS
69 start,end = p.lexspan(3) # Lexing range of right hand expression
70
7111/01/06: beazley
72 Minor change to error handling. The recommended way to skip characters
73 in the input is to use t.lexer.skip() as shown here:
74
75 def t_error(t):
76 print "Illegal character '%s'" % t.value[0]
77 t.lexer.skip(1)
78
79 The old approach of just using t.skip(1) will still work, but won't
80 be documented.
81
8210/31/06: beazley
83 Discarded tokens can now be specified as simple strings instead of
84 functions. To do this, simply include the text "ignore_" in the
85 token declaration. For example:
86
87 t_ignore_cppcomment = r'//.*'
88
89 Previously, this had to be done with a function. For example:
90
91 def t_ignore_cppcomment(t):
92 r'//.*'
93 pass
94
95 If start conditions/states are being used, state names should appear
96 before the "ignore_" text.
97
9810/19/06: beazley
99 The Lex module now provides support for flex-style start conditions
100 as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html.
101 Please refer to this document to understand this change note. Refer to
102 the PLY documentation for PLY-specific explanation of how this works.
103
104 To use start conditions, you first need to declare a set of states in
105 your lexer file:
106
107 states = (
108 ('foo','exclusive'),
109 ('bar','inclusive')
110 )
111
112 This serves the same role as the %s and %x specifiers in flex.
113
114 One a state has been declared, tokens for that state can be
115 declared by defining rules of the form t_state_TOK. For example:
116
117 t_PLUS = '\+' # Rule defined in INITIAL state
118 t_foo_NUM = '\d+' # Rule defined in foo state
119 t_bar_NUM = '\d+' # Rule defined in bar state
120
121 t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar
122 t_ANY_NUM = '\d+' # Rule defined in all states
123
124 In addition to defining tokens for each state, the t_ignore and t_error
125 specifications can be customized for specific states. For example:
126
127 t_foo_ignore = " " # Ignored characters for foo state
128 def t_bar_error(t):
129 # Handle errors in bar state
130
131 With token rules, the following methods can be used to change states
132
133 def t_TOKNAME(t):
134 t.lexer.begin('foo') # Begin state 'foo'
135 t.lexer.push_state('foo') # Begin state 'foo', push old state
136 # onto a stack
137 t.lexer.pop_state() # Restore previous state
138 t.lexer.current_state() # Returns name of current state
139
140 These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and
141 yy_top_state() functions in flex.
142
143 The use of start states can be used as one way to write sub-lexers.
144 For example, the lexer or parser might instruct the lexer to start
145 generating a different set of tokens depending on the context.
146
147 example/yply/ylex.py shows the use of start states to grab C/C++
148 code fragments out of traditional yacc specification files.
149
150 *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also
151 discussed various aspects of the design.
152
15310/19/06: beazley
154 Minor change to the way in which yacc.py was reporting shift/reduce
155 conflicts. Although the underlying LALR(1) algorithm was correct,
156 PLY was under-reporting the number of conflicts compared to yacc/bison
157 when precedence rules were in effect. This change should make PLY
158 report the same number of conflicts as yacc.
159
16010/19/06: beazley
161 Modified yacc so that grammar rules could also include the '-'
162 character. For example:
163
164 def p_expr_list(p):
165 'expression-list : expression-list expression'
166
167 Suggested by Oldrich Jedlicka.
168
16910/18/06: beazley
170 Attribute lexer.lexmatch added so that token rules can access the re
171 match object that was generated. For example:
172
173 def t_FOO(t):
174 r'some regex'
175 m = t.lexer.lexmatch
176 # Do something with m
177
178
179 This may be useful if you want to access named groups specified within
180 the regex for a specific token. Suggested by Oldrich Jedlicka.
181
18210/16/06: beazley
183 Changed the error message that results if an illegal character
184 is encountered and no default error function is defined in lex.
185 The exception is now more informative about the actual cause of
186 the error.
187
188Version 2.1
189------------------------------
19010/02/06: beazley
191 The last Lexer object built by lex() can be found in lex.lexer.
192 The last Parser object built by yacc() can be found in yacc.parser.
193
19410/02/06: beazley
195 New example added: examples/yply
196
197 This example uses PLY to convert Unix-yacc specification files to
198 PLY programs with the same grammar. This may be useful if you
199 want to convert a grammar from bison/yacc to use with PLY.
200
20110/02/06: beazley
202 Added support for a start symbol to be specified in the yacc
203 input file itself. Just do this:
204
205 start = 'name'
206
207 where 'name' matches some grammar rule. For example:
208
209 def p_name(p):
210 'name : A B C'
211 ...
212
213 This mirrors the functionality of the yacc %start specifier.
214
21509/30/06: beazley
216 Some new examples added.:
217
218 examples/GardenSnake : A simple indentation based language similar
219 to Python. Shows how you might handle
220 whitespace. Contributed by Andrew Dalke.
221
222 examples/BASIC : An implementation of 1964 Dartmouth BASIC.
223 Contributed by Dave against his better
224 judgement.
225
22609/28/06: beazley
227 Minor patch to allow named groups to be used in lex regular
228 expression rules. For example:
229
230 t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)'''
231
232 Patch submitted by Adam Ring.
233
23409/28/06: beazley
235 LALR(1) is now the default parsing method. To use SLR, use
236 yacc.yacc(method="SLR"). Note: there is no performance impact
237 on parsing when using LALR(1) instead of SLR. However, constructing
238 the parsing tables will take a little longer.
239
24009/26/06: beazley
241 Change to line number tracking. To modify line numbers, modify
242 the line number of the lexer itself. For example:
243
244 def t_NEWLINE(t):
245 r'\n'
246 t.lexer.lineno += 1
247
248 This modification is both cleanup and a performance optimization.
249 In past versions, lex was monitoring every token for changes in
250 the line number. This extra processing is unnecessary for a vast
251 majority of tokens. Thus, this new approach cleans it up a bit.
252
253 *** POTENTIAL INCOMPATIBILITY ***
254 You will need to change code in your lexer that updates the line
255 number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1"
256
25709/26/06: beazley
258 Added the lexing position to tokens as an attribute lexpos. This
259 is the raw index into the input text at which a token appears.
260 This information can be used to compute column numbers and other
261 details (e.g., scan backwards from lexpos to the first newline
262 to get a column position).
263
26409/25/06: beazley
265 Changed the name of the __copy__() method on the Lexer class
266 to clone(). This is used to clone a Lexer object (e.g., if
267 you're running different lexers at the same time).
268
26909/21/06: beazley
270 Limitations related to the use of the re module have been eliminated.
271 Several users reported problems with regular expressions exceeding
272 more than 100 named groups. To solve this, lex.py is now capable
273 of automatically splitting its master regular regular expression into
274 smaller expressions as needed. This should, in theory, make it
275 possible to specify an arbitrarily large number of tokens.
276
27709/21/06: beazley
278 Improved error checking in lex.py. Rules that match the empty string
279 are now rejected (otherwise they cause the lexer to enter an infinite
280 loop). An extra check for rules containing '#' has also been added.
281 Since lex compiles regular expressions in verbose mode, '#' is interpreted
282 as a regex comment, it is critical to use '\#' instead.
283
28409/18/06: beazley
285 Added a @TOKEN decorator function to lex.py that can be used to
286 define token rules where the documentation string might be computed
287 in some way.
288
289 digit = r'([0-9])'
290 nondigit = r'([_A-Za-z])'
291 identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)'
292
293 from ply.lex import TOKEN
294
295 @TOKEN(identifier)
296 def t_ID(t):
297 # Do whatever
298
299 The @TOKEN decorator merely sets the documentation string of the
300 associated token function as needed for lex to work.
301
302 Note: An alternative solution is the following:
303
304 def t_ID(t):
305 # Do whatever
306
307 t_ID.__doc__ = identifier
308
309 Note: Decorators require the use of Python 2.4 or later. If compatibility
310 with old versions is needed, use the latter solution.
311
312 The need for this feature was suggested by Cem Karan.
313
31409/14/06: beazley
315 Support for single-character literal tokens has been added to yacc.
316 These literals must be enclosed in quotes. For example:
317
318 def p_expr(p):
319 "expr : expr '+' expr"
320 ...
321
322 def p_expr(p):
323 'expr : expr "-" expr'
324 ...
325
326 In addition to this, it is necessary to tell the lexer module about
327 literal characters. This is done by defining the variable 'literals'
328 as a list of characters. This should be defined in the module that
329 invokes the lex.lex() function. For example:
330
331 literals = ['+','-','*','/','(',')','=']
332
333 or simply
334
335 literals = '+=*/()='
336
337 It is important to note that literals can only be a single character.
338 When the lexer fails to match a token using its normal regular expression
339 rules, it will check the current character against the literal list.
340 If found, it will be returned with a token type set to match the literal
341 character. Otherwise, an illegal character will be signalled.
342
343
34409/14/06: beazley
345 Modified PLY to install itself as a proper Python package called 'ply'.
346 This will make it a little more friendly to other modules. This
347 changes the usage of PLY only slightly. Just do this to import the
348 modules
349
350 import ply.lex as lex
351 import ply.yacc as yacc
352
353 Alternatively, you can do this:
354
355 from ply import *
356
357 Which imports both the lex and yacc modules.
358 Change suggested by Lee June.
359
36009/13/06: beazley
361 Changed the handling of negative indices when used in production rules.
362 A negative production index now accesses already parsed symbols on the
363 parsing stack. For example,
364
365 def p_foo(p):
366 "foo: A B C D"
367 print p[1] # Value of 'A' symbol
368 print p[2] # Value of 'B' symbol
369 print p[-1] # Value of whatever symbol appears before A
370 # on the parsing stack.
371
372 p[0] = some_val # Sets the value of the 'foo' grammer symbol
373
374 This behavior makes it easier to work with embedded actions within the
375 parsing rules. For example, in C-yacc, it is possible to write code like
376 this:
377
378 bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; }
379
380 In this example, the printf() code executes immediately after A has been
381 parsed. Within the embedded action code, $1 refers to the A symbol on
382 the stack.
383
384 To perform this equivalent action in PLY, you need to write a pair
385 of rules like this:
386
387 def p_bar(p):
388 "bar : A seen_A B"
389 do_stuff
390
391 def p_seen_A(p):
392 "seen_A :"
393 print "seen an A =", p[-1]
394
395 The second rule "seen_A" is merely a empty production which should be
396 reduced as soon as A is parsed in the "bar" rule above. The use
397 of the negative index p[-1] is used to access whatever symbol appeared
398 before the seen_A symbol.
399
400 This feature also makes it possible to support inherited attributes.
401 For example:
402
403 def p_decl(p):
404 "decl : scope name"
405
406 def p_scope(p):
407 """scope : GLOBAL
408 | LOCAL"""
409 p[0] = p[1]
410
411 def p_name(p):
412 "name : ID"
413 if p[-1] == "GLOBAL":
414 # ...
415 else if p[-1] == "LOCAL":
416 #...
417
418 In this case, the name rule is inheriting an attribute from the
419 scope declaration that precedes it.
420
421 *** POTENTIAL INCOMPATIBILITY ***
422 If you are currently using negative indices within existing grammar rules,
423 your code will break. This should be extremely rare if non-existent in
424 most cases. The argument to various grammar rules is not usually not
425 processed in the same way as a list of items.
426
427Version 2.0
428------------------------------
42909/07/06: beazley
430 Major cleanup and refactoring of the LR table generation code. Both SLR
431 and LALR(1) table generation is now performed by the same code base with
432 only minor extensions for extra LALR(1) processing.
433
43409/07/06: beazley
435 Completely reimplemented the entire LALR(1) parsing engine to use the
436 DeRemer and Pennello algorithm for calculating lookahead sets. This
437 significantly improves the performance of generating LALR(1) tables
438 and has the added feature of actually working correctly! If you
439 experienced weird behavior with LALR(1) in prior releases, this should
440 hopefully resolve all of those problems. Many thanks to
441 Andrew Waters and Markus Schoepflin for submitting bug reports
442 and helping me test out the revised LALR(1) support.
443
444Version 1.8
445------------------------------
44608/02/06: beazley
447 Fixed a problem related to the handling of default actions in LALR(1)
448 parsing. If you experienced subtle and/or bizarre behavior when trying
449 to use the LALR(1) engine, this may correct those problems. Patch
450 contributed by Russ Cox. Note: This patch has been superceded by
451 revisions for LALR(1) parsing in Ply-2.0.
452
45308/02/06: beazley
454 Added support for slicing of productions in yacc.
455 Patch contributed by Patrick Mezard.
456
457Version 1.7
458------------------------------
45903/02/06: beazley
460 Fixed infinite recursion problem ReduceToTerminals() function that
461 would sometimes come up in LALR(1) table generation. Reported by
462 Markus Schoepflin.
463
46403/01/06: beazley
465 Added "reflags" argument to lex(). For example:
466
467 lex.lex(reflags=re.UNICODE)
468
469 This can be used to specify optional flags to the re.compile() function
470 used inside the lexer. This may be necessary for special situations such
471 as processing Unicode (e.g., if you want escapes like \w and \b to consult
472 the Unicode character property database). The need for this suggested by
473 Andreas Jung.
474
47503/01/06: beazley
476 Fixed a bug with an uninitialized variable on repeated instantiations of parser
477 objects when the write_tables=0 argument was used. Reported by Michael Brown.
478
47903/01/06: beazley
480 Modified lex.py to accept Unicode strings both as the regular expressions for
481 tokens and as input. Hopefully this is the only change needed for Unicode support.
482 Patch contributed by Johan Dahl.
483
48403/01/06: beazley
485 Modified the class-based interface to work with new-style or old-style classes.
486 Patch contributed by Michael Brown (although I tweaked it slightly so it would work
487 with older versions of Python).
488
489Version 1.6
490------------------------------
49105/27/05: beazley
492 Incorporated patch contributed by Christopher Stawarz to fix an extremely
493 devious bug in LALR(1) parser generation. This patch should fix problems
494 numerous people reported with LALR parsing.
495
49605/27/05: beazley
497 Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav,
498 and Thad Austin.
499
50005/27/05: beazley
501 Added outputdir option to yacc() to control output directory. Contributed
502 by Christopher Stawarz.
503
50405/27/05: beazley
505 Added rununit.py test script to run tests using the Python unittest module.
506 Contributed by Miki Tebeka.
507
508Version 1.5
509------------------------------
51005/26/04: beazley
511 Major enhancement. LALR(1) parsing support is now working.
512 This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu)
513 and optimized by David Beazley. To use LALR(1) parsing do
514 the following:
515
516 yacc.yacc(method="LALR")
517
518 Computing LALR(1) parsing tables takes about twice as long as
519 the default SLR method. However, LALR(1) allows you to handle
520 more complex grammars. For example, the ANSI C grammar
521 (in example/ansic) has 13 shift-reduce conflicts with SLR, but
522 only has 1 shift-reduce conflict with LALR(1).
523
52405/20/04: beazley
525 Added a __len__ method to parser production lists. Can
526 be used in parser rules like this:
527
528 def p_somerule(p):
529 """a : B C D
530 | E F"
531 if (len(p) == 3):
532 # Must have been first rule
533 elif (len(p) == 2):
534 # Must be second rule
535
536 Suggested by Joshua Gerth and others.
537
538Version 1.4
539------------------------------
54004/23/04: beazley
541 Incorporated a variety of patches contributed by Eric Raymond.
542 These include:
543
544 0. Cleans up some comments so they don't wrap on an 80-column display.
545 1. Directs compiler errors to stderr where they belong.
546 2. Implements and documents automatic line counting when \n is ignored.
547 3. Changes the way progress messages are dumped when debugging is on.
548 The new format is both less verbose and conveys more information than
549 the old, including shift and reduce actions.
550
55104/23/04: beazley
552 Added a Python setup.py file to simply installation. Contributed
553 by Adam Kerrison.
554
55504/23/04: beazley
556 Added patches contributed by Adam Kerrison.
557
558 - Some output is now only shown when debugging is enabled. This
559 means that PLY will be completely silent when not in debugging mode.
560
561 - An optional parameter "write_tables" can be passed to yacc() to
562 control whether or not parsing tables are written. By default,
563 it is true, but it can be turned off if you don't want the yacc
564 table file. Note: disabling this will cause yacc() to regenerate
565 the parsing table each time.
566
56704/23/04: beazley
568 Added patches contributed by David McNab. This patch addes two
569 features:
570
571 - The parser can be supplied as a class instead of a module.
572 For an example of this, see the example/classcalc directory.
573
574 - Debugging output can be directed to a filename of the user's
575 choice. Use
576
577 yacc(debugfile="somefile.out")
578
579

1Version 1.3
2------------------------------
312/10/02: jmdyck
4 Various minor adjustments to the code that Dave checked in today.
5 Updated test/yacc_{inf,unused}.exp to reflect today's changes.
6
712/10/02: beazley
8 Incorporated a variety of minor bug fixes to empty production

--- 150 unchanged lines hidden ---

580Version 1.3
581------------------------------
58212/10/02: jmdyck
583 Various minor adjustments to the code that Dave checked in today.
584 Updated test/yacc_{inf,unused}.exp to reflect today's changes.
585
58612/10/02: beazley
587 Incorporated a variety of minor bug fixes to empty production

--- 150 unchanged lines hidden ---