CHANGES revision 6498
13534Sgblack@eecs.umich.edu 23534Sgblack@eecs.umich.eduVersion 3.2 33534Sgblack@eecs.umich.edu----------------------------- 43534Sgblack@eecs.umich.edu03/24/09: beazley 53534Sgblack@eecs.umich.edu Added an extra check to not print duplicated warning messages 63534Sgblack@eecs.umich.edu about reduce/reduce conflicts. 73534Sgblack@eecs.umich.edu 83534Sgblack@eecs.umich.edu03/24/09: beazley 93534Sgblack@eecs.umich.edu Switched PLY over to a BSD-license. 103534Sgblack@eecs.umich.edu 113534Sgblack@eecs.umich.edu03/23/09: beazley 123534Sgblack@eecs.umich.edu Performance optimization. Discovered a few places to make 133534Sgblack@eecs.umich.edu speedups in LR table generation. 143534Sgblack@eecs.umich.edu 153534Sgblack@eecs.umich.edu03/23/09: beazley 163534Sgblack@eecs.umich.edu New warning message. PLY now warns about rules never 173534Sgblack@eecs.umich.edu reduced due to reduce/reduce conflicts. Suggested by 183534Sgblack@eecs.umich.edu Bruce Frederiksen. 193534Sgblack@eecs.umich.edu 203534Sgblack@eecs.umich.edu03/23/09: beazley 213534Sgblack@eecs.umich.edu Some clean-up of warning messages related to reduce/reduce errors. 223534Sgblack@eecs.umich.edu 233534Sgblack@eecs.umich.edu03/23/09: beazley 243534Sgblack@eecs.umich.edu Added a new picklefile option to yacc() to write the parsing 253534Sgblack@eecs.umich.edu tables to a filename using the pickle module. Here is how 263534Sgblack@eecs.umich.edu it works: 273534Sgblack@eecs.umich.edu 283534Sgblack@eecs.umich.edu yacc(picklefile="parsetab.p") 293534Sgblack@eecs.umich.edu 303534Sgblack@eecs.umich.edu This option can be used if the normal parsetab.py file is 313534Sgblack@eecs.umich.edu extremely large. For example, on jython, it is impossible 324202Sbinkertn@umich.edu to read parsing tables if the parsetab.py exceeds a certain 333534Sgblack@eecs.umich.edu threshold. 344202Sbinkertn@umich.edu 354202Sbinkertn@umich.edu The filename supplied to the picklefile option is opened 364202Sbinkertn@umich.edu relative to the current working directory of the Python 374202Sbinkertn@umich.edu interpreter. If you need to refer to the file elsewhere, 384202Sbinkertn@umich.edu you will need to supply an absolute or relative path. 39 40 For maximum portability, the pickle file is written 41 using protocol 0. 42 4303/13/09: beazley 44 Fixed a bug in parser.out generation where the rule numbers 45 where off by one. 46 4703/13/09: beazley 48 Fixed a string formatting bug with one of the error messages. 49 Reported by Richard Reitmeyer 50 51Version 3.1 52----------------------------- 5302/28/09: beazley 54 Fixed broken start argument to yacc(). PLY-3.0 broke this 55 feature by accident. 56 5702/28/09: beazley 58 Fixed debugging output. yacc() no longer reports shift/reduce 59 or reduce/reduce conflicts if debugging is turned off. This 60 restores similar behavior in PLY-2.5. Reported by Andrew Waters. 61 62Version 3.0 63----------------------------- 6402/03/09: beazley 65 Fixed missing lexer attribute on certain tokens when 66 invoking the parser p_error() function. Reported by 67 Bart Whiteley. 68 6902/02/09: beazley 70 The lex() command now does all error-reporting and diagonistics 71 using the logging module interface. Pass in a Logger object 72 using the errorlog parameter to specify a different logger. 73 7402/02/09: beazley 75 Refactored ply.lex to use a more object-oriented and organized 76 approach to collecting lexer information. 77 7802/01/09: beazley 79 Removed the nowarn option from lex(). All output is controlled 80 by passing in a logger object. Just pass in a logger with a high 81 level setting to suppress output. This argument was never 82 documented to begin with so hopefully no one was relying upon it. 83 8402/01/09: beazley 85 Discovered and removed a dead if-statement in the lexer. This 86 resulted in a 6-7% speedup in lexing when I tested it. 87 8801/13/09: beazley 89 Minor change to the procedure for signalling a syntax error in a 90 production rule. A normal SyntaxError exception should be raised 91 instead of yacc.SyntaxError. 92 9301/13/09: beazley 94 Added a new method p.set_lineno(n,lineno) that can be used to set the 95 line number of symbol n in grammar rules. This simplifies manual 96 tracking of line numbers. 97 9801/11/09: beazley 99 Vastly improved debugging support for yacc.parse(). Instead of passing 100 debug as an integer, you can supply a Logging object (see the logging 101 module). Messages will be generated at the ERROR, INFO, and DEBUG 102 logging levels, each level providing progressively more information. 103 The debugging trace also shows states, grammar rule, values passed 104 into grammar rules, and the result of each reduction. 105 10601/09/09: beazley 107 The yacc() command now does all error-reporting and diagnostics using 108 the interface of the logging module. Use the errorlog parameter to 109 specify a logging object for error messages. Use the debuglog parameter 110 to specify a logging object for the 'parser.out' output. 111 11201/09/09: beazley 113 *HUGE* refactoring of the the ply.yacc() implementation. The high-level 114 user interface is backwards compatible, but the internals are completely 115 reorganized into classes. No more global variables. The internals 116 are also more extensible. For example, you can use the classes to 117 construct a LALR(1) parser in an entirely different manner than 118 what is currently the case. Documentation is forthcoming. 119 12001/07/09: beazley 121 Various cleanup and refactoring of yacc internals. 122 12301/06/09: beazley 124 Fixed a bug with precedence assignment. yacc was assigning the precedence 125 each rule based on the left-most token, when in fact, it should have been 126 using the right-most token. Reported by Bruce Frederiksen. 127 12811/27/08: beazley 129 Numerous changes to support Python 3.0 including removal of deprecated 130 statements (e.g., has_key) and the additional of compatibility code 131 to emulate features from Python 2 that have been removed, but which 132 are needed. Fixed the unit testing suite to work with Python 3.0. 133 The code should be backwards compatible with Python 2. 134 13511/26/08: beazley 136 Loosened the rules on what kind of objects can be passed in as the 137 "module" parameter to lex() and yacc(). Previously, you could only use 138 a module or an instance. Now, PLY just uses dir() to get a list of 139 symbols on whatever the object is without regard for its type. 140 14111/26/08: beazley 142 Changed all except: statements to be compatible with Python2.x/3.x syntax. 143 14411/26/08: beazley 145 Changed all raise Exception, value statements to raise Exception(value) for 146 forward compatibility. 147 14811/26/08: beazley 149 Removed all print statements from lex and yacc, using sys.stdout and sys.stderr 150 directly. Preparation for Python 3.0 support. 151 15211/04/08: beazley 153 Fixed a bug with referring to symbols on the the parsing stack using negative 154 indices. 155 15605/29/08: beazley 157 Completely revamped the testing system to use the unittest module for everything. 158 Added additional tests to cover new errors/warnings. 159 160Version 2.5 161----------------------------- 16205/28/08: beazley 163 Fixed a bug with writing lex-tables in optimized mode and start states. 164 Reported by Kevin Henry. 165 166Version 2.4 167----------------------------- 16805/04/08: beazley 169 A version number is now embedded in the table file signature so that 170 yacc can more gracefully accomodate changes to the output format 171 in the future. 172 17305/04/08: beazley 174 Removed undocumented .pushback() method on grammar productions. I'm 175 not sure this ever worked and can't recall ever using it. Might have 176 been an abandoned idea that never really got fleshed out. This 177 feature was never described or tested so removing it is hopefully 178 harmless. 179 18005/04/08: beazley 181 Added extra error checking to yacc() to detect precedence rules defined 182 for undefined terminal symbols. This allows yacc() to detect a potential 183 problem that can be really tricky to debug if no warning message or error 184 message is generated about it. 185 18605/04/08: beazley 187 lex() now has an outputdir that can specify the output directory for 188 tables when running in optimize mode. For example: 189 190 lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar") 191 192 The behavior of specifying a table module and output directory are 193 more aligned with the behavior of yacc(). 194 19505/04/08: beazley 196 [Issue 9] 197 Fixed filename bug in when specifying the modulename in lex() and yacc(). 198 If you specified options such as the following: 199 200 parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar") 201 202 yacc would create a file "foo.bar.parsetab.py" in the given directory. 203 Now, it simply generates a file "parsetab.py" in that directory. 204 Bug reported by cptbinho. 205 20605/04/08: beazley 207 Slight modification to lex() and yacc() to allow their table files 208 to be loaded from a previously loaded module. This might make 209 it easier to load the parsing tables from a complicated package 210 structure. For example: 211 212 import foo.bar.spam.parsetab as parsetab 213 parser = yacc.yacc(tabmodule=parsetab) 214 215 Note: lex and yacc will never regenerate the table file if used 216 in the form---you will get a warning message instead. 217 This idea suggested by Brian Clapper. 218 219 22004/28/08: beazley 221 Fixed a big with p_error() functions being picked up correctly 222 when running in yacc(optimize=1) mode. Patch contributed by 223 Bart Whiteley. 224 22502/28/08: beazley 226 Fixed a bug with 'nonassoc' precedence rules. Basically the 227 non-precedence was being ignored and not producing the correct 228 run-time behavior in the parser. 229 23002/16/08: beazley 231 Slight relaxation of what the input() method to a lexer will 232 accept as a string. Instead of testing the input to see 233 if the input is a string or unicode string, it checks to see 234 if the input object looks like it contains string data. 235 This change makes it possible to pass string-like objects 236 in as input. For example, the object returned by mmap. 237 238 import mmap, os 239 data = mmap.mmap(os.open(filename,os.O_RDONLY), 240 os.path.getsize(filename), 241 access=mmap.ACCESS_READ) 242 lexer.input(data) 243 244 24511/29/07: beazley 246 Modification of ply.lex to allow token functions to aliased. 247 This is subtle, but it makes it easier to create libraries and 248 to reuse token specifications. For example, suppose you defined 249 a function like this: 250 251 def number(t): 252 r'\d+' 253 t.value = int(t.value) 254 return t 255 256 This change would allow you to define a token rule as follows: 257 258 t_NUMBER = number 259 260 In this case, the token type will be set to 'NUMBER' and use 261 the associated number() function to process tokens. 262 26311/28/07: beazley 264 Slight modification to lex and yacc to grab symbols from both 265 the local and global dictionaries of the caller. This 266 modification allows lexers and parsers to be defined using 267 inner functions and closures. 268 26911/28/07: beazley 270 Performance optimization: The lexer.lexmatch and t.lexer 271 attributes are no longer set for lexer tokens that are not 272 defined by functions. The only normal use of these attributes 273 would be in lexer rules that need to perform some kind of 274 special processing. Thus, it doesn't make any sense to set 275 them on every token. 276 277 *** POTENTIAL INCOMPATIBILITY *** This might break code 278 that is mucking around with internal lexer state in some 279 sort of magical way. 280 28111/27/07: beazley 282 Added the ability to put the parser into error-handling mode 283 from within a normal production. To do this, simply raise 284 a yacc.SyntaxError exception like this: 285 286 def p_some_production(p): 287 'some_production : prod1 prod2' 288 ... 289 raise yacc.SyntaxError # Signal an error 290 291 A number of things happen after this occurs: 292 293 - The last symbol shifted onto the symbol stack is discarded 294 and parser state backed up to what it was before the 295 the rule reduction. 296 297 - The current lookahead symbol is saved and replaced by 298 the 'error' symbol. 299 300 - The parser enters error recovery mode where it tries 301 to either reduce the 'error' rule or it starts 302 discarding items off of the stack until the parser 303 resets. 304 305 When an error is manually set, the parser does *not* call 306 the p_error() function (if any is defined). 307 *** NEW FEATURE *** Suggested on the mailing list 308 30911/27/07: beazley 310 Fixed structure bug in examples/ansic. Reported by Dion Blazakis. 311 31211/27/07: beazley 313 Fixed a bug in the lexer related to start conditions and ignored 314 token rules. If a rule was defined that changed state, but 315 returned no token, the lexer could be left in an inconsistent 316 state. Reported by 317 31811/27/07: beazley 319 Modified setup.py to support Python Eggs. Patch contributed by 320 Simon Cross. 321 32211/09/07: beazely 323 Fixed a bug in error handling in yacc. If a syntax error occurred and the 324 parser rolled the entire parse stack back, the parser would be left in in 325 inconsistent state that would cause it to trigger incorrect actions on 326 subsequent input. Reported by Ton Biegstraaten, Justin King, and others. 327 32811/09/07: beazley 329 Fixed a bug when passing empty input strings to yacc.parse(). This 330 would result in an error message about "No input given". Reported 331 by Andrew Dalke. 332 333Version 2.3 334----------------------------- 33502/20/07: beazley 336 Fixed a bug with character literals if the literal '.' appeared as the 337 last symbol of a grammar rule. Reported by Ales Smrcka. 338 33902/19/07: beazley 340 Warning messages are now redirected to stderr instead of being printed 341 to standard output. 342 34302/19/07: beazley 344 Added a warning message to lex.py if it detects a literal backslash 345 character inside the t_ignore declaration. This is to help 346 problems that might occur if someone accidentally defines t_ignore 347 as a Python raw string. For example: 348 349 t_ignore = r' \t' 350 351 The idea for this is from an email I received from David Cimimi who 352 reported bizarre behavior in lexing as a result of defining t_ignore 353 as a raw string by accident. 354 35502/18/07: beazley 356 Performance improvements. Made some changes to the internal 357 table organization and LR parser to improve parsing performance. 358 35902/18/07: beazley 360 Automatic tracking of line number and position information must now be 361 enabled by a special flag to parse(). For example: 362 363 yacc.parse(data,tracking=True) 364 365 In many applications, it's just not that important to have the 366 parser automatically track all line numbers. By making this an 367 optional feature, it allows the parser to run significantly faster 368 (more than a 20% speed increase in many cases). Note: positional 369 information is always available for raw tokens---this change only 370 applies to positional information associated with nonterminal 371 grammar symbols. 372 *** POTENTIAL INCOMPATIBILITY *** 373 37402/18/07: beazley 375 Yacc no longer supports extended slices of grammar productions. 376 However, it does support regular slices. For example: 377 378 def p_foo(p): 379 '''foo: a b c d e''' 380 p[0] = p[1:3] 381 382 This change is a performance improvement to the parser--it streamlines 383 normal access to the grammar values since slices are now handled in 384 a __getslice__() method as opposed to __getitem__(). 385 38602/12/07: beazley 387 Fixed a bug in the handling of token names when combined with 388 start conditions. Bug reported by Todd O'Bryan. 389 390Version 2.2 391------------------------------ 39211/01/06: beazley 393 Added lexpos() and lexspan() methods to grammar symbols. These 394 mirror the same functionality of lineno() and linespan(). For 395 example: 396 397 def p_expr(p): 398 'expr : expr PLUS expr' 399 p.lexpos(1) # Lexing position of left-hand-expression 400 p.lexpos(1) # Lexing position of PLUS 401 start,end = p.lexspan(3) # Lexing range of right hand expression 402 40311/01/06: beazley 404 Minor change to error handling. The recommended way to skip characters 405 in the input is to use t.lexer.skip() as shown here: 406 407 def t_error(t): 408 print "Illegal character '%s'" % t.value[0] 409 t.lexer.skip(1) 410 411 The old approach of just using t.skip(1) will still work, but won't 412 be documented. 413 41410/31/06: beazley 415 Discarded tokens can now be specified as simple strings instead of 416 functions. To do this, simply include the text "ignore_" in the 417 token declaration. For example: 418 419 t_ignore_cppcomment = r'//.*' 420 421 Previously, this had to be done with a function. For example: 422 423 def t_ignore_cppcomment(t): 424 r'//.*' 425 pass 426 427 If start conditions/states are being used, state names should appear 428 before the "ignore_" text. 429 43010/19/06: beazley 431 The Lex module now provides support for flex-style start conditions 432 as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html. 433 Please refer to this document to understand this change note. Refer to 434 the PLY documentation for PLY-specific explanation of how this works. 435 436 To use start conditions, you first need to declare a set of states in 437 your lexer file: 438 439 states = ( 440 ('foo','exclusive'), 441 ('bar','inclusive') 442 ) 443 444 This serves the same role as the %s and %x specifiers in flex. 445 446 One a state has been declared, tokens for that state can be 447 declared by defining rules of the form t_state_TOK. For example: 448 449 t_PLUS = '\+' # Rule defined in INITIAL state 450 t_foo_NUM = '\d+' # Rule defined in foo state 451 t_bar_NUM = '\d+' # Rule defined in bar state 452 453 t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar 454 t_ANY_NUM = '\d+' # Rule defined in all states 455 456 In addition to defining tokens for each state, the t_ignore and t_error 457 specifications can be customized for specific states. For example: 458 459 t_foo_ignore = " " # Ignored characters for foo state 460 def t_bar_error(t): 461 # Handle errors in bar state 462 463 With token rules, the following methods can be used to change states 464 465 def t_TOKNAME(t): 466 t.lexer.begin('foo') # Begin state 'foo' 467 t.lexer.push_state('foo') # Begin state 'foo', push old state 468 # onto a stack 469 t.lexer.pop_state() # Restore previous state 470 t.lexer.current_state() # Returns name of current state 471 472 These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and 473 yy_top_state() functions in flex. 474 475 The use of start states can be used as one way to write sub-lexers. 476 For example, the lexer or parser might instruct the lexer to start 477 generating a different set of tokens depending on the context. 478 479 example/yply/ylex.py shows the use of start states to grab C/C++ 480 code fragments out of traditional yacc specification files. 481 482 *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also 483 discussed various aspects of the design. 484 48510/19/06: beazley 486 Minor change to the way in which yacc.py was reporting shift/reduce 487 conflicts. Although the underlying LALR(1) algorithm was correct, 488 PLY was under-reporting the number of conflicts compared to yacc/bison 489 when precedence rules were in effect. This change should make PLY 490 report the same number of conflicts as yacc. 491 49210/19/06: beazley 493 Modified yacc so that grammar rules could also include the '-' 494 character. For example: 495 496 def p_expr_list(p): 497 'expression-list : expression-list expression' 498 499 Suggested by Oldrich Jedlicka. 500 50110/18/06: beazley 502 Attribute lexer.lexmatch added so that token rules can access the re 503 match object that was generated. For example: 504 505 def t_FOO(t): 506 r'some regex' 507 m = t.lexer.lexmatch 508 # Do something with m 509 510 511 This may be useful if you want to access named groups specified within 512 the regex for a specific token. Suggested by Oldrich Jedlicka. 513 51410/16/06: beazley 515 Changed the error message that results if an illegal character 516 is encountered and no default error function is defined in lex. 517 The exception is now more informative about the actual cause of 518 the error. 519 520Version 2.1 521------------------------------ 52210/02/06: beazley 523 The last Lexer object built by lex() can be found in lex.lexer. 524 The last Parser object built by yacc() can be found in yacc.parser. 525 52610/02/06: beazley 527 New example added: examples/yply 528 529 This example uses PLY to convert Unix-yacc specification files to 530 PLY programs with the same grammar. This may be useful if you 531 want to convert a grammar from bison/yacc to use with PLY. 532 53310/02/06: beazley 534 Added support for a start symbol to be specified in the yacc 535 input file itself. Just do this: 536 537 start = 'name' 538 539 where 'name' matches some grammar rule. For example: 540 541 def p_name(p): 542 'name : A B C' 543 ... 544 545 This mirrors the functionality of the yacc %start specifier. 546 54709/30/06: beazley 548 Some new examples added.: 549 550 examples/GardenSnake : A simple indentation based language similar 551 to Python. Shows how you might handle 552 whitespace. Contributed by Andrew Dalke. 553 554 examples/BASIC : An implementation of 1964 Dartmouth BASIC. 555 Contributed by Dave against his better 556 judgement. 557 55809/28/06: beazley 559 Minor patch to allow named groups to be used in lex regular 560 expression rules. For example: 561 562 t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)''' 563 564 Patch submitted by Adam Ring. 565 56609/28/06: beazley 567 LALR(1) is now the default parsing method. To use SLR, use 568 yacc.yacc(method="SLR"). Note: there is no performance impact 569 on parsing when using LALR(1) instead of SLR. However, constructing 570 the parsing tables will take a little longer. 571 57209/26/06: beazley 573 Change to line number tracking. To modify line numbers, modify 574 the line number of the lexer itself. For example: 575 576 def t_NEWLINE(t): 577 r'\n' 578 t.lexer.lineno += 1 579 580 This modification is both cleanup and a performance optimization. 581 In past versions, lex was monitoring every token for changes in 582 the line number. This extra processing is unnecessary for a vast 583 majority of tokens. Thus, this new approach cleans it up a bit. 584 585 *** POTENTIAL INCOMPATIBILITY *** 586 You will need to change code in your lexer that updates the line 587 number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1" 588 58909/26/06: beazley 590 Added the lexing position to tokens as an attribute lexpos. This 591 is the raw index into the input text at which a token appears. 592 This information can be used to compute column numbers and other 593 details (e.g., scan backwards from lexpos to the first newline 594 to get a column position). 595 59609/25/06: beazley 597 Changed the name of the __copy__() method on the Lexer class 598 to clone(). This is used to clone a Lexer object (e.g., if 599 you're running different lexers at the same time). 600 60109/21/06: beazley 602 Limitations related to the use of the re module have been eliminated. 603 Several users reported problems with regular expressions exceeding 604 more than 100 named groups. To solve this, lex.py is now capable 605 of automatically splitting its master regular regular expression into 606 smaller expressions as needed. This should, in theory, make it 607 possible to specify an arbitrarily large number of tokens. 608 60909/21/06: beazley 610 Improved error checking in lex.py. Rules that match the empty string 611 are now rejected (otherwise they cause the lexer to enter an infinite 612 loop). An extra check for rules containing '#' has also been added. 613 Since lex compiles regular expressions in verbose mode, '#' is interpreted 614 as a regex comment, it is critical to use '\#' instead. 615 61609/18/06: beazley 617 Added a @TOKEN decorator function to lex.py that can be used to 618 define token rules where the documentation string might be computed 619 in some way. 620 621 digit = r'([0-9])' 622 nondigit = r'([_A-Za-z])' 623 identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)' 624 625 from ply.lex import TOKEN 626 627 @TOKEN(identifier) 628 def t_ID(t): 629 # Do whatever 630 631 The @TOKEN decorator merely sets the documentation string of the 632 associated token function as needed for lex to work. 633 634 Note: An alternative solution is the following: 635 636 def t_ID(t): 637 # Do whatever 638 639 t_ID.__doc__ = identifier 640 641 Note: Decorators require the use of Python 2.4 or later. If compatibility 642 with old versions is needed, use the latter solution. 643 644 The need for this feature was suggested by Cem Karan. 645 64609/14/06: beazley 647 Support for single-character literal tokens has been added to yacc. 648 These literals must be enclosed in quotes. For example: 649 650 def p_expr(p): 651 "expr : expr '+' expr" 652 ... 653 654 def p_expr(p): 655 'expr : expr "-" expr' 656 ... 657 658 In addition to this, it is necessary to tell the lexer module about 659 literal characters. This is done by defining the variable 'literals' 660 as a list of characters. This should be defined in the module that 661 invokes the lex.lex() function. For example: 662 663 literals = ['+','-','*','/','(',')','='] 664 665 or simply 666 667 literals = '+=*/()=' 668 669 It is important to note that literals can only be a single character. 670 When the lexer fails to match a token using its normal regular expression 671 rules, it will check the current character against the literal list. 672 If found, it will be returned with a token type set to match the literal 673 character. Otherwise, an illegal character will be signalled. 674 675 67609/14/06: beazley 677 Modified PLY to install itself as a proper Python package called 'ply'. 678 This will make it a little more friendly to other modules. This 679 changes the usage of PLY only slightly. Just do this to import the 680 modules 681 682 import ply.lex as lex 683 import ply.yacc as yacc 684 685 Alternatively, you can do this: 686 687 from ply import * 688 689 Which imports both the lex and yacc modules. 690 Change suggested by Lee June. 691 69209/13/06: beazley 693 Changed the handling of negative indices when used in production rules. 694 A negative production index now accesses already parsed symbols on the 695 parsing stack. For example, 696 697 def p_foo(p): 698 "foo: A B C D" 699 print p[1] # Value of 'A' symbol 700 print p[2] # Value of 'B' symbol 701 print p[-1] # Value of whatever symbol appears before A 702 # on the parsing stack. 703 704 p[0] = some_val # Sets the value of the 'foo' grammer symbol 705 706 This behavior makes it easier to work with embedded actions within the 707 parsing rules. For example, in C-yacc, it is possible to write code like 708 this: 709 710 bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; } 711 712 In this example, the printf() code executes immediately after A has been 713 parsed. Within the embedded action code, $1 refers to the A symbol on 714 the stack. 715 716 To perform this equivalent action in PLY, you need to write a pair 717 of rules like this: 718 719 def p_bar(p): 720 "bar : A seen_A B" 721 do_stuff 722 723 def p_seen_A(p): 724 "seen_A :" 725 print "seen an A =", p[-1] 726 727 The second rule "seen_A" is merely a empty production which should be 728 reduced as soon as A is parsed in the "bar" rule above. The use 729 of the negative index p[-1] is used to access whatever symbol appeared 730 before the seen_A symbol. 731 732 This feature also makes it possible to support inherited attributes. 733 For example: 734 735 def p_decl(p): 736 "decl : scope name" 737 738 def p_scope(p): 739 """scope : GLOBAL 740 | LOCAL""" 741 p[0] = p[1] 742 743 def p_name(p): 744 "name : ID" 745 if p[-1] == "GLOBAL": 746 # ... 747 else if p[-1] == "LOCAL": 748 #... 749 750 In this case, the name rule is inheriting an attribute from the 751 scope declaration that precedes it. 752 753 *** POTENTIAL INCOMPATIBILITY *** 754 If you are currently using negative indices within existing grammar rules, 755 your code will break. This should be extremely rare if non-existent in 756 most cases. The argument to various grammar rules is not usually not 757 processed in the same way as a list of items. 758 759Version 2.0 760------------------------------ 76109/07/06: beazley 762 Major cleanup and refactoring of the LR table generation code. Both SLR 763 and LALR(1) table generation is now performed by the same code base with 764 only minor extensions for extra LALR(1) processing. 765 76609/07/06: beazley 767 Completely reimplemented the entire LALR(1) parsing engine to use the 768 DeRemer and Pennello algorithm for calculating lookahead sets. This 769 significantly improves the performance of generating LALR(1) tables 770 and has the added feature of actually working correctly! If you 771 experienced weird behavior with LALR(1) in prior releases, this should 772 hopefully resolve all of those problems. Many thanks to 773 Andrew Waters and Markus Schoepflin for submitting bug reports 774 and helping me test out the revised LALR(1) support. 775 776Version 1.8 777------------------------------ 77808/02/06: beazley 779 Fixed a problem related to the handling of default actions in LALR(1) 780 parsing. If you experienced subtle and/or bizarre behavior when trying 781 to use the LALR(1) engine, this may correct those problems. Patch 782 contributed by Russ Cox. Note: This patch has been superceded by 783 revisions for LALR(1) parsing in Ply-2.0. 784 78508/02/06: beazley 786 Added support for slicing of productions in yacc. 787 Patch contributed by Patrick Mezard. 788 789Version 1.7 790------------------------------ 79103/02/06: beazley 792 Fixed infinite recursion problem ReduceToTerminals() function that 793 would sometimes come up in LALR(1) table generation. Reported by 794 Markus Schoepflin. 795 79603/01/06: beazley 797 Added "reflags" argument to lex(). For example: 798 799 lex.lex(reflags=re.UNICODE) 800 801 This can be used to specify optional flags to the re.compile() function 802 used inside the lexer. This may be necessary for special situations such 803 as processing Unicode (e.g., if you want escapes like \w and \b to consult 804 the Unicode character property database). The need for this suggested by 805 Andreas Jung. 806 80703/01/06: beazley 808 Fixed a bug with an uninitialized variable on repeated instantiations of parser 809 objects when the write_tables=0 argument was used. Reported by Michael Brown. 810 81103/01/06: beazley 812 Modified lex.py to accept Unicode strings both as the regular expressions for 813 tokens and as input. Hopefully this is the only change needed for Unicode support. 814 Patch contributed by Johan Dahl. 815 81603/01/06: beazley 817 Modified the class-based interface to work with new-style or old-style classes. 818 Patch contributed by Michael Brown (although I tweaked it slightly so it would work 819 with older versions of Python). 820 821Version 1.6 822------------------------------ 82305/27/05: beazley 824 Incorporated patch contributed by Christopher Stawarz to fix an extremely 825 devious bug in LALR(1) parser generation. This patch should fix problems 826 numerous people reported with LALR parsing. 827 82805/27/05: beazley 829 Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav, 830 and Thad Austin. 831 83205/27/05: beazley 833 Added outputdir option to yacc() to control output directory. Contributed 834 by Christopher Stawarz. 835 83605/27/05: beazley 837 Added rununit.py test script to run tests using the Python unittest module. 838 Contributed by Miki Tebeka. 839 840Version 1.5 841------------------------------ 84205/26/04: beazley 843 Major enhancement. LALR(1) parsing support is now working. 844 This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu) 845 and optimized by David Beazley. To use LALR(1) parsing do 846 the following: 847 848 yacc.yacc(method="LALR") 849 850 Computing LALR(1) parsing tables takes about twice as long as 851 the default SLR method. However, LALR(1) allows you to handle 852 more complex grammars. For example, the ANSI C grammar 853 (in example/ansic) has 13 shift-reduce conflicts with SLR, but 854 only has 1 shift-reduce conflict with LALR(1). 855 85605/20/04: beazley 857 Added a __len__ method to parser production lists. Can 858 be used in parser rules like this: 859 860 def p_somerule(p): 861 """a : B C D 862 | E F" 863 if (len(p) == 3): 864 # Must have been first rule 865 elif (len(p) == 2): 866 # Must be second rule 867 868 Suggested by Joshua Gerth and others. 869 870Version 1.4 871------------------------------ 87204/23/04: beazley 873 Incorporated a variety of patches contributed by Eric Raymond. 874 These include: 875 876 0. Cleans up some comments so they don't wrap on an 80-column display. 877 1. Directs compiler errors to stderr where they belong. 878 2. Implements and documents automatic line counting when \n is ignored. 879 3. Changes the way progress messages are dumped when debugging is on. 880 The new format is both less verbose and conveys more information than 881 the old, including shift and reduce actions. 882 88304/23/04: beazley 884 Added a Python setup.py file to simply installation. Contributed 885 by Adam Kerrison. 886 88704/23/04: beazley 888 Added patches contributed by Adam Kerrison. 889 890 - Some output is now only shown when debugging is enabled. This 891 means that PLY will be completely silent when not in debugging mode. 892 893 - An optional parameter "write_tables" can be passed to yacc() to 894 control whether or not parsing tables are written. By default, 895 it is true, but it can be turned off if you don't want the yacc 896 table file. Note: disabling this will cause yacc() to regenerate 897 the parsing table each time. 898 89904/23/04: beazley 900 Added patches contributed by David McNab. This patch addes two 901 features: 902 903 - The parser can be supplied as a class instead of a module. 904 For an example of this, see the example/classcalc directory. 905 906 - Debugging output can be directed to a filename of the user's 907 choice. Use 908 909 yacc(debugfile="somefile.out") 910 911 912Version 1.3 913------------------------------ 91412/10/02: jmdyck 915 Various minor adjustments to the code that Dave checked in today. 916 Updated test/yacc_{inf,unused}.exp to reflect today's changes. 917 91812/10/02: beazley 919 Incorporated a variety of minor bug fixes to empty production 920 handling and infinite recursion checking. Contributed by 921 Michael Dyck. 922 92312/10/02: beazley 924 Removed bogus recover() method call in yacc.restart() 925 926Version 1.2 927------------------------------ 92811/27/02: beazley 929 Lexer and parser objects are now available as an attribute 930 of tokens and slices respectively. For example: 931 932 def t_NUMBER(t): 933 r'\d+' 934 print t.lexer 935 936 def p_expr_plus(t): 937 'expr: expr PLUS expr' 938 print t.lexer 939 print t.parser 940 941 This can be used for state management (if needed). 942 94310/31/02: beazley 944 Modified yacc.py to work with Python optimize mode. To make 945 this work, you need to use 946 947 yacc.yacc(optimize=1) 948 949 Furthermore, you need to first run Python in normal mode 950 to generate the necessary parsetab.py files. After that, 951 you can use python -O or python -OO. 952 953 Note: optimized mode turns off a lot of error checking. 954 Only use when you are sure that your grammar is working. 955 Make sure parsetab.py is up to date! 956 95710/30/02: beazley 958 Added cloning of Lexer objects. For example: 959 960 import copy 961 l = lex.lex() 962 lc = copy.copy(l) 963 964 l.input("Some text") 965 lc.input("Some other text") 966 ... 967 968 This might be useful if the same "lexer" is meant to 969 be used in different contexts---or if multiple lexers 970 are running concurrently. 971 97210/30/02: beazley 973 Fixed subtle bug with first set computation and empty productions. 974 Patch submitted by Michael Dyck. 975 97610/30/02: beazley 977 Fixed error messages to use "filename:line: message" instead 978 of "filename:line. message". This makes error reporting more 979 friendly to emacs. Patch submitted by Fran�ois Pinard. 980 98110/30/02: beazley 982 Improvements to parser.out file. Terminals and nonterminals 983 are sorted instead of being printed in random order. 984 Patch submitted by Fran�ois Pinard. 985 98610/30/02: beazley 987 Improvements to parser.out file output. Rules are now printed 988 in a way that's easier to understand. Contributed by Russ Cox. 989 99010/30/02: beazley 991 Added 'nonassoc' associativity support. This can be used 992 to disable the chaining of operators like a < b < c. 993 To use, simply specify 'nonassoc' in the precedence table 994 995 precedence = ( 996 ('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators 997 ('left', 'PLUS', 'MINUS'), 998 ('left', 'TIMES', 'DIVIDE'), 999 ('right', 'UMINUS'), # Unary minus operator 1000 ) 1001 1002 Patch contributed by Russ Cox. 1003 100410/30/02: beazley 1005 Modified the lexer to provide optional support for Python -O and -OO 1006 modes. To make this work, Python *first* needs to be run in 1007 unoptimized mode. This reads the lexing information and creates a 1008 file "lextab.py". Then, run lex like this: 1009 1010 # module foo.py 1011 ... 1012 ... 1013 lex.lex(optimize=1) 1014 1015 Once the lextab file has been created, subsequent calls to 1016 lex.lex() will read data from the lextab file instead of using 1017 introspection. In optimized mode (-O, -OO) everything should 1018 work normally despite the loss of doc strings. 1019 1020 To change the name of the file 'lextab.py' use the following: 1021 1022 lex.lex(lextab="footab") 1023 1024 (this creates a file footab.py) 1025 1026 1027Version 1.1 October 25, 2001 1028------------------------------ 1029 103010/25/01: beazley 1031 Modified the table generator to produce much more compact data. 1032 This should greatly reduce the size of the parsetab.py[c] file. 1033 Caveat: the tables still need to be constructed so a little more 1034 work is done in parsetab on import. 1035 103610/25/01: beazley 1037 There may be a possible bug in the cycle detector that reports errors 1038 about infinite recursion. I'm having a little trouble tracking it 1039 down, but if you get this problem, you can disable the cycle 1040 detector as follows: 1041 1042 yacc.yacc(check_recursion = 0) 1043 104410/25/01: beazley 1045 Fixed a bug in lex.py that sometimes caused illegal characters to be 1046 reported incorrectly. Reported by Sverre J�rgensen. 1047 10487/8/01 : beazley 1049 Added a reference to the underlying lexer object when tokens are handled by 1050 functions. The lexer is available as the 'lexer' attribute. This 1051 was added to provide better lexing support for languages such as Fortran 1052 where certain types of tokens can't be conveniently expressed as regular 1053 expressions (and where the tokenizing function may want to perform a 1054 little backtracking). Suggested by Pearu Peterson. 1055 10566/20/01 : beazley 1057 Modified yacc() function so that an optional starting symbol can be specified. 1058 For example: 1059 1060 yacc.yacc(start="statement") 1061 1062 Normally yacc always treats the first production rule as the starting symbol. 1063 However, if you are debugging your grammar it may be useful to specify 1064 an alternative starting symbol. Idea suggested by Rich Salz. 1065 1066Version 1.0 June 18, 2001 1067-------------------------- 1068Initial public offering 1069 1070