CHANGES revision 6498
16498Snate@binkert.org 26498Snate@binkert.orgVersion 3.2 36498Snate@binkert.org----------------------------- 46498Snate@binkert.org03/24/09: beazley 56498Snate@binkert.org Added an extra check to not print duplicated warning messages 66498Snate@binkert.org about reduce/reduce conflicts. 76498Snate@binkert.org 86498Snate@binkert.org03/24/09: beazley 96498Snate@binkert.org Switched PLY over to a BSD-license. 106498Snate@binkert.org 116498Snate@binkert.org03/23/09: beazley 126498Snate@binkert.org Performance optimization. Discovered a few places to make 136498Snate@binkert.org speedups in LR table generation. 146498Snate@binkert.org 156498Snate@binkert.org03/23/09: beazley 166498Snate@binkert.org New warning message. PLY now warns about rules never 176498Snate@binkert.org reduced due to reduce/reduce conflicts. Suggested by 186498Snate@binkert.org Bruce Frederiksen. 196498Snate@binkert.org 206498Snate@binkert.org03/23/09: beazley 216498Snate@binkert.org Some clean-up of warning messages related to reduce/reduce errors. 226498Snate@binkert.org 236498Snate@binkert.org03/23/09: beazley 246498Snate@binkert.org Added a new picklefile option to yacc() to write the parsing 256498Snate@binkert.org tables to a filename using the pickle module. Here is how 266498Snate@binkert.org it works: 276498Snate@binkert.org 286498Snate@binkert.org yacc(picklefile="parsetab.p") 296498Snate@binkert.org 306498Snate@binkert.org This option can be used if the normal parsetab.py file is 316498Snate@binkert.org extremely large. For example, on jython, it is impossible 326498Snate@binkert.org to read parsing tables if the parsetab.py exceeds a certain 336498Snate@binkert.org threshold. 346498Snate@binkert.org 356498Snate@binkert.org The filename supplied to the picklefile option is opened 366498Snate@binkert.org relative to the current working directory of the Python 376498Snate@binkert.org interpreter. If you need to refer to the file elsewhere, 386498Snate@binkert.org you will need to supply an absolute or relative path. 396498Snate@binkert.org 406498Snate@binkert.org For maximum portability, the pickle file is written 416498Snate@binkert.org using protocol 0. 426498Snate@binkert.org 436498Snate@binkert.org03/13/09: beazley 446498Snate@binkert.org Fixed a bug in parser.out generation where the rule numbers 456498Snate@binkert.org where off by one. 466498Snate@binkert.org 476498Snate@binkert.org03/13/09: beazley 486498Snate@binkert.org Fixed a string formatting bug with one of the error messages. 496498Snate@binkert.org Reported by Richard Reitmeyer 506498Snate@binkert.org 516498Snate@binkert.orgVersion 3.1 526498Snate@binkert.org----------------------------- 536498Snate@binkert.org02/28/09: beazley 546498Snate@binkert.org Fixed broken start argument to yacc(). PLY-3.0 broke this 556498Snate@binkert.org feature by accident. 566498Snate@binkert.org 576498Snate@binkert.org02/28/09: beazley 586498Snate@binkert.org Fixed debugging output. yacc() no longer reports shift/reduce 596498Snate@binkert.org or reduce/reduce conflicts if debugging is turned off. This 606498Snate@binkert.org restores similar behavior in PLY-2.5. Reported by Andrew Waters. 616498Snate@binkert.org 626498Snate@binkert.orgVersion 3.0 636498Snate@binkert.org----------------------------- 646498Snate@binkert.org02/03/09: beazley 656498Snate@binkert.org Fixed missing lexer attribute on certain tokens when 666498Snate@binkert.org invoking the parser p_error() function. Reported by 676498Snate@binkert.org Bart Whiteley. 686498Snate@binkert.org 696498Snate@binkert.org02/02/09: beazley 706498Snate@binkert.org The lex() command now does all error-reporting and diagonistics 716498Snate@binkert.org using the logging module interface. Pass in a Logger object 726498Snate@binkert.org using the errorlog parameter to specify a different logger. 736498Snate@binkert.org 746498Snate@binkert.org02/02/09: beazley 756498Snate@binkert.org Refactored ply.lex to use a more object-oriented and organized 766498Snate@binkert.org approach to collecting lexer information. 776498Snate@binkert.org 786498Snate@binkert.org02/01/09: beazley 796498Snate@binkert.org Removed the nowarn option from lex(). All output is controlled 806498Snate@binkert.org by passing in a logger object. Just pass in a logger with a high 816498Snate@binkert.org level setting to suppress output. This argument was never 826498Snate@binkert.org documented to begin with so hopefully no one was relying upon it. 836498Snate@binkert.org 846498Snate@binkert.org02/01/09: beazley 856498Snate@binkert.org Discovered and removed a dead if-statement in the lexer. This 866498Snate@binkert.org resulted in a 6-7% speedup in lexing when I tested it. 876498Snate@binkert.org 886498Snate@binkert.org01/13/09: beazley 896498Snate@binkert.org Minor change to the procedure for signalling a syntax error in a 906498Snate@binkert.org production rule. A normal SyntaxError exception should be raised 916498Snate@binkert.org instead of yacc.SyntaxError. 926498Snate@binkert.org 936498Snate@binkert.org01/13/09: beazley 946498Snate@binkert.org Added a new method p.set_lineno(n,lineno) that can be used to set the 956498Snate@binkert.org line number of symbol n in grammar rules. This simplifies manual 966498Snate@binkert.org tracking of line numbers. 976498Snate@binkert.org 986498Snate@binkert.org01/11/09: beazley 996498Snate@binkert.org Vastly improved debugging support for yacc.parse(). Instead of passing 1006498Snate@binkert.org debug as an integer, you can supply a Logging object (see the logging 1016498Snate@binkert.org module). Messages will be generated at the ERROR, INFO, and DEBUG 1026498Snate@binkert.org logging levels, each level providing progressively more information. 1036498Snate@binkert.org The debugging trace also shows states, grammar rule, values passed 1046498Snate@binkert.org into grammar rules, and the result of each reduction. 1056498Snate@binkert.org 1066498Snate@binkert.org01/09/09: beazley 1076498Snate@binkert.org The yacc() command now does all error-reporting and diagnostics using 1086498Snate@binkert.org the interface of the logging module. Use the errorlog parameter to 1096498Snate@binkert.org specify a logging object for error messages. Use the debuglog parameter 1106498Snate@binkert.org to specify a logging object for the 'parser.out' output. 1116498Snate@binkert.org 1126498Snate@binkert.org01/09/09: beazley 1136498Snate@binkert.org *HUGE* refactoring of the the ply.yacc() implementation. The high-level 1146498Snate@binkert.org user interface is backwards compatible, but the internals are completely 1156498Snate@binkert.org reorganized into classes. No more global variables. The internals 1166498Snate@binkert.org are also more extensible. For example, you can use the classes to 1176498Snate@binkert.org construct a LALR(1) parser in an entirely different manner than 1186498Snate@binkert.org what is currently the case. Documentation is forthcoming. 1196498Snate@binkert.org 1206498Snate@binkert.org01/07/09: beazley 1216498Snate@binkert.org Various cleanup and refactoring of yacc internals. 1226498Snate@binkert.org 1236498Snate@binkert.org01/06/09: beazley 1246498Snate@binkert.org Fixed a bug with precedence assignment. yacc was assigning the precedence 1256498Snate@binkert.org each rule based on the left-most token, when in fact, it should have been 1266498Snate@binkert.org using the right-most token. Reported by Bruce Frederiksen. 1276498Snate@binkert.org 1286498Snate@binkert.org11/27/08: beazley 1296498Snate@binkert.org Numerous changes to support Python 3.0 including removal of deprecated 1306498Snate@binkert.org statements (e.g., has_key) and the additional of compatibility code 1316498Snate@binkert.org to emulate features from Python 2 that have been removed, but which 1326498Snate@binkert.org are needed. Fixed the unit testing suite to work with Python 3.0. 1336498Snate@binkert.org The code should be backwards compatible with Python 2. 1346498Snate@binkert.org 1356498Snate@binkert.org11/26/08: beazley 1366498Snate@binkert.org Loosened the rules on what kind of objects can be passed in as the 1376498Snate@binkert.org "module" parameter to lex() and yacc(). Previously, you could only use 1386498Snate@binkert.org a module or an instance. Now, PLY just uses dir() to get a list of 1396498Snate@binkert.org symbols on whatever the object is without regard for its type. 1406498Snate@binkert.org 1416498Snate@binkert.org11/26/08: beazley 1426498Snate@binkert.org Changed all except: statements to be compatible with Python2.x/3.x syntax. 1436498Snate@binkert.org 1446498Snate@binkert.org11/26/08: beazley 1456498Snate@binkert.org Changed all raise Exception, value statements to raise Exception(value) for 1466498Snate@binkert.org forward compatibility. 1476498Snate@binkert.org 1486498Snate@binkert.org11/26/08: beazley 1496498Snate@binkert.org Removed all print statements from lex and yacc, using sys.stdout and sys.stderr 1506498Snate@binkert.org directly. Preparation for Python 3.0 support. 1516498Snate@binkert.org 1526498Snate@binkert.org11/04/08: beazley 1536498Snate@binkert.org Fixed a bug with referring to symbols on the the parsing stack using negative 1546498Snate@binkert.org indices. 1556498Snate@binkert.org 1566498Snate@binkert.org05/29/08: beazley 1576498Snate@binkert.org Completely revamped the testing system to use the unittest module for everything. 1586498Snate@binkert.org Added additional tests to cover new errors/warnings. 1596498Snate@binkert.org 1606498Snate@binkert.orgVersion 2.5 1616498Snate@binkert.org----------------------------- 1626498Snate@binkert.org05/28/08: beazley 1636498Snate@binkert.org Fixed a bug with writing lex-tables in optimized mode and start states. 1646498Snate@binkert.org Reported by Kevin Henry. 1656498Snate@binkert.org 1666498Snate@binkert.orgVersion 2.4 1676498Snate@binkert.org----------------------------- 1686498Snate@binkert.org05/04/08: beazley 1696498Snate@binkert.org A version number is now embedded in the table file signature so that 1706498Snate@binkert.org yacc can more gracefully accomodate changes to the output format 1716498Snate@binkert.org in the future. 1726498Snate@binkert.org 1736498Snate@binkert.org05/04/08: beazley 1746498Snate@binkert.org Removed undocumented .pushback() method on grammar productions. I'm 1756498Snate@binkert.org not sure this ever worked and can't recall ever using it. Might have 1766498Snate@binkert.org been an abandoned idea that never really got fleshed out. This 1776498Snate@binkert.org feature was never described or tested so removing it is hopefully 1786498Snate@binkert.org harmless. 1796498Snate@binkert.org 1806498Snate@binkert.org05/04/08: beazley 1816498Snate@binkert.org Added extra error checking to yacc() to detect precedence rules defined 1826498Snate@binkert.org for undefined terminal symbols. This allows yacc() to detect a potential 1836498Snate@binkert.org problem that can be really tricky to debug if no warning message or error 1846498Snate@binkert.org message is generated about it. 1856498Snate@binkert.org 1866498Snate@binkert.org05/04/08: beazley 1876498Snate@binkert.org lex() now has an outputdir that can specify the output directory for 1886498Snate@binkert.org tables when running in optimize mode. For example: 1896498Snate@binkert.org 1906498Snate@binkert.org lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar") 1916498Snate@binkert.org 1926498Snate@binkert.org The behavior of specifying a table module and output directory are 1936498Snate@binkert.org more aligned with the behavior of yacc(). 1946498Snate@binkert.org 1956498Snate@binkert.org05/04/08: beazley 1966498Snate@binkert.org [Issue 9] 1976498Snate@binkert.org Fixed filename bug in when specifying the modulename in lex() and yacc(). 1986498Snate@binkert.org If you specified options such as the following: 1996498Snate@binkert.org 2006498Snate@binkert.org parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar") 2016498Snate@binkert.org 2026498Snate@binkert.org yacc would create a file "foo.bar.parsetab.py" in the given directory. 2036498Snate@binkert.org Now, it simply generates a file "parsetab.py" in that directory. 2046498Snate@binkert.org Bug reported by cptbinho. 2056498Snate@binkert.org 2066498Snate@binkert.org05/04/08: beazley 2076498Snate@binkert.org Slight modification to lex() and yacc() to allow their table files 2086498Snate@binkert.org to be loaded from a previously loaded module. This might make 2096498Snate@binkert.org it easier to load the parsing tables from a complicated package 2106498Snate@binkert.org structure. For example: 2116498Snate@binkert.org 2126498Snate@binkert.org import foo.bar.spam.parsetab as parsetab 2136498Snate@binkert.org parser = yacc.yacc(tabmodule=parsetab) 2146498Snate@binkert.org 2156498Snate@binkert.org Note: lex and yacc will never regenerate the table file if used 2166498Snate@binkert.org in the form---you will get a warning message instead. 2176498Snate@binkert.org This idea suggested by Brian Clapper. 2186498Snate@binkert.org 2196498Snate@binkert.org 2206498Snate@binkert.org04/28/08: beazley 2216498Snate@binkert.org Fixed a big with p_error() functions being picked up correctly 2226498Snate@binkert.org when running in yacc(optimize=1) mode. Patch contributed by 2236498Snate@binkert.org Bart Whiteley. 2246498Snate@binkert.org 2256498Snate@binkert.org02/28/08: beazley 2266498Snate@binkert.org Fixed a bug with 'nonassoc' precedence rules. Basically the 2276498Snate@binkert.org non-precedence was being ignored and not producing the correct 2286498Snate@binkert.org run-time behavior in the parser. 2296498Snate@binkert.org 2306498Snate@binkert.org02/16/08: beazley 2316498Snate@binkert.org Slight relaxation of what the input() method to a lexer will 2326498Snate@binkert.org accept as a string. Instead of testing the input to see 2336498Snate@binkert.org if the input is a string or unicode string, it checks to see 2346498Snate@binkert.org if the input object looks like it contains string data. 2356498Snate@binkert.org This change makes it possible to pass string-like objects 2366498Snate@binkert.org in as input. For example, the object returned by mmap. 2376498Snate@binkert.org 2386498Snate@binkert.org import mmap, os 2396498Snate@binkert.org data = mmap.mmap(os.open(filename,os.O_RDONLY), 2406498Snate@binkert.org os.path.getsize(filename), 2416498Snate@binkert.org access=mmap.ACCESS_READ) 2426498Snate@binkert.org lexer.input(data) 2436498Snate@binkert.org 2446498Snate@binkert.org 2456498Snate@binkert.org11/29/07: beazley 2466498Snate@binkert.org Modification of ply.lex to allow token functions to aliased. 2476498Snate@binkert.org This is subtle, but it makes it easier to create libraries and 2486498Snate@binkert.org to reuse token specifications. For example, suppose you defined 2496498Snate@binkert.org a function like this: 2506498Snate@binkert.org 2516498Snate@binkert.org def number(t): 2526498Snate@binkert.org r'\d+' 2536498Snate@binkert.org t.value = int(t.value) 2546498Snate@binkert.org return t 2556498Snate@binkert.org 2566498Snate@binkert.org This change would allow you to define a token rule as follows: 2576498Snate@binkert.org 2586498Snate@binkert.org t_NUMBER = number 2596498Snate@binkert.org 2606498Snate@binkert.org In this case, the token type will be set to 'NUMBER' and use 2616498Snate@binkert.org the associated number() function to process tokens. 2626498Snate@binkert.org 2636498Snate@binkert.org11/28/07: beazley 2646498Snate@binkert.org Slight modification to lex and yacc to grab symbols from both 2656498Snate@binkert.org the local and global dictionaries of the caller. This 2666498Snate@binkert.org modification allows lexers and parsers to be defined using 2676498Snate@binkert.org inner functions and closures. 2686498Snate@binkert.org 2696498Snate@binkert.org11/28/07: beazley 2706498Snate@binkert.org Performance optimization: The lexer.lexmatch and t.lexer 2716498Snate@binkert.org attributes are no longer set for lexer tokens that are not 2726498Snate@binkert.org defined by functions. The only normal use of these attributes 2736498Snate@binkert.org would be in lexer rules that need to perform some kind of 2746498Snate@binkert.org special processing. Thus, it doesn't make any sense to set 2756498Snate@binkert.org them on every token. 2766498Snate@binkert.org 2776498Snate@binkert.org *** POTENTIAL INCOMPATIBILITY *** This might break code 2786498Snate@binkert.org that is mucking around with internal lexer state in some 2796498Snate@binkert.org sort of magical way. 2806498Snate@binkert.org 2816498Snate@binkert.org11/27/07: beazley 2826498Snate@binkert.org Added the ability to put the parser into error-handling mode 2836498Snate@binkert.org from within a normal production. To do this, simply raise 2846498Snate@binkert.org a yacc.SyntaxError exception like this: 2856498Snate@binkert.org 2866498Snate@binkert.org def p_some_production(p): 2876498Snate@binkert.org 'some_production : prod1 prod2' 2886498Snate@binkert.org ... 2896498Snate@binkert.org raise yacc.SyntaxError # Signal an error 2906498Snate@binkert.org 2916498Snate@binkert.org A number of things happen after this occurs: 2926498Snate@binkert.org 2936498Snate@binkert.org - The last symbol shifted onto the symbol stack is discarded 2946498Snate@binkert.org and parser state backed up to what it was before the 2956498Snate@binkert.org the rule reduction. 2966498Snate@binkert.org 2976498Snate@binkert.org - The current lookahead symbol is saved and replaced by 2986498Snate@binkert.org the 'error' symbol. 2996498Snate@binkert.org 3006498Snate@binkert.org - The parser enters error recovery mode where it tries 3016498Snate@binkert.org to either reduce the 'error' rule or it starts 3026498Snate@binkert.org discarding items off of the stack until the parser 3036498Snate@binkert.org resets. 3046498Snate@binkert.org 3056498Snate@binkert.org When an error is manually set, the parser does *not* call 3066498Snate@binkert.org the p_error() function (if any is defined). 3076498Snate@binkert.org *** NEW FEATURE *** Suggested on the mailing list 3086498Snate@binkert.org 3096498Snate@binkert.org11/27/07: beazley 3106498Snate@binkert.org Fixed structure bug in examples/ansic. Reported by Dion Blazakis. 3116498Snate@binkert.org 3126498Snate@binkert.org11/27/07: beazley 3136498Snate@binkert.org Fixed a bug in the lexer related to start conditions and ignored 3146498Snate@binkert.org token rules. If a rule was defined that changed state, but 3156498Snate@binkert.org returned no token, the lexer could be left in an inconsistent 3166498Snate@binkert.org state. Reported by 3176498Snate@binkert.org 3186498Snate@binkert.org11/27/07: beazley 3196498Snate@binkert.org Modified setup.py to support Python Eggs. Patch contributed by 3206498Snate@binkert.org Simon Cross. 3216498Snate@binkert.org 3226498Snate@binkert.org11/09/07: beazely 3236498Snate@binkert.org Fixed a bug in error handling in yacc. If a syntax error occurred and the 3246498Snate@binkert.org parser rolled the entire parse stack back, the parser would be left in in 3256498Snate@binkert.org inconsistent state that would cause it to trigger incorrect actions on 3266498Snate@binkert.org subsequent input. Reported by Ton Biegstraaten, Justin King, and others. 3276498Snate@binkert.org 3286498Snate@binkert.org11/09/07: beazley 3296498Snate@binkert.org Fixed a bug when passing empty input strings to yacc.parse(). This 3306498Snate@binkert.org would result in an error message about "No input given". Reported 3316498Snate@binkert.org by Andrew Dalke. 3326498Snate@binkert.org 3334479Sbinkertn@umich.eduVersion 2.3 3344479Sbinkertn@umich.edu----------------------------- 3354479Sbinkertn@umich.edu02/20/07: beazley 3364479Sbinkertn@umich.edu Fixed a bug with character literals if the literal '.' appeared as the 3374479Sbinkertn@umich.edu last symbol of a grammar rule. Reported by Ales Smrcka. 3384479Sbinkertn@umich.edu 3394479Sbinkertn@umich.edu02/19/07: beazley 3404479Sbinkertn@umich.edu Warning messages are now redirected to stderr instead of being printed 3414479Sbinkertn@umich.edu to standard output. 3424479Sbinkertn@umich.edu 3434479Sbinkertn@umich.edu02/19/07: beazley 3444479Sbinkertn@umich.edu Added a warning message to lex.py if it detects a literal backslash 3454479Sbinkertn@umich.edu character inside the t_ignore declaration. This is to help 3464479Sbinkertn@umich.edu problems that might occur if someone accidentally defines t_ignore 3474479Sbinkertn@umich.edu as a Python raw string. For example: 3484479Sbinkertn@umich.edu 3494479Sbinkertn@umich.edu t_ignore = r' \t' 3504479Sbinkertn@umich.edu 3514479Sbinkertn@umich.edu The idea for this is from an email I received from David Cimimi who 3524479Sbinkertn@umich.edu reported bizarre behavior in lexing as a result of defining t_ignore 3534479Sbinkertn@umich.edu as a raw string by accident. 3544479Sbinkertn@umich.edu 3554479Sbinkertn@umich.edu02/18/07: beazley 3564479Sbinkertn@umich.edu Performance improvements. Made some changes to the internal 3574479Sbinkertn@umich.edu table organization and LR parser to improve parsing performance. 3584479Sbinkertn@umich.edu 3594479Sbinkertn@umich.edu02/18/07: beazley 3604479Sbinkertn@umich.edu Automatic tracking of line number and position information must now be 3614479Sbinkertn@umich.edu enabled by a special flag to parse(). For example: 3624479Sbinkertn@umich.edu 3634479Sbinkertn@umich.edu yacc.parse(data,tracking=True) 3644479Sbinkertn@umich.edu 3654479Sbinkertn@umich.edu In many applications, it's just not that important to have the 3664479Sbinkertn@umich.edu parser automatically track all line numbers. By making this an 3674479Sbinkertn@umich.edu optional feature, it allows the parser to run significantly faster 3684479Sbinkertn@umich.edu (more than a 20% speed increase in many cases). Note: positional 3694479Sbinkertn@umich.edu information is always available for raw tokens---this change only 3704479Sbinkertn@umich.edu applies to positional information associated with nonterminal 3714479Sbinkertn@umich.edu grammar symbols. 3724479Sbinkertn@umich.edu *** POTENTIAL INCOMPATIBILITY *** 3734479Sbinkertn@umich.edu 3744479Sbinkertn@umich.edu02/18/07: beazley 3754479Sbinkertn@umich.edu Yacc no longer supports extended slices of grammar productions. 3764479Sbinkertn@umich.edu However, it does support regular slices. For example: 3774479Sbinkertn@umich.edu 3784479Sbinkertn@umich.edu def p_foo(p): 3794479Sbinkertn@umich.edu '''foo: a b c d e''' 3804479Sbinkertn@umich.edu p[0] = p[1:3] 3814479Sbinkertn@umich.edu 3824479Sbinkertn@umich.edu This change is a performance improvement to the parser--it streamlines 3834479Sbinkertn@umich.edu normal access to the grammar values since slices are now handled in 3844479Sbinkertn@umich.edu a __getslice__() method as opposed to __getitem__(). 3854479Sbinkertn@umich.edu 3864479Sbinkertn@umich.edu02/12/07: beazley 3874479Sbinkertn@umich.edu Fixed a bug in the handling of token names when combined with 3884479Sbinkertn@umich.edu start conditions. Bug reported by Todd O'Bryan. 3894479Sbinkertn@umich.edu 3904479Sbinkertn@umich.eduVersion 2.2 3914479Sbinkertn@umich.edu------------------------------ 3924479Sbinkertn@umich.edu11/01/06: beazley 3934479Sbinkertn@umich.edu Added lexpos() and lexspan() methods to grammar symbols. These 3944479Sbinkertn@umich.edu mirror the same functionality of lineno() and linespan(). For 3954479Sbinkertn@umich.edu example: 3964479Sbinkertn@umich.edu 3974479Sbinkertn@umich.edu def p_expr(p): 3984479Sbinkertn@umich.edu 'expr : expr PLUS expr' 3994479Sbinkertn@umich.edu p.lexpos(1) # Lexing position of left-hand-expression 4004479Sbinkertn@umich.edu p.lexpos(1) # Lexing position of PLUS 4014479Sbinkertn@umich.edu start,end = p.lexspan(3) # Lexing range of right hand expression 4024479Sbinkertn@umich.edu 4034479Sbinkertn@umich.edu11/01/06: beazley 4044479Sbinkertn@umich.edu Minor change to error handling. The recommended way to skip characters 4054479Sbinkertn@umich.edu in the input is to use t.lexer.skip() as shown here: 4064479Sbinkertn@umich.edu 4074479Sbinkertn@umich.edu def t_error(t): 4084479Sbinkertn@umich.edu print "Illegal character '%s'" % t.value[0] 4094479Sbinkertn@umich.edu t.lexer.skip(1) 4104479Sbinkertn@umich.edu 4114479Sbinkertn@umich.edu The old approach of just using t.skip(1) will still work, but won't 4124479Sbinkertn@umich.edu be documented. 4134479Sbinkertn@umich.edu 4144479Sbinkertn@umich.edu10/31/06: beazley 4154479Sbinkertn@umich.edu Discarded tokens can now be specified as simple strings instead of 4164479Sbinkertn@umich.edu functions. To do this, simply include the text "ignore_" in the 4174479Sbinkertn@umich.edu token declaration. For example: 4184479Sbinkertn@umich.edu 4194479Sbinkertn@umich.edu t_ignore_cppcomment = r'//.*' 4204479Sbinkertn@umich.edu 4214479Sbinkertn@umich.edu Previously, this had to be done with a function. For example: 4224479Sbinkertn@umich.edu 4234479Sbinkertn@umich.edu def t_ignore_cppcomment(t): 4244479Sbinkertn@umich.edu r'//.*' 4254479Sbinkertn@umich.edu pass 4264479Sbinkertn@umich.edu 4274479Sbinkertn@umich.edu If start conditions/states are being used, state names should appear 4284479Sbinkertn@umich.edu before the "ignore_" text. 4294479Sbinkertn@umich.edu 4304479Sbinkertn@umich.edu10/19/06: beazley 4314479Sbinkertn@umich.edu The Lex module now provides support for flex-style start conditions 4324479Sbinkertn@umich.edu as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html. 4334479Sbinkertn@umich.edu Please refer to this document to understand this change note. Refer to 4344479Sbinkertn@umich.edu the PLY documentation for PLY-specific explanation of how this works. 4354479Sbinkertn@umich.edu 4364479Sbinkertn@umich.edu To use start conditions, you first need to declare a set of states in 4374479Sbinkertn@umich.edu your lexer file: 4384479Sbinkertn@umich.edu 4394479Sbinkertn@umich.edu states = ( 4404479Sbinkertn@umich.edu ('foo','exclusive'), 4414479Sbinkertn@umich.edu ('bar','inclusive') 4424479Sbinkertn@umich.edu ) 4434479Sbinkertn@umich.edu 4444479Sbinkertn@umich.edu This serves the same role as the %s and %x specifiers in flex. 4454479Sbinkertn@umich.edu 4464479Sbinkertn@umich.edu One a state has been declared, tokens for that state can be 4474479Sbinkertn@umich.edu declared by defining rules of the form t_state_TOK. For example: 4484479Sbinkertn@umich.edu 4494479Sbinkertn@umich.edu t_PLUS = '\+' # Rule defined in INITIAL state 4504479Sbinkertn@umich.edu t_foo_NUM = '\d+' # Rule defined in foo state 4514479Sbinkertn@umich.edu t_bar_NUM = '\d+' # Rule defined in bar state 4524479Sbinkertn@umich.edu 4534479Sbinkertn@umich.edu t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar 4544479Sbinkertn@umich.edu t_ANY_NUM = '\d+' # Rule defined in all states 4554479Sbinkertn@umich.edu 4564479Sbinkertn@umich.edu In addition to defining tokens for each state, the t_ignore and t_error 4574479Sbinkertn@umich.edu specifications can be customized for specific states. For example: 4584479Sbinkertn@umich.edu 4594479Sbinkertn@umich.edu t_foo_ignore = " " # Ignored characters for foo state 4604479Sbinkertn@umich.edu def t_bar_error(t): 4614479Sbinkertn@umich.edu # Handle errors in bar state 4624479Sbinkertn@umich.edu 4634479Sbinkertn@umich.edu With token rules, the following methods can be used to change states 4644479Sbinkertn@umich.edu 4654479Sbinkertn@umich.edu def t_TOKNAME(t): 4664479Sbinkertn@umich.edu t.lexer.begin('foo') # Begin state 'foo' 4674479Sbinkertn@umich.edu t.lexer.push_state('foo') # Begin state 'foo', push old state 4684479Sbinkertn@umich.edu # onto a stack 4694479Sbinkertn@umich.edu t.lexer.pop_state() # Restore previous state 4704479Sbinkertn@umich.edu t.lexer.current_state() # Returns name of current state 4714479Sbinkertn@umich.edu 4724479Sbinkertn@umich.edu These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and 4734479Sbinkertn@umich.edu yy_top_state() functions in flex. 4744479Sbinkertn@umich.edu 4754479Sbinkertn@umich.edu The use of start states can be used as one way to write sub-lexers. 4764479Sbinkertn@umich.edu For example, the lexer or parser might instruct the lexer to start 4774479Sbinkertn@umich.edu generating a different set of tokens depending on the context. 4784479Sbinkertn@umich.edu 4794479Sbinkertn@umich.edu example/yply/ylex.py shows the use of start states to grab C/C++ 4804479Sbinkertn@umich.edu code fragments out of traditional yacc specification files. 4814479Sbinkertn@umich.edu 4824479Sbinkertn@umich.edu *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also 4834479Sbinkertn@umich.edu discussed various aspects of the design. 4844479Sbinkertn@umich.edu 4854479Sbinkertn@umich.edu10/19/06: beazley 4864479Sbinkertn@umich.edu Minor change to the way in which yacc.py was reporting shift/reduce 4874479Sbinkertn@umich.edu conflicts. Although the underlying LALR(1) algorithm was correct, 4884479Sbinkertn@umich.edu PLY was under-reporting the number of conflicts compared to yacc/bison 4894479Sbinkertn@umich.edu when precedence rules were in effect. This change should make PLY 4904479Sbinkertn@umich.edu report the same number of conflicts as yacc. 4914479Sbinkertn@umich.edu 4924479Sbinkertn@umich.edu10/19/06: beazley 4934479Sbinkertn@umich.edu Modified yacc so that grammar rules could also include the '-' 4944479Sbinkertn@umich.edu character. For example: 4954479Sbinkertn@umich.edu 4964479Sbinkertn@umich.edu def p_expr_list(p): 4974479Sbinkertn@umich.edu 'expression-list : expression-list expression' 4984479Sbinkertn@umich.edu 4994479Sbinkertn@umich.edu Suggested by Oldrich Jedlicka. 5004479Sbinkertn@umich.edu 5014479Sbinkertn@umich.edu10/18/06: beazley 5024479Sbinkertn@umich.edu Attribute lexer.lexmatch added so that token rules can access the re 5034479Sbinkertn@umich.edu match object that was generated. For example: 5044479Sbinkertn@umich.edu 5054479Sbinkertn@umich.edu def t_FOO(t): 5064479Sbinkertn@umich.edu r'some regex' 5074479Sbinkertn@umich.edu m = t.lexer.lexmatch 5084479Sbinkertn@umich.edu # Do something with m 5094479Sbinkertn@umich.edu 5104479Sbinkertn@umich.edu 5114479Sbinkertn@umich.edu This may be useful if you want to access named groups specified within 5124479Sbinkertn@umich.edu the regex for a specific token. Suggested by Oldrich Jedlicka. 5134479Sbinkertn@umich.edu 5144479Sbinkertn@umich.edu10/16/06: beazley 5154479Sbinkertn@umich.edu Changed the error message that results if an illegal character 5164479Sbinkertn@umich.edu is encountered and no default error function is defined in lex. 5174479Sbinkertn@umich.edu The exception is now more informative about the actual cause of 5184479Sbinkertn@umich.edu the error. 5194479Sbinkertn@umich.edu 5204479Sbinkertn@umich.eduVersion 2.1 5214479Sbinkertn@umich.edu------------------------------ 5224479Sbinkertn@umich.edu10/02/06: beazley 5234479Sbinkertn@umich.edu The last Lexer object built by lex() can be found in lex.lexer. 5244479Sbinkertn@umich.edu The last Parser object built by yacc() can be found in yacc.parser. 5254479Sbinkertn@umich.edu 5264479Sbinkertn@umich.edu10/02/06: beazley 5274479Sbinkertn@umich.edu New example added: examples/yply 5284479Sbinkertn@umich.edu 5294479Sbinkertn@umich.edu This example uses PLY to convert Unix-yacc specification files to 5304479Sbinkertn@umich.edu PLY programs with the same grammar. This may be useful if you 5314479Sbinkertn@umich.edu want to convert a grammar from bison/yacc to use with PLY. 5324479Sbinkertn@umich.edu 5334479Sbinkertn@umich.edu10/02/06: beazley 5344479Sbinkertn@umich.edu Added support for a start symbol to be specified in the yacc 5354479Sbinkertn@umich.edu input file itself. Just do this: 5364479Sbinkertn@umich.edu 5374479Sbinkertn@umich.edu start = 'name' 5384479Sbinkertn@umich.edu 5394479Sbinkertn@umich.edu where 'name' matches some grammar rule. For example: 5404479Sbinkertn@umich.edu 5414479Sbinkertn@umich.edu def p_name(p): 5424479Sbinkertn@umich.edu 'name : A B C' 5434479Sbinkertn@umich.edu ... 5444479Sbinkertn@umich.edu 5454479Sbinkertn@umich.edu This mirrors the functionality of the yacc %start specifier. 5464479Sbinkertn@umich.edu 5474479Sbinkertn@umich.edu09/30/06: beazley 5484479Sbinkertn@umich.edu Some new examples added.: 5494479Sbinkertn@umich.edu 5504479Sbinkertn@umich.edu examples/GardenSnake : A simple indentation based language similar 5514479Sbinkertn@umich.edu to Python. Shows how you might handle 5524479Sbinkertn@umich.edu whitespace. Contributed by Andrew Dalke. 5534479Sbinkertn@umich.edu 5544479Sbinkertn@umich.edu examples/BASIC : An implementation of 1964 Dartmouth BASIC. 5554479Sbinkertn@umich.edu Contributed by Dave against his better 5564479Sbinkertn@umich.edu judgement. 5574479Sbinkertn@umich.edu 5584479Sbinkertn@umich.edu09/28/06: beazley 5594479Sbinkertn@umich.edu Minor patch to allow named groups to be used in lex regular 5604479Sbinkertn@umich.edu expression rules. For example: 5614479Sbinkertn@umich.edu 5624479Sbinkertn@umich.edu t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)''' 5634479Sbinkertn@umich.edu 5644479Sbinkertn@umich.edu Patch submitted by Adam Ring. 5654479Sbinkertn@umich.edu 5664479Sbinkertn@umich.edu09/28/06: beazley 5674479Sbinkertn@umich.edu LALR(1) is now the default parsing method. To use SLR, use 5684479Sbinkertn@umich.edu yacc.yacc(method="SLR"). Note: there is no performance impact 5694479Sbinkertn@umich.edu on parsing when using LALR(1) instead of SLR. However, constructing 5704479Sbinkertn@umich.edu the parsing tables will take a little longer. 5714479Sbinkertn@umich.edu 5724479Sbinkertn@umich.edu09/26/06: beazley 5734479Sbinkertn@umich.edu Change to line number tracking. To modify line numbers, modify 5744479Sbinkertn@umich.edu the line number of the lexer itself. For example: 5754479Sbinkertn@umich.edu 5764479Sbinkertn@umich.edu def t_NEWLINE(t): 5774479Sbinkertn@umich.edu r'\n' 5784479Sbinkertn@umich.edu t.lexer.lineno += 1 5794479Sbinkertn@umich.edu 5804479Sbinkertn@umich.edu This modification is both cleanup and a performance optimization. 5814479Sbinkertn@umich.edu In past versions, lex was monitoring every token for changes in 5824479Sbinkertn@umich.edu the line number. This extra processing is unnecessary for a vast 5834479Sbinkertn@umich.edu majority of tokens. Thus, this new approach cleans it up a bit. 5844479Sbinkertn@umich.edu 5854479Sbinkertn@umich.edu *** POTENTIAL INCOMPATIBILITY *** 5864479Sbinkertn@umich.edu You will need to change code in your lexer that updates the line 5874479Sbinkertn@umich.edu number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1" 5884479Sbinkertn@umich.edu 5894479Sbinkertn@umich.edu09/26/06: beazley 5904479Sbinkertn@umich.edu Added the lexing position to tokens as an attribute lexpos. This 5914479Sbinkertn@umich.edu is the raw index into the input text at which a token appears. 5924479Sbinkertn@umich.edu This information can be used to compute column numbers and other 5934479Sbinkertn@umich.edu details (e.g., scan backwards from lexpos to the first newline 5944479Sbinkertn@umich.edu to get a column position). 5954479Sbinkertn@umich.edu 5964479Sbinkertn@umich.edu09/25/06: beazley 5974479Sbinkertn@umich.edu Changed the name of the __copy__() method on the Lexer class 5984479Sbinkertn@umich.edu to clone(). This is used to clone a Lexer object (e.g., if 5994479Sbinkertn@umich.edu you're running different lexers at the same time). 6004479Sbinkertn@umich.edu 6014479Sbinkertn@umich.edu09/21/06: beazley 6024479Sbinkertn@umich.edu Limitations related to the use of the re module have been eliminated. 6034479Sbinkertn@umich.edu Several users reported problems with regular expressions exceeding 6044479Sbinkertn@umich.edu more than 100 named groups. To solve this, lex.py is now capable 6054479Sbinkertn@umich.edu of automatically splitting its master regular regular expression into 6064479Sbinkertn@umich.edu smaller expressions as needed. This should, in theory, make it 6074479Sbinkertn@umich.edu possible to specify an arbitrarily large number of tokens. 6084479Sbinkertn@umich.edu 6094479Sbinkertn@umich.edu09/21/06: beazley 6104479Sbinkertn@umich.edu Improved error checking in lex.py. Rules that match the empty string 6114479Sbinkertn@umich.edu are now rejected (otherwise they cause the lexer to enter an infinite 6124479Sbinkertn@umich.edu loop). An extra check for rules containing '#' has also been added. 6134479Sbinkertn@umich.edu Since lex compiles regular expressions in verbose mode, '#' is interpreted 6144479Sbinkertn@umich.edu as a regex comment, it is critical to use '\#' instead. 6154479Sbinkertn@umich.edu 6164479Sbinkertn@umich.edu09/18/06: beazley 6174479Sbinkertn@umich.edu Added a @TOKEN decorator function to lex.py that can be used to 6184479Sbinkertn@umich.edu define token rules where the documentation string might be computed 6194479Sbinkertn@umich.edu in some way. 6204479Sbinkertn@umich.edu 6214479Sbinkertn@umich.edu digit = r'([0-9])' 6224479Sbinkertn@umich.edu nondigit = r'([_A-Za-z])' 6234479Sbinkertn@umich.edu identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)' 6244479Sbinkertn@umich.edu 6254479Sbinkertn@umich.edu from ply.lex import TOKEN 6264479Sbinkertn@umich.edu 6274479Sbinkertn@umich.edu @TOKEN(identifier) 6284479Sbinkertn@umich.edu def t_ID(t): 6294479Sbinkertn@umich.edu # Do whatever 6304479Sbinkertn@umich.edu 6314479Sbinkertn@umich.edu The @TOKEN decorator merely sets the documentation string of the 6324479Sbinkertn@umich.edu associated token function as needed for lex to work. 6334479Sbinkertn@umich.edu 6344479Sbinkertn@umich.edu Note: An alternative solution is the following: 6354479Sbinkertn@umich.edu 6364479Sbinkertn@umich.edu def t_ID(t): 6374479Sbinkertn@umich.edu # Do whatever 6384479Sbinkertn@umich.edu 6394479Sbinkertn@umich.edu t_ID.__doc__ = identifier 6404479Sbinkertn@umich.edu 6414479Sbinkertn@umich.edu Note: Decorators require the use of Python 2.4 or later. If compatibility 6424479Sbinkertn@umich.edu with old versions is needed, use the latter solution. 6434479Sbinkertn@umich.edu 6444479Sbinkertn@umich.edu The need for this feature was suggested by Cem Karan. 6454479Sbinkertn@umich.edu 6464479Sbinkertn@umich.edu09/14/06: beazley 6474479Sbinkertn@umich.edu Support for single-character literal tokens has been added to yacc. 6484479Sbinkertn@umich.edu These literals must be enclosed in quotes. For example: 6494479Sbinkertn@umich.edu 6504479Sbinkertn@umich.edu def p_expr(p): 6514479Sbinkertn@umich.edu "expr : expr '+' expr" 6524479Sbinkertn@umich.edu ... 6534479Sbinkertn@umich.edu 6544479Sbinkertn@umich.edu def p_expr(p): 6554479Sbinkertn@umich.edu 'expr : expr "-" expr' 6564479Sbinkertn@umich.edu ... 6574479Sbinkertn@umich.edu 6584479Sbinkertn@umich.edu In addition to this, it is necessary to tell the lexer module about 6594479Sbinkertn@umich.edu literal characters. This is done by defining the variable 'literals' 6604479Sbinkertn@umich.edu as a list of characters. This should be defined in the module that 6614479Sbinkertn@umich.edu invokes the lex.lex() function. For example: 6624479Sbinkertn@umich.edu 6634479Sbinkertn@umich.edu literals = ['+','-','*','/','(',')','='] 6644479Sbinkertn@umich.edu 6654479Sbinkertn@umich.edu or simply 6664479Sbinkertn@umich.edu 6674479Sbinkertn@umich.edu literals = '+=*/()=' 6684479Sbinkertn@umich.edu 6694479Sbinkertn@umich.edu It is important to note that literals can only be a single character. 6704479Sbinkertn@umich.edu When the lexer fails to match a token using its normal regular expression 6714479Sbinkertn@umich.edu rules, it will check the current character against the literal list. 6724479Sbinkertn@umich.edu If found, it will be returned with a token type set to match the literal 6734479Sbinkertn@umich.edu character. Otherwise, an illegal character will be signalled. 6744479Sbinkertn@umich.edu 6754479Sbinkertn@umich.edu 6764479Sbinkertn@umich.edu09/14/06: beazley 6774479Sbinkertn@umich.edu Modified PLY to install itself as a proper Python package called 'ply'. 6784479Sbinkertn@umich.edu This will make it a little more friendly to other modules. This 6794479Sbinkertn@umich.edu changes the usage of PLY only slightly. Just do this to import the 6804479Sbinkertn@umich.edu modules 6814479Sbinkertn@umich.edu 6824479Sbinkertn@umich.edu import ply.lex as lex 6834479Sbinkertn@umich.edu import ply.yacc as yacc 6844479Sbinkertn@umich.edu 6854479Sbinkertn@umich.edu Alternatively, you can do this: 6864479Sbinkertn@umich.edu 6874479Sbinkertn@umich.edu from ply import * 6884479Sbinkertn@umich.edu 6894479Sbinkertn@umich.edu Which imports both the lex and yacc modules. 6904479Sbinkertn@umich.edu Change suggested by Lee June. 6914479Sbinkertn@umich.edu 6924479Sbinkertn@umich.edu09/13/06: beazley 6934479Sbinkertn@umich.edu Changed the handling of negative indices when used in production rules. 6944479Sbinkertn@umich.edu A negative production index now accesses already parsed symbols on the 6954479Sbinkertn@umich.edu parsing stack. For example, 6964479Sbinkertn@umich.edu 6974479Sbinkertn@umich.edu def p_foo(p): 6984479Sbinkertn@umich.edu "foo: A B C D" 6994479Sbinkertn@umich.edu print p[1] # Value of 'A' symbol 7004479Sbinkertn@umich.edu print p[2] # Value of 'B' symbol 7014479Sbinkertn@umich.edu print p[-1] # Value of whatever symbol appears before A 7024479Sbinkertn@umich.edu # on the parsing stack. 7034479Sbinkertn@umich.edu 7044479Sbinkertn@umich.edu p[0] = some_val # Sets the value of the 'foo' grammer symbol 7054479Sbinkertn@umich.edu 7064479Sbinkertn@umich.edu This behavior makes it easier to work with embedded actions within the 7074479Sbinkertn@umich.edu parsing rules. For example, in C-yacc, it is possible to write code like 7084479Sbinkertn@umich.edu this: 7094479Sbinkertn@umich.edu 7104479Sbinkertn@umich.edu bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; } 7114479Sbinkertn@umich.edu 7124479Sbinkertn@umich.edu In this example, the printf() code executes immediately after A has been 7134479Sbinkertn@umich.edu parsed. Within the embedded action code, $1 refers to the A symbol on 7144479Sbinkertn@umich.edu the stack. 7154479Sbinkertn@umich.edu 7164479Sbinkertn@umich.edu To perform this equivalent action in PLY, you need to write a pair 7174479Sbinkertn@umich.edu of rules like this: 7184479Sbinkertn@umich.edu 7194479Sbinkertn@umich.edu def p_bar(p): 7204479Sbinkertn@umich.edu "bar : A seen_A B" 7214479Sbinkertn@umich.edu do_stuff 7224479Sbinkertn@umich.edu 7234479Sbinkertn@umich.edu def p_seen_A(p): 7244479Sbinkertn@umich.edu "seen_A :" 7254479Sbinkertn@umich.edu print "seen an A =", p[-1] 7264479Sbinkertn@umich.edu 7274479Sbinkertn@umich.edu The second rule "seen_A" is merely a empty production which should be 7284479Sbinkertn@umich.edu reduced as soon as A is parsed in the "bar" rule above. The use 7294479Sbinkertn@umich.edu of the negative index p[-1] is used to access whatever symbol appeared 7304479Sbinkertn@umich.edu before the seen_A symbol. 7314479Sbinkertn@umich.edu 7324479Sbinkertn@umich.edu This feature also makes it possible to support inherited attributes. 7334479Sbinkertn@umich.edu For example: 7344479Sbinkertn@umich.edu 7354479Sbinkertn@umich.edu def p_decl(p): 7364479Sbinkertn@umich.edu "decl : scope name" 7374479Sbinkertn@umich.edu 7384479Sbinkertn@umich.edu def p_scope(p): 7394479Sbinkertn@umich.edu """scope : GLOBAL 7404479Sbinkertn@umich.edu | LOCAL""" 7414479Sbinkertn@umich.edu p[0] = p[1] 7424479Sbinkertn@umich.edu 7434479Sbinkertn@umich.edu def p_name(p): 7444479Sbinkertn@umich.edu "name : ID" 7454479Sbinkertn@umich.edu if p[-1] == "GLOBAL": 7464479Sbinkertn@umich.edu # ... 7474479Sbinkertn@umich.edu else if p[-1] == "LOCAL": 7484479Sbinkertn@umich.edu #... 7494479Sbinkertn@umich.edu 7504479Sbinkertn@umich.edu In this case, the name rule is inheriting an attribute from the 7514479Sbinkertn@umich.edu scope declaration that precedes it. 7524479Sbinkertn@umich.edu 7534479Sbinkertn@umich.edu *** POTENTIAL INCOMPATIBILITY *** 7544479Sbinkertn@umich.edu If you are currently using negative indices within existing grammar rules, 7554479Sbinkertn@umich.edu your code will break. This should be extremely rare if non-existent in 7564479Sbinkertn@umich.edu most cases. The argument to various grammar rules is not usually not 7574479Sbinkertn@umich.edu processed in the same way as a list of items. 7584479Sbinkertn@umich.edu 7594479Sbinkertn@umich.eduVersion 2.0 7604479Sbinkertn@umich.edu------------------------------ 7614479Sbinkertn@umich.edu09/07/06: beazley 7624479Sbinkertn@umich.edu Major cleanup and refactoring of the LR table generation code. Both SLR 7634479Sbinkertn@umich.edu and LALR(1) table generation is now performed by the same code base with 7644479Sbinkertn@umich.edu only minor extensions for extra LALR(1) processing. 7654479Sbinkertn@umich.edu 7664479Sbinkertn@umich.edu09/07/06: beazley 7674479Sbinkertn@umich.edu Completely reimplemented the entire LALR(1) parsing engine to use the 7684479Sbinkertn@umich.edu DeRemer and Pennello algorithm for calculating lookahead sets. This 7694479Sbinkertn@umich.edu significantly improves the performance of generating LALR(1) tables 7704479Sbinkertn@umich.edu and has the added feature of actually working correctly! If you 7714479Sbinkertn@umich.edu experienced weird behavior with LALR(1) in prior releases, this should 7724479Sbinkertn@umich.edu hopefully resolve all of those problems. Many thanks to 7734479Sbinkertn@umich.edu Andrew Waters and Markus Schoepflin for submitting bug reports 7744479Sbinkertn@umich.edu and helping me test out the revised LALR(1) support. 7754479Sbinkertn@umich.edu 7764479Sbinkertn@umich.eduVersion 1.8 7774479Sbinkertn@umich.edu------------------------------ 7784479Sbinkertn@umich.edu08/02/06: beazley 7794479Sbinkertn@umich.edu Fixed a problem related to the handling of default actions in LALR(1) 7804479Sbinkertn@umich.edu parsing. If you experienced subtle and/or bizarre behavior when trying 7814479Sbinkertn@umich.edu to use the LALR(1) engine, this may correct those problems. Patch 7824479Sbinkertn@umich.edu contributed by Russ Cox. Note: This patch has been superceded by 7834479Sbinkertn@umich.edu revisions for LALR(1) parsing in Ply-2.0. 7844479Sbinkertn@umich.edu 7854479Sbinkertn@umich.edu08/02/06: beazley 7864479Sbinkertn@umich.edu Added support for slicing of productions in yacc. 7874479Sbinkertn@umich.edu Patch contributed by Patrick Mezard. 7884479Sbinkertn@umich.edu 7894479Sbinkertn@umich.eduVersion 1.7 7904479Sbinkertn@umich.edu------------------------------ 7914479Sbinkertn@umich.edu03/02/06: beazley 7924479Sbinkertn@umich.edu Fixed infinite recursion problem ReduceToTerminals() function that 7934479Sbinkertn@umich.edu would sometimes come up in LALR(1) table generation. Reported by 7944479Sbinkertn@umich.edu Markus Schoepflin. 7954479Sbinkertn@umich.edu 7964479Sbinkertn@umich.edu03/01/06: beazley 7974479Sbinkertn@umich.edu Added "reflags" argument to lex(). For example: 7984479Sbinkertn@umich.edu 7994479Sbinkertn@umich.edu lex.lex(reflags=re.UNICODE) 8004479Sbinkertn@umich.edu 8014479Sbinkertn@umich.edu This can be used to specify optional flags to the re.compile() function 8024479Sbinkertn@umich.edu used inside the lexer. This may be necessary for special situations such 8034479Sbinkertn@umich.edu as processing Unicode (e.g., if you want escapes like \w and \b to consult 8044479Sbinkertn@umich.edu the Unicode character property database). The need for this suggested by 8054479Sbinkertn@umich.edu Andreas Jung. 8064479Sbinkertn@umich.edu 8074479Sbinkertn@umich.edu03/01/06: beazley 8084479Sbinkertn@umich.edu Fixed a bug with an uninitialized variable on repeated instantiations of parser 8094479Sbinkertn@umich.edu objects when the write_tables=0 argument was used. Reported by Michael Brown. 8104479Sbinkertn@umich.edu 8114479Sbinkertn@umich.edu03/01/06: beazley 8124479Sbinkertn@umich.edu Modified lex.py to accept Unicode strings both as the regular expressions for 8134479Sbinkertn@umich.edu tokens and as input. Hopefully this is the only change needed for Unicode support. 8144479Sbinkertn@umich.edu Patch contributed by Johan Dahl. 8154479Sbinkertn@umich.edu 8164479Sbinkertn@umich.edu03/01/06: beazley 8174479Sbinkertn@umich.edu Modified the class-based interface to work with new-style or old-style classes. 8184479Sbinkertn@umich.edu Patch contributed by Michael Brown (although I tweaked it slightly so it would work 8194479Sbinkertn@umich.edu with older versions of Python). 8204479Sbinkertn@umich.edu 8214479Sbinkertn@umich.eduVersion 1.6 8224479Sbinkertn@umich.edu------------------------------ 8234479Sbinkertn@umich.edu05/27/05: beazley 8244479Sbinkertn@umich.edu Incorporated patch contributed by Christopher Stawarz to fix an extremely 8254479Sbinkertn@umich.edu devious bug in LALR(1) parser generation. This patch should fix problems 8264479Sbinkertn@umich.edu numerous people reported with LALR parsing. 8274479Sbinkertn@umich.edu 8284479Sbinkertn@umich.edu05/27/05: beazley 8294479Sbinkertn@umich.edu Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav, 8304479Sbinkertn@umich.edu and Thad Austin. 8314479Sbinkertn@umich.edu 8324479Sbinkertn@umich.edu05/27/05: beazley 8334479Sbinkertn@umich.edu Added outputdir option to yacc() to control output directory. Contributed 8344479Sbinkertn@umich.edu by Christopher Stawarz. 8354479Sbinkertn@umich.edu 8364479Sbinkertn@umich.edu05/27/05: beazley 8374479Sbinkertn@umich.edu Added rununit.py test script to run tests using the Python unittest module. 8384479Sbinkertn@umich.edu Contributed by Miki Tebeka. 8394479Sbinkertn@umich.edu 8404479Sbinkertn@umich.eduVersion 1.5 8414479Sbinkertn@umich.edu------------------------------ 8424479Sbinkertn@umich.edu05/26/04: beazley 8434479Sbinkertn@umich.edu Major enhancement. LALR(1) parsing support is now working. 8444479Sbinkertn@umich.edu This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu) 8454479Sbinkertn@umich.edu and optimized by David Beazley. To use LALR(1) parsing do 8464479Sbinkertn@umich.edu the following: 8474479Sbinkertn@umich.edu 8484479Sbinkertn@umich.edu yacc.yacc(method="LALR") 8494479Sbinkertn@umich.edu 8504479Sbinkertn@umich.edu Computing LALR(1) parsing tables takes about twice as long as 8514479Sbinkertn@umich.edu the default SLR method. However, LALR(1) allows you to handle 8524479Sbinkertn@umich.edu more complex grammars. For example, the ANSI C grammar 8534479Sbinkertn@umich.edu (in example/ansic) has 13 shift-reduce conflicts with SLR, but 8544479Sbinkertn@umich.edu only has 1 shift-reduce conflict with LALR(1). 8554479Sbinkertn@umich.edu 8564479Sbinkertn@umich.edu05/20/04: beazley 8574479Sbinkertn@umich.edu Added a __len__ method to parser production lists. Can 8584479Sbinkertn@umich.edu be used in parser rules like this: 8594479Sbinkertn@umich.edu 8604479Sbinkertn@umich.edu def p_somerule(p): 8614479Sbinkertn@umich.edu """a : B C D 8624479Sbinkertn@umich.edu | E F" 8634479Sbinkertn@umich.edu if (len(p) == 3): 8644479Sbinkertn@umich.edu # Must have been first rule 8654479Sbinkertn@umich.edu elif (len(p) == 2): 8664479Sbinkertn@umich.edu # Must be second rule 8674479Sbinkertn@umich.edu 8684479Sbinkertn@umich.edu Suggested by Joshua Gerth and others. 8694479Sbinkertn@umich.edu 8704479Sbinkertn@umich.eduVersion 1.4 8714479Sbinkertn@umich.edu------------------------------ 8724479Sbinkertn@umich.edu04/23/04: beazley 8734479Sbinkertn@umich.edu Incorporated a variety of patches contributed by Eric Raymond. 8744479Sbinkertn@umich.edu These include: 8754479Sbinkertn@umich.edu 8764479Sbinkertn@umich.edu 0. Cleans up some comments so they don't wrap on an 80-column display. 8774479Sbinkertn@umich.edu 1. Directs compiler errors to stderr where they belong. 8784479Sbinkertn@umich.edu 2. Implements and documents automatic line counting when \n is ignored. 8794479Sbinkertn@umich.edu 3. Changes the way progress messages are dumped when debugging is on. 8804479Sbinkertn@umich.edu The new format is both less verbose and conveys more information than 8814479Sbinkertn@umich.edu the old, including shift and reduce actions. 8824479Sbinkertn@umich.edu 8834479Sbinkertn@umich.edu04/23/04: beazley 8844479Sbinkertn@umich.edu Added a Python setup.py file to simply installation. Contributed 8854479Sbinkertn@umich.edu by Adam Kerrison. 8864479Sbinkertn@umich.edu 8874479Sbinkertn@umich.edu04/23/04: beazley 8884479Sbinkertn@umich.edu Added patches contributed by Adam Kerrison. 8894479Sbinkertn@umich.edu 8904479Sbinkertn@umich.edu - Some output is now only shown when debugging is enabled. This 8914479Sbinkertn@umich.edu means that PLY will be completely silent when not in debugging mode. 8924479Sbinkertn@umich.edu 8934479Sbinkertn@umich.edu - An optional parameter "write_tables" can be passed to yacc() to 8944479Sbinkertn@umich.edu control whether or not parsing tables are written. By default, 8954479Sbinkertn@umich.edu it is true, but it can be turned off if you don't want the yacc 8964479Sbinkertn@umich.edu table file. Note: disabling this will cause yacc() to regenerate 8974479Sbinkertn@umich.edu the parsing table each time. 8984479Sbinkertn@umich.edu 8994479Sbinkertn@umich.edu04/23/04: beazley 9004479Sbinkertn@umich.edu Added patches contributed by David McNab. This patch addes two 9014479Sbinkertn@umich.edu features: 9024479Sbinkertn@umich.edu 9034479Sbinkertn@umich.edu - The parser can be supplied as a class instead of a module. 9044479Sbinkertn@umich.edu For an example of this, see the example/classcalc directory. 9054479Sbinkertn@umich.edu 9064479Sbinkertn@umich.edu - Debugging output can be directed to a filename of the user's 9074479Sbinkertn@umich.edu choice. Use 9084479Sbinkertn@umich.edu 9094479Sbinkertn@umich.edu yacc(debugfile="somefile.out") 9104479Sbinkertn@umich.edu 9114479Sbinkertn@umich.edu 9122632Sstever@eecs.umich.eduVersion 1.3 9132632Sstever@eecs.umich.edu------------------------------ 9142632Sstever@eecs.umich.edu12/10/02: jmdyck 9152632Sstever@eecs.umich.edu Various minor adjustments to the code that Dave checked in today. 9162632Sstever@eecs.umich.edu Updated test/yacc_{inf,unused}.exp to reflect today's changes. 9172632Sstever@eecs.umich.edu 9182632Sstever@eecs.umich.edu12/10/02: beazley 9192632Sstever@eecs.umich.edu Incorporated a variety of minor bug fixes to empty production 9202632Sstever@eecs.umich.edu handling and infinite recursion checking. Contributed by 9212632Sstever@eecs.umich.edu Michael Dyck. 9222632Sstever@eecs.umich.edu 9232632Sstever@eecs.umich.edu12/10/02: beazley 9242632Sstever@eecs.umich.edu Removed bogus recover() method call in yacc.restart() 9252632Sstever@eecs.umich.edu 9262632Sstever@eecs.umich.eduVersion 1.2 9272632Sstever@eecs.umich.edu------------------------------ 9282632Sstever@eecs.umich.edu11/27/02: beazley 9292632Sstever@eecs.umich.edu Lexer and parser objects are now available as an attribute 9302632Sstever@eecs.umich.edu of tokens and slices respectively. For example: 9312632Sstever@eecs.umich.edu 9322632Sstever@eecs.umich.edu def t_NUMBER(t): 9332632Sstever@eecs.umich.edu r'\d+' 9342632Sstever@eecs.umich.edu print t.lexer 9352632Sstever@eecs.umich.edu 9362632Sstever@eecs.umich.edu def p_expr_plus(t): 9372632Sstever@eecs.umich.edu 'expr: expr PLUS expr' 9382632Sstever@eecs.umich.edu print t.lexer 9392632Sstever@eecs.umich.edu print t.parser 9402632Sstever@eecs.umich.edu 9412632Sstever@eecs.umich.edu This can be used for state management (if needed). 9422632Sstever@eecs.umich.edu 9432632Sstever@eecs.umich.edu10/31/02: beazley 9442632Sstever@eecs.umich.edu Modified yacc.py to work with Python optimize mode. To make 9452632Sstever@eecs.umich.edu this work, you need to use 9462632Sstever@eecs.umich.edu 9472632Sstever@eecs.umich.edu yacc.yacc(optimize=1) 9482632Sstever@eecs.umich.edu 9492632Sstever@eecs.umich.edu Furthermore, you need to first run Python in normal mode 9502632Sstever@eecs.umich.edu to generate the necessary parsetab.py files. After that, 9512632Sstever@eecs.umich.edu you can use python -O or python -OO. 9522632Sstever@eecs.umich.edu 9532632Sstever@eecs.umich.edu Note: optimized mode turns off a lot of error checking. 9542632Sstever@eecs.umich.edu Only use when you are sure that your grammar is working. 9552632Sstever@eecs.umich.edu Make sure parsetab.py is up to date! 9562632Sstever@eecs.umich.edu 9572632Sstever@eecs.umich.edu10/30/02: beazley 9582632Sstever@eecs.umich.edu Added cloning of Lexer objects. For example: 9592632Sstever@eecs.umich.edu 9602632Sstever@eecs.umich.edu import copy 9612632Sstever@eecs.umich.edu l = lex.lex() 9622632Sstever@eecs.umich.edu lc = copy.copy(l) 9632632Sstever@eecs.umich.edu 9642632Sstever@eecs.umich.edu l.input("Some text") 9652632Sstever@eecs.umich.edu lc.input("Some other text") 9662632Sstever@eecs.umich.edu ... 9672632Sstever@eecs.umich.edu 9682632Sstever@eecs.umich.edu This might be useful if the same "lexer" is meant to 9692632Sstever@eecs.umich.edu be used in different contexts---or if multiple lexers 9702632Sstever@eecs.umich.edu are running concurrently. 9712632Sstever@eecs.umich.edu 9722632Sstever@eecs.umich.edu10/30/02: beazley 9732632Sstever@eecs.umich.edu Fixed subtle bug with first set computation and empty productions. 9742632Sstever@eecs.umich.edu Patch submitted by Michael Dyck. 9752632Sstever@eecs.umich.edu 9762632Sstever@eecs.umich.edu10/30/02: beazley 9772632Sstever@eecs.umich.edu Fixed error messages to use "filename:line: message" instead 9782632Sstever@eecs.umich.edu of "filename:line. message". This makes error reporting more 9792632Sstever@eecs.umich.edu friendly to emacs. Patch submitted by Fran�ois Pinard. 9802632Sstever@eecs.umich.edu 9812632Sstever@eecs.umich.edu10/30/02: beazley 9822632Sstever@eecs.umich.edu Improvements to parser.out file. Terminals and nonterminals 9832632Sstever@eecs.umich.edu are sorted instead of being printed in random order. 9842632Sstever@eecs.umich.edu Patch submitted by Fran�ois Pinard. 9852632Sstever@eecs.umich.edu 9862632Sstever@eecs.umich.edu10/30/02: beazley 9872632Sstever@eecs.umich.edu Improvements to parser.out file output. Rules are now printed 9882632Sstever@eecs.umich.edu in a way that's easier to understand. Contributed by Russ Cox. 9892632Sstever@eecs.umich.edu 9902632Sstever@eecs.umich.edu10/30/02: beazley 9912632Sstever@eecs.umich.edu Added 'nonassoc' associativity support. This can be used 9922632Sstever@eecs.umich.edu to disable the chaining of operators like a < b < c. 9932632Sstever@eecs.umich.edu To use, simply specify 'nonassoc' in the precedence table 9942632Sstever@eecs.umich.edu 9952632Sstever@eecs.umich.edu precedence = ( 9962632Sstever@eecs.umich.edu ('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators 9972632Sstever@eecs.umich.edu ('left', 'PLUS', 'MINUS'), 9982632Sstever@eecs.umich.edu ('left', 'TIMES', 'DIVIDE'), 9992632Sstever@eecs.umich.edu ('right', 'UMINUS'), # Unary minus operator 10002632Sstever@eecs.umich.edu ) 10012632Sstever@eecs.umich.edu 10022632Sstever@eecs.umich.edu Patch contributed by Russ Cox. 10032632Sstever@eecs.umich.edu 10042632Sstever@eecs.umich.edu10/30/02: beazley 10052632Sstever@eecs.umich.edu Modified the lexer to provide optional support for Python -O and -OO 10062632Sstever@eecs.umich.edu modes. To make this work, Python *first* needs to be run in 10072632Sstever@eecs.umich.edu unoptimized mode. This reads the lexing information and creates a 10082632Sstever@eecs.umich.edu file "lextab.py". Then, run lex like this: 10092632Sstever@eecs.umich.edu 10102632Sstever@eecs.umich.edu # module foo.py 10112632Sstever@eecs.umich.edu ... 10122632Sstever@eecs.umich.edu ... 10132632Sstever@eecs.umich.edu lex.lex(optimize=1) 10142632Sstever@eecs.umich.edu 10152632Sstever@eecs.umich.edu Once the lextab file has been created, subsequent calls to 10162632Sstever@eecs.umich.edu lex.lex() will read data from the lextab file instead of using 10172632Sstever@eecs.umich.edu introspection. In optimized mode (-O, -OO) everything should 10182632Sstever@eecs.umich.edu work normally despite the loss of doc strings. 10192632Sstever@eecs.umich.edu 10202632Sstever@eecs.umich.edu To change the name of the file 'lextab.py' use the following: 10212632Sstever@eecs.umich.edu 10222632Sstever@eecs.umich.edu lex.lex(lextab="footab") 10232632Sstever@eecs.umich.edu 10242632Sstever@eecs.umich.edu (this creates a file footab.py) 10252632Sstever@eecs.umich.edu 10262632Sstever@eecs.umich.edu 10272632Sstever@eecs.umich.eduVersion 1.1 October 25, 2001 10282632Sstever@eecs.umich.edu------------------------------ 10292632Sstever@eecs.umich.edu 10302632Sstever@eecs.umich.edu10/25/01: beazley 10312632Sstever@eecs.umich.edu Modified the table generator to produce much more compact data. 10322632Sstever@eecs.umich.edu This should greatly reduce the size of the parsetab.py[c] file. 10332632Sstever@eecs.umich.edu Caveat: the tables still need to be constructed so a little more 10342632Sstever@eecs.umich.edu work is done in parsetab on import. 10352632Sstever@eecs.umich.edu 10362632Sstever@eecs.umich.edu10/25/01: beazley 10372632Sstever@eecs.umich.edu There may be a possible bug in the cycle detector that reports errors 10382632Sstever@eecs.umich.edu about infinite recursion. I'm having a little trouble tracking it 10392632Sstever@eecs.umich.edu down, but if you get this problem, you can disable the cycle 10402632Sstever@eecs.umich.edu detector as follows: 10412632Sstever@eecs.umich.edu 10422632Sstever@eecs.umich.edu yacc.yacc(check_recursion = 0) 10432632Sstever@eecs.umich.edu 10442632Sstever@eecs.umich.edu10/25/01: beazley 10452632Sstever@eecs.umich.edu Fixed a bug in lex.py that sometimes caused illegal characters to be 10462632Sstever@eecs.umich.edu reported incorrectly. Reported by Sverre J�rgensen. 10472632Sstever@eecs.umich.edu 10482632Sstever@eecs.umich.edu7/8/01 : beazley 10492632Sstever@eecs.umich.edu Added a reference to the underlying lexer object when tokens are handled by 10502632Sstever@eecs.umich.edu functions. The lexer is available as the 'lexer' attribute. This 10512632Sstever@eecs.umich.edu was added to provide better lexing support for languages such as Fortran 10522632Sstever@eecs.umich.edu where certain types of tokens can't be conveniently expressed as regular 10532632Sstever@eecs.umich.edu expressions (and where the tokenizing function may want to perform a 10542632Sstever@eecs.umich.edu little backtracking). Suggested by Pearu Peterson. 10552632Sstever@eecs.umich.edu 10562632Sstever@eecs.umich.edu6/20/01 : beazley 10572632Sstever@eecs.umich.edu Modified yacc() function so that an optional starting symbol can be specified. 10582632Sstever@eecs.umich.edu For example: 10592632Sstever@eecs.umich.edu 10602632Sstever@eecs.umich.edu yacc.yacc(start="statement") 10612632Sstever@eecs.umich.edu 10622632Sstever@eecs.umich.edu Normally yacc always treats the first production rule as the starting symbol. 10632632Sstever@eecs.umich.edu However, if you are debugging your grammar it may be useful to specify 10642632Sstever@eecs.umich.edu an alternative starting symbol. Idea suggested by Rich Salz. 10652632Sstever@eecs.umich.edu 10662632Sstever@eecs.umich.eduVersion 1.0 June 18, 2001 10672632Sstever@eecs.umich.edu-------------------------- 10682632Sstever@eecs.umich.eduInitial public offering 10692632Sstever@eecs.umich.edu 1070