16498Snate@binkert.org
26498Snate@binkert.orgVersion 3.2
36498Snate@binkert.org-----------------------------
46498Snate@binkert.org03/24/09: beazley
56498Snate@binkert.org          Added an extra check to not print duplicated warning messages
66498Snate@binkert.org          about reduce/reduce conflicts.
76498Snate@binkert.org
86498Snate@binkert.org03/24/09: beazley
96498Snate@binkert.org          Switched PLY over to a BSD-license.
106498Snate@binkert.org
116498Snate@binkert.org03/23/09: beazley
126498Snate@binkert.org          Performance optimization.  Discovered a few places to make
136498Snate@binkert.org          speedups in LR table generation.
146498Snate@binkert.org
156498Snate@binkert.org03/23/09: beazley
166498Snate@binkert.org          New warning message.  PLY now warns about rules never 
176498Snate@binkert.org          reduced due to reduce/reduce conflicts.  Suggested by
186498Snate@binkert.org          Bruce Frederiksen.
196498Snate@binkert.org
206498Snate@binkert.org03/23/09: beazley
216498Snate@binkert.org          Some clean-up of warning messages related to reduce/reduce errors.
226498Snate@binkert.org
236498Snate@binkert.org03/23/09: beazley
246498Snate@binkert.org          Added a new picklefile option to yacc() to write the parsing
256498Snate@binkert.org          tables to a filename using the pickle module.   Here is how
266498Snate@binkert.org          it works:
276498Snate@binkert.org
286498Snate@binkert.org              yacc(picklefile="parsetab.p")
296498Snate@binkert.org
306498Snate@binkert.org          This option can be used if the normal parsetab.py file is
316498Snate@binkert.org          extremely large.  For example, on jython, it is impossible
326498Snate@binkert.org          to read parsing tables if the parsetab.py exceeds a certain
336498Snate@binkert.org          threshold.
346498Snate@binkert.org
356498Snate@binkert.org          The filename supplied to the picklefile option is opened
366498Snate@binkert.org          relative to the current working directory of the Python 
376498Snate@binkert.org          interpreter.  If you need to refer to the file elsewhere,
386498Snate@binkert.org          you will need to supply an absolute or relative path.
396498Snate@binkert.org
406498Snate@binkert.org          For maximum portability, the pickle file is written
416498Snate@binkert.org          using protocol 0. 
426498Snate@binkert.org 
436498Snate@binkert.org03/13/09: beazley
446498Snate@binkert.org          Fixed a bug in parser.out generation where the rule numbers
456498Snate@binkert.org          where off by one.
466498Snate@binkert.org
476498Snate@binkert.org03/13/09: beazley
486498Snate@binkert.org          Fixed a string formatting bug with one of the error messages.
496498Snate@binkert.org          Reported by Richard Reitmeyer
506498Snate@binkert.org
516498Snate@binkert.orgVersion 3.1
526498Snate@binkert.org-----------------------------
536498Snate@binkert.org02/28/09: beazley
546498Snate@binkert.org          Fixed broken start argument to yacc().  PLY-3.0 broke this
556498Snate@binkert.org          feature by accident.
566498Snate@binkert.org
576498Snate@binkert.org02/28/09: beazley
586498Snate@binkert.org          Fixed debugging output. yacc() no longer reports shift/reduce
596498Snate@binkert.org          or reduce/reduce conflicts if debugging is turned off.  This
606498Snate@binkert.org          restores similar behavior in PLY-2.5.   Reported by Andrew Waters.
616498Snate@binkert.org
626498Snate@binkert.orgVersion 3.0
636498Snate@binkert.org-----------------------------
646498Snate@binkert.org02/03/09: beazley
656498Snate@binkert.org          Fixed missing lexer attribute on certain tokens when
666498Snate@binkert.org          invoking the parser p_error() function.  Reported by 
676498Snate@binkert.org          Bart Whiteley.
686498Snate@binkert.org
696498Snate@binkert.org02/02/09: beazley
706498Snate@binkert.org          The lex() command now does all error-reporting and diagonistics
716498Snate@binkert.org          using the logging module interface.   Pass in a Logger object
726498Snate@binkert.org          using the errorlog parameter to specify a different logger.
736498Snate@binkert.org
746498Snate@binkert.org02/02/09: beazley
756498Snate@binkert.org          Refactored ply.lex to use a more object-oriented and organized
766498Snate@binkert.org          approach to collecting lexer information.
776498Snate@binkert.org
786498Snate@binkert.org02/01/09: beazley
796498Snate@binkert.org          Removed the nowarn option from lex().  All output is controlled
806498Snate@binkert.org          by passing in a logger object.   Just pass in a logger with a high
816498Snate@binkert.org          level setting to suppress output.   This argument was never 
826498Snate@binkert.org          documented to begin with so hopefully no one was relying upon it.
836498Snate@binkert.org
846498Snate@binkert.org02/01/09: beazley
856498Snate@binkert.org          Discovered and removed a dead if-statement in the lexer.  This
866498Snate@binkert.org          resulted in a 6-7% speedup in lexing when I tested it.
876498Snate@binkert.org
886498Snate@binkert.org01/13/09: beazley
896498Snate@binkert.org          Minor change to the procedure for signalling a syntax error in a
906498Snate@binkert.org          production rule.  A normal SyntaxError exception should be raised
916498Snate@binkert.org          instead of yacc.SyntaxError.
926498Snate@binkert.org
936498Snate@binkert.org01/13/09: beazley
946498Snate@binkert.org          Added a new method p.set_lineno(n,lineno) that can be used to set the
956498Snate@binkert.org          line number of symbol n in grammar rules.   This simplifies manual
966498Snate@binkert.org          tracking of line numbers.
976498Snate@binkert.org
986498Snate@binkert.org01/11/09: beazley
996498Snate@binkert.org          Vastly improved debugging support for yacc.parse().   Instead of passing
1006498Snate@binkert.org          debug as an integer, you can supply a Logging object (see the logging
1016498Snate@binkert.org          module). Messages will be generated at the ERROR, INFO, and DEBUG
1026498Snate@binkert.org	  logging levels, each level providing progressively more information.
1036498Snate@binkert.org          The debugging trace also shows states, grammar rule, values passed
1046498Snate@binkert.org          into grammar rules, and the result of each reduction.
1056498Snate@binkert.org
1066498Snate@binkert.org01/09/09: beazley
1076498Snate@binkert.org          The yacc() command now does all error-reporting and diagnostics using
1086498Snate@binkert.org          the interface of the logging module.  Use the errorlog parameter to
1096498Snate@binkert.org          specify a logging object for error messages.  Use the debuglog parameter
1106498Snate@binkert.org          to specify a logging object for the 'parser.out' output.
1116498Snate@binkert.org
1126498Snate@binkert.org01/09/09: beazley
1136498Snate@binkert.org          *HUGE* refactoring of the the ply.yacc() implementation.   The high-level
1146498Snate@binkert.org	  user interface is backwards compatible, but the internals are completely
1156498Snate@binkert.org          reorganized into classes.  No more global variables.    The internals
1166498Snate@binkert.org          are also more extensible.  For example, you can use the classes to 
1176498Snate@binkert.org          construct a LALR(1) parser in an entirely different manner than 
1186498Snate@binkert.org          what is currently the case.  Documentation is forthcoming.
1196498Snate@binkert.org
1206498Snate@binkert.org01/07/09: beazley
1216498Snate@binkert.org          Various cleanup and refactoring of yacc internals.  
1226498Snate@binkert.org
1236498Snate@binkert.org01/06/09: beazley
1246498Snate@binkert.org          Fixed a bug with precedence assignment.  yacc was assigning the precedence
1256498Snate@binkert.org          each rule based on the left-most token, when in fact, it should have been
1266498Snate@binkert.org          using the right-most token.  Reported by Bruce Frederiksen.
1276498Snate@binkert.org
1286498Snate@binkert.org11/27/08: beazley
1296498Snate@binkert.org          Numerous changes to support Python 3.0 including removal of deprecated 
1306498Snate@binkert.org          statements (e.g., has_key) and the additional of compatibility code
1316498Snate@binkert.org          to emulate features from Python 2 that have been removed, but which
1326498Snate@binkert.org          are needed.   Fixed the unit testing suite to work with Python 3.0.
1336498Snate@binkert.org          The code should be backwards compatible with Python 2.
1346498Snate@binkert.org
1356498Snate@binkert.org11/26/08: beazley
1366498Snate@binkert.org          Loosened the rules on what kind of objects can be passed in as the
1376498Snate@binkert.org          "module" parameter to lex() and yacc().  Previously, you could only use
1386498Snate@binkert.org          a module or an instance.  Now, PLY just uses dir() to get a list of
1396498Snate@binkert.org          symbols on whatever the object is without regard for its type.
1406498Snate@binkert.org
1416498Snate@binkert.org11/26/08: beazley
1426498Snate@binkert.org          Changed all except: statements to be compatible with Python2.x/3.x syntax.
1436498Snate@binkert.org
1446498Snate@binkert.org11/26/08: beazley
1456498Snate@binkert.org          Changed all raise Exception, value statements to raise Exception(value) for
1466498Snate@binkert.org          forward compatibility.
1476498Snate@binkert.org
1486498Snate@binkert.org11/26/08: beazley
1496498Snate@binkert.org          Removed all print statements from lex and yacc, using sys.stdout and sys.stderr
1506498Snate@binkert.org          directly.  Preparation for Python 3.0 support.
1516498Snate@binkert.org
1526498Snate@binkert.org11/04/08: beazley
1536498Snate@binkert.org          Fixed a bug with referring to symbols on the the parsing stack using negative
1546498Snate@binkert.org          indices.
1556498Snate@binkert.org
1566498Snate@binkert.org05/29/08: beazley
1576498Snate@binkert.org          Completely revamped the testing system to use the unittest module for everything.
1586498Snate@binkert.org          Added additional tests to cover new errors/warnings.
1596498Snate@binkert.org
1606498Snate@binkert.orgVersion 2.5
1616498Snate@binkert.org-----------------------------
1626498Snate@binkert.org05/28/08: beazley
1636498Snate@binkert.org          Fixed a bug with writing lex-tables in optimized mode and start states.
1646498Snate@binkert.org          Reported by Kevin Henry.
1656498Snate@binkert.org
1666498Snate@binkert.orgVersion 2.4
1676498Snate@binkert.org-----------------------------
1686498Snate@binkert.org05/04/08: beazley
1696498Snate@binkert.org          A version number is now embedded in the table file signature so that 
1706498Snate@binkert.org          yacc can more gracefully accomodate changes to the output format
1716498Snate@binkert.org          in the future.
1726498Snate@binkert.org
1736498Snate@binkert.org05/04/08: beazley
1746498Snate@binkert.org          Removed undocumented .pushback() method on grammar productions.  I'm
1756498Snate@binkert.org          not sure this ever worked and can't recall ever using it.  Might have
1766498Snate@binkert.org          been an abandoned idea that never really got fleshed out.  This
1776498Snate@binkert.org          feature was never described or tested so removing it is hopefully
1786498Snate@binkert.org          harmless.
1796498Snate@binkert.org
1806498Snate@binkert.org05/04/08: beazley
1816498Snate@binkert.org          Added extra error checking to yacc() to detect precedence rules defined
1826498Snate@binkert.org          for undefined terminal symbols.   This allows yacc() to detect a potential
1836498Snate@binkert.org          problem that can be really tricky to debug if no warning message or error
1846498Snate@binkert.org          message is generated about it.
1856498Snate@binkert.org
1866498Snate@binkert.org05/04/08: beazley
1876498Snate@binkert.org          lex() now has an outputdir that can specify the output directory for
1886498Snate@binkert.org          tables when running in optimize mode.  For example:
1896498Snate@binkert.org
1906498Snate@binkert.org             lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar")
1916498Snate@binkert.org 
1926498Snate@binkert.org          The behavior of specifying a table module and output directory are
1936498Snate@binkert.org          more aligned with the behavior of yacc().
1946498Snate@binkert.org          
1956498Snate@binkert.org05/04/08: beazley
1966498Snate@binkert.org          [Issue 9]
1976498Snate@binkert.org          Fixed filename bug in when specifying the modulename in lex() and yacc(). 
1986498Snate@binkert.org          If you specified options such as the following:
1996498Snate@binkert.org
2006498Snate@binkert.org             parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar")
2016498Snate@binkert.org
2026498Snate@binkert.org          yacc would create a file "foo.bar.parsetab.py" in the given directory.
2036498Snate@binkert.org          Now, it simply generates a file "parsetab.py" in that directory. 
2046498Snate@binkert.org          Bug reported by cptbinho.
2056498Snate@binkert.org
2066498Snate@binkert.org05/04/08: beazley
2076498Snate@binkert.org          Slight modification to lex() and yacc() to allow their table files
2086498Snate@binkert.org	  to be loaded from a previously loaded module.   This might make
2096498Snate@binkert.org	  it easier to load the parsing tables from a complicated package
2106498Snate@binkert.org          structure.  For example:
2116498Snate@binkert.org
2126498Snate@binkert.org	       import foo.bar.spam.parsetab as parsetab
2136498Snate@binkert.org               parser = yacc.yacc(tabmodule=parsetab)
2146498Snate@binkert.org
2156498Snate@binkert.org          Note:  lex and yacc will never regenerate the table file if used
2166498Snate@binkert.org          in the form---you will get a warning message instead. 
2176498Snate@binkert.org          This idea suggested by Brian Clapper.
2186498Snate@binkert.org
2196498Snate@binkert.org
2206498Snate@binkert.org04/28/08: beazley
2216498Snate@binkert.org          Fixed a big with p_error() functions being picked up correctly
2226498Snate@binkert.org          when running in yacc(optimize=1) mode.  Patch contributed by
2236498Snate@binkert.org          Bart Whiteley.
2246498Snate@binkert.org
2256498Snate@binkert.org02/28/08: beazley
2266498Snate@binkert.org          Fixed a bug with 'nonassoc' precedence rules.   Basically the
2276498Snate@binkert.org          non-precedence was being ignored and not producing the correct
2286498Snate@binkert.org          run-time behavior in the parser.
2296498Snate@binkert.org
2306498Snate@binkert.org02/16/08: beazley
2316498Snate@binkert.org          Slight relaxation of what the input() method to a lexer will
2326498Snate@binkert.org          accept as a string.   Instead of testing the input to see
2336498Snate@binkert.org          if the input is a string or unicode string, it checks to see
2346498Snate@binkert.org          if the input object looks like it contains string data.
2356498Snate@binkert.org          This change makes it possible to pass string-like objects
2366498Snate@binkert.org          in as input.  For example, the object returned by mmap.
2376498Snate@binkert.org
2386498Snate@binkert.org              import mmap, os
2396498Snate@binkert.org              data = mmap.mmap(os.open(filename,os.O_RDONLY),
2406498Snate@binkert.org                               os.path.getsize(filename),
2416498Snate@binkert.org                               access=mmap.ACCESS_READ)
2426498Snate@binkert.org              lexer.input(data)
2436498Snate@binkert.org           
2446498Snate@binkert.org
2456498Snate@binkert.org11/29/07: beazley
2466498Snate@binkert.org          Modification of ply.lex to allow token functions to aliased.
2476498Snate@binkert.org          This is subtle, but it makes it easier to create libraries and
2486498Snate@binkert.org          to reuse token specifications.  For example, suppose you defined
2496498Snate@binkert.org          a function like this:
2506498Snate@binkert.org
2516498Snate@binkert.org               def number(t):
2526498Snate@binkert.org                    r'\d+'
2536498Snate@binkert.org                    t.value = int(t.value)
2546498Snate@binkert.org                    return t
2556498Snate@binkert.org
2566498Snate@binkert.org          This change would allow you to define a token rule as follows:
2576498Snate@binkert.org
2586498Snate@binkert.org              t_NUMBER = number
2596498Snate@binkert.org
2606498Snate@binkert.org          In this case, the token type will be set to 'NUMBER' and use
2616498Snate@binkert.org          the associated number() function to process tokens.   
2626498Snate@binkert.org           
2636498Snate@binkert.org11/28/07: beazley
2646498Snate@binkert.org          Slight modification to lex and yacc to grab symbols from both
2656498Snate@binkert.org          the local and global dictionaries of the caller.   This
2666498Snate@binkert.org          modification allows lexers and parsers to be defined using
2676498Snate@binkert.org          inner functions and closures.
2686498Snate@binkert.org
2696498Snate@binkert.org11/28/07: beazley
2706498Snate@binkert.org          Performance optimization:  The lexer.lexmatch and t.lexer
2716498Snate@binkert.org          attributes are no longer set for lexer tokens that are not
2726498Snate@binkert.org          defined by functions.   The only normal use of these attributes
2736498Snate@binkert.org          would be in lexer rules that need to perform some kind of 
2746498Snate@binkert.org          special processing.  Thus, it doesn't make any sense to set
2756498Snate@binkert.org          them on every token.
2766498Snate@binkert.org
2776498Snate@binkert.org          *** POTENTIAL INCOMPATIBILITY ***  This might break code
2786498Snate@binkert.org          that is mucking around with internal lexer state in some
2796498Snate@binkert.org          sort of magical way.
2806498Snate@binkert.org
2816498Snate@binkert.org11/27/07: beazley
2826498Snate@binkert.org          Added the ability to put the parser into error-handling mode
2836498Snate@binkert.org          from within a normal production.   To do this, simply raise
2846498Snate@binkert.org          a yacc.SyntaxError exception like this:
2856498Snate@binkert.org
2866498Snate@binkert.org          def p_some_production(p):
2876498Snate@binkert.org              'some_production : prod1 prod2'
2886498Snate@binkert.org              ...
2896498Snate@binkert.org              raise yacc.SyntaxError      # Signal an error
2906498Snate@binkert.org
2916498Snate@binkert.org          A number of things happen after this occurs:
2926498Snate@binkert.org
2936498Snate@binkert.org          - The last symbol shifted onto the symbol stack is discarded
2946498Snate@binkert.org            and parser state backed up to what it was before the
2956498Snate@binkert.org            the rule reduction.
2966498Snate@binkert.org
2976498Snate@binkert.org          - The current lookahead symbol is saved and replaced by
2986498Snate@binkert.org            the 'error' symbol.
2996498Snate@binkert.org
3006498Snate@binkert.org          - The parser enters error recovery mode where it tries
3016498Snate@binkert.org            to either reduce the 'error' rule or it starts 
3026498Snate@binkert.org            discarding items off of the stack until the parser
3036498Snate@binkert.org            resets.
3046498Snate@binkert.org
3056498Snate@binkert.org          When an error is manually set, the parser does *not* call 
3066498Snate@binkert.org          the p_error() function (if any is defined).
3076498Snate@binkert.org          *** NEW FEATURE *** Suggested on the mailing list
3086498Snate@binkert.org
3096498Snate@binkert.org11/27/07: beazley
3106498Snate@binkert.org          Fixed structure bug in examples/ansic.  Reported by Dion Blazakis.
3116498Snate@binkert.org
3126498Snate@binkert.org11/27/07: beazley
3136498Snate@binkert.org          Fixed a bug in the lexer related to start conditions and ignored
3146498Snate@binkert.org          token rules.  If a rule was defined that changed state, but
3156498Snate@binkert.org          returned no token, the lexer could be left in an inconsistent
3166498Snate@binkert.org          state.  Reported by 
3176498Snate@binkert.org          
3186498Snate@binkert.org11/27/07: beazley
3196498Snate@binkert.org          Modified setup.py to support Python Eggs.   Patch contributed by
3206498Snate@binkert.org          Simon Cross.
3216498Snate@binkert.org
3226498Snate@binkert.org11/09/07: beazely
3236498Snate@binkert.org          Fixed a bug in error handling in yacc.  If a syntax error occurred and the
3246498Snate@binkert.org          parser rolled the entire parse stack back, the parser would be left in in
3256498Snate@binkert.org          inconsistent state that would cause it to trigger incorrect actions on
3266498Snate@binkert.org          subsequent input.  Reported by Ton Biegstraaten, Justin King, and others.
3276498Snate@binkert.org
3286498Snate@binkert.org11/09/07: beazley
3296498Snate@binkert.org          Fixed a bug when passing empty input strings to yacc.parse().   This 
3306498Snate@binkert.org          would result in an error message about "No input given".  Reported
3316498Snate@binkert.org          by Andrew Dalke.
3326498Snate@binkert.org
3334479Sbinkertn@umich.eduVersion 2.3
3344479Sbinkertn@umich.edu-----------------------------
3354479Sbinkertn@umich.edu02/20/07: beazley
3364479Sbinkertn@umich.edu          Fixed a bug with character literals if the literal '.' appeared as the
3374479Sbinkertn@umich.edu          last symbol of a grammar rule.  Reported by Ales Smrcka.
3384479Sbinkertn@umich.edu
3394479Sbinkertn@umich.edu02/19/07: beazley
3404479Sbinkertn@umich.edu          Warning messages are now redirected to stderr instead of being printed
3414479Sbinkertn@umich.edu          to standard output.
3424479Sbinkertn@umich.edu
3434479Sbinkertn@umich.edu02/19/07: beazley
3444479Sbinkertn@umich.edu          Added a warning message to lex.py if it detects a literal backslash
3454479Sbinkertn@umich.edu          character inside the t_ignore declaration.  This is to help
3464479Sbinkertn@umich.edu          problems that might occur if someone accidentally defines t_ignore
3474479Sbinkertn@umich.edu          as a Python raw string.  For example:
3484479Sbinkertn@umich.edu
3494479Sbinkertn@umich.edu              t_ignore = r' \t'
3504479Sbinkertn@umich.edu
3514479Sbinkertn@umich.edu          The idea for this is from an email I received from David Cimimi who
3524479Sbinkertn@umich.edu          reported bizarre behavior in lexing as a result of defining t_ignore
3534479Sbinkertn@umich.edu          as a raw string by accident.
3544479Sbinkertn@umich.edu
3554479Sbinkertn@umich.edu02/18/07: beazley
3564479Sbinkertn@umich.edu          Performance improvements.  Made some changes to the internal
3574479Sbinkertn@umich.edu          table organization and LR parser to improve parsing performance.
3584479Sbinkertn@umich.edu
3594479Sbinkertn@umich.edu02/18/07: beazley
3604479Sbinkertn@umich.edu          Automatic tracking of line number and position information must now be 
3614479Sbinkertn@umich.edu          enabled by a special flag to parse().  For example:
3624479Sbinkertn@umich.edu
3634479Sbinkertn@umich.edu              yacc.parse(data,tracking=True)
3644479Sbinkertn@umich.edu
3654479Sbinkertn@umich.edu          In many applications, it's just not that important to have the
3664479Sbinkertn@umich.edu          parser automatically track all line numbers.  By making this an 
3674479Sbinkertn@umich.edu          optional feature, it allows the parser to run significantly faster
3684479Sbinkertn@umich.edu          (more than a 20% speed increase in many cases).    Note: positional
3694479Sbinkertn@umich.edu          information is always available for raw tokens---this change only
3704479Sbinkertn@umich.edu          applies to positional information associated with nonterminal
3714479Sbinkertn@umich.edu          grammar symbols.
3724479Sbinkertn@umich.edu          *** POTENTIAL INCOMPATIBILITY ***
3734479Sbinkertn@umich.edu	  
3744479Sbinkertn@umich.edu02/18/07: beazley
3754479Sbinkertn@umich.edu          Yacc no longer supports extended slices of grammar productions.
3764479Sbinkertn@umich.edu          However, it does support regular slices.  For example:
3774479Sbinkertn@umich.edu
3784479Sbinkertn@umich.edu          def p_foo(p):
3794479Sbinkertn@umich.edu              '''foo: a b c d e'''
3804479Sbinkertn@umich.edu              p[0] = p[1:3]
3814479Sbinkertn@umich.edu
3824479Sbinkertn@umich.edu          This change is a performance improvement to the parser--it streamlines
3834479Sbinkertn@umich.edu          normal access to the grammar values since slices are now handled in
3844479Sbinkertn@umich.edu          a __getslice__() method as opposed to __getitem__().
3854479Sbinkertn@umich.edu
3864479Sbinkertn@umich.edu02/12/07: beazley
3874479Sbinkertn@umich.edu          Fixed a bug in the handling of token names when combined with
3884479Sbinkertn@umich.edu          start conditions.   Bug reported by Todd O'Bryan.
3894479Sbinkertn@umich.edu
3904479Sbinkertn@umich.eduVersion 2.2
3914479Sbinkertn@umich.edu------------------------------
3924479Sbinkertn@umich.edu11/01/06: beazley
3934479Sbinkertn@umich.edu          Added lexpos() and lexspan() methods to grammar symbols.  These
3944479Sbinkertn@umich.edu          mirror the same functionality of lineno() and linespan().  For
3954479Sbinkertn@umich.edu          example:
3964479Sbinkertn@umich.edu
3974479Sbinkertn@umich.edu          def p_expr(p):
3984479Sbinkertn@umich.edu              'expr : expr PLUS expr'
3994479Sbinkertn@umich.edu               p.lexpos(1)     # Lexing position of left-hand-expression
4004479Sbinkertn@umich.edu               p.lexpos(1)     # Lexing position of PLUS
4014479Sbinkertn@umich.edu               start,end = p.lexspan(3)  # Lexing range of right hand expression
4024479Sbinkertn@umich.edu
4034479Sbinkertn@umich.edu11/01/06: beazley
4044479Sbinkertn@umich.edu          Minor change to error handling.  The recommended way to skip characters
4054479Sbinkertn@umich.edu          in the input is to use t.lexer.skip() as shown here:
4064479Sbinkertn@umich.edu
4074479Sbinkertn@umich.edu             def t_error(t):
4084479Sbinkertn@umich.edu                 print "Illegal character '%s'" % t.value[0]
4094479Sbinkertn@umich.edu                 t.lexer.skip(1)
4104479Sbinkertn@umich.edu          
4114479Sbinkertn@umich.edu          The old approach of just using t.skip(1) will still work, but won't
4124479Sbinkertn@umich.edu          be documented.
4134479Sbinkertn@umich.edu
4144479Sbinkertn@umich.edu10/31/06: beazley
4154479Sbinkertn@umich.edu          Discarded tokens can now be specified as simple strings instead of
4164479Sbinkertn@umich.edu          functions.  To do this, simply include the text "ignore_" in the
4174479Sbinkertn@umich.edu          token declaration.  For example:
4184479Sbinkertn@umich.edu
4194479Sbinkertn@umich.edu              t_ignore_cppcomment = r'//.*'
4204479Sbinkertn@umich.edu          
4214479Sbinkertn@umich.edu          Previously, this had to be done with a function.  For example:
4224479Sbinkertn@umich.edu
4234479Sbinkertn@umich.edu              def t_ignore_cppcomment(t):
4244479Sbinkertn@umich.edu                  r'//.*'
4254479Sbinkertn@umich.edu                  pass
4264479Sbinkertn@umich.edu
4274479Sbinkertn@umich.edu          If start conditions/states are being used, state names should appear
4284479Sbinkertn@umich.edu          before the "ignore_" text.
4294479Sbinkertn@umich.edu
4304479Sbinkertn@umich.edu10/19/06: beazley
4314479Sbinkertn@umich.edu          The Lex module now provides support for flex-style start conditions
4324479Sbinkertn@umich.edu          as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html.
4334479Sbinkertn@umich.edu          Please refer to this document to understand this change note.  Refer to
4344479Sbinkertn@umich.edu          the PLY documentation for PLY-specific explanation of how this works.
4354479Sbinkertn@umich.edu
4364479Sbinkertn@umich.edu          To use start conditions, you first need to declare a set of states in
4374479Sbinkertn@umich.edu          your lexer file:
4384479Sbinkertn@umich.edu
4394479Sbinkertn@umich.edu          states = (
4404479Sbinkertn@umich.edu                    ('foo','exclusive'),
4414479Sbinkertn@umich.edu                    ('bar','inclusive')
4424479Sbinkertn@umich.edu          )
4434479Sbinkertn@umich.edu
4444479Sbinkertn@umich.edu          This serves the same role as the %s and %x specifiers in flex.
4454479Sbinkertn@umich.edu
4464479Sbinkertn@umich.edu          One a state has been declared, tokens for that state can be 
4474479Sbinkertn@umich.edu          declared by defining rules of the form t_state_TOK.  For example:
4484479Sbinkertn@umich.edu
4494479Sbinkertn@umich.edu            t_PLUS = '\+'          # Rule defined in INITIAL state
4504479Sbinkertn@umich.edu            t_foo_NUM = '\d+'      # Rule defined in foo state
4514479Sbinkertn@umich.edu            t_bar_NUM = '\d+'      # Rule defined in bar state
4524479Sbinkertn@umich.edu
4534479Sbinkertn@umich.edu            t_foo_bar_NUM = '\d+'  # Rule defined in both foo and bar
4544479Sbinkertn@umich.edu            t_ANY_NUM = '\d+'      # Rule defined in all states
4554479Sbinkertn@umich.edu
4564479Sbinkertn@umich.edu          In addition to defining tokens for each state, the t_ignore and t_error
4574479Sbinkertn@umich.edu          specifications can be customized for specific states.  For example:
4584479Sbinkertn@umich.edu
4594479Sbinkertn@umich.edu            t_foo_ignore = " "     # Ignored characters for foo state
4604479Sbinkertn@umich.edu            def t_bar_error(t):   
4614479Sbinkertn@umich.edu                # Handle errors in bar state
4624479Sbinkertn@umich.edu
4634479Sbinkertn@umich.edu          With token rules, the following methods can be used to change states
4644479Sbinkertn@umich.edu          
4654479Sbinkertn@umich.edu            def t_TOKNAME(t):
4664479Sbinkertn@umich.edu                t.lexer.begin('foo')        # Begin state 'foo'
4674479Sbinkertn@umich.edu                t.lexer.push_state('foo')   # Begin state 'foo', push old state
4684479Sbinkertn@umich.edu                                            # onto a stack
4694479Sbinkertn@umich.edu                t.lexer.pop_state()         # Restore previous state
4704479Sbinkertn@umich.edu                t.lexer.current_state()     # Returns name of current state
4714479Sbinkertn@umich.edu
4724479Sbinkertn@umich.edu          These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and
4734479Sbinkertn@umich.edu          yy_top_state() functions in flex.
4744479Sbinkertn@umich.edu
4754479Sbinkertn@umich.edu          The use of start states can be used as one way to write sub-lexers.
4764479Sbinkertn@umich.edu          For example, the lexer or parser might instruct the lexer to start
4774479Sbinkertn@umich.edu          generating a different set of tokens depending on the context.
4784479Sbinkertn@umich.edu          
4794479Sbinkertn@umich.edu          example/yply/ylex.py shows the use of start states to grab C/C++ 
4804479Sbinkertn@umich.edu          code fragments out of traditional yacc specification files.
4814479Sbinkertn@umich.edu
4824479Sbinkertn@umich.edu          *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also
4834479Sbinkertn@umich.edu          discussed various aspects of the design.
4844479Sbinkertn@umich.edu
4854479Sbinkertn@umich.edu10/19/06: beazley
4864479Sbinkertn@umich.edu          Minor change to the way in which yacc.py was reporting shift/reduce
4874479Sbinkertn@umich.edu          conflicts.  Although the underlying LALR(1) algorithm was correct,
4884479Sbinkertn@umich.edu          PLY was under-reporting the number of conflicts compared to yacc/bison
4894479Sbinkertn@umich.edu          when precedence rules were in effect.  This change should make PLY
4904479Sbinkertn@umich.edu          report the same number of conflicts as yacc.
4914479Sbinkertn@umich.edu
4924479Sbinkertn@umich.edu10/19/06: beazley
4934479Sbinkertn@umich.edu          Modified yacc so that grammar rules could also include the '-' 
4944479Sbinkertn@umich.edu          character.  For example:
4954479Sbinkertn@umich.edu
4964479Sbinkertn@umich.edu            def p_expr_list(p):
4974479Sbinkertn@umich.edu                'expression-list : expression-list expression'
4984479Sbinkertn@umich.edu
4994479Sbinkertn@umich.edu          Suggested by Oldrich Jedlicka.
5004479Sbinkertn@umich.edu
5014479Sbinkertn@umich.edu10/18/06: beazley
5024479Sbinkertn@umich.edu          Attribute lexer.lexmatch added so that token rules can access the re 
5034479Sbinkertn@umich.edu          match object that was generated.  For example:
5044479Sbinkertn@umich.edu
5054479Sbinkertn@umich.edu          def t_FOO(t):
5064479Sbinkertn@umich.edu              r'some regex'
5074479Sbinkertn@umich.edu              m = t.lexer.lexmatch
5084479Sbinkertn@umich.edu              # Do something with m
5094479Sbinkertn@umich.edu
5104479Sbinkertn@umich.edu
5114479Sbinkertn@umich.edu          This may be useful if you want to access named groups specified within
5124479Sbinkertn@umich.edu          the regex for a specific token. Suggested by Oldrich Jedlicka.
5134479Sbinkertn@umich.edu          
5144479Sbinkertn@umich.edu10/16/06: beazley
5154479Sbinkertn@umich.edu          Changed the error message that results if an illegal character
5164479Sbinkertn@umich.edu          is encountered and no default error function is defined in lex.
5174479Sbinkertn@umich.edu          The exception is now more informative about the actual cause of
5184479Sbinkertn@umich.edu          the error.
5194479Sbinkertn@umich.edu      
5204479Sbinkertn@umich.eduVersion 2.1
5214479Sbinkertn@umich.edu------------------------------
5224479Sbinkertn@umich.edu10/02/06: beazley
5234479Sbinkertn@umich.edu          The last Lexer object built by lex() can be found in lex.lexer.
5244479Sbinkertn@umich.edu          The last Parser object built  by yacc() can be found in yacc.parser.
5254479Sbinkertn@umich.edu
5264479Sbinkertn@umich.edu10/02/06: beazley
5274479Sbinkertn@umich.edu          New example added:  examples/yply
5284479Sbinkertn@umich.edu
5294479Sbinkertn@umich.edu          This example uses PLY to convert Unix-yacc specification files to
5304479Sbinkertn@umich.edu          PLY programs with the same grammar.   This may be useful if you
5314479Sbinkertn@umich.edu          want to convert a grammar from bison/yacc to use with PLY.
5324479Sbinkertn@umich.edu    
5334479Sbinkertn@umich.edu10/02/06: beazley
5344479Sbinkertn@umich.edu          Added support for a start symbol to be specified in the yacc
5354479Sbinkertn@umich.edu          input file itself.  Just do this:
5364479Sbinkertn@umich.edu
5374479Sbinkertn@umich.edu               start = 'name'
5384479Sbinkertn@umich.edu
5394479Sbinkertn@umich.edu          where 'name' matches some grammar rule.  For example:
5404479Sbinkertn@umich.edu
5414479Sbinkertn@umich.edu               def p_name(p):
5424479Sbinkertn@umich.edu                   'name : A B C'
5434479Sbinkertn@umich.edu                   ...
5444479Sbinkertn@umich.edu
5454479Sbinkertn@umich.edu          This mirrors the functionality of the yacc %start specifier.
5464479Sbinkertn@umich.edu
5474479Sbinkertn@umich.edu09/30/06: beazley
5484479Sbinkertn@umich.edu          Some new examples added.:
5494479Sbinkertn@umich.edu
5504479Sbinkertn@umich.edu          examples/GardenSnake : A simple indentation based language similar
5514479Sbinkertn@umich.edu                                 to Python.  Shows how you might handle 
5524479Sbinkertn@umich.edu                                 whitespace.  Contributed by Andrew Dalke.
5534479Sbinkertn@umich.edu
5544479Sbinkertn@umich.edu          examples/BASIC       : An implementation of 1964 Dartmouth BASIC.
5554479Sbinkertn@umich.edu                                 Contributed by Dave against his better
5564479Sbinkertn@umich.edu                                 judgement.
5574479Sbinkertn@umich.edu
5584479Sbinkertn@umich.edu09/28/06: beazley
5594479Sbinkertn@umich.edu          Minor patch to allow named groups to be used in lex regular
5604479Sbinkertn@umich.edu          expression rules.  For example:
5614479Sbinkertn@umich.edu
5624479Sbinkertn@umich.edu              t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)'''
5634479Sbinkertn@umich.edu
5644479Sbinkertn@umich.edu          Patch submitted by Adam Ring.
5654479Sbinkertn@umich.edu 
5664479Sbinkertn@umich.edu09/28/06: beazley
5674479Sbinkertn@umich.edu          LALR(1) is now the default parsing method.   To use SLR, use
5684479Sbinkertn@umich.edu          yacc.yacc(method="SLR").  Note: there is no performance impact
5694479Sbinkertn@umich.edu          on parsing when using LALR(1) instead of SLR. However, constructing
5704479Sbinkertn@umich.edu          the parsing tables will take a little longer.
5714479Sbinkertn@umich.edu
5724479Sbinkertn@umich.edu09/26/06: beazley
5734479Sbinkertn@umich.edu          Change to line number tracking.  To modify line numbers, modify
5744479Sbinkertn@umich.edu          the line number of the lexer itself.  For example:
5754479Sbinkertn@umich.edu
5764479Sbinkertn@umich.edu          def t_NEWLINE(t):
5774479Sbinkertn@umich.edu              r'\n'
5784479Sbinkertn@umich.edu              t.lexer.lineno += 1
5794479Sbinkertn@umich.edu
5804479Sbinkertn@umich.edu          This modification is both cleanup and a performance optimization.
5814479Sbinkertn@umich.edu          In past versions, lex was monitoring every token for changes in
5824479Sbinkertn@umich.edu          the line number.  This extra processing is unnecessary for a vast
5834479Sbinkertn@umich.edu          majority of tokens. Thus, this new approach cleans it up a bit.
5844479Sbinkertn@umich.edu
5854479Sbinkertn@umich.edu          *** POTENTIAL INCOMPATIBILITY ***
5864479Sbinkertn@umich.edu          You will need to change code in your lexer that updates the line
5874479Sbinkertn@umich.edu          number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1"
5884479Sbinkertn@umich.edu         
5894479Sbinkertn@umich.edu09/26/06: beazley
5904479Sbinkertn@umich.edu          Added the lexing position to tokens as an attribute lexpos. This
5914479Sbinkertn@umich.edu          is the raw index into the input text at which a token appears.
5924479Sbinkertn@umich.edu          This information can be used to compute column numbers and other
5934479Sbinkertn@umich.edu          details (e.g., scan backwards from lexpos to the first newline
5944479Sbinkertn@umich.edu          to get a column position).
5954479Sbinkertn@umich.edu
5964479Sbinkertn@umich.edu09/25/06: beazley
5974479Sbinkertn@umich.edu          Changed the name of the __copy__() method on the Lexer class
5984479Sbinkertn@umich.edu          to clone().  This is used to clone a Lexer object (e.g., if
5994479Sbinkertn@umich.edu          you're running different lexers at the same time).
6004479Sbinkertn@umich.edu
6014479Sbinkertn@umich.edu09/21/06: beazley
6024479Sbinkertn@umich.edu          Limitations related to the use of the re module have been eliminated.
6034479Sbinkertn@umich.edu          Several users reported problems with regular expressions exceeding
6044479Sbinkertn@umich.edu          more than 100 named groups. To solve this, lex.py is now capable
6054479Sbinkertn@umich.edu          of automatically splitting its master regular regular expression into
6064479Sbinkertn@umich.edu          smaller expressions as needed.   This should, in theory, make it
6074479Sbinkertn@umich.edu          possible to specify an arbitrarily large number of tokens.
6084479Sbinkertn@umich.edu
6094479Sbinkertn@umich.edu09/21/06: beazley
6104479Sbinkertn@umich.edu          Improved error checking in lex.py.  Rules that match the empty string
6114479Sbinkertn@umich.edu          are now rejected (otherwise they cause the lexer to enter an infinite
6124479Sbinkertn@umich.edu          loop).  An extra check for rules containing '#' has also been added.
6134479Sbinkertn@umich.edu          Since lex compiles regular expressions in verbose mode, '#' is interpreted
6144479Sbinkertn@umich.edu          as a regex comment, it is critical to use '\#' instead.  
6154479Sbinkertn@umich.edu
6164479Sbinkertn@umich.edu09/18/06: beazley
6174479Sbinkertn@umich.edu          Added a @TOKEN decorator function to lex.py that can be used to 
6184479Sbinkertn@umich.edu          define token rules where the documentation string might be computed
6194479Sbinkertn@umich.edu          in some way.
6204479Sbinkertn@umich.edu          
6214479Sbinkertn@umich.edu          digit            = r'([0-9])'
6224479Sbinkertn@umich.edu          nondigit         = r'([_A-Za-z])'
6234479Sbinkertn@umich.edu          identifier       = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)'        
6244479Sbinkertn@umich.edu
6254479Sbinkertn@umich.edu          from ply.lex import TOKEN
6264479Sbinkertn@umich.edu
6274479Sbinkertn@umich.edu          @TOKEN(identifier)
6284479Sbinkertn@umich.edu          def t_ID(t):
6294479Sbinkertn@umich.edu               # Do whatever
6304479Sbinkertn@umich.edu
6314479Sbinkertn@umich.edu          The @TOKEN decorator merely sets the documentation string of the
6324479Sbinkertn@umich.edu          associated token function as needed for lex to work.  
6334479Sbinkertn@umich.edu
6344479Sbinkertn@umich.edu          Note: An alternative solution is the following:
6354479Sbinkertn@umich.edu
6364479Sbinkertn@umich.edu          def t_ID(t):
6374479Sbinkertn@umich.edu              # Do whatever
6384479Sbinkertn@umich.edu   
6394479Sbinkertn@umich.edu          t_ID.__doc__ = identifier
6404479Sbinkertn@umich.edu
6414479Sbinkertn@umich.edu          Note: Decorators require the use of Python 2.4 or later.  If compatibility
6424479Sbinkertn@umich.edu          with old versions is needed, use the latter solution.
6434479Sbinkertn@umich.edu
6444479Sbinkertn@umich.edu          The need for this feature was suggested by Cem Karan.
6454479Sbinkertn@umich.edu
6464479Sbinkertn@umich.edu09/14/06: beazley
6474479Sbinkertn@umich.edu          Support for single-character literal tokens has been added to yacc.
6484479Sbinkertn@umich.edu          These literals must be enclosed in quotes.  For example:
6494479Sbinkertn@umich.edu
6504479Sbinkertn@umich.edu          def p_expr(p):
6514479Sbinkertn@umich.edu               "expr : expr '+' expr"
6524479Sbinkertn@umich.edu               ...
6534479Sbinkertn@umich.edu
6544479Sbinkertn@umich.edu          def p_expr(p):
6554479Sbinkertn@umich.edu               'expr : expr "-" expr'
6564479Sbinkertn@umich.edu               ...
6574479Sbinkertn@umich.edu
6584479Sbinkertn@umich.edu          In addition to this, it is necessary to tell the lexer module about
6594479Sbinkertn@umich.edu          literal characters.   This is done by defining the variable 'literals'
6604479Sbinkertn@umich.edu          as a list of characters.  This should  be defined in the module that
6614479Sbinkertn@umich.edu          invokes the lex.lex() function.  For example:
6624479Sbinkertn@umich.edu
6634479Sbinkertn@umich.edu             literals = ['+','-','*','/','(',')','=']
6644479Sbinkertn@umich.edu 
6654479Sbinkertn@umich.edu          or simply
6664479Sbinkertn@umich.edu
6674479Sbinkertn@umich.edu             literals = '+=*/()='
6684479Sbinkertn@umich.edu
6694479Sbinkertn@umich.edu          It is important to note that literals can only be a single character.
6704479Sbinkertn@umich.edu          When the lexer fails to match a token using its normal regular expression
6714479Sbinkertn@umich.edu          rules, it will check the current character against the literal list.
6724479Sbinkertn@umich.edu          If found, it will be returned with a token type set to match the literal
6734479Sbinkertn@umich.edu          character.  Otherwise, an illegal character will be signalled.
6744479Sbinkertn@umich.edu
6754479Sbinkertn@umich.edu
6764479Sbinkertn@umich.edu09/14/06: beazley
6774479Sbinkertn@umich.edu          Modified PLY to install itself as a proper Python package called 'ply'.
6784479Sbinkertn@umich.edu          This will make it a little more friendly to other modules.  This
6794479Sbinkertn@umich.edu          changes the usage of PLY only slightly.  Just do this to import the
6804479Sbinkertn@umich.edu          modules
6814479Sbinkertn@umich.edu
6824479Sbinkertn@umich.edu                import ply.lex as lex
6834479Sbinkertn@umich.edu                import ply.yacc as yacc
6844479Sbinkertn@umich.edu
6854479Sbinkertn@umich.edu          Alternatively, you can do this:
6864479Sbinkertn@umich.edu
6874479Sbinkertn@umich.edu                from ply import *
6884479Sbinkertn@umich.edu
6894479Sbinkertn@umich.edu          Which imports both the lex and yacc modules.
6904479Sbinkertn@umich.edu          Change suggested by Lee June.
6914479Sbinkertn@umich.edu
6924479Sbinkertn@umich.edu09/13/06: beazley
6934479Sbinkertn@umich.edu          Changed the handling of negative indices when used in production rules.
6944479Sbinkertn@umich.edu          A negative production index now accesses already parsed symbols on the
6954479Sbinkertn@umich.edu          parsing stack.  For example, 
6964479Sbinkertn@umich.edu
6974479Sbinkertn@umich.edu              def p_foo(p):
6984479Sbinkertn@umich.edu                   "foo: A B C D"
6994479Sbinkertn@umich.edu                   print p[1]       # Value of 'A' symbol
7004479Sbinkertn@umich.edu                   print p[2]       # Value of 'B' symbol
7014479Sbinkertn@umich.edu                   print p[-1]      # Value of whatever symbol appears before A
7024479Sbinkertn@umich.edu                                    # on the parsing stack.
7034479Sbinkertn@umich.edu
7044479Sbinkertn@umich.edu                   p[0] = some_val  # Sets the value of the 'foo' grammer symbol
7054479Sbinkertn@umich.edu                                    
7064479Sbinkertn@umich.edu          This behavior makes it easier to work with embedded actions within the
7074479Sbinkertn@umich.edu          parsing rules. For example, in C-yacc, it is possible to write code like
7084479Sbinkertn@umich.edu          this:
7094479Sbinkertn@umich.edu
7104479Sbinkertn@umich.edu               bar:   A { printf("seen an A = %d\n", $1); } B { do_stuff; }
7114479Sbinkertn@umich.edu
7124479Sbinkertn@umich.edu          In this example, the printf() code executes immediately after A has been
7134479Sbinkertn@umich.edu          parsed.  Within the embedded action code, $1 refers to the A symbol on
7144479Sbinkertn@umich.edu          the stack.
7154479Sbinkertn@umich.edu
7164479Sbinkertn@umich.edu          To perform this equivalent action in PLY, you need to write a pair
7174479Sbinkertn@umich.edu          of rules like this:
7184479Sbinkertn@umich.edu
7194479Sbinkertn@umich.edu               def p_bar(p):
7204479Sbinkertn@umich.edu                     "bar : A seen_A B"
7214479Sbinkertn@umich.edu                     do_stuff
7224479Sbinkertn@umich.edu
7234479Sbinkertn@umich.edu               def p_seen_A(p):
7244479Sbinkertn@umich.edu                     "seen_A :"
7254479Sbinkertn@umich.edu                     print "seen an A =", p[-1]
7264479Sbinkertn@umich.edu
7274479Sbinkertn@umich.edu          The second rule "seen_A" is merely a empty production which should be
7284479Sbinkertn@umich.edu          reduced as soon as A is parsed in the "bar" rule above.  The use 
7294479Sbinkertn@umich.edu          of the negative index p[-1] is used to access whatever symbol appeared
7304479Sbinkertn@umich.edu          before the seen_A symbol.
7314479Sbinkertn@umich.edu
7324479Sbinkertn@umich.edu          This feature also makes it possible to support inherited attributes.
7334479Sbinkertn@umich.edu          For example:
7344479Sbinkertn@umich.edu
7354479Sbinkertn@umich.edu               def p_decl(p):
7364479Sbinkertn@umich.edu                     "decl : scope name"
7374479Sbinkertn@umich.edu
7384479Sbinkertn@umich.edu               def p_scope(p):
7394479Sbinkertn@umich.edu                     """scope : GLOBAL
7404479Sbinkertn@umich.edu                              | LOCAL"""
7414479Sbinkertn@umich.edu                   p[0] = p[1]
7424479Sbinkertn@umich.edu
7434479Sbinkertn@umich.edu               def p_name(p):
7444479Sbinkertn@umich.edu                     "name : ID"
7454479Sbinkertn@umich.edu                     if p[-1] == "GLOBAL":
7464479Sbinkertn@umich.edu                          # ...
7474479Sbinkertn@umich.edu                     else if p[-1] == "LOCAL":
7484479Sbinkertn@umich.edu                          #...
7494479Sbinkertn@umich.edu
7504479Sbinkertn@umich.edu          In this case, the name rule is inheriting an attribute from the
7514479Sbinkertn@umich.edu          scope declaration that precedes it.
7524479Sbinkertn@umich.edu       
7534479Sbinkertn@umich.edu          *** POTENTIAL INCOMPATIBILITY ***
7544479Sbinkertn@umich.edu          If you are currently using negative indices within existing grammar rules,
7554479Sbinkertn@umich.edu          your code will break.  This should be extremely rare if non-existent in
7564479Sbinkertn@umich.edu          most cases.  The argument to various grammar rules is not usually not
7574479Sbinkertn@umich.edu          processed in the same way as a list of items.
7584479Sbinkertn@umich.edu          
7594479Sbinkertn@umich.eduVersion 2.0
7604479Sbinkertn@umich.edu------------------------------
7614479Sbinkertn@umich.edu09/07/06: beazley
7624479Sbinkertn@umich.edu          Major cleanup and refactoring of the LR table generation code.  Both SLR
7634479Sbinkertn@umich.edu          and LALR(1) table generation is now performed by the same code base with
7644479Sbinkertn@umich.edu          only minor extensions for extra LALR(1) processing.
7654479Sbinkertn@umich.edu
7664479Sbinkertn@umich.edu09/07/06: beazley
7674479Sbinkertn@umich.edu          Completely reimplemented the entire LALR(1) parsing engine to use the
7684479Sbinkertn@umich.edu          DeRemer and Pennello algorithm for calculating lookahead sets.  This
7694479Sbinkertn@umich.edu          significantly improves the performance of generating LALR(1) tables
7704479Sbinkertn@umich.edu          and has the added feature of actually working correctly!  If you
7714479Sbinkertn@umich.edu          experienced weird behavior with LALR(1) in prior releases, this should
7724479Sbinkertn@umich.edu          hopefully resolve all of those problems.  Many thanks to 
7734479Sbinkertn@umich.edu          Andrew Waters and Markus Schoepflin for submitting bug reports
7744479Sbinkertn@umich.edu          and helping me test out the revised LALR(1) support.
7754479Sbinkertn@umich.edu
7764479Sbinkertn@umich.eduVersion 1.8
7774479Sbinkertn@umich.edu------------------------------
7784479Sbinkertn@umich.edu08/02/06: beazley
7794479Sbinkertn@umich.edu          Fixed a problem related to the handling of default actions in LALR(1)
7804479Sbinkertn@umich.edu          parsing.  If you experienced subtle and/or bizarre behavior when trying
7814479Sbinkertn@umich.edu          to use the LALR(1) engine, this may correct those problems.  Patch
7824479Sbinkertn@umich.edu          contributed by Russ Cox.  Note: This patch has been superceded by 
7834479Sbinkertn@umich.edu          revisions for LALR(1) parsing in Ply-2.0.
7844479Sbinkertn@umich.edu
7854479Sbinkertn@umich.edu08/02/06: beazley
7864479Sbinkertn@umich.edu          Added support for slicing of productions in yacc.  
7874479Sbinkertn@umich.edu          Patch contributed by Patrick Mezard.
7884479Sbinkertn@umich.edu
7894479Sbinkertn@umich.eduVersion 1.7
7904479Sbinkertn@umich.edu------------------------------
7914479Sbinkertn@umich.edu03/02/06: beazley
7924479Sbinkertn@umich.edu          Fixed infinite recursion problem ReduceToTerminals() function that
7934479Sbinkertn@umich.edu          would sometimes come up in LALR(1) table generation.  Reported by 
7944479Sbinkertn@umich.edu          Markus Schoepflin.
7954479Sbinkertn@umich.edu
7964479Sbinkertn@umich.edu03/01/06: beazley
7974479Sbinkertn@umich.edu          Added "reflags" argument to lex().  For example:
7984479Sbinkertn@umich.edu
7994479Sbinkertn@umich.edu               lex.lex(reflags=re.UNICODE)
8004479Sbinkertn@umich.edu
8014479Sbinkertn@umich.edu          This can be used to specify optional flags to the re.compile() function
8024479Sbinkertn@umich.edu          used inside the lexer.   This may be necessary for special situations such
8034479Sbinkertn@umich.edu          as processing Unicode (e.g., if you want escapes like \w and \b to consult
8044479Sbinkertn@umich.edu          the Unicode character property database).   The need for this suggested by
8054479Sbinkertn@umich.edu          Andreas Jung.
8064479Sbinkertn@umich.edu
8074479Sbinkertn@umich.edu03/01/06: beazley
8084479Sbinkertn@umich.edu          Fixed a bug with an uninitialized variable on repeated instantiations of parser
8094479Sbinkertn@umich.edu          objects when the write_tables=0 argument was used.   Reported by Michael Brown.
8104479Sbinkertn@umich.edu
8114479Sbinkertn@umich.edu03/01/06: beazley
8124479Sbinkertn@umich.edu          Modified lex.py to accept Unicode strings both as the regular expressions for
8134479Sbinkertn@umich.edu          tokens and as input. Hopefully this is the only change needed for Unicode support.
8144479Sbinkertn@umich.edu          Patch contributed by Johan Dahl.
8154479Sbinkertn@umich.edu
8164479Sbinkertn@umich.edu03/01/06: beazley
8174479Sbinkertn@umich.edu          Modified the class-based interface to work with new-style or old-style classes.
8184479Sbinkertn@umich.edu          Patch contributed by Michael Brown (although I tweaked it slightly so it would work
8194479Sbinkertn@umich.edu          with older versions of Python).
8204479Sbinkertn@umich.edu
8214479Sbinkertn@umich.eduVersion 1.6
8224479Sbinkertn@umich.edu------------------------------
8234479Sbinkertn@umich.edu05/27/05: beazley
8244479Sbinkertn@umich.edu          Incorporated patch contributed by Christopher Stawarz to fix an extremely
8254479Sbinkertn@umich.edu          devious bug in LALR(1) parser generation.   This patch should fix problems
8264479Sbinkertn@umich.edu          numerous people reported with LALR parsing.
8274479Sbinkertn@umich.edu
8284479Sbinkertn@umich.edu05/27/05: beazley
8294479Sbinkertn@umich.edu          Fixed problem with lex.py copy constructor.  Reported by Dave Aitel, Aaron Lav,
8304479Sbinkertn@umich.edu          and Thad Austin. 
8314479Sbinkertn@umich.edu
8324479Sbinkertn@umich.edu05/27/05: beazley
8334479Sbinkertn@umich.edu          Added outputdir option to yacc()  to control output directory. Contributed
8344479Sbinkertn@umich.edu          by Christopher Stawarz.
8354479Sbinkertn@umich.edu
8364479Sbinkertn@umich.edu05/27/05: beazley
8374479Sbinkertn@umich.edu          Added rununit.py test script to run tests using the Python unittest module.
8384479Sbinkertn@umich.edu          Contributed by Miki Tebeka.
8394479Sbinkertn@umich.edu
8404479Sbinkertn@umich.eduVersion 1.5
8414479Sbinkertn@umich.edu------------------------------
8424479Sbinkertn@umich.edu05/26/04: beazley
8434479Sbinkertn@umich.edu          Major enhancement. LALR(1) parsing support is now working.
8444479Sbinkertn@umich.edu          This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu)
8454479Sbinkertn@umich.edu          and optimized by David Beazley. To use LALR(1) parsing do
8464479Sbinkertn@umich.edu          the following:
8474479Sbinkertn@umich.edu
8484479Sbinkertn@umich.edu               yacc.yacc(method="LALR")
8494479Sbinkertn@umich.edu
8504479Sbinkertn@umich.edu          Computing LALR(1) parsing tables takes about twice as long as
8514479Sbinkertn@umich.edu          the default SLR method.  However, LALR(1) allows you to handle
8524479Sbinkertn@umich.edu          more complex grammars.  For example, the ANSI C grammar
8534479Sbinkertn@umich.edu          (in example/ansic) has 13 shift-reduce conflicts with SLR, but
8544479Sbinkertn@umich.edu          only has 1 shift-reduce conflict with LALR(1).
8554479Sbinkertn@umich.edu
8564479Sbinkertn@umich.edu05/20/04: beazley
8574479Sbinkertn@umich.edu          Added a __len__ method to parser production lists.  Can
8584479Sbinkertn@umich.edu          be used in parser rules like this:
8594479Sbinkertn@umich.edu
8604479Sbinkertn@umich.edu             def p_somerule(p):
8614479Sbinkertn@umich.edu                 """a : B C D
8624479Sbinkertn@umich.edu                      | E F"
8634479Sbinkertn@umich.edu                 if (len(p) == 3):
8644479Sbinkertn@umich.edu                     # Must have been first rule
8654479Sbinkertn@umich.edu                 elif (len(p) == 2):
8664479Sbinkertn@umich.edu                     # Must be second rule
8674479Sbinkertn@umich.edu
8684479Sbinkertn@umich.edu          Suggested by Joshua Gerth and others.
8694479Sbinkertn@umich.edu
8704479Sbinkertn@umich.eduVersion 1.4
8714479Sbinkertn@umich.edu------------------------------
8724479Sbinkertn@umich.edu04/23/04: beazley
8734479Sbinkertn@umich.edu          Incorporated a variety of patches contributed by Eric Raymond.
8744479Sbinkertn@umich.edu          These include:
8754479Sbinkertn@umich.edu
8764479Sbinkertn@umich.edu           0. Cleans up some comments so they don't wrap on an 80-column display.
8774479Sbinkertn@umich.edu           1. Directs compiler errors to stderr where they belong.
8784479Sbinkertn@umich.edu           2. Implements and documents automatic line counting when \n is ignored.
8794479Sbinkertn@umich.edu           3. Changes the way progress messages are dumped when debugging is on. 
8804479Sbinkertn@umich.edu              The new format is both less verbose and conveys more information than
8814479Sbinkertn@umich.edu              the old, including shift and reduce actions.
8824479Sbinkertn@umich.edu
8834479Sbinkertn@umich.edu04/23/04: beazley
8844479Sbinkertn@umich.edu          Added a Python setup.py file to simply installation.  Contributed
8854479Sbinkertn@umich.edu          by Adam Kerrison.
8864479Sbinkertn@umich.edu
8874479Sbinkertn@umich.edu04/23/04: beazley
8884479Sbinkertn@umich.edu          Added patches contributed by Adam Kerrison.
8894479Sbinkertn@umich.edu 
8904479Sbinkertn@umich.edu          -   Some output is now only shown when debugging is enabled.  This
8914479Sbinkertn@umich.edu              means that PLY will be completely silent when not in debugging mode.
8924479Sbinkertn@umich.edu          
8934479Sbinkertn@umich.edu          -   An optional parameter "write_tables" can be passed to yacc() to
8944479Sbinkertn@umich.edu              control whether or not parsing tables are written.   By default,
8954479Sbinkertn@umich.edu              it is true, but it can be turned off if you don't want the yacc
8964479Sbinkertn@umich.edu              table file. Note: disabling this will cause yacc() to regenerate
8974479Sbinkertn@umich.edu              the parsing table each time.
8984479Sbinkertn@umich.edu
8994479Sbinkertn@umich.edu04/23/04: beazley
9004479Sbinkertn@umich.edu          Added patches contributed by David McNab.  This patch addes two
9014479Sbinkertn@umich.edu          features:
9024479Sbinkertn@umich.edu
9034479Sbinkertn@umich.edu          -   The parser can be supplied as a class instead of a module.
9044479Sbinkertn@umich.edu              For an example of this, see the example/classcalc directory.
9054479Sbinkertn@umich.edu
9064479Sbinkertn@umich.edu          -   Debugging output can be directed to a filename of the user's
9074479Sbinkertn@umich.edu              choice.  Use
9084479Sbinkertn@umich.edu
9094479Sbinkertn@umich.edu                 yacc(debugfile="somefile.out")
9104479Sbinkertn@umich.edu
9114479Sbinkertn@umich.edu          
9122632Sstever@eecs.umich.eduVersion 1.3
9132632Sstever@eecs.umich.edu------------------------------
9142632Sstever@eecs.umich.edu12/10/02: jmdyck
9152632Sstever@eecs.umich.edu          Various minor adjustments to the code that Dave checked in today.
9162632Sstever@eecs.umich.edu          Updated test/yacc_{inf,unused}.exp to reflect today's changes.
9172632Sstever@eecs.umich.edu
9182632Sstever@eecs.umich.edu12/10/02: beazley
9192632Sstever@eecs.umich.edu          Incorporated a variety of minor bug fixes to empty production
9202632Sstever@eecs.umich.edu          handling and infinite recursion checking.  Contributed by
9212632Sstever@eecs.umich.edu          Michael Dyck.
9222632Sstever@eecs.umich.edu
9232632Sstever@eecs.umich.edu12/10/02: beazley
9242632Sstever@eecs.umich.edu          Removed bogus recover() method call in yacc.restart()
9252632Sstever@eecs.umich.edu
9262632Sstever@eecs.umich.eduVersion 1.2
9272632Sstever@eecs.umich.edu------------------------------
9282632Sstever@eecs.umich.edu11/27/02: beazley
9292632Sstever@eecs.umich.edu          Lexer and parser objects are now available as an attribute
9302632Sstever@eecs.umich.edu          of tokens and slices respectively. For example:
9312632Sstever@eecs.umich.edu 
9322632Sstever@eecs.umich.edu             def t_NUMBER(t):
9332632Sstever@eecs.umich.edu                 r'\d+'
9342632Sstever@eecs.umich.edu                 print t.lexer
9352632Sstever@eecs.umich.edu
9362632Sstever@eecs.umich.edu             def p_expr_plus(t):
9372632Sstever@eecs.umich.edu                 'expr: expr PLUS expr'
9382632Sstever@eecs.umich.edu                 print t.lexer
9392632Sstever@eecs.umich.edu                 print t.parser
9402632Sstever@eecs.umich.edu
9412632Sstever@eecs.umich.edu          This can be used for state management (if needed).
9422632Sstever@eecs.umich.edu 
9432632Sstever@eecs.umich.edu10/31/02: beazley
9442632Sstever@eecs.umich.edu          Modified yacc.py to work with Python optimize mode.  To make
9452632Sstever@eecs.umich.edu          this work, you need to use
9462632Sstever@eecs.umich.edu
9472632Sstever@eecs.umich.edu              yacc.yacc(optimize=1)
9482632Sstever@eecs.umich.edu
9492632Sstever@eecs.umich.edu          Furthermore, you need to first run Python in normal mode
9502632Sstever@eecs.umich.edu          to generate the necessary parsetab.py files.  After that,
9512632Sstever@eecs.umich.edu          you can use python -O or python -OO.  
9522632Sstever@eecs.umich.edu
9532632Sstever@eecs.umich.edu          Note: optimized mode turns off a lot of error checking.
9542632Sstever@eecs.umich.edu          Only use when you are sure that your grammar is working.
9552632Sstever@eecs.umich.edu          Make sure parsetab.py is up to date!
9562632Sstever@eecs.umich.edu
9572632Sstever@eecs.umich.edu10/30/02: beazley
9582632Sstever@eecs.umich.edu          Added cloning of Lexer objects.   For example:
9592632Sstever@eecs.umich.edu
9602632Sstever@eecs.umich.edu              import copy
9612632Sstever@eecs.umich.edu              l = lex.lex()
9622632Sstever@eecs.umich.edu              lc = copy.copy(l)
9632632Sstever@eecs.umich.edu
9642632Sstever@eecs.umich.edu              l.input("Some text")
9652632Sstever@eecs.umich.edu              lc.input("Some other text")
9662632Sstever@eecs.umich.edu              ...
9672632Sstever@eecs.umich.edu
9682632Sstever@eecs.umich.edu          This might be useful if the same "lexer" is meant to
9692632Sstever@eecs.umich.edu          be used in different contexts---or if multiple lexers
9702632Sstever@eecs.umich.edu          are running concurrently.
9712632Sstever@eecs.umich.edu                
9722632Sstever@eecs.umich.edu10/30/02: beazley
9732632Sstever@eecs.umich.edu          Fixed subtle bug with first set computation and empty productions.
9742632Sstever@eecs.umich.edu          Patch submitted by Michael Dyck.
9752632Sstever@eecs.umich.edu
9762632Sstever@eecs.umich.edu10/30/02: beazley
9772632Sstever@eecs.umich.edu          Fixed error messages to use "filename:line: message" instead
9782632Sstever@eecs.umich.edu          of "filename:line. message".  This makes error reporting more
9792632Sstever@eecs.umich.edu          friendly to emacs. Patch submitted by Fran�ois Pinard.
9802632Sstever@eecs.umich.edu
9812632Sstever@eecs.umich.edu10/30/02: beazley
9822632Sstever@eecs.umich.edu          Improvements to parser.out file.  Terminals and nonterminals
9832632Sstever@eecs.umich.edu          are sorted instead of being printed in random order.
9842632Sstever@eecs.umich.edu          Patch submitted by Fran�ois Pinard.
9852632Sstever@eecs.umich.edu
9862632Sstever@eecs.umich.edu10/30/02: beazley
9872632Sstever@eecs.umich.edu          Improvements to parser.out file output.  Rules are now printed
9882632Sstever@eecs.umich.edu          in a way that's easier to understand.  Contributed by Russ Cox.
9892632Sstever@eecs.umich.edu
9902632Sstever@eecs.umich.edu10/30/02: beazley
9912632Sstever@eecs.umich.edu          Added 'nonassoc' associativity support.    This can be used
9922632Sstever@eecs.umich.edu          to disable the chaining of operators like a < b < c.
9932632Sstever@eecs.umich.edu          To use, simply specify 'nonassoc' in the precedence table
9942632Sstever@eecs.umich.edu
9952632Sstever@eecs.umich.edu          precedence = (
9962632Sstever@eecs.umich.edu            ('nonassoc', 'LESSTHAN', 'GREATERTHAN'),  # Nonassociative operators
9972632Sstever@eecs.umich.edu            ('left', 'PLUS', 'MINUS'),
9982632Sstever@eecs.umich.edu            ('left', 'TIMES', 'DIVIDE'),
9992632Sstever@eecs.umich.edu            ('right', 'UMINUS'),            # Unary minus operator
10002632Sstever@eecs.umich.edu          )
10012632Sstever@eecs.umich.edu
10022632Sstever@eecs.umich.edu          Patch contributed by Russ Cox.
10032632Sstever@eecs.umich.edu
10042632Sstever@eecs.umich.edu10/30/02: beazley
10052632Sstever@eecs.umich.edu          Modified the lexer to provide optional support for Python -O and -OO
10062632Sstever@eecs.umich.edu          modes.  To make this work, Python *first* needs to be run in
10072632Sstever@eecs.umich.edu          unoptimized mode.  This reads the lexing information and creates a
10082632Sstever@eecs.umich.edu          file "lextab.py".  Then, run lex like this:
10092632Sstever@eecs.umich.edu
10102632Sstever@eecs.umich.edu                   # module foo.py
10112632Sstever@eecs.umich.edu                   ...
10122632Sstever@eecs.umich.edu                   ...
10132632Sstever@eecs.umich.edu                   lex.lex(optimize=1)
10142632Sstever@eecs.umich.edu
10152632Sstever@eecs.umich.edu          Once the lextab file has been created, subsequent calls to
10162632Sstever@eecs.umich.edu          lex.lex() will read data from the lextab file instead of using 
10172632Sstever@eecs.umich.edu          introspection.   In optimized mode (-O, -OO) everything should
10182632Sstever@eecs.umich.edu          work normally despite the loss of doc strings.
10192632Sstever@eecs.umich.edu
10202632Sstever@eecs.umich.edu          To change the name of the file 'lextab.py' use the following:
10212632Sstever@eecs.umich.edu
10222632Sstever@eecs.umich.edu                  lex.lex(lextab="footab")
10232632Sstever@eecs.umich.edu
10242632Sstever@eecs.umich.edu          (this creates a file footab.py)
10252632Sstever@eecs.umich.edu         
10262632Sstever@eecs.umich.edu
10272632Sstever@eecs.umich.eduVersion 1.1   October 25, 2001
10282632Sstever@eecs.umich.edu------------------------------
10292632Sstever@eecs.umich.edu
10302632Sstever@eecs.umich.edu10/25/01: beazley
10312632Sstever@eecs.umich.edu          Modified the table generator to produce much more compact data.
10322632Sstever@eecs.umich.edu          This should greatly reduce the size of the parsetab.py[c] file.
10332632Sstever@eecs.umich.edu          Caveat: the tables still need to be constructed so a little more
10342632Sstever@eecs.umich.edu          work is done in parsetab on import. 
10352632Sstever@eecs.umich.edu
10362632Sstever@eecs.umich.edu10/25/01: beazley
10372632Sstever@eecs.umich.edu          There may be a possible bug in the cycle detector that reports errors
10382632Sstever@eecs.umich.edu          about infinite recursion.   I'm having a little trouble tracking it
10392632Sstever@eecs.umich.edu          down, but if you get this problem, you can disable the cycle
10402632Sstever@eecs.umich.edu          detector as follows:
10412632Sstever@eecs.umich.edu
10422632Sstever@eecs.umich.edu                 yacc.yacc(check_recursion = 0)
10432632Sstever@eecs.umich.edu
10442632Sstever@eecs.umich.edu10/25/01: beazley
10452632Sstever@eecs.umich.edu          Fixed a bug in lex.py that sometimes caused illegal characters to be
10462632Sstever@eecs.umich.edu          reported incorrectly.  Reported by Sverre J�rgensen.
10472632Sstever@eecs.umich.edu
10482632Sstever@eecs.umich.edu7/8/01  : beazley
10492632Sstever@eecs.umich.edu          Added a reference to the underlying lexer object when tokens are handled by
10502632Sstever@eecs.umich.edu          functions.   The lexer is available as the 'lexer' attribute.   This
10512632Sstever@eecs.umich.edu          was added to provide better lexing support for languages such as Fortran
10522632Sstever@eecs.umich.edu          where certain types of tokens can't be conveniently expressed as regular 
10532632Sstever@eecs.umich.edu          expressions (and where the tokenizing function may want to perform a 
10542632Sstever@eecs.umich.edu          little backtracking).  Suggested by Pearu Peterson.
10552632Sstever@eecs.umich.edu
10562632Sstever@eecs.umich.edu6/20/01 : beazley
10572632Sstever@eecs.umich.edu          Modified yacc() function so that an optional starting symbol can be specified.
10582632Sstever@eecs.umich.edu          For example:
10592632Sstever@eecs.umich.edu            
10602632Sstever@eecs.umich.edu                 yacc.yacc(start="statement")
10612632Sstever@eecs.umich.edu
10622632Sstever@eecs.umich.edu          Normally yacc always treats the first production rule as the starting symbol.
10632632Sstever@eecs.umich.edu          However, if you are debugging your grammar it may be useful to specify
10642632Sstever@eecs.umich.edu          an alternative starting symbol.  Idea suggested by Rich Salz.
10652632Sstever@eecs.umich.edu                      
10662632Sstever@eecs.umich.eduVersion 1.0  June 18, 2001
10672632Sstever@eecs.umich.edu--------------------------
10682632Sstever@eecs.umich.eduInitial public offering
10692632Sstever@eecs.umich.edu
1070