internal.html revision 6498:e21e9ab5fad0
1<html> 2<head> 3<title>PLY Internals</title> 4</head> 5<body bgcolor="#ffffff"> 6 7<h1>PLY Internals</h1> 8 9<b> 10David M. Beazley <br> 11dave@dabeaz.com<br> 12</b> 13 14<p> 15<b>PLY Version: 3.0</b> 16<p> 17 18<!-- INDEX --> 19<div class="sectiontoc"> 20<ul> 21<li><a href="#internal_nn1">Introduction</a> 22<li><a href="#internal_nn2">Grammar Class</a> 23<li><a href="#internal_nn3">Productions</a> 24<li><a href="#internal_nn4">LRItems</a> 25<li><a href="#internal_nn5">LRTable</a> 26<li><a href="#internal_nn6">LRGeneratedTable</a> 27<li><a href="#internal_nn7">LRParser</a> 28<li><a href="#internal_nn8">ParserReflect</a> 29<li><a href="#internal_nn9">High-level operation</a> 30</ul> 31</div> 32<!-- INDEX --> 33 34 35<H2><a name="internal_nn1"></a>1. Introduction</H2> 36 37 38This document describes classes and functions that make up the internal 39operation of PLY. Using this programming interface, it is possible to 40manually build an parser using a different interface specification 41than what PLY normally uses. For example, you could build a gramar 42from information parsed in a completely different input format. Some of 43these objects may be useful for building more advanced parsing engines 44such as GLR. 45 46<p> 47It should be stressed that using PLY at this level is not for the 48faint of heart. Generally, it's assumed that you know a bit of 49the underlying compiler theory and how an LR parser is put together. 50 51<H2><a name="internal_nn2"></a>2. Grammar Class</H2> 52 53 54The file <tt>ply.yacc</tt> defines a class <tt>Grammar</tt> that 55is used to hold and manipulate information about a grammar 56specification. It encapsulates the same basic information 57about a grammar that is put into a YACC file including 58the list of tokens, precedence rules, and grammar rules. 59Various operations are provided to perform different validations 60on the grammar. In addition, there are operations to compute 61the first and follow sets that are needed by the various table 62generation algorithms. 63 64<p> 65<tt><b>Grammar(terminals)</b></tt> 66 67<blockquote> 68Creates a new grammar object. <tt>terminals</tt> is a list of strings 69specifying the terminals for the grammar. An instance <tt>g</tt> of 70<tt>Grammar</tt> has the following methods: 71</blockquote> 72 73<p> 74<b><tt>g.set_precedence(term,assoc,level)</tt></b> 75<blockquote> 76Sets the precedence level and associativity for a given terminal <tt>term</tt>. 77<tt>assoc</tt> is one of <tt>'right'</tt>, 78<tt>'left'</tt>, or <tt>'nonassoc'</tt> and <tt>level</tt> is a positive integer. The higher 79the value of <tt>level</tt>, the higher the precedence. Here is an example of typical 80precedence settings: 81 82<pre> 83g.set_precedence('PLUS', 'left',1) 84g.set_precedence('MINUS', 'left',1) 85g.set_precedence('TIMES', 'left',2) 86g.set_precedence('DIVIDE','left',2) 87g.set_precedence('UMINUS','left',3) 88</pre> 89 90This method must be called prior to adding any productions to the 91grammar with <tt>g.add_production()</tt>. The precedence of individual grammar 92rules is determined by the precedence of the right-most terminal. 93 94</blockquote> 95<p> 96<b><tt>g.add_production(name,syms,func=None,file='',line=0)</tt></b> 97<blockquote> 98Adds a new grammar rule. <tt>name</tt> is the name of the rule, 99<tt>syms</tt> is a list of symbols making up the right hand 100side of the rule, <tt>func</tt> is the function to call when 101reducing the rule. <tt>file</tt> and <tt>line</tt> specify 102the filename and line number of the rule and are used for 103generating error messages. 104 105<p> 106The list of symbols in <tt>syms</tt> may include character 107literals and <tt>%prec</tt> specifiers. Here are some 108examples: 109 110<pre> 111g.add_production('expr',['expr','PLUS','term'],func,file,line) 112g.add_production('expr',['expr','"+"','term'],func,file,line) 113g.add_production('expr',['MINUS','expr','%prec','UMINUS'],func,file,line) 114</pre> 115 116<p> 117If any kind of error is detected, a <tt>GrammarError</tt> exception 118is raised with a message indicating the reason for the failure. 119</blockquote> 120 121<p> 122<b><tt>g.set_start(start=None)</tt></b> 123<blockquote> 124Sets the starting rule for the grammar. <tt>start</tt> is a string 125specifying the name of the start rule. If <tt>start</tt> is omitted, 126the first grammar rule added with <tt>add_production()</tt> is taken to be 127the starting rule. This method must always be called after all 128productions have been added. 129</blockquote> 130 131<p> 132<b><tt>g.find_unreachable()</tt></b> 133<blockquote> 134Diagnostic function. Returns a list of all unreachable non-terminals 135defined in the grammar. This is used to identify inactive parts of 136the grammar specification. 137</blockquote> 138 139<p> 140<b><tt>g.infinite_cycle()</tt></b> 141<blockquote> 142Diagnostic function. Returns a list of all non-terminals in the 143grammar that result in an infinite cycle. This condition occurs if 144there is no way for a grammar rule to expand to a string containing 145only terminal symbols. 146</blockquote> 147 148<p> 149<b><tt>g.undefined_symbols()</tt></b> 150<blockquote> 151Diagnostic function. Returns a list of tuples <tt>(name, prod)</tt> 152corresponding to undefined symbols in the grammar. <tt>name</tt> is the 153name of the undefined symbol and <tt>prod</tt> is an instance of 154<tt>Production</tt> which has information about the production rule 155where the undefined symbol was used. 156</blockquote> 157 158<p> 159<b><tt>g.unused_terminals()</tt></b> 160<blockquote> 161Diagnostic function. Returns a list of terminals that were defined, 162but never used in the grammar. 163</blockquote> 164 165<p> 166<b><tt>g.unused_rules()</tt></b> 167<blockquote> 168Diagnostic function. Returns a list of <tt>Production</tt> instances 169corresponding to production rules that were defined in the grammar, 170but never used anywhere. This is slightly different 171than <tt>find_unreachable()</tt>. 172</blockquote> 173 174<p> 175<b><tt>g.unused_precedence()</tt></b> 176<blockquote> 177Diagnostic function. Returns a list of tuples <tt>(term, assoc)</tt> 178corresponding to precedence rules that were set, but never used the 179grammar. <tt>term</tt> is the terminal name and <tt>assoc</tt> is the 180precedence associativity (e.g., <tt>'left'</tt>, <tt>'right'</tt>, 181or <tt>'nonassoc'</tt>. 182</blockquote> 183 184<p> 185<b><tt>g.compute_first()</tt></b> 186<blockquote> 187Compute all of the first sets for all symbols in the grammar. Returns a dictionary 188mapping symbol names to a list of all first symbols. 189</blockquote> 190 191<p> 192<b><tt>g.compute_follow()</tt></b> 193<blockquote> 194Compute all of the follow sets for all non-terminals in the grammar. 195The follow set is the set of all possible symbols that might follow a 196given non-terminal. Returns a dictionary mapping non-terminal names 197to a list of symbols. 198</blockquote> 199 200<p> 201<b><tt>g.build_lritems()</tt></b> 202<blockquote> 203Calculates all of the LR items for all productions in the grammar. This 204step is required before using the grammar for any kind of table generation. 205See the section on LR items below. 206</blockquote> 207 208<p> 209The following attributes are set by the above methods and may be useful 210in code that works with the grammar. All of these attributes should be 211assumed to be read-only. Changing their values directly will likely 212break the grammar. 213 214<p> 215<b><tt>g.Productions</tt></b> 216<blockquote> 217A list of all productions added. The first entry is reserved for 218a production representing the starting rule. The objects in this list 219are instances of the <tt>Production</tt> class, described shortly. 220</blockquote> 221 222<p> 223<b><tt>g.Prodnames</tt></b> 224<blockquote> 225A dictionary mapping the names of nonterminals to a list of all 226productions of that nonterminal. 227</blockquote> 228 229<p> 230<b><tt>g.Terminals</tt></b> 231<blockquote> 232A dictionary mapping the names of terminals to a list of the 233production numbers where they are used. 234</blockquote> 235 236<p> 237<b><tt>g.Nonterminals</tt></b> 238<blockquote> 239A dictionary mapping the names of nonterminals to a list of the 240production numbers where they are used. 241</blockquote> 242 243<p> 244<b><tt>g.First</tt></b> 245<blockquote> 246A dictionary representing the first sets for all grammar symbols. This is 247computed and returned by the <tt>compute_first()</tt> method. 248</blockquote> 249 250<p> 251<b><tt>g.Follow</tt></b> 252<blockquote> 253A dictionary representing the follow sets for all grammar rules. This is 254computed and returned by the <tt>compute_follow()</tt> method. 255</blockquote> 256 257<p> 258<b><tt>g.Start</tt></b> 259<blockquote> 260Starting symbol for the grammar. Set by the <tt>set_start()</tt> method. 261</blockquote> 262 263For the purposes of debugging, a <tt>Grammar</tt> object supports the <tt>__len__()</tt> and 264<tt>__getitem__()</tt> special methods. Accessing <tt>g[n]</tt> returns the nth production 265from the grammar. 266 267 268<H2><a name="internal_nn3"></a>3. Productions</H2> 269 270 271<tt>Grammar</tt> objects store grammar rules as instances of a <tt>Production</tt> class. This 272class has no public constructor--you should only create productions by calling <tt>Grammar.add_production()</tt>. 273The following attributes are available on a <tt>Production</tt> instance <tt>p</tt>. 274 275<p> 276<b><tt>p.name</tt></b> 277<blockquote> 278The name of the production. For a grammar rule such as <tt>A : B C D</tt>, this is <tt>'A'</tt>. 279</blockquote> 280 281<p> 282<b><tt>p.prod</tt></b> 283<blockquote> 284A tuple of symbols making up the right-hand side of the production. For a grammar rule such as <tt>A : B C D</tt>, this is <tt>('B','C','D')</tt>. 285</blockquote> 286 287<p> 288<b><tt>p.number</tt></b> 289<blockquote> 290Production number. An integer containing the index of the production in the grammar's <tt>Productions</tt> list. 291</blockquote> 292 293<p> 294<b><tt>p.func</tt></b> 295<blockquote> 296The name of the reduction function associated with the production. 297This is the function that will execute when reducing the entire 298grammar rule during parsing. 299</blockquote> 300 301<p> 302<b><tt>p.callable</tt></b> 303<blockquote> 304The callable object associated with the name in <tt>p.func</tt>. This is <tt>None</tt> 305unless the production has been bound using <tt>bind()</tt>. 306</blockquote> 307 308<p> 309<b><tt>p.file</tt></b> 310<blockquote> 311Filename associated with the production. Typically this is the file where the production was defined. Used for error messages. 312</blockquote> 313 314<p> 315<b><tt>p.lineno</tt></b> 316<blockquote> 317Line number associated with the production. Typically this is the line number in <tt>p.file</tt> where the production was defined. Used for error messages. 318</blockquote> 319 320<p> 321<b><tt>p.prec</tt></b> 322<blockquote> 323Precedence and associativity associated with the production. This is a tuple <tt>(assoc,level)</tt> where 324<tt>assoc</tt> is one of <tt>'left'</tt>,<tt>'right'</tt>, or <tt>'nonassoc'</tt> and <tt>level</tt> is 325an integer. This value is determined by the precedence of the right-most terminal symbol in the production 326or by use of the <tt>%prec</tt> specifier when adding the production. 327</blockquote> 328 329<p> 330<b><tt>p.usyms</tt></b> 331<blockquote> 332A list of all unique symbols found in the production. 333</blockquote> 334 335<p> 336<b><tt>p.lr_items</tt></b> 337<blockquote> 338A list of all LR items for this production. This attribute only has a meaningful value if the 339<tt>Grammar.build_lritems()</tt> method has been called. The items in this list are 340instances of <tt>LRItem</tt> described below. 341</blockquote> 342 343<p> 344<b><tt>p.lr_next</tt></b> 345<blockquote> 346The head of a linked-list representation of the LR items in <tt>p.lr_items</tt>. 347This attribute only has a meaningful value if the <tt>Grammar.build_lritems()</tt> 348method has been called. Each <tt>LRItem</tt> instance has a <tt>lr_next</tt> attribute 349to move to the next item. The list is terminated by <tt>None</tt>. 350</blockquote> 351 352<p> 353<b><tt>p.bind(dict)</tt></b> 354<blockquote> 355Binds the production function name in <tt>p.func</tt> to a callable object in 356<tt>dict</tt>. This operation is typically carried out in the last step 357prior to running the parsing engine and is needed since parsing tables are typically 358read from files which only include the function names, not the functions themselves. 359</blockquote> 360 361<P> 362<tt>Production</tt> objects support 363the <tt>__len__()</tt>, <tt>__getitem__()</tt>, and <tt>__str__()</tt> 364special methods. 365<tt>len(p)</tt> returns the number of symbols in <tt>p.prod</tt> 366and <tt>p[n]</tt> is the same as <tt>p.prod[n]</tt>. 367 368<H2><a name="internal_nn4"></a>4. LRItems</H2> 369 370 371The construction of parsing tables in an LR-based parser generator is primarily 372done over a set of "LR Items". An LR item represents a stage of parsing one 373of the grammar rules. To compute the LR items, it is first necessary to 374call <tt>Grammar.build_lritems()</tt>. Once this step, all of the productions 375in the grammar will have their LR items attached to them. 376 377<p> 378Here is an interactive example that shows what LR items look like if you 379interactively experiment. In this example, <tt>g</tt> is a <tt>Grammar</tt> 380object. 381 382<blockquote> 383<pre> 384>>> <b>g.build_lritems()</b> 385>>> <b>p = g[1]</b> 386>>> <b>p</b> 387Production(statement -> ID = expr) 388>>> 389</pre> 390</blockquote> 391 392In the above code, <tt>p</tt> represents the first grammar rule. In 393this case, a rule <tt>'statement -> ID = expr'</tt>. 394 395<p> 396Now, let's look at the LR items for <tt>p</tt>. 397 398<blockquote> 399<pre> 400>>> <b>p.lr_items</b> 401[LRItem(statement -> . ID = expr), 402 LRItem(statement -> ID . = expr), 403 LRItem(statement -> ID = . expr), 404 LRItem(statement -> ID = expr .)] 405>>> 406</pre> 407</blockquote> 408 409In each LR item, the dot (.) represents a specific stage of parsing. In each LR item, the dot 410is advanced by one symbol. It is only when the dot reaches the very end that a production 411is successfully parsed. 412 413<p> 414An instance <tt>lr</tt> of <tt>LRItem</tt> has the following 415attributes that hold information related to that specific stage of 416parsing. 417 418<p> 419<b><tt>lr.name</tt></b> 420<blockquote> 421The name of the grammar rule. For example, <tt>'statement'</tt> in the above example. 422</blockquote> 423 424<p> 425<b><tt>lr.prod</tt></b> 426<blockquote> 427A tuple of symbols representing the right-hand side of the production, including the 428special <tt>'.'</tt> character. For example, <tt>('ID','.','=','expr')</tt>. 429</blockquote> 430 431<p> 432<b><tt>lr.number</tt></b> 433<blockquote> 434An integer representing the production number in the grammar. 435</blockquote> 436 437<p> 438<b><tt>lr.usyms</tt></b> 439<blockquote> 440A set of unique symbols in the production. Inherited from the original <tt>Production</tt> instance. 441</blockquote> 442 443<p> 444<b><tt>lr.lr_index</tt></b> 445<blockquote> 446An integer representing the position of the dot (.). You should never use <tt>lr.prod.index()</tt> 447to search for it--the result will be wrong if the grammar happens to also use (.) as a character 448literal. 449</blockquote> 450 451<p> 452<b><tt>lr.lr_after</tt></b> 453<blockquote> 454A list of all productions that can legally appear immediately to the right of the 455dot (.). This list contains <tt>Production</tt> instances. This attribute 456represents all of the possible branches a parse can take from the current position. 457For example, suppose that <tt>lr</tt> represents a stage immediately before 458an expression like this: 459 460<pre> 461>>> <b>lr</b> 462LRItem(statement -> ID = . expr) 463>>> 464</pre> 465 466Then, the value of <tt>lr.lr_after</tt> might look like this, showing all productions that 467can legally appear next: 468 469<pre> 470>>> <b>lr.lr_after</b> 471[Production(expr -> expr PLUS expr), 472 Production(expr -> expr MINUS expr), 473 Production(expr -> expr TIMES expr), 474 Production(expr -> expr DIVIDE expr), 475 Production(expr -> MINUS expr), 476 Production(expr -> LPAREN expr RPAREN), 477 Production(expr -> NUMBER), 478 Production(expr -> ID)] 479>>> 480</pre> 481 482</blockquote> 483 484<p> 485<b><tt>lr.lr_before</tt></b> 486<blockquote> 487The grammar symbol that appears immediately before the dot (.) or <tt>None</tt> if 488at the beginning of the parse. 489</blockquote> 490 491<p> 492<b><tt>lr.lr_next</tt></b> 493<blockquote> 494A link to the next LR item, representing the next stage of the parse. <tt>None</tt> if <tt>lr</tt> 495is the last LR item. 496</blockquote> 497 498<tt>LRItem</tt> instances also support the <tt>__len__()</tt> and <tt>__getitem__()</tt> special methods. 499<tt>len(lr)</tt> returns the number of items in <tt>lr.prod</tt> including the dot (.). <tt>lr[n]</tt> 500returns <tt>lr.prod[n]</tt>. 501 502<p> 503It goes without saying that all of the attributes associated with LR 504items should be assumed to be read-only. Modifications will very 505likely create a small black-hole that will consume you and your code. 506 507<H2><a name="internal_nn5"></a>5. LRTable</H2> 508 509 510The <tt>LRTable</tt> class is used to represent LR parsing table data. This 511minimally includes the production list, action table, and goto table. 512 513<p> 514<b><tt>LRTable()</tt></b> 515<blockquote> 516Create an empty LRTable object. This object contains only the information needed to 517run an LR parser. 518</blockquote> 519 520An instance <tt>lrtab</tt> of <tt>LRTable</tt> has the following methods: 521 522<p> 523<b><tt>lrtab.read_table(module)</tt></b> 524<blockquote> 525Populates the LR table with information from the module specified in <tt>module</tt>. 526<tt>module</tt> is either a module object already loaded with <tt>import</tt> or 527the name of a Python module. If it's a string containing a module name, it is 528loaded and parsing data is extracted. Returns the signature value that was used 529when initially writing the tables. Raises a <tt>VersionError</tt> exception if 530the module was created using an incompatible version of PLY. 531</blockquote> 532 533<p> 534<b><tt>lrtab.bind_callables(dict)</tt></b> 535<blockquote> 536This binds all of the function names used in productions to callable objects 537found in the dictionary <tt>dict</tt>. During table generation and when reading 538LR tables from files, PLY only uses the names of action functions such as <tt>'p_expr'</tt>, 539<tt>'p_statement'</tt>, etc. In order to actually run the parser, these names 540have to be bound to callable objects. This method is always called prior to 541running a parser. 542</blockquote> 543 544After <tt>lrtab</tt> has been populated, the following attributes are defined. 545 546<p> 547<b><tt>lrtab.lr_method</tt></b> 548<blockquote> 549The LR parsing method used (e.g., <tt>'LALR'</tt>) 550</blockquote> 551 552 553<p> 554<b><tt>lrtab.lr_productions</tt></b> 555<blockquote> 556The production list. If the parsing tables have been newly 557constructed, this will be a list of <tt>Production</tt> instances. If 558the parsing tables have been read from a file, it's a list 559of <tt>MiniProduction</tt> instances. This, together 560with <tt>lr_action</tt> and <tt>lr_goto</tt> contain all of the 561information needed by the LR parsing engine. 562</blockquote> 563 564<p> 565<b><tt>lrtab.lr_action</tt></b> 566<blockquote> 567The LR action dictionary that implements the underlying state machine. 568The keys of this dictionary are the LR states. 569</blockquote> 570 571<p> 572<b><tt>lrtab.lr_goto</tt></b> 573<blockquote> 574The LR goto table that contains information about grammar rule reductions. 575</blockquote> 576 577 578<H2><a name="internal_nn6"></a>6. LRGeneratedTable</H2> 579 580 581The <tt>LRGeneratedTable</tt> class represents constructed LR parsing tables on a 582grammar. It is a subclass of <tt>LRTable</tt>. 583 584<p> 585<b><tt>LRGeneratedTable(grammar, method='LALR',log=None)</tt></b> 586<blockquote> 587Create the LR parsing tables on a grammar. <tt>grammar</tt> is an instance of <tt>Grammar</tt>, 588<tt>method</tt> is a string with the parsing method (<tt>'SLR'</tt> or <tt>'LALR'</tt>), and 589<tt>log</tt> is a logger object used to write debugging information. The debugging information 590written to <tt>log</tt> is the same as what appears in the <tt>parser.out</tt> file created 591by yacc. By supplying a custom logger with a different message format, it is possible to get 592more information (e.g., the line number in <tt>yacc.py</tt> used for issuing each line of 593output in the log). The result is an instance of <tt>LRGeneratedTable</tt>. 594</blockquote> 595 596<p> 597An instance <tt>lr</tt> of <tt>LRGeneratedTable</tt> has the following attributes. 598 599<p> 600<b><tt>lr.grammar</tt></b> 601<blockquote> 602A link to the Grammar object used to construct the parsing tables. 603</blockquote> 604 605<p> 606<b><tt>lr.lr_method</tt></b> 607<blockquote> 608The LR parsing method used (e.g., <tt>'LALR'</tt>) 609</blockquote> 610 611 612<p> 613<b><tt>lr.lr_productions</tt></b> 614<blockquote> 615A reference to <tt>grammar.Productions</tt>. This, together with <tt>lr_action</tt> and <tt>lr_goto</tt> 616contain all of the information needed by the LR parsing engine. 617</blockquote> 618 619<p> 620<b><tt>lr.lr_action</tt></b> 621<blockquote> 622The LR action dictionary that implements the underlying state machine. The keys of this dictionary are 623the LR states. 624</blockquote> 625 626<p> 627<b><tt>lr.lr_goto</tt></b> 628<blockquote> 629The LR goto table that contains information about grammar rule reductions. 630</blockquote> 631 632<p> 633<b><tt>lr.sr_conflicts</tt></b> 634<blockquote> 635A list of tuples <tt>(state,token,resolution)</tt> identifying all shift/reduce conflicts. <tt>state</tt> is the LR state 636number where the conflict occurred, <tt>token</tt> is the token causing the conflict, and <tt>resolution</tt> is 637a string describing the resolution taken. <tt>resolution</tt> is either <tt>'shift'</tt> or <tt>'reduce'</tt>. 638</blockquote> 639 640<p> 641<b><tt>lr.rr_conflicts</tt></b> 642<blockquote> 643A list of tuples <tt>(state,rule,rejected)</tt> identifying all reduce/reduce conflicts. <tt>state</tt> is the 644LR state number where the conflict occurred, <tt>rule</tt> is the production rule that was selected 645and <tt>rejected</tt> is the production rule that was rejected. Both <tt>rule</tt> and </tt>rejected</tt> are 646instances of <tt>Production</tt>. They can be inspected to provide the user with more information. 647</blockquote> 648 649<p> 650There are two public methods of <tt>LRGeneratedTable</tt>. 651 652<p> 653<b><tt>lr.write_table(modulename,outputdir="",signature="")</tt></b> 654<blockquote> 655Writes the LR parsing table information to a Python module. <tt>modulename</tt> is a string 656specifying the name of a module such as <tt>"parsetab"</tt>. <tt>outputdir</tt> is the name of a 657directory where the module should be created. <tt>signature</tt> is a string representing a 658grammar signature that's written into the output file. This can be used to detect when 659the data stored in a module file is out-of-sync with the the grammar specification (and that 660the tables need to be regenerated). If <tt>modulename</tt> is a string <tt>"parsetab"</tt>, 661this function creates a file called <tt>parsetab.py</tt>. If the module name represents a 662package such as <tt>"foo.bar.parsetab"</tt>, then only the last component, <tt>"parsetab"</tt> is 663used. 664</blockquote> 665 666 667<H2><a name="internal_nn7"></a>7. LRParser</H2> 668 669 670The <tt>LRParser</tt> class implements the low-level LR parsing engine. 671 672 673<p> 674<b><tt>LRParser(lrtab, error_func)</tt></b> 675<blockquote> 676Create an LRParser. <tt>lrtab</tt> is an instance of <tt>LRTable</tt> 677containing the LR production and state tables. <tt>error_func</tt> is the 678error function to invoke in the event of a parsing error. 679</blockquote> 680 681An instance <tt>p</tt> of <tt>LRParser</tt> has the following methods: 682 683<p> 684<b><tt>p.parse(input=None,lexer=None,debug=0,tracking=0,tokenfunc=None)</tt></b> 685<blockquote> 686Run the parser. <tt>input</tt> is a string, which if supplied is fed into the 687lexer using its <tt>input()</tt> method. <tt>lexer</tt> is an instance of the 688<tt>Lexer</tt> class to use for tokenizing. If not supplied, the last lexer 689created with the <tt>lex</tt> module is used. <tt>debug</tt> is a boolean flag 690that enables debugging. <tt>tracking</tt> is a boolean flag that tells the 691parser to perform additional line number tracking. <tt>tokenfunc</tt> is a callable 692function that returns the next token. If supplied, the parser will use it to get 693all tokens. 694</blockquote> 695 696<p> 697<b><tt>p.restart()</tt></b> 698<blockquote> 699Resets the parser state for a parse already in progress. 700</blockquote> 701 702<H2><a name="internal_nn8"></a>8. ParserReflect</H2> 703 704 705<p> 706The <tt>ParserReflect</tt> class is used to collect parser specification data 707from a Python module or object. This class is what collects all of the 708<tt>p_rule()</tt> functions in a PLY file, performs basic error checking, 709and collects all of the needed information to build a grammar. Most of the 710high-level PLY interface as used by the <tt>yacc()</tt> function is actually 711implemented by this class. 712 713<p> 714<b><tt>ParserReflect(pdict, log=None)</tt></b> 715<blockquote> 716Creates a <tt>ParserReflect</tt> instance. <tt>pdict</tt> is a dictionary 717containing parser specification data. This dictionary typically corresponds 718to the module or class dictionary of code that implements a PLY parser. 719<tt>log</tt> is a logger instance that will be used to report error 720messages. 721</blockquote> 722 723An instance <tt>p</tt> of <tt>ParserReflect</tt> has the following methods: 724 725<p> 726<b><tt>p.get_all()</tt></b> 727<blockquote> 728Collect and store all required parsing information. 729</blockquote> 730 731<p> 732<b><tt>p.validate_all()</tt></b> 733<blockquote> 734Validate all of the collected parsing information. This is a seprate step 735from <tt>p.get_all()</tt> as a performance optimization. In order to 736increase parser start-up time, a parser can elect to only validate the 737parsing data when regenerating the parsing tables. The validation 738step tries to collect as much information as possible rather than 739raising an exception at the first sign of trouble. The attribute 740<tt>p.error</tt> is set if there are any validation errors. The 741value of this attribute is also returned. 742</blockquote> 743 744<p> 745<b><tt>p.signature()</tt></b> 746<blockquote> 747Compute a signature representing the contents of the collected parsing 748data. The signature value should change if anything in the parser 749specification has changed in a way that would justify parser table 750regeneration. This method can be called after <tt>p.get_all()</tt>, 751but before <tt>p.validate_all()</tt>. 752</blockquote> 753 754The following attributes are set in the process of collecting data: 755 756<p> 757<b><tt>p.start</tt></b> 758<blockquote> 759The grammar start symbol, if any. Taken from <tt>pdict['start']</tt>. 760</blockquote> 761 762<p> 763<b><tt>p.error_func</tt></b> 764<blockquote> 765The error handling function or <tt>None</tt>. Taken from <tt>pdict['p_error']</tt>. 766</blockquote> 767 768<p> 769<b><tt>p.tokens</tt></b> 770<blockquote> 771The token list. Taken from <tt>pdict['tokens']</tt>. 772</blockquote> 773 774<p> 775<b><tt>p.prec</tt></b> 776<blockquote> 777The precedence specifier. Taken from <tt>pdict['precedence']</tt>. 778</blockquote> 779 780<p> 781<b><tt>p.preclist</tt></b> 782<blockquote> 783A parsed version of the precedence specified. A list of tuples of the form 784<tt>(token,assoc,level)</tt> where <tt>token</tt> is the terminal symbol, 785<tt>assoc</tt> is the associativity (e.g., <tt>'left'</tt>) and <tt>level</tt> 786is a numeric precedence level. 787</blockquote> 788 789<p> 790<b><tt>p.grammar</tt></b> 791<blockquote> 792A list of tuples <tt>(name, rules)</tt> representing the grammar rules. <tt>name</tt> is the 793name of a Python function or method in <tt>pdict</tt> that starts with <tt>"p_"</tt>. 794<tt>rules</tt> is a list of tuples <tt>(filename,line,prodname,syms)</tt> representing 795the grammar rules found in the documentation string of that function. <tt>filename</tt> and <tt>line</tt> contain location 796information that can be used for debugging. <tt>prodname</tt> is the name of the 797production. <tt>syms</tt> is the right-hand side of the production. If you have a 798function like this 799 800<pre> 801def p_expr(p): 802 '''expr : expr PLUS expr 803 | expr MINUS expr 804 | expr TIMES expr 805 | expr DIVIDE expr''' 806</pre> 807 808then the corresponding entry in <tt>p.grammar</tt> might look like this: 809 810<pre> 811('p_expr', [ ('calc.py',10,'expr', ['expr','PLUS','expr']), 812 ('calc.py',11,'expr', ['expr','MINUS','expr']), 813 ('calc.py',12,'expr', ['expr','TIMES','expr']), 814 ('calc.py',13,'expr', ['expr','DIVIDE','expr']) 815 ]) 816</pre> 817</blockquote> 818 819<p> 820<b><tt>p.pfuncs</tt></b> 821<blockquote> 822A sorted list of tuples <tt>(line, file, name, doc)</tt> representing all of 823the <tt>p_</tt> functions found. <tt>line</tt> and <tt>file</tt> give location 824information. <tt>name</tt> is the name of the function. <tt>doc</tt> is the 825documentation string. This list is sorted in ascending order by line number. 826</blockquote> 827 828<p> 829<b><tt>p.files</tt></b> 830<blockquote> 831A dictionary holding all of the source filenames that were encountered 832while collecting parser information. Only the keys of this dictionary have 833any meaning. 834</blockquote> 835 836<p> 837<b><tt>p.error</tt></b> 838<blockquote> 839An attribute that indicates whether or not any critical errors 840occurred in validation. If this is set, it means that that some kind 841of problem was detected and that no further processing should be 842performed. 843</blockquote> 844 845 846<H2><a name="internal_nn9"></a>9. High-level operation</H2> 847 848 849Using all of the above classes requires some attention to detail. The <tt>yacc()</tt> 850function carries out a very specific sequence of operations to create a grammar. 851This same sequence should be emulated if you build an alternative PLY interface. 852 853<ol> 854<li>A <tt>ParserReflect</tt> object is created and raw grammar specification data is 855collected. 856<li>A <tt>Grammar</tt> object is created and populated with information 857from the specification data. 858<li>A <tt>LRGenerator</tt> object is created to run the LALR algorithm over 859the <tt>Grammar</tt> object. 860<li>Productions in the LRGenerator and bound to callables using the <tt>bind_callables()</tt> 861method. 862<li>A <tt>LRParser</tt> object is created from from the information in the 863<tt>LRGenerator</tt> object. 864</ol> 865 866</body> 867</html> 868 869 870 871 872 873 874 875