1# Copyright (c) 2014 ARM Limited 2# All rights reserved 3# 4# The license below extends only to copyright in the software and shall 5# not be construed as granting a license to any other intellectual 6# property including but not limited to intellectual property relating 7# to a hardware implementation of the functionality of the software 8# licensed hereunder. You may use the software subject to the license 9# terms below provided that you ensure that this notice is replicated 10# unmodified and in its entirety in all distributions of the software, 11# modified or unmodified, in source code or in binary form. 12# 13# Redistribution and use in source and binary forms, with or without 14# modification, are permitted provided that the following conditions are 15# met: redistributions of source code must retain the above copyright 16# notice, this list of conditions and the following disclaimer; 17# redistributions in binary form must reproduce the above copyright 18# notice, this list of conditions and the following disclaimer in the 19# documentation and/or other materials provided with the distribution; 20# neither the name of the copyright holders nor the names of its 21# contributors may be used to endorse or promote products derived from 22# this software without specific prior written permission. 23# 24# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 25# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 26# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 27# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 28# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 29# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 30# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 31# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 32# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 33# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 34# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 35# 36# Authors: Andrew Bardsley 37 38namespace Minor 39{ 40 41/*! 42 43\page minor Inside the Minor CPU model 44 45\tableofcontents 46 47This document contains a description of the structure and function of the 48Minor gem5 in-order processor model. It is recommended reading for anyone who 49wants to understand Minor's internal organisation, design decisions, C++ 50implementation and Python configuration. A familiarity with gem5 and some of 51its internal structures is assumed. This document is meant to be read 52alongside the Minor source code and to explain its general structure without 53being too slavish about naming every function and data type. 54 55\section whatis What is Minor? 56 57Minor is an in-order processor model with a fixed pipeline but configurable 58data structures and execute behaviour. It is intended to be used to model 59processors with strict in-order execution behaviour and allows visualisation 60of an instruction's position in the pipeline through the 61MinorTrace/minorview.py format/tool. The intention is to provide a framework 62for micro-architecturally correlating the model with a particular, chosen 63processor with similar capabilities. 64 65\section philo Design philosophy 66 67\subsection mt Multithreading 68 69The model isn't currently capable of multithreading but there are THREAD 70comments in key places where stage data needs to be arrayed to support 71multithreading. 72 73\subsection structs Data structures 74 75Decorating data structures with large amounts of life-cycle information is 76avoided. Only instructions (MinorDynInst) contain a significant proportion of 77their data content whose values are not set at construction. 78 79All internal structures have fixed sizes on construction. Data held in queues 80and FIFOs (MinorBuffer, FUPipeline) should have a BubbleIF interface to 81allow a distinct 'bubble'/no data value option for each type. 82 83Inter-stage 'struct' data is packaged in structures which are passed by value. 84Only MinorDynInst, the line data in ForwardLineData and the memory-interfacing 85objects Fetch1::FetchRequest and LSQ::LSQRequest are '::new' allocated while 86running the model. 87 88\section model Model structure 89 90Objects of class MinorCPU are provided by the model to gem5. MinorCPU 91implements the interfaces of (cpu.hh) and can provide data and 92instruction interfaces for connection to a cache system. The model is 93configured in a similar way to other gem5 models through Python. That 94configuration is passed on to MinorCPU::pipeline (of class Pipeline) which 95actually implements the processor pipeline. 96 97The hierarchy of major unit ownership from MinorCPU down looks like this: 98 99<ul> 100<li>MinorCPU</li> 101<ul> 102 <li>Pipeline - container for the pipeline, owns the cyclic 'tick' 103 event mechanism and the idling (cycle skipping) mechanism.</li> 104 <ul> 105 <li>Fetch1 - instruction fetch unit responsible for fetching cache 106 lines (or parts of lines from the I-cache interface)</li> 107 <ul> 108 <li>Fetch1::IcachePort - interface to the I-cache from 109 Fetch1</li> 110 </ul> 111 <li>Fetch2 - line to instruction decomposition</li> 112 <li>Decode - instruction to micro-op decomposition</li> 113 <li>Execute - instruction execution and data memory 114 interface</li> 115 <ul> 116 <li>LSQ - load store queue for memory ref. instructions</li> 117 <li>LSQ::DcachePort - interface to the D-cache from 118 Execute</li> 119 </ul> 120 </ul> 121 </ul> 122</ul> 123 124\section keystruct Key data structures 125 126\subsection ids Instruction and line identity: InstId (dyn_inst.hh) 127 128An InstId contains the sequence numbers and thread numbers that describe the 129life cycle and instruction stream affiliations of individual fetched cache 130lines and instructions. 131 132An InstId is printed in one of the following forms: 133 134 - T/S.P/L - for fetched cache lines 135 - T/S.P/L/F - for instructions before Decode 136 - T/S.P/L/F.E - for instructions from Decode onwards 137 138for example: 139 140 - 0/10.12/5/6.7 141 142InstId's fields are: 143 144<table> 145<tr> 146 <td><b>Field</b></td> 147 <td><b>Symbol</b></td> 148 <td><b>Generated by</b></td> 149 <td><b>Checked by</b></td> 150 <td><b>Function</b></td> 151</tr> 152 153<tr> 154 <td>InstId::threadId</td> 155 <td>T</td> 156 <td>Fetch1</td> 157 <td>Everywhere the thread number is needed</td> 158 <td>Thread number (currently always 0).</td> 159</tr> 160 161<tr> 162 <td>InstId::streamSeqNum</td> 163 <td>S</td> 164 <td>Execute</td> 165 <td>Fetch1, Fetch2, Execute (to discard lines/insts)</td> 166 <td>Stream sequence number as chosen by Execute. Stream 167 sequence numbers change after changes of PC (branches, exceptions) in 168 Execute and are used to separate pre and post branch instruction 169 streams.</td> 170</tr> 171 172<tr> 173 <td>InstId::predictionSeqNum</td> 174 <td>P</td> 175 <td>Fetch2</td> 176 <td>Fetch2 (while discarding lines after prediction)</td> 177 <td>Prediction sequence numbers represent branch prediction decisions. 178 This is used by Fetch2 to mark lines/instructions according to the last 179 followed branch prediction made by Fetch2. Fetch2 can signal to Fetch1 180 that it should change its fetch address and mark lines with a new 181 prediction sequence number (which it will only do if the stream sequence 182 number Fetch1 expects matches that of the request). </td> </tr> 183 184<tr> 185<td>InstId::lineSeqNum</td> 186<td>L</td> 187<td>Fetch1</td> 188<td>(Just for debugging)</td> 189<td>Line fetch sequence number of this cache line or the line 190 this instruction was extracted from. 191 </td> 192</tr> 193 194<tr> 195<td>InstId::fetchSeqNum</td> 196<td>F</td> 197<td>Fetch2</td> 198<td>Fetch2 (as the inst. sequence number for branches)</td> 199<td>Instruction fetch order assigned by Fetch2 when lines 200 are decomposed into instructions. 201 </td> 202</tr> 203 204<tr> 205<td>InstId::execSeqNum</td> 206<td>E</td> 207<td>Decode</td> 208<td>Execute (to check instruction identity in queues/FUs/LSQ)</td> 209<td>Instruction order after micro-op decomposition.</td> 210</tr> 211 212</table> 213 214The sequence number fields are all independent of each other and although, for 215instance, InstId::execSeqNum for an instruction will always be >= 216InstId::fetchSeqNum, the comparison is not useful. 217 218The originating stage of each sequence number field keeps a counter for that 219field which can be incremented in order to generate new, unique numbers. 220 221\subsection insts Instructions: MinorDynInst (dyn_inst.hh) 222 223MinorDynInst represents an instruction's progression through the pipeline. An 224instruction can be three things: 225 226<table> 227<tr> 228 <td><b>Thing</b></td> 229 <td><b>Predicate</b></td> 230 <td><b>Explanation</b></td> 231</tr> 232<tr> 233 <td>A bubble</td> 234 <td>MinorDynInst::isBubble()</td> 235 <td>no instruction at all, just a space-filler</td> 236</tr> 237<tr> 238 <td>A fault</td> 239 <td>MinorDynInst::isFault()</td> 240 <td>a fault to pass down the pipeline in an instruction's clothing</td> 241</tr> 242<tr> 243 <td>A decoded instruction</td> 244 <td>MinorDynInst::isInst()</td> 245 <td>instructions are actually passed to the gem5 decoder in Fetch2 and so 246 are created fully decoded. MinorDynInst::staticInst is the decoded 247 instruction form.</td> 248</tr> 249</table> 250 251Instructions are reference counted using the gem5 RefCountingPtr 252(base/refcnt.hh) wrapper. They therefore usually appear as MinorDynInstPtr in 253code. Note that as RefCountingPtr initialises as nullptr rather than an 254object that supports BubbleIF::isBubble, passing raw MinorDynInstPtrs to 255Queue%s and other similar structures from stage.hh without boxing is 256dangerous. 257 258\subsection fld ForwardLineData (pipe_data.hh) 259 260ForwardLineData is used to pass cache lines from Fetch1 to Fetch2. Like 261MinorDynInst%s, they can be bubbles (ForwardLineData::isBubble()), 262fault-carrying or can contain a line (partial line) fetched by Fetch1. The 263data carried by ForwardLineData is owned by a Packet object returned from 264memory and is explicitly memory managed and do must be deleted once processed 265(by Fetch2 deleting the Packet). 266 267\subsection fid ForwardInstData (pipe_data.hh) 268 269ForwardInstData can contain up to ForwardInstData::width() instructions in its 270ForwardInstData::insts vector. This structure is used to carry instructions 271between Fetch2, Decode and Execute and to store input buffer vectors in Decode 272and Execute. 273 274\subsection fr Fetch1::FetchRequest (fetch1.hh) 275 276FetchRequests represent I-cache line fetch requests. The are used in the 277memory queues of Fetch1 and are pushed into/popped from Packet::senderState 278while traversing the memory system. 279 280FetchRequests contain a memory system Request (mem/request.hh) for that fetch 281access, a packet (Packet, mem/packet.hh), if the request gets to memory, and a 282fault field that can be populated with a TLB-sourced prefetch fault (if any). 283 284\subsection lsqr LSQ::LSQRequest (execute.hh) 285 286LSQRequests are similar to FetchRequests but for D-cache accesses. They carry 287the instruction associated with a memory access. 288 289\section pipeline The pipeline 290 291\verbatim 292------------------------------------------------------------------------------ 293 Key: 294 295 [] : inter-stage BufferBuffer 296 ,--. 297 | | : pipeline stage 298 `--' 299 ---> : forward communication 300 <--- : backward communication 301 302 rv : reservation information for input buffers 303 304 ,------. ,------. ,------. ,-------. 305 (from --[]-v->|Fetch1|-[]->|Fetch2|-[]->|Decode|-[]->|Execute|--> (to Fetch1 306 Execute) | | |<-[]-| |<-rv-| |<-rv-| | & Fetch2) 307 | `------'<-rv-| | | | | | 308 `-------------->| | | | | | 309 `------' `------' `-------' 310------------------------------------------------------------------------------ 311\endverbatim 312 313The four pipeline stages are connected together by MinorBuffer FIFO 314(stage.hh, derived ultimately from TimeBuffer) structures which allow 315inter-stage delays to be modelled. There is a MinorBuffer%s between adjacent 316stages in the forward direction (for example: passing lines from Fetch1 to 317Fetch2) and, between Fetch2 and Fetch1, a buffer in the backwards direction 318carrying branch predictions. 319 320Stages Fetch2, Decode and Execute have input buffers which, each cycle, can 321accept input data from the previous stage and can hold that data if the stage 322is not ready to process it. Input buffers store data in the same form as it 323is received and so Decode and Execute's input buffers contain the output 324instruction vector (ForwardInstData (pipe_data.hh)) from their previous stages 325with the instructions and bubbles in the same positions as a single buffer 326entry. 327 328Stage input buffers provide a Reservable (stage.hh) interface to their 329previous stages, to allow slots to be reserved in their input buffers, and 330communicate their input buffer occupancy backwards to allow the previous stage 331to plan whether it should make an output in a given cycle. 332 333\subsection events Event handling: MinorActivityRecorder (activity.hh, 334pipeline.hh) 335 336Minor is essentially a cycle-callable model with some ability to skip cycles 337based on pipeline activity. External events are mostly received by callbacks 338(e.g. Fetch1::IcachePort::recvTimingResp) and cause the pipeline to be woken 339up to service advancing request queues. 340 341Ticked (sim/ticked.hh) is a base class bringing together an evaluate 342member function and a provided SimObject. It provides a Ticked::start/stop 343interface to start and pause clock events from being periodically issued. 344Pipeline is a derived class of Ticked. 345 346During evaluate calls, stages can signal that they still have work to do in 347the next cycle by calling either MinorCPU::activityRecorder->activity() (for 348non-callable related activity) or MinorCPU::wakeupOnEvent(<stageId>) (for 349stage callback-related 'wakeup' activity). 350 351Pipeline::evaluate contains calls to evaluate for each unit and a test for 352pipeline idling which can turns off the clock tick if no unit has signalled 353that it may become active next cycle. 354 355Within Pipeline (pipeline.hh), the stages are evaluated in reverse order (and 356so will ::evaluate in reverse order) and their backwards data can be 357read immediately after being written in each cycle allowing output decisions 358to be 'perfect' (allowing synchronous stalling of the whole pipeline). Branch 359predictions from Fetch2 to Fetch1 can also be transported in 0 cycles making 360fetch1ToFetch2BackwardDelay the only configurable delay which can be set as 361low as 0 cycles. 362 363The MinorCPU::activateContext and MinorCPU::suspendContext interface can be 364called to start and pause threads (threads in the MT sense) and to start and 365pause the pipeline. Executing instructions can call this interface 366(indirectly through the ThreadContext) to idle the CPU/their threads. 367 368\subsection stages Each pipeline stage 369 370In general, the behaviour of a stage (each cycle) is: 371 372\verbatim 373 evaluate: 374 push input to inputBuffer 375 setup references to input/output data slots 376 377 do 'every cycle' 'step' tasks 378 379 if there is input and there is space in the next stage: 380 process and generate a new output 381 maybe re-activate the stage 382 383 send backwards data 384 385 if the stage generated output to the following FIFO: 386 signal pipe activity 387 388 if the stage has more processable input and space in the next stage: 389 re-activate the stage for the next cycle 390 391 commit the push to the inputBuffer if that data hasn't all been used 392\endverbatim 393 394The Execute stage differs from this model as its forward output (branch) data 395is unconditionally sent to Fetch1 and Fetch2. To allow this behaviour, Fetch1 396and Fetch2 must be unconditionally receptive to that data. 397 398\subsection fetch1 Fetch1 stage 399 400Fetch1 is responsible for fetching cache lines or partial cache lines from the 401I-cache and passing them on to Fetch2 to be decomposed into instructions. It 402can receive 'change of stream' indications from both Execute and Fetch2 to 403signal that it should change its internal fetch address and tag newly fetched 404lines with new stream or prediction sequence numbers. When both Execute and 405Fetch2 signal changes of stream at the same time, Fetch1 takes Execute's 406change. 407 408Every line issued by Fetch1 will bear a unique line sequence number which can 409be used for debugging stream changes. 410 411When fetching from the I-cache, Fetch1 will ask for data from the current 412fetch address (Fetch1::pc) up to the end of the 'data snap' size set in the 413parameter fetch1LineSnapWidth. Subsequent autonomous line fetches will fetch 414whole lines at a snap boundary and of size fetch1LineWidth. 415 416Fetch1 will only initiate a memory fetch if it can reserve space in Fetch2 417input buffer. That input buffer serves an the fetch queue/LFL for the system. 418 419Fetch1 contains two queues: requests and transfers to handle the stages of 420translating the address of a line fetch (via the TLB) and accommodating the 421request/response of fetches to/from memory. 422 423Fetch requests from Fetch1 are pushed into the requests queue as newly 424allocated FetchRequest objects once they have been sent to the ITLB with a 425call to itb->translateTiming. 426 427A response from the TLB moves the request from the requests queue to the 428transfers queue. If there is more than one entry in each queue, it is 429possible to get a TLB response for request which is not at the head of the 430requests queue. In that case, the TLB response is marked up as a state change 431to Translated in the request object, and advancing the request to transfers 432(and the memory system) is left to calls to Fetch1::stepQueues which is called 433in the cycle following any event is received. 434 435Fetch1::tryToSendToTransfers is responsible for moving requests between the 436two queues and issuing requests to memory. Failed TLB lookups (prefetch 437aborts) continue to occupy space in the queues until they are recovered at the 438head of transfers. 439 440Responses from memory change the request object state to Complete and 441Fetch1::evaluate can pick up response data, package it in the ForwardLineData 442object, and forward it to Fetch2%'s input buffer. 443 444As space is always reserved in Fetch2::inputBuffer, setting the input buffer's 445size to 1 results in non-prefetching behaviour. 446 447When a change of stream occurs, translated requests queue members and 448completed transfers queue members can be unconditionally discarded to make way 449for new transfers. 450 451\subsection fetch2 Fetch2 stage 452 453Fetch2 receives a line from Fetch1 into its input buffer. The data in the 454head line in that buffer is iterated over and separated into individual 455instructions which are packed into a vector of instructions which can be 456passed to Decode. Packing instructions can be aborted early if a fault is 457found in either the input line as a whole or a decomposed instruction. 458 459\subsubsection bp Branch prediction 460 461Fetch2 contains the branch prediction mechanism. This is a wrapper around the 462branch predictor interface provided by gem5 (cpu/pred/...). 463 464Branches are predicted for any control instructions found. If prediction is 465attempted for an instruction, the MinorDynInst::triedToPredict flag is set on 466that instruction. 467 468When a branch is predicted to take, the MinorDynInst::predictedTaken flag is 469set and MinorDynInst::predictedTarget is set to the predicted target PC value. 470The predicted branch instruction is then packed into Fetch2%'s output vector, 471the prediction sequence number is incremented, and the branch is communicated 472to Fetch1. 473 474After signalling a prediction, Fetch2 will discard its input buffer contents 475and will reject any new lines which have the same stream sequence number as 476that branch but have a different prediction sequence number. This allows 477following sequentially fetched lines to be rejected without ignoring new lines 478generated by a change of stream indicated from a 'real' branch from Execute 479(which will have a new stream sequence number). 480 481The program counter value provided to Fetch2 by Fetch1 packets is only updated 482when there is a change of stream. Fetch2::havePC indicates whether the PC 483will be picked up from the next processed input line. Fetch2::havePC is 484necessary to allow line-wrapping instructions to be tracked through decode. 485 486Branches (and instructions predicted to branch) which are processed by Execute 487will generate BranchData (pipe_data.hh) data explaining the outcome of the 488branch which is sent forwards to Fetch1 and Fetch2. Fetch1 uses this data to 489change stream (and update its stream sequence number and address for new 490lines). Fetch2 uses it to update the branch predictor. Minor does not 491communicate branch data to the branch predictor for instructions which are 492discarded on the way to commit. 493 494BranchData::BranchReason (pipe_data.hh) encodes the possible branch scenarios: 495 496<table> 497<tr> 498 <td>Branch enum val.</td> 499 <td>In Execute</td> 500 <td>Fetch1 reaction</td> 501 <td>Fetch2 reaction</td> 502</tr> 503<tr> 504 <td>NoBranch</td> 505 <td>(output bubble data)</td> 506 <td>-</td> 507 <td>-</td> 508</tr> 509<tr> 510 <td>CorrectlyPredictedBranch</td> 511 <td>Predicted, taken</td> 512 <td>-</td> 513 <td>Update BP as taken branch</td> 514</tr> 515<tr> 516 <td>UnpredictedBranch</td> 517 <td>Not predicted, taken and was taken</td> 518 <td>New stream</td> 519 <td>Update BP as taken branch</td> 520</tr> 521<tr> 522 <td>BadlyPredictedBranch</td> 523 <td>Predicted, not taken</td> 524 <td>New stream to restore to old inst. source</td> 525 <td>Update BP as not taken branch</td> 526</tr> 527<tr> 528 <td>BadlyPredictedBranchTarget</td> 529 <td>Predicted, taken, but to a different target than predicted one</td> 530 <td>New stream</td> 531 <td>Update BTB to new target</td> 532</tr> 533<tr> 534 <td>SuspendThread</td> 535 <td>Hint to suspend fetching</td> 536 <td>Suspend fetch for this thread (branch to next inst. as wakeup 537 fetch addr)</td> 538 <td>-</td> 539</tr> 540<tr> 541 <td>Interrupt</td> 542 <td>Interrupt detected</td> 543 <td>New stream</td> 544 <td>-</td> 545</tr> 546</table> 547 548The parameter decodeInputWidth sets the number of instructions which can be 549packed into the output per cycle. If the parameter fetch2CycleInput is true, 550Decode can try to take instructions from more than one entry in its input 551buffer per cycle. 552 553\subsection decode Decode stage 554 555Decode takes a vector of instructions from Fetch2 (via its input buffer) and 556decomposes those instructions into micro-ops (if necessary) and packs them 557into its output instruction vector. 558 559The parameter executeInputWidth sets the number of instructions which can be 560packed into the output per cycle. If the parameter decodeCycleInput is true, 561Decode can try to take instructions from more than one entry in its input 562buffer per cycle. 563 564\subsection execute Execute stage 565 566Execute provides all the instruction execution and memory access mechanisms. 567An instructions passage through Execute can take multiple cycles with its 568precise timing modelled by a functional unit pipeline FIFO. 569 570A vector of instructions (possibly including fault 'instructions') is provided 571to Execute by Decode and can be queued in the Execute input buffer before 572being issued. Setting the parameter executeCycleInput allows execute to 573examine more than one input buffer entry (more than one instruction vector). 574The number of instructions in the input vector can be set with 575executeInputWidth and the depth of the input buffer can be set with parameter 576executeInputBufferSize. 577 578\subsubsection fus Functional units 579 580The Execute stage contains pipelines for each functional unit comprising the 581computational core of the CPU. Functional units are configured via the 582executeFuncUnits parameter. Each functional unit has a number of instruction 583classes it supports, a stated delay between instruction issues, and a delay 584from instruction issue to (possible) commit and an optional timing annotation 585capable of more complicated timing. 586 587Each active cycle, Execute::evaluate performs this action: 588 589\verbatim 590 Execute::evaluate: 591 push input to inputBuffer 592 setup references to input/output data slots and branch output slot 593 594 step D-cache interface queues (similar to Fetch1) 595 596 if interrupt posted: 597 take interrupt (signalling branch to Fetch1/Fetch2) 598 else 599 commit instructions 600 issue new instructions 601 602 advance functional unit pipelines 603 604 reactivate Execute if the unit is still active 605 606 commit the push to the inputBuffer if that data hasn't all been used 607\endverbatim 608 609\subsubsection fifos Functional unit FIFOs 610 611Functional units are implemented as SelfStallingPipelines (stage.hh). These 612are TimeBuffer FIFOs with two distinct 'push' and 'pop' wires. They respond 613to SelfStallingPipeline::advance in the same way as TimeBuffers <b>unless</b> 614there is data at the far, 'pop', end of the FIFO. A 'stalled' flag is 615provided for signalling stalling and to allow a stall to be cleared. The 616intention is to provide a pipeline for each functional unit which will never 617advance an instruction out of that pipeline until it has been processed and 618the pipeline is explicitly unstalled. 619 620The actions 'issue', 'commit', and 'advance' act on the functional units. 621 622\subsubsection issue Issue 623 624Issuing instructions involves iterating over both the input buffer 625instructions and the heads of the functional units to try and issue 626instructions in order. The number of instructions which can be issued each 627cycle is limited by the parameter executeIssueLimit, how executeCycleInput is 628set, the availability of pipeline space and the policy used to choose a 629pipeline in which the instruction can be issued. 630 631At present, the only issue policy is strict round-robin visiting of each 632pipeline with the given instructions in sequence. For greater flexibility, 633better (and more specific policies) will need to be possible. 634 635Memory operation instructions traverse their functional units to perform their 636EA calculations. On 'commit', the ExecContext::initiateAcc execution phase is 637performed and any memory access is issued (via. ExecContext::{read,write}Mem 638calling LSQ::pushRequest) to the LSQ. 639 640Note that faults are issued as if they are instructions and can (currently) be 641issued to *any* functional unit. 642 643Every issued instruction is also pushed into the Execute::inFlightInsts queue. 644Memory ref. instructions are pushing into Execute::inFUMemInsts queue. 645 646\subsubsection commit Commit 647 648Instructions are committed by examining the head of the Execute::inFlightInsts 649queue (which is decorated with the functional unit number to which the 650instruction was issued). Instructions which can then be found in their 651functional units are executed and popped from Execute::inFlightInsts. 652 653Memory operation instructions are committed into the memory queues (as 654described above) and exit their functional unit pipeline but are not popped 655from the Execute::inFlightInsts queue. The Execute::inFUMemInsts queue 656provides ordering to memory operations as they pass through the functional 657units (maintaining issue order). On entering the LSQ, instructions are popped 658from Execute::inFUMemInsts. 659 660If the parameter executeAllowEarlyMemoryIssue is set, memory operations can be 661sent from their FU to the LSQ before reaching the head of 662Execute::inFlightInsts but after their dependencies are met. 663MinorDynInst::instToWaitFor is marked up with the latest dependent instruction 664execSeqNum required to be committed for a memory operation to progress to the 665LSQ. 666 667Once a memory response is available (by testing the head of 668Execute::inFlightInsts against LSQ::findResponse), commit will process that 669response (ExecContext::completeAcc) and pop the instruction from 670Execute::inFlightInsts. 671 672Any branch, fault or interrupt will cause a stream sequence number change and 673signal a branch to Fetch1/Fetch2. Only instructions with the current stream 674sequence number will be issued and/or committed. 675 676\subsubsection advance Advance 677 678All non-stalled pipeline are advanced and may, thereafter, become stalled. 679Potential activity in the next cycle is signalled if there are any 680instructions remaining in any pipeline. 681 682\subsubsection sb Scoreboard 683 684The scoreboard (Scoreboard) is used to control instruction issue. It contains 685a count of the number of in flight instructions which will write each general 686purpose CPU integer or float register. Instructions will only be issued where 687the scoreboard contains a count of 0 instructions which will write to one of 688the instructions source registers. 689 690Once an instruction is issued, the scoreboard counts for each destination 691register for an instruction will be incremented. 692 693The estimated delivery time of the instruction's result is marked up in the 694scoreboard by adding the length of the issued-to FU to the current time. The 695timings parameter on each FU provides a list of additional rules for 696calculating the delivery time. These are documented in the parameter comments 697in MinorCPU.py. 698 699On commit, (for memory operations, memory response commit) the scoreboard 700counters for an instruction's source registers are decremented. will be 701decremented. 702 703\subsubsection ifi Execute::inFlightInsts 704 705The Execute::inFlightInsts queue will always contain all instructions in 706flight in Execute in the correct issue order. Execute::issue is the only 707process which will push an instruction into the queue. Execute::commit is the 708only process that can pop an instruction. 709 710\subsubsection lsq LSQ 711 712The LSQ can support multiple outstanding transactions to memory in a number of 713conservative cases. 714 715There are three queues to contain requests: requests, transfers and the store 716buffer. The requests and transfers queue operate in a similar manner to the 717queues in Fetch1. The store buffer is used to decouple the delay of 718completing store operations from following loads. 719 720Requests are issued to the DTLB as their instructions leave their functional 721unit. At the head of requests, cacheable load requests can be sent to memory 722and on to the transfers queue. Cacheable stores will be passed to transfers 723unprocessed and progress that queue maintaining order with other transactions. 724 725The conditions in LSQ::tryToSendToTransfers dictate when requests can 726be sent to memory. 727 728All uncacheable transactions, split transactions and locked transactions are 729processed in order at the head of requests. Additionally, store results 730residing in the store buffer can have their data forwarded to cacheable loads 731(removing the need to perform a read from memory) but no cacheable load can be 732issue to the transfers queue until that queue's stores have drained into the 733store buffer. 734 735At the end of transfers, requests which are LSQ::LSQRequest::Complete (are 736faulting, are cacheable stores, or have been sent to memory and received a 737response) can be picked off by Execute and either committed 738(ExecContext::completeAcc) and, for stores, be sent to the store buffer. 739 740Barrier instructions do not prevent cacheable loads from progressing to memory 741but do cause a stream change which will discard that load. Stores will not be 742committed to the store buffer if they are in the shadow of the barrier but 743before the new instruction stream has arrived at Execute. As all other memory 744transactions are delayed at the end of the requests queue until they are at 745the head of Execute::inFlightInsts, they will be discarded by any barrier 746stream change. 747 748After commit, LSQ::BarrierDataRequest requests are inserted into the 749store buffer to track each barrier until all preceding memory transactions 750have drained from the store buffer. No further memory transactions will be 751issued from the ends of FUs until after the barrier has drained. 752 753\subsubsection drain Draining 754 755Draining is mostly handled by the Execute stage. When initiated by calling 756MinorCPU::drain, Pipeline::evaluate checks the draining status of each unit 757each cycle and keeps the pipeline active until draining is complete. It is 758Pipeline that signals the completion of draining. Execute is triggered by 759MinorCPU::drain and starts stepping through its Execute::DrainState state 760machine, starting from state Execute::NotDraining, in this order: 761 762<table> 763<tr> 764 <td><b>State</b></td> 765 <td><b>Meaning</b></td> 766</tr> 767<tr> 768 <td>Execute::NotDraining</td> 769 <td>Not trying to drain, normal execution</td> 770</tr> 771<tr> 772 <td>Execute::DrainCurrentInst</td> 773 <td>Draining micro-ops to complete inst.</td> 774</tr> 775<tr> 776 <td>Execute::DrainHaltFetch</td> 777 <td>Halt fetching instructions</td> 778</tr> 779<tr> 780 <td>Execute::DrainAllInsts</td> 781 <td>Discarding all instructions presented</td> 782</tr> 783</table> 784 785When complete, a drained Execute unit will be in the Execute::DrainAllInsts 786state where it will continue to discard instructions but has no knowledge of 787the drained state of the rest of the model. 788 789\section debug Debug options 790 791The model provides a number of debug flags which can be passed to gem5 with 792the --debug-flags option. 793 794The available flags are: 795 796<table> 797<tr> 798 <td><b>Debug flag</b></td> 799 <td><b>Unit which will generate debugging output</b></td> 800</tr> 801<tr> 802 <td>Activity</td> 803 <td>Debug ActivityMonitor actions</td> 804</tr> 805<tr> 806 <td>Branch</td> 807 <td>Fetch2 and Execute branch prediction decisions</td> 808</tr> 809<tr> 810 <td>MinorCPU</td> 811 <td>CPU global actions such as wakeup/thread suspension</td> 812</tr> 813<tr> 814 <td>Decode</td> 815 <td>Decode</td> 816</tr> 817<tr> 818 <td>MinorExec</td> 819 <td>Execute behaviour</td> 820</tr> 821<tr> 822 <td>Fetch</td> 823 <td>Fetch1 and Fetch2</td> 824</tr> 825<tr> 826 <td>MinorInterrupt</td> 827 <td>Execute interrupt handling</td> 828</tr> 829<tr> 830 <td>MinorMem</td> 831 <td>Execute memory interactions</td> 832</tr> 833<tr> 834 <td>MinorScoreboard</td> 835 <td>Execute scoreboard activity</td> 836</tr> 837<tr> 838 <td>MinorTrace</td> 839 <td>Generate MinorTrace cyclic state trace output (see below)</td> 840</tr> 841<tr> 842 <td>MinorTiming</td> 843 <td>MinorTiming instruction timing modification operations</td> 844</tr> 845</table> 846 847The group flag Minor enables all of the flags beginning with Minor. 848 849\section trace MinorTrace and minorview.py 850 851The debug flag MinorTrace causes cycle-by-cycle state data to be printed which 852can then be processed and viewed by the minorview.py tool. This output is 853very verbose and so it is recommended it only be used for small examples. 854 855\subsection traceformat MinorTrace format 856 857There are three types of line outputted by MinorTrace: 858 859\subsubsection state MinorTrace - Ticked unit cycle state 860 861For example: 862 863\verbatim 864 110000: system.cpu.dcachePort: MinorTrace: state=MemoryRunning in_tlb_mem=0/0 865\endverbatim 866 867For each time step, the MinorTrace flag will cause one MinorTrace line to be 868printed for every named element in the model. 869 870\subsubsection traceunit MinorInst - summaries of instructions issued by \ 871 Decode 872 873For example: 874 875\verbatim 876 140000: system.cpu.execute: MinorInst: id=0/1.1/1/1.1 addr=0x5c \ 877 inst=" mov r0, #0" class=IntAlu 878\endverbatim 879 880MinorInst lines are currently only generated for instructions which are 881committed. 882 883\subsubsection tracefetch1 MinorLine - summaries of line fetches issued by \ 884 Fetch1 885 886For example: 887 888\verbatim 889 92000: system.cpu.icachePort: MinorLine: id=0/1.1/1 size=36 \ 890 vaddr=0x5c paddr=0x5c 891\endverbatim 892 893\subsection minorview minorview.py 894 895Minorview (util/minorview.py) can be used to visualise the data created by 896MinorTrace. 897 898\verbatim 899usage: minorview.py [-h] [--picture picture-file] [--prefix name] 900 [--start-time time] [--end-time time] [--mini-views] 901 event-file 902 903Minor visualiser 904 905positional arguments: 906 event-file 907 908optional arguments: 909 -h, --help show this help message and exit 910 --picture picture-file 911 markup file containing blob information (default: 912 <minorview-path>/minor.pic) 913 --prefix name name prefix in trace for CPU to be visualised 914 (default: system.cpu) 915 --start-time time time of first event to load from file 916 --end-time time time of last event to load from file 917 --mini-views show tiny views of the next 10 time steps 918\endverbatim 919 920Raw debugging output can be passed to minorview.py as the event-file. It will 921pick out the MinorTrace lines and use other lines where units in the 922simulation are named (such as system.cpu.dcachePort in the above example) will 923appear as 'comments' when units are clicked on the visualiser. 924 925Clicking on a unit which contains instructions or lines will bring up a speech 926bubble giving extra information derived from the MinorInst/MinorLine lines. 927 928--start-time and --end-time allow only sections of debug files to be loaded. 929 930--prefix allows the name prefix of the CPU to be inspected to be supplied. 931This defaults to 'system.cpu'. 932 933In the visualiser, The buttons Start, End, Back, Forward, Play and Stop can be 934used to control the displayed simulation time. 935 936The diagonally striped coloured blocks are showing the InstId of the 937instruction or line they represent. Note that lines in Fetch1 and f1ToF2.F 938only show the id fields of a line and that instructions in Fetch2, f2ToD, and 939decode.inputBuffer do not yet have execute sequence numbers. The T/S.P/L/F.E 940buttons can be used to toggle parts of InstId on and off to make it easier to 941understand the display. Useful combinations are: 942 943<table> 944<tr> 945 <td><b>Combination</b></td> 946 <td><b>Reason</b></td> 947</tr> 948<tr> 949 <td>E</td> 950 <td>just show the final execute sequence number</td> 951</tr> 952<tr> 953 <td>F/E</td> 954 <td>show the instruction-related numbers</td> 955</tr> 956<tr> 957 <td>S/P</td> 958 <td>show just the stream-related numbers (watch the stream sequence 959 change with branches and not change with predicted branches)</td> 960</tr> 961<tr> 962 <td>S/E</td> 963 <td>show instructions and their stream</td> 964</tr> 965</table> 966 967The key to the right shows all the displayable colours (some of the colour 968choices are quite bad!): 969 970<table> 971<tr> 972 <td><b>Symbol</b></td> 973 <td><b>Meaning</b></td> 974</tr> 975<tr> 976 <td>U</td> 977 <td>Unknown data</td> 978</tr> 979<tr> 980 <td>B</td> 981 <td>Blocked stage</td> 982</tr> 983<tr> 984 <td>-</td> 985 <td>Bubble</td> 986</tr> 987<tr> 988 <td>E</td> 989 <td>Empty queue slot</td> 990</tr> 991<tr> 992 <td>R</td> 993 <td>Reserved queue slot</td> 994</tr> 995<tr> 996 <td>F</td> 997 <td>Fault</td> 998</tr> 999<tr> 1000 <td>r</td> 1001 <td>Read (used as the leftmost stripe on data in the dcachePort)</td> 1002</tr> 1003<tr> 1004 <td>w</td> 1005 <td>Write " "</td> 1006</tr> 1007<tr> 1008 <td>0 to 9</td> 1009 <td>last decimal digit of the corresponding data</td> 1010</tr> 1011</table> 1012 1013\verbatim 1014 1015 ,---------------. .--------------. *U 1016 | |=|->|=|->|=| | ||=|||->||->|| | *- <- Fetch queues/LSQ 1017 `---------------' `--------------' *R 1018 === ====== *w <- Activity/Stage activity 1019 ,--------------. *1 1020 ,--. ,. ,. | ============ | *3 <- Scoreboard 1021 | |-\[]-\||-\[]-\||-\[]-\| ============ | *5 <- Execute::inFlightInsts 1022 | | :[] :||-/[]-/||-/[]-/| -. -------- | *7 1023 | |-/[]-/|| ^ || | | --------- | *9 1024 | | || | || | | ------ | 1025[]->| | ->|| | || | | ---- | 1026 | |<-[]<-||<-+-<-||<-[]<-| | ------ |->[] <- Execute to Fetch1, 1027 '--` `' ^ `' | -' ------ | Fetch2 branch data 1028 ---. | ---. `--------------' 1029 ---' | ---' ^ ^ 1030 | ^ | `------------ Execute 1031 MinorBuffer ----' input `-------------------- Execute input buffer 1032 buffer 1033\endverbatim 1034 1035Stages show the colours of the instructions currently being 1036generated/processed. 1037 1038Forward FIFOs between stages show the data being pushed into them at the 1039current tick (to the left), the data in transit, and the data available at 1040their outputs (to the right). 1041 1042The backwards FIFO between Fetch2 and Fetch1 shows branch prediction data. 1043 1044In general, all displayed data is correct at the end of a cycle's activity at 1045the time indicated but before the inter-stage FIFOs are ticked. Each FIFO 1046has, therefore an extra slot to show the asserted new input data, and all the 1047data currently within the FIFO. 1048 1049Input buffers for each stage are shown below the corresponding stage and show 1050the contents of those buffers as horizontal strips. Strips marked as reserved 1051(cyan by default) are reserved to be filled by the previous stage. An input 1052buffer with all reserved or occupied slots will, therefore, block the previous 1053stage from generating output. 1054 1055Fetch queues and LSQ show the lines/instructions in the queues of each 1056interface and show the number of lines/instructions in TLB and memory in the 1057two striped colours of the top of their frames. 1058 1059Inside Execute, the horizontal bars represent the individual FU pipelines. 1060The vertical bar to the left is the input buffer and the bar to the right, the 1061instructions committed this cycle. The background of Execute shows 1062instructions which are being committed this cycle in their original FU 1063pipeline positions. 1064 1065The strip at the top of the Execute block shows the current streamSeqNum that 1066Execute is committing. A similar stripe at the top of Fetch1 shows that 1067stage's expected streamSeqNum and the stripe at the top of Fetch2 shows its 1068issuing predictionSeqNum. 1069 1070The scoreboard shows the number of instructions in flight which will commit a 1071result to the register in the position shown. The scoreboard contains slots 1072for each integer and floating point register. 1073 1074The Execute::inFlightInsts queue shows all the instructions in flight in 1075Execute with the oldest instruction (the next instruction to be committed) to 1076the right. 1077 1078'Stage activity' shows the signalled activity (as E/1) for each stage (with 1079CPU miscellaneous activity to the left) 1080 1081'Activity' show a count of stage and pipe activity. 1082 1083\subsection picformat minor.pic format 1084 1085The minor.pic file (src/minor/minor.pic) describes the layout of the 1086models blocks on the visualiser. Its format is described in the supplied 1087minor.pic file. 1088 1089*/ 1090 1091} 1092