inside-minor.doxygen revision 10259
1# Copyright (c) 2014 ARM Limited
2# All rights reserved
3#
4# The license below extends only to copyright in the software and shall
5# not be construed as granting a license to any other intellectual
6# property including but not limited to intellectual property relating
7# to a hardware implementation of the functionality of the software
8# licensed hereunder.  You may use the software subject to the license
9# terms below provided that you ensure that this notice is replicated
10# unmodified and in its entirety in all distributions of the software,
11# modified or unmodified, in source code or in binary form.
12#
13# Redistribution and use in source and binary forms, with or without
14# modification, are permitted provided that the following conditions are
15# met: redistributions of source code must retain the above copyright
16# notice, this list of conditions and the following disclaimer;
17# redistributions in binary form must reproduce the above copyright
18# notice, this list of conditions and the following disclaimer in the
19# documentation and/or other materials provided with the distribution;
20# neither the name of the copyright holders nor the names of its
21# contributors may be used to endorse or promote products derived from
22# this software without specific prior written permission.
23#
24# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
25# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
26# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
27# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
28# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
29# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
30# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
31# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
32# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
33# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
34# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
35#
36# Authors: Andrew Bardsley
37
38namespace Minor
39{
40
41/*!
42
43\page minor Inside the Minor CPU model
44
45\tableofcontents
46
47This document contains a description of the structure and function of the
48Minor gem5 in-order processor model.  It is recommended reading for anyone who
49wants to understand Minor's internal organisation, design decisions, C++
50implementation and Python configuration.  A familiarity with gem5 and some of
51its internal structures is assumed.  This document is meant to be read
52alongside the Minor source code and to explain its general structure without
53being too slavish about naming every function and data type.
54
55\section whatis What is Minor?
56
57Minor is an in-order processor model with a fixed pipeline but configurable
58data structures and execute behaviour.  It is intended to be used to model
59processors with strict in-order execution behaviour and allows visualisation
60of an instruction's position in the pipeline through the
61MinorTrace/minorview.py format/tool.  The intention is to provide a framework
62for micro-architecturally correlating the model with a particular, chosen
63processor with similar capabilities.
64
65\section philo Design philosophy
66
67\subsection mt Multithreading
68
69The model isn't currently capable of multithreading but there are THREAD
70comments in key places where stage data needs to be arrayed to support
71multithreading.
72
73\subsection structs Data structures
74
75Decorating data structures with large amounts of life-cycle information is
76avoided.  Only instructions (MinorDynInst) contain a significant proportion of
77their data content whose values are not set at construction.
78
79All internal structures have fixed sizes on construction.  Data held in queues
80and FIFOs (MinorBuffer, FUPipeline) should have a BubbleIF interface to
81allow a distinct 'bubble'/no data value option for each type.
82
83Inter-stage 'struct' data is packaged in structures which are passed by value.
84Only MinorDynInst, the line data in ForwardLineData and the memory-interfacing
85objects Fetch1::FetchRequest and LSQ::LSQRequest are '::new' allocated while
86running the model.
87
88\section model Model structure
89
90Objects of class MinorCPU are provided by the model to gem5.  MinorCPU
91implements the interfaces of (cpu.hh) and can provide data and
92instruction interfaces for connection to a cache system.  The model is
93configured in a similar way to other gem5 models through Python.  That
94configuration is passed on to MinorCPU::pipeline (of class Pipeline) which
95actually implements the processor pipeline.
96
97The hierarchy of major unit ownership from MinorCPU down looks like this:
98
99<ul>
100<li>MinorCPU</li>
101<ul>
102    <li>Pipeline - container for the pipeline, owns the cyclic 'tick'
103    event mechanism and the idling (cycle skipping) mechanism.</li>
104    <ul>
105        <li>Fetch1 - instruction fetch unit responsible for fetching cache
106            lines (or parts of lines from the I-cache interface)</li>
107        <ul>
108            <li>Fetch1::IcachePort - interface to the I-cache from
109                Fetch1</li>
110            </ul>
111            <li>Fetch2 - line to instruction decomposition</li>
112            <li>Decode - instruction to micro-op decomposition</li>
113            <li>Execute - instruction execution and data memory
114                interface</li>
115            <ul>
116                <li>LSQ - load store queue for memory ref. instructions</li>
117                <li>LSQ::DcachePort - interface to the D-cache from
118                    Execute</li>
119            </ul>
120        </ul>
121    </ul>
122</ul>
123
124\section keystruct Key data structures
125
126\subsection ids Instruction and line identity: InstId (dyn_inst.hh)
127
128An InstId contains the sequence numbers and thread numbers that describe the
129life cycle and instruction stream affiliations of individual fetched cache
130lines and instructions.
131
132An InstId is printed in one of the following forms:
133
134    - T/S.P/L - for fetched cache lines
135    - T/S.P/L/F - for instructions before Decode
136    - T/S.P/L/F.E - for instructions from Decode onwards
137
138for example:
139
140    - 0/10.12/5/6.7
141
142InstId's fields are:
143
144<table>
145<tr>
146    <td><b>Field</b></td>
147    <td><b>Symbol</b></td>
148    <td><b>Generated by</b></td>
149    <td><b>Checked by</b></td>
150    <td><b>Function</b></td>
151</tr>
152
153<tr>
154    <td>InstId::threadId</td>
155    <td>T</td>
156    <td>Fetch1</td>
157    <td>Everywhere the thread number is needed</td>
158    <td>Thread number (currently always 0).</td>
159</tr>
160
161<tr>
162    <td>InstId::streamSeqNum</td>
163    <td>S</td>
164    <td>Execute</td>
165    <td>Fetch1, Fetch2, Execute (to discard lines/insts)</td>
166    <td>Stream sequence number as chosen by Execute.  Stream
167        sequence numbers change after changes of PC (branches, exceptions) in
168        Execute and are used to separate pre and post branch instruction
169        streams.</td>
170</tr>
171
172<tr>
173    <td>InstId::predictionSeqNum</td>
174    <td>P</td>
175    <td>Fetch2</td>
176    <td>Fetch2 (while discarding lines after prediction)</td>
177    <td>Prediction sequence numbers represent branch prediction decisions.
178    This is used by Fetch2 to mark lines/instructions according to the last
179    followed branch prediction made by Fetch2.  Fetch2 can signal to Fetch1
180    that it should change its fetch address and mark lines with a new
181    prediction sequence number (which it will only do if the stream sequence
182    number Fetch1 expects matches that of the request).  </td> </tr>
183
184<tr>
185<td>InstId::lineSeqNum</td>
186<td>L</td>
187<td>Fetch1</td>
188<td>(Just for debugging)</td>
189<td>Line fetch sequence number of this cache line or the line
190    this instruction was extracted from.
191    </td>
192</tr>
193
194<tr>
195<td>InstId::fetchSeqNum</td>
196<td>F</td>
197<td>Fetch2</td>
198<td>Fetch2 (as the inst. sequence number for branches)</td>
199<td>Instruction fetch order assigned by Fetch2 when lines
200    are decomposed into instructions.
201    </td>
202</tr>
203
204<tr>
205<td>InstId::execSeqNum</td>
206<td>E</td>
207<td>Decode</td>
208<td>Execute (to check instruction identity in queues/FUs/LSQ)</td>
209<td>Instruction order after micro-op decomposition.</td>
210</tr>
211
212</table>
213
214The sequence number fields are all independent of each other and although, for
215instance, InstId::execSeqNum for an instruction will always be >=
216InstId::fetchSeqNum, the comparison is not useful.
217
218The originating stage of each sequence number field keeps a counter for that
219field which can be incremented in order to generate new, unique numbers.
220
221\subsection insts Instructions: MinorDynInst (dyn_inst.hh)
222
223MinorDynInst represents an instruction's progression through the pipeline.  An
224instruction can be three things:
225
226<table>
227<tr>
228    <td><b>Thing</b></td>
229    <td><b>Predicate</b></td>
230    <td><b>Explanation</b></td>
231</tr>
232<tr>
233    <td>A bubble</td>
234    <td>MinorDynInst::isBubble()</td>
235    <td>no instruction at all, just a space-filler</td>
236</tr>
237<tr>
238    <td>A fault</td>
239    <td>MinorDynInst::isFault()</td>
240    <td>a fault to pass down the pipeline in an instruction's clothing</td>
241</tr>
242<tr>
243    <td>A decoded instruction</td>
244    <td>MinorDynInst::isInst()</td>
245    <td>instructions are actually passed to the gem5 decoder in Fetch2 and so
246    are created fully decoded.  MinorDynInst::staticInst is the decoded
247    instruction form.</td>
248</tr>
249</table>
250
251Instructions are reference counted using the gem5 RefCountingPtr
252(base/refcnt.hh) wrapper.  They therefore usually appear as MinorDynInstPtr in
253code.  Note that as RefCountingPtr initialises as nullptr rather than an
254object that supports BubbleIF::isBubble, passing raw MinorDynInstPtrs to
255Queue%s and other similar structures from stage.hh without boxing is
256dangerous.
257
258\subsection fld ForwardLineData (pipe_data.hh)
259
260ForwardLineData is used to pass cache lines from Fetch1 to Fetch2.  Like
261MinorDynInst%s, they can be bubbles (ForwardLineData::isBubble()),
262fault-carrying or can contain a line (partial line) fetched by Fetch1.  The
263data carried by ForwardLineData is owned by a Packet object returned from
264memory and is explicitly memory managed and do must be deleted once processed
265(by Fetch2 deleting the Packet).
266
267\subsection fid ForwardInstData (pipe_data.hh)
268
269ForwardInstData can contain up to ForwardInstData::width() instructions in its
270ForwardInstData::insts vector.  This structure is used to carry instructions
271between Fetch2, Decode and Execute and to store input buffer vectors in Decode
272and Execute.
273
274\subsection fr Fetch1::FetchRequest (fetch1.hh)
275
276FetchRequests represent I-cache line fetch requests.  The are used in the
277memory queues of Fetch1 and are pushed into/popped from Packet::senderState
278while traversing the memory system.
279
280FetchRequests contain a memory system Request (mem/request.hh) for that fetch
281access, a packet (Packet, mem/packet.hh), if the request gets to memory, and a
282fault field that can be populated with a TLB-sourced prefetch fault (if any).
283
284\subsection lsqr LSQ::LSQRequest (execute.hh)
285
286LSQRequests are similar to FetchRequests but for D-cache accesses.  They carry
287the instruction associated with a memory access.
288
289\section pipeline The pipeline
290
291\verbatim
292------------------------------------------------------------------------------
293    Key:
294
295    [] : inter-stage BufferBuffer
296    ,--.
297    |  | : pipeline stage
298    `--'
299    ---> : forward communication
300    <--- : backward communication
301
302    rv : reservation information for input buffers
303
304                ,------.     ,------.     ,------.     ,-------.
305 (from  --[]-v->|Fetch1|-[]->|Fetch2|-[]->|Decode|-[]->|Execute|--> (to Fetch1
306 Execute)    |  |      |<-[]-|      |<-rv-|      |<-rv-|       |     & Fetch2)
307             |  `------'<-rv-|      |     |      |     |       |
308             `-------------->|      |     |      |     |       |
309                             `------'     `------'     `-------'
310------------------------------------------------------------------------------
311\endverbatim
312
313The four pipeline stages are connected together by MinorBuffer FIFO
314(stage.hh, derived ultimately from TimeBuffer) structures which allow
315inter-stage delays to be modelled.  There is a MinorBuffer%s between adjacent
316stages in the forward direction (for example: passing lines from Fetch1 to
317Fetch2) and, between Fetch2 and Fetch1, a buffer in the backwards direction
318carrying branch predictions.
319
320Stages Fetch2, Decode and Execute have input buffers which, each cycle, can
321accept input data from the previous stage and can hold that data if the stage
322is not ready to process it.  Input buffers store data in the same form as it
323is received and so Decode and Execute's input buffers contain the output
324instruction vector (ForwardInstData (pipe_data.hh)) from their previous stages
325with the instructions and bubbles in the same positions as a single buffer
326entry.
327
328Stage input buffers provide a Reservable (stage.hh) interface to their
329previous stages, to allow slots to be reserved in their input buffers, and
330communicate their input buffer occupancy backwards to allow the previous stage
331to plan whether it should make an output in a given cycle.
332
333\subsection events Event handling: MinorActivityRecorder (activity.hh,
334pipeline.hh)
335
336Minor is essentially a cycle-callable model with some ability to skip cycles
337based on pipeline activity.  External events are mostly received by callbacks
338(e.g. Fetch1::IcachePort::recvTimingResp) and cause the pipeline to be woken
339up to service advancing request queues.
340
341Ticked (sim/ticked.hh) is a base class bringing together an evaluate
342member function and a provided SimObject.  It provides a Ticked::start/stop
343interface to start and pause clock events from being periodically issued.
344Pipeline is a derived class of Ticked.
345
346During evaluate calls, stages can signal that they still have work to do in
347the next cycle by calling either MinorCPU::activityRecorder->activity() (for
348non-callable related activity) or MinorCPU::wakeupOnEvent(<stageId>) (for
349stage callback-related 'wakeup' activity).
350
351Pipeline::evaluate contains calls to evaluate for each unit and a test for
352pipeline idling which can turns off the clock tick if no unit has signalled
353that it may become active next cycle.
354
355Within Pipeline (pipeline.hh), the stages are evaluated in reverse order (and
356so will ::evaluate in reverse order) and their backwards data can be
357read immediately after being written in each cycle allowing output decisions
358to be 'perfect' (allowing synchronous stalling of the whole pipeline).  Branch
359predictions from Fetch2 to Fetch1 can also be transported in 0 cycles making
360fetch1ToFetch2BackwardDelay the only configurable delay which can be set as
361low as 0 cycles.
362
363The MinorCPU::activateContext and MinorCPU::suspendContext interface can be
364called to start and pause threads (threads in the MT sense) and to start and
365pause the pipeline.  Executing instructions can call this interface
366(indirectly through the ThreadContext) to idle the CPU/their threads.
367
368\subsection stages Each pipeline stage
369
370In general, the behaviour of a stage (each cycle) is:
371
372\verbatim
373    evaluate:
374        push input to inputBuffer
375        setup references to input/output data slots
376
377        do 'every cycle' 'step' tasks
378
379        if there is input and there is space in the next stage:
380            process and generate a new output
381            maybe re-activate the stage
382
383        send backwards data
384
385        if the stage generated output to the following FIFO:
386            signal pipe activity
387
388        if the stage has more processable input and space in the next stage:
389            re-activate the stage for the next cycle
390
391        commit the push to the inputBuffer if that data hasn't all been used
392\endverbatim
393
394The Execute stage differs from this model as its forward output (branch) data
395is unconditionally sent to Fetch1 and Fetch2.  To allow this behaviour, Fetch1
396and Fetch2 must be unconditionally receptive to that data.
397
398\subsection fetch1 Fetch1 stage
399
400Fetch1 is responsible for fetching cache lines or partial cache lines from the
401I-cache and passing them on to Fetch2 to be decomposed into instructions.  It
402can receive 'change of stream' indications from both Execute and Fetch2 to
403signal that it should change its internal fetch address and tag newly fetched
404lines with new stream or prediction sequence numbers.  When both Execute and
405Fetch2 signal changes of stream at the same time, Fetch1 takes Execute's
406change.
407
408Every line issued by Fetch1 will bear a unique line sequence number which can
409be used for debugging stream changes.
410
411When fetching from the I-cache, Fetch1 will ask for data from the current
412fetch address (Fetch1::pc) up to the end of the 'data snap' size set in the
413parameter fetch1LineSnapWidth.  Subsequent autonomous line fetches will fetch
414whole lines at a snap boundary and of size fetch1LineWidth.
415
416Fetch1 will only initiate a memory fetch if it can reserve space in Fetch2
417input buffer.  That input buffer serves an the fetch queue/LFL for the system.
418
419Fetch1 contains two queues: requests and transfers to handle the stages of
420translating the address of a line fetch (via the TLB) and accommodating the
421request/response of fetches to/from memory.
422
423Fetch requests from Fetch1 are pushed into the requests queue as newly
424allocated FetchRequest objects once they have been sent to the ITLB with a
425call to itb->translateTiming.
426
427A response from the TLB moves the request from the requests queue to the
428transfers queue.  If there is more than one entry in each queue, it is
429possible to get a TLB response for request which is not at the head of the
430requests queue.  In that case, the TLB response is marked up as a state change
431to Translated in the request object, and advancing the request to transfers
432(and the memory system) is left to calls to Fetch1::stepQueues which is called
433in the cycle following any event is received.
434
435Fetch1::tryToSendToTransfers is responsible for moving requests between the
436two queues and issuing requests to memory.  Failed TLB lookups (prefetch
437aborts) continue to occupy space in the queues until they are recovered at the
438head of transfers.
439
440Responses from memory change the request object state to Complete and
441Fetch1::evaluate can pick up response data, package it in the ForwardLineData
442object, and forward it to Fetch2%'s input buffer.
443
444As space is always reserved in Fetch2::inputBuffer, setting the input buffer's
445size to 1 results in non-prefetching behaviour.
446
447When a change of stream occurs, translated requests queue members and
448completed transfers queue members can be unconditionally discarded to make way
449for new transfers.
450
451\subsection fetch2 Fetch2 stage
452
453Fetch2 receives a line from Fetch1 into its input buffer.  The data in the
454head line in that buffer is iterated over and separated into individual
455instructions which are packed into a vector of instructions which can be
456passed to Decode.  Packing instructions can be aborted early if a fault is
457found in either the input line as a whole or a decomposed instruction.
458
459\subsubsection bp Branch prediction
460
461Fetch2 contains the branch prediction mechanism.  This is a wrapper around the
462branch predictor interface provided by gem5 (cpu/pred/...).
463
464Branches are predicted for any control instructions found.  If prediction is
465attempted for an instruction, the MinorDynInst::triedToPredict flag is set on
466that instruction.
467
468When a branch is predicted to take, the MinorDynInst::predictedTaken flag is
469set and MinorDynInst::predictedTarget is set to the predicted target PC value.
470The predicted branch instruction is then packed into Fetch2%'s output vector,
471the prediction sequence number is incremented, and the branch is communicated
472to Fetch1.
473
474After signalling a prediction, Fetch2 will discard its input buffer contents
475and will reject any new lines which have the same stream sequence number as
476that branch but have a different prediction sequence number.  This allows
477following sequentially fetched lines to be rejected without ignoring new lines
478generated by a change of stream indicated from a 'real' branch from Execute
479(which will have a new stream sequence number).
480
481The program counter value provided to Fetch2 by Fetch1 packets is only updated
482when there is a change of stream.  Fetch2::havePC indicates whether the PC
483will be picked up from the next processed input line.  Fetch2::havePC is
484necessary to allow line-wrapping instructions to be tracked through decode.
485
486Branches (and instructions predicted to branch) which are processed by Execute
487will generate BranchData (pipe_data.hh) data explaining the outcome of the
488branch which is sent forwards to Fetch1 and Fetch2.  Fetch1 uses this data to
489change stream (and update its stream sequence number and address for new
490lines).  Fetch2 uses it to update the branch predictor.   Minor does not
491communicate branch data to the branch predictor for instructions which are
492discarded on the way to commit.
493
494BranchData::BranchReason (pipe_data.hh) encodes the possible branch scenarios:
495
496<table>
497<tr>
498    <td>Branch enum val.</td>
499    <td>In Execute</td>
500    <td>Fetch1 reaction</td>
501    <td>Fetch2 reaction</td>
502</tr>
503<tr>
504    <td>NoBranch</td>
505    <td>(output bubble data)</td>
506    <td>-</td>
507    <td>-</td>
508</tr>
509<tr>
510    <td>CorrectlyPredictedBranch</td>
511    <td>Predicted, taken</td>
512    <td>-</td>
513    <td>Update BP as taken branch</td>
514</tr>
515<tr>
516    <td>UnpredictedBranch</td>
517    <td>Not predicted, taken and was taken</td>
518    <td>New stream</td>
519    <td>Update BP as taken branch</td>
520</tr>
521<tr>
522    <td>BadlyPredictedBranch</td>
523    <td>Predicted, not taken</td>
524    <td>New stream to restore to old inst. source</td>
525    <td>Update BP as not taken branch</td>
526</tr>
527<tr>
528    <td>BadlyPredictedBranchTarget</td>
529    <td>Predicted, taken, but to a different target than predicted one</td>
530    <td>New stream</td>
531    <td>Update BTB to new target</td>
532</tr>
533<tr>
534    <td>SuspendThread</td>
535    <td>Hint to suspend fetching</td>
536    <td>Suspend fetch for this thread (branch to next inst. as wakeup
537        fetch addr)</td>
538    <td>-</td>
539</tr>
540<tr>
541    <td>Interrupt</td>
542    <td>Interrupt detected</td>
543    <td>New stream</td>
544    <td>-</td>
545</tr>
546</table>
547
548The parameter decodeInputWidth sets the number of instructions which can be
549packed into the output per cycle.  If the parameter fetch2CycleInput is true,
550Decode can try to take instructions from more than one entry in its input
551buffer per cycle.
552
553\subsection decode Decode stage
554
555Decode takes a vector of instructions from Fetch2 (via its input buffer) and
556decomposes those instructions into micro-ops (if necessary) and packs them
557into its output instruction vector.
558
559The parameter executeInputWidth sets the number of instructions which can be
560packed into the output per cycle.  If the parameter decodeCycleInput is true,
561Decode can try to take instructions from more than one entry in its input
562buffer per cycle.
563
564\subsection execute Execute stage
565
566Execute provides all the instruction execution and memory access mechanisms.
567An instructions passage through Execute can take multiple cycles with its
568precise timing modelled by a functional unit pipeline FIFO.
569
570A vector of instructions (possibly including fault 'instructions') is provided
571to Execute by Decode and can be queued in the Execute input buffer before
572being issued.  Setting the parameter executeCycleInput allows execute to
573examine more than one input buffer entry (more than one instruction vector).
574The number of instructions in the input vector can be set with
575executeInputWidth and the depth of the input buffer can be set with parameter
576executeInputBufferSize.
577
578\subsubsection fus Functional units
579
580The Execute stage contains pipelines for each functional unit comprising the
581computational core of the CPU.  Functional units are configured via the
582executeFuncUnits parameter.  Each functional unit has a number of instruction
583classes it supports, a stated delay between instruction issues, and a delay
584from instruction issue to (possible) commit and an optional timing annotation
585capable of more complicated timing.
586
587Each active cycle, Execute::evaluate performs this action:
588
589\verbatim
590    Execute::evaluate:
591        push input to inputBuffer
592        setup references to input/output data slots and branch output slot
593
594        step D-cache interface queues (similar to Fetch1)
595
596        if interrupt posted:
597            take interrupt (signalling branch to Fetch1/Fetch2)
598        else
599            commit instructions
600            issue new instructions
601
602        advance functional unit pipelines
603
604        reactivate Execute if the unit is still active
605
606        commit the push to the inputBuffer if that data hasn't all been used
607\endverbatim
608
609\subsubsection fifos Functional unit FIFOs
610
611Functional units are implemented as SelfStallingPipelines (stage.hh).  These
612are TimeBuffer FIFOs with two distinct 'push' and 'pop' wires.  They respond
613to SelfStallingPipeline::advance in the same way as TimeBuffers <b>unless</b>
614there is data at the far, 'pop', end of the FIFO.  A 'stalled' flag is
615provided for signalling stalling and to allow a stall to be cleared.  The
616intention is to provide a pipeline for each functional unit which will never
617advance an instruction out of that pipeline until it has been processed and
618the pipeline is explicitly unstalled.
619
620The actions 'issue', 'commit', and 'advance' act on the functional units.
621
622\subsubsection issue Issue
623
624Issuing instructions involves iterating over both the input buffer
625instructions and the heads of the functional units to try and issue
626instructions in order.  The number of instructions which can be issued each
627cycle is limited by the parameter executeIssueLimit, how executeCycleInput is
628set, the availability of pipeline space and the policy used to choose a
629pipeline in which the instruction can be issued.
630
631At present, the only issue policy is strict round-robin visiting of each
632pipeline with the given instructions in sequence.  For greater flexibility,
633better (and more specific policies) will need to be possible.
634
635Memory operation instructions traverse their functional units to perform their
636EA calculations.  On 'commit', the ExecContext::initiateAcc execution phase is
637performed and any memory access is issued (via. ExecContext::{read,write}Mem
638calling LSQ::pushRequest) to the LSQ.
639
640Note that faults are issued as if they are instructions and can (currently) be
641issued to *any* functional unit.
642
643Every issued instruction is also pushed into the Execute::inFlightInsts queue.
644Memory ref. instructions are pushing into Execute::inFUMemInsts queue.
645
646\subsubsection commit Commit
647
648Instructions are committed by examining the head of the Execute::inFlightInsts
649queue (which is decorated with the functional unit number to which the
650instruction was issued).  Instructions which can then be found in their
651functional units are executed and popped from Execute::inFlightInsts.
652
653Memory operation instructions are committed into the memory queues (as
654described above) and exit their functional unit pipeline but are not popped
655from the Execute::inFlightInsts queue.  The Execute::inFUMemInsts queue
656provides ordering to memory operations as they pass through the functional
657units (maintaining issue order).  On entering the LSQ, instructions are popped
658from Execute::inFUMemInsts.
659
660If the parameter executeAllowEarlyMemoryIssue is set, memory operations can be
661sent from their FU to the LSQ before reaching the head of
662Execute::inFlightInsts but after their dependencies are met.
663MinorDynInst::instToWaitFor is marked up with the latest dependent instruction
664execSeqNum required to be committed for a memory operation to progress to the
665LSQ.
666
667Once a memory response is available (by testing the head of
668Execute::inFlightInsts against LSQ::findResponse), commit will process that
669response (ExecContext::completeAcc) and pop the instruction from
670Execute::inFlightInsts.
671
672Any branch, fault or interrupt will cause a stream sequence number change and
673signal a branch to Fetch1/Fetch2.  Only instructions with the current stream
674sequence number will be issued and/or committed.
675
676\subsubsection advance Advance
677
678All non-stalled pipeline are advanced and may, thereafter, become stalled.
679Potential activity in the next cycle is signalled if there are any
680instructions remaining in any pipeline.
681
682\subsubsection sb Scoreboard
683
684The scoreboard (Scoreboard) is used to control instruction issue.  It contains
685a count of the number of in flight instructions which will write each general
686purpose CPU integer or float register.  Instructions will only be issued where
687the scoreboard contains a count of 0 instructions which will write to one of
688the instructions source registers.
689
690Once an instruction is issued, the scoreboard counts for each destination
691register for an instruction will be incremented.
692
693The estimated delivery time of the instruction's result is marked up in the
694scoreboard by adding the length of the issued-to FU to the current time.  The
695timings parameter on each FU provides a list of additional rules for
696calculating the delivery time.  These are documented in the parameter comments
697in MinorCPU.py.
698
699On commit, (for memory operations, memory response commit) the scoreboard
700counters for an instruction's source registers are decremented.  will be
701decremented.
702
703\subsubsection ifi Execute::inFlightInsts
704
705The Execute::inFlightInsts queue will always contain all instructions in
706flight in Execute in the correct issue order.  Execute::issue is the only
707process which will push an instruction into the queue.  Execute::commit is the
708only process that can pop an instruction.
709
710\subsubsection lsq LSQ
711
712The LSQ can support multiple outstanding transactions to memory in a number of
713conservative cases.
714
715There are three queues to contain requests: requests, transfers and the store
716buffer.  The requests and transfers queue operate in a similar manner to the
717queues in Fetch1.  The store buffer is used to decouple the delay of
718completing store operations from following loads.
719
720Requests are issued to the DTLB as their instructions leave their functional
721unit.  At the head of requests, cacheable load requests can be sent to memory
722and on to the transfers queue.  Cacheable stores will be passed to transfers
723unprocessed and progress that queue maintaining order with other transactions.
724
725The conditions in LSQ::tryToSendToTransfers dictate when requests can
726be sent to memory.
727
728All uncacheable transactions, split transactions and locked transactions are
729processed in order at the head of requests.  Additionally, store results
730residing in the store buffer can have their data forwarded to cacheable loads
731(removing the need to perform a read from memory) but no cacheable load can be
732issue to the transfers queue until that queue's stores have drained into the
733store buffer.
734
735At the end of transfers, requests which are LSQ::LSQRequest::Complete (are
736faulting, are cacheable stores, or have been sent to memory and received a
737response) can be picked off by Execute and either committed
738(ExecContext::completeAcc) and, for stores, be sent to the store buffer.
739
740Barrier instructions do not prevent cacheable loads from progressing to memory
741but do cause a stream change which will discard that load.  Stores will not be
742committed to the store buffer if they are in the shadow of the barrier but
743before the new instruction stream has arrived at Execute.  As all other memory
744transactions are delayed at the end of the requests queue until they are at
745the head of Execute::inFlightInsts, they will be discarded by any barrier
746stream change.
747
748After commit, LSQ::BarrierDataRequest requests are inserted into the
749store buffer to track each barrier until all preceding memory transactions
750have drained from the store buffer.  No further memory transactions will be
751issued from the ends of FUs until after the barrier has drained.
752
753\subsubsection drain Draining
754
755Draining is mostly handled by the Execute stage.  When initiated by calling
756MinorCPU::drain, Pipeline::evaluate checks the draining status of each unit
757each cycle and keeps the pipeline active until draining is complete.  It is
758Pipeline that signals the completion of draining.  Execute is triggered by
759MinorCPU::drain and starts stepping through its Execute::DrainState state
760machine, starting from state Execute::NotDraining, in this order:
761
762<table>
763<tr>
764    <td><b>State</b></td>
765    <td><b>Meaning</b></td>
766</tr>
767<tr>
768    <td>Execute::NotDraining</td>
769    <td>Not trying to drain, normal execution</td>
770</tr>
771<tr>
772    <td>Execute::DrainCurrentInst</td>
773    <td>Draining micro-ops to complete inst.</td>
774</tr>
775<tr>
776    <td>Execute::DrainHaltFetch</td>
777    <td>Halt fetching instructions</td>
778</tr>
779<tr>
780    <td>Execute::DrainAllInsts</td>
781    <td>Discarding all instructions presented</td>
782</tr>
783</table>
784
785When complete, a drained Execute unit will be in the Execute::DrainAllInsts
786state where it will continue to discard instructions but has no knowledge of
787the drained state of the rest of the model.
788
789\section debug Debug options
790
791The model provides a number of debug flags which can be passed to gem5 with
792the --debug-flags option.
793
794The available flags are:
795
796<table>
797<tr>
798    <td><b>Debug flag</b></td>
799    <td><b>Unit which will generate debugging output</b></td>
800</tr>
801<tr>
802    <td>Activity</td>
803    <td>Debug ActivityMonitor actions</td>
804</tr>
805<tr>
806    <td>Branch</td>
807    <td>Fetch2 and Execute branch prediction decisions</td>
808</tr>
809<tr>
810    <td>MinorCPU</td>
811    <td>CPU global actions such as wakeup/thread suspension</td>
812</tr>
813<tr>
814    <td>Decode</td>
815    <td>Decode</td>
816</tr>
817<tr>
818    <td>MinorExec</td>
819    <td>Execute behaviour</td>
820</tr>
821<tr>
822    <td>Fetch</td>
823    <td>Fetch1 and Fetch2</td>
824</tr>
825<tr>
826    <td>MinorInterrupt</td>
827    <td>Execute interrupt handling</td>
828</tr>
829<tr>
830    <td>MinorMem</td>
831    <td>Execute memory interactions</td>
832</tr>
833<tr>
834    <td>MinorScoreboard</td>
835    <td>Execute scoreboard activity</td>
836</tr>
837<tr>
838    <td>MinorTrace</td>
839    <td>Generate MinorTrace cyclic state trace output (see below)</td>
840</tr>
841<tr>
842    <td>MinorTiming</td>
843    <td>MinorTiming instruction timing modification operations</td>
844</tr>
845</table>
846
847The group flag Minor enables all of the flags beginning with Minor.
848
849\section trace MinorTrace and minorview.py
850
851The debug flag MinorTrace causes cycle-by-cycle state data to be printed which
852can then be processed and viewed by the minorview.py tool.  This output is
853very verbose and so it is recommended it only be used for small examples.
854
855\subsection traceformat MinorTrace format
856
857There are three types of line outputted by MinorTrace:
858
859\subsubsection state MinorTrace - Ticked unit cycle state
860
861For example:
862
863\verbatim
864 110000: system.cpu.dcachePort: MinorTrace: state=MemoryRunning in_tlb_mem=0/0
865\endverbatim
866
867For each time step, the MinorTrace flag will cause one MinorTrace line to be
868printed for every named element in the model.
869
870\subsubsection traceunit MinorInst - summaries of instructions issued by \
871    Decode
872
873For example:
874
875\verbatim
876 140000: system.cpu.execute: MinorInst: id=0/1.1/1/1.1 addr=0x5c \
877                             inst="  mov r0, #0" class=IntAlu
878\endverbatim
879
880MinorInst lines are currently only generated for instructions which are
881committed.
882
883\subsubsection tracefetch1 MinorLine - summaries of line fetches issued by \
884    Fetch1
885
886For example:
887
888\verbatim
889  92000: system.cpu.icachePort: MinorLine: id=0/1.1/1 size=36 \
890                                vaddr=0x5c paddr=0x5c
891\endverbatim
892
893\subsection minorview minorview.py
894
895Minorview (util/minorview.py) can be used to visualise the data created by
896MinorTrace.
897
898\verbatim
899usage: minorview.py [-h] [--picture picture-file] [--prefix name]
900                   [--start-time time] [--end-time time] [--mini-views]
901                   event-file
902
903Minor visualiser
904
905positional arguments:
906  event-file
907
908optional arguments:
909  -h, --help            show this help message and exit
910  --picture picture-file
911                        markup file containing blob information (default:
912                        <minorview-path>/minor.pic)
913  --prefix name         name prefix in trace for CPU to be visualised
914                        (default: system.cpu)
915  --start-time time     time of first event to load from file
916  --end-time time       time of last event to load from file
917  --mini-views          show tiny views of the next 10 time steps
918\endverbatim
919
920Raw debugging output can be passed to minorview.py as the event-file. It will
921pick out the MinorTrace lines and use other lines where units in the
922simulation are named (such as system.cpu.dcachePort in the above example) will
923appear as 'comments' when units are clicked on the visualiser.
924
925Clicking on a unit which contains instructions or lines will bring up a speech
926bubble giving extra information derived from the MinorInst/MinorLine lines.
927
928--start-time and --end-time allow only sections of debug files to be loaded.
929
930--prefix allows the name prefix of the CPU to be inspected to be supplied.
931This defaults to 'system.cpu'.
932
933In the visualiser, The buttons Start, End, Back, Forward, Play and Stop can be
934used to control the displayed simulation time.
935
936The diagonally striped coloured blocks are showing the InstId of the
937instruction or line they represent.  Note that lines in Fetch1 and f1ToF2.F
938only show the id fields of a line and that instructions in Fetch2, f2ToD, and
939decode.inputBuffer do not yet have execute sequence numbers.  The T/S.P/L/F.E
940buttons can be used to toggle parts of InstId on and off to make it easier to
941understand the display.  Useful combinations are:
942
943<table>
944<tr>
945    <td><b>Combination</b></td>
946    <td><b>Reason</b></td>
947</tr>
948<tr>
949    <td>E</td>
950    <td>just show the final execute sequence number</td>
951</tr>
952<tr>
953    <td>F/E</td>
954    <td>show the instruction-related numbers</td>
955</tr>
956<tr>
957    <td>S/P</td>
958    <td>show just the stream-related numbers (watch the stream sequence
959        change with branches and not change with predicted branches)</td>
960</tr>
961<tr>
962    <td>S/E</td>
963    <td>show instructions and their stream</td>
964</tr>
965</table>
966
967The key to the right shows all the displayable colours (some of the colour
968choices are quite bad!):
969
970<table>
971<tr>
972    <td><b>Symbol</b></td>
973    <td><b>Meaning</b></td>
974</tr>
975<tr>
976    <td>U</td>
977    <td>Unknown data</td>
978</tr>
979<tr>
980    <td>B</td>
981    <td>Blocked stage</td>
982</tr>
983<tr>
984    <td>-</td>
985    <td>Bubble</td>
986</tr>
987<tr>
988    <td>E</td>
989    <td>Empty queue slot</td>
990</tr>
991<tr>
992    <td>R</td>
993    <td>Reserved queue slot</td>
994</tr>
995<tr>
996    <td>F</td>
997    <td>Fault</td>
998</tr>
999<tr>
1000    <td>r</td>
1001    <td>Read (used as the leftmost stripe on data in the dcachePort)</td>
1002</tr>
1003<tr>
1004    <td>w</td>
1005    <td>Write " "</td>
1006</tr>
1007<tr>
1008    <td>0 to 9</td>
1009    <td>last decimal digit of the corresponding data</td>
1010</tr>
1011</table>
1012
1013\verbatim
1014
1015    ,---------------.         .--------------.  *U
1016    | |=|->|=|->|=| |         ||=|||->||->|| |  *-  <- Fetch queues/LSQ
1017    `---------------'         `--------------'  *R
1018    === ======                                  *w  <- Activity/Stage activity
1019                              ,--------------.  *1
1020    ,--.      ,.      ,.      | ============ |  *3  <- Scoreboard
1021    |  |-\[]-\||-\[]-\||-\[]-\| ============ |  *5  <- Execute::inFlightInsts
1022    |  | :[] :||-/[]-/||-/[]-/| -. --------  |  *7
1023    |  |-/[]-/||  ^   ||      |  | --------- |  *9
1024    |  |      ||  |   ||      |  | ------    |
1025[]->|  |    ->||  |   ||      |  | ----      |
1026    |  |<-[]<-||<-+-<-||<-[]<-|  | ------    |->[] <- Execute to Fetch1,
1027    '--`      `'  ^   `'      | -' ------    |        Fetch2 branch data
1028             ---. |  ---.     `--------------'
1029             ---' |  ---'       ^       ^
1030                  |   ^         |       `------------ Execute
1031  MinorBuffer ----' input       `-------------------- Execute input buffer
1032                    buffer
1033\endverbatim
1034
1035Stages show the colours of the instructions currently being
1036generated/processed.
1037
1038Forward FIFOs between stages show the data being pushed into them at the
1039current tick (to the left), the data in transit, and the data available at
1040their outputs (to the right).
1041
1042The backwards FIFO between Fetch2 and Fetch1 shows branch prediction data.
1043
1044In general, all displayed data is correct at the end of a cycle's activity at
1045the time indicated but before the inter-stage FIFOs are ticked.  Each FIFO
1046has, therefore an extra slot to show the asserted new input data, and all the
1047data currently within the FIFO.
1048
1049Input buffers for each stage are shown below the corresponding stage and show
1050the contents of those buffers as horizontal strips.  Strips marked as reserved
1051(cyan by default) are reserved to be filled by the previous stage.  An input
1052buffer with all reserved or occupied slots will, therefore, block the previous
1053stage from generating output.
1054
1055Fetch queues and LSQ show the lines/instructions in the queues of each
1056interface and show the number of lines/instructions in TLB and memory in the
1057two striped colours of the top of their frames.
1058
1059Inside Execute, the horizontal bars represent the individual FU pipelines.
1060The vertical bar to the left is the input buffer and the bar to the right, the
1061instructions committed this cycle.  The background of Execute shows
1062instructions which are being committed this cycle in their original FU
1063pipeline positions.
1064
1065The strip at the top of the Execute block shows the current streamSeqNum that
1066Execute is committing.  A similar stripe at the top of Fetch1 shows that
1067stage's expected streamSeqNum and the stripe at the top of Fetch2 shows its
1068issuing predictionSeqNum.
1069
1070The scoreboard shows the number of instructions in flight which will commit a
1071result to the register in the position shown.  The scoreboard contains slots
1072for each integer and floating point register.
1073
1074The Execute::inFlightInsts queue shows all the instructions in flight in
1075Execute with the oldest instruction (the next instruction to be committed) to
1076the right.
1077
1078'Stage activity' shows the signalled activity (as E/1) for each stage (with
1079CPU miscellaneous activity to the left)
1080
1081'Activity' show a count of stage and pipe activity.
1082
1083\subsection picformat minor.pic format
1084
1085The minor.pic file (src/minor/minor.pic) describes the layout of the
1086models blocks on the visualiser.  Its format is described in the supplied
1087minor.pic file.
1088
1089*/
1090
1091}
1092