14198:9c2f67392409 |
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Make get(Data|Inst)Port return a Port and not a MasterPort.
No caller uses any of the MasterPort specific properties of these function's return values, so we can instead return a reference to the base Port class. This makes it possible for the data and inst ports to be of any port type, not just gem5 style MasterPorts. This makes life simpler for, for example, systemc based CPUs which might have TLM ports.
It also makes it possible for any two CPUs which have compatible ports to be switched between, as long as the ports they use support being unbound. Unfortunately that does not include TLM or systemc ports which are bound permanently.
Change-Id: I98fce5a16d2ef1af051238e929dd96d57a4ac838 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20240 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> |
14192:595a4358b844 |
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu, dev, mem: Use the new Port methods.
Use getPeer, takeOverFrom, and << to simplify the use of ports in some areas.
Change-Id: Idfbda27411b5d6b742f5e4927894302ea6d6a53d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20235 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
11633:40c951e58c2b |
15-Sep-2016 |
Radhika Jagtap <radhika.jagtap@arm.com> |
cpu: Support exit when any one Trace CPU completes replay
This change adds a Trace CPU param to exit simulation early, i.e. when the first (any one) trace execution is complete. With this change the user gets a choice to configure exit as either when the last CPU finishes (default) or first CPU finishes replay. Configuring an early exit enables simulating and measuring stats strictly when memory-system resources are being stressed by all Trace CPUs.
Change-Id: I3998045fdcc5cd343e1ca92d18dd7f7ecdba8f1d Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> |
11632:a96d6787b385 |
15-Sep-2016 |
Radhika Jagtap <radhika.jagtap@arm.com> |
cpu: Adjust for trace offset and fix stats
This change subtracts the time offset present in the trace from all the event times when nodes and request are sent so that the replay starts immediately when the simulation starts. This makes the stats accurate when the time offset in traces is large, for example when traces are generated in the middle of a workload execution. It also solves the problem of unnecessary DRAM refresh events that would keep occuring during the large time offset before even a single request is replayed into the system.
Change-Id: Ie0898842615def867ffd5c219948386d952af7f7 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> |
11249:0733a1c08600 |
07-Dec-2015 |
Radhika Jagtap <radhika.jagtap@ARM.com> |
cpu: Add TraceCPU to playback elastic traces
This patch defines a TraceCPU that replays trace generated using the elastic trace probe attached to the O3 CPU model. The elastic trace is an execution trace with data dependencies and ordering dependencies annoted to it. It also replays fixed timestamp instruction fetch trace that is also generated by the elastic trace probe.
The TraceCPU inherits from BaseCPU as a result of which some methods need to be defined. It has two port subclasses inherited from MasterPort for instruction and data ports. It issues the memory requests deducing the timing from the trace and without performing real execution of micro-ops. As soon as the last dependency for an instruction is complete, its computational delay, also provided in the input trace is added. The dependency-free nodes are maintained in a list, called 'ReadyList', ordered by ready time. Instructions which depend on load stall until the responses for read requests are received thus achieving elastic replay. If the dependency is not found when adding a new node, it is assumed complete. Thus, if this node is found to be completely dependency-free its issue time is calculated and it is added to the ready list immediately. This is encapsulated in the subclass ElasticDataGen.
If ready nodes are issued in an unconstrained way there can be more nodes outstanding which results in divergence in timing compared to the O3CPU. Therefore, the Trace CPU also models hardware resources. A sub-class to model hardware resources is added which contains the maximum sizes of load buffer, store buffer and ROB. If resources are not available, the node is not issued. The 'depFreeQueue' structure holds nodes that are pending issue.
Modeling the ROB size in the Trace CPU as a resource limitation is arguably the most important parameter of all resources. The ROB occupancy is estimated using the newly added field 'robNum'. We need to use ROB number as sequence number is at times much higher due to squashing and trace replay is focused on correct path modeling.
A map called 'inFlightNodes' is added to track nodes that are not only in the readyList but also load nodes that are executed (and thus removed from readyList) but are not complete. ReadyList handles what and when to execute next node while the inFlightNodes is used for resource modelling. The oldest ROB number is updated when any node occupies the ROB or when an entry in the ROB is released. The ROB occupancy is equal to the difference in the ROB number of the newly dependency-free node and the oldest ROB number in flight.
If no node dependends on a non load/store node then there is no reason to track it in the dependency graph. We filter out such nodes but count them and add a weight field to the subsequent node that we do include in the trace. The weight field is used to model ROB occupancy during replay.
The depFreeQueue is chosen to be FIFO so that child nodes which are in program order get pushed into it in that order and thus issued in the in program order, like in the O3CPU. This is also why the dependents is made a sequential container, std::set to std::vector. We only check head of the depFreeQueue as nodes are issued in order and blocking on head models that better than looping the entire queue. An alternative choice would be to inspect top N pending nodes where N is the issue-width. This is left for future as the timing correlation looks good as it is.
At the start of an execution event, first we attempt to issue such pending nodes by checking if appropriate resources have become available. If yes, we compute the execute tick with respect to the time then. Then we proceed to complete nodes from the readyList.
When a read response is received, sometimes a dependency on it that was supposed to be released when it was issued is still not released. This occurs because the dependent gets added to the graph after the read was sent. So the check is made less strict and the dependency is marked complete on read response instead of insisting that it should have been removed on read sent.
There is a check for requests spanning two cache lines as this condition triggers an assert fail in the L1 cache. If it does then truncate the size to access only until the end of that line and ignore the remainder. Strictly-ordered requests are skipped and the dependencies on such requests are handled by simply marking them complete immediately.
The simulated seconds can be calculated as the difference between the final_tick stat and the tickOffset stat. A CountedExitEvent that contains a static int belonging to the Trace CPU class as a down counter is used to implement multi Trace CPU simulation exit. |
2667:fe64b8353b1c |
09-Jun-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Move main control from C++ into Python. User script now invokes initialization and simulation loop after building configuration. These functions are exported from C++ to Python using SWIG.
SConstruct: Set up SWIG builder & scanner. Set up symlinking of source files into build directory (by not disabling the default behavior). configs/test/test.py: Rewrite to use new script-driven interface. Include a sample option. src/SConscript: Set up symlinking of source files into build directory (by not disabling the default behavior). Add SWIG-generated main_wrap.cc to source list. src/arch/SConscript: Set up symlinking of source files into build directory (by not disabling the default behavior). src/arch/alpha/ev5.cc: src/arch/alpha/isa/decoder.isa: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/trace/opt_cpu.cc: src/cpu/trace/trace_cpu.cc: src/sim/pseudo_inst.cc: src/sim/root.cc: src/sim/serialize.cc: src/sim/syscall_emul.cc: SimExit() is now exitSimLoop(). src/cpu/base.cc: SimExitEvent is now SimLoopExitEvent src/python/SConscript: Add SWIG build command for main.i. Use python/m5 in build dir as source for zip archive... easy now with file duplication enabled. src/python/m5/__init__.py: - Move copyright notice back to C++ so we can print it right away, even for interactive sessions. - Get rid of argument parsing code; just provide default option descriptors for user script to call optparse with. - Don't clutter m5 namespace by sucking in all of m5.config and m5.objects. - Move instantiate() function here from config.py. src/python/m5/config.py: - Move instantiate() function to __init__.py. - Param.Foo deferred type lookups must use m5.objects namespace now (not m5). src/python/m5/objects/AlphaConsole.py: src/python/m5/objects/AlphaFullCPU.py: src/python/m5/objects/AlphaTLB.py: src/python/m5/objects/BadDevice.py: src/python/m5/objects/BaseCPU.py: src/python/m5/objects/BaseCache.py: src/python/m5/objects/Bridge.py: src/python/m5/objects/Bus.py: src/python/m5/objects/CoherenceProtocol.py: src/python/m5/objects/Device.py: src/python/m5/objects/DiskImage.py: src/python/m5/objects/Ethernet.py: src/python/m5/objects/Ide.py: src/python/m5/objects/IntrControl.py: src/python/m5/objects/MemObject.py: src/python/m5/objects/MemTest.py: src/python/m5/objects/Pci.py: src/python/m5/objects/PhysicalMemory.py: src/python/m5/objects/Platform.py: src/python/m5/objects/Process.py: src/python/m5/objects/Repl.py: src/python/m5/objects/Root.py: src/python/m5/objects/SimConsole.py: src/python/m5/objects/SimpleDisk.py: src/python/m5/objects/System.py: src/python/m5/objects/Tsunami.py: src/python/m5/objects/Uart.py: Fix up imports (m5 namespace no longer includes m5.config). src/sim/eventq.cc: src/sim/eventq.hh: Support for Python-called simulate() function: - Use IsExitEvent flag to signal events that want to exit the simulation loop gracefully (instead of calling exit() to terminate the process). - Modify interface to hand exit event object back to caller so it can be inspected for cause. src/sim/host.hh: Add MaxTick constant. src/sim/main.cc: Move copyright notice back to C++ so we can print it right away, even for interactive sessions. Use PYTHONPATH environment var to set module path (instead of clunky code injection method). Move main control from here into Python: - Separate initialization code and simulation loop into separate functions callable from Python. - Make Python interpreter invocation more pure (more like directly invoking interpreter). Add -i and -p flags (only options on binary itself; other options processed by Python). Import readline package when using interactive mode. src/sim/sim_events.cc: SimExitEvent is now SimLoopExitEvent, and uses IsSimExit flag to terminate loop (instead of exiting simulator process). src/sim/sim_events.hh: SimExitEvent is now SimLoopExitEvent, and uses IsSimExit flag to terminate loop (instead of exiting simulator process). Get rid of a few unused constructors. src/sim/sim_exit.hh: SimExit() is now exitSimLoop(). Get rid of unused functions. Add comments. |