#
14022:a7cdc33dab35 |
|
02-May-2019 |
Gabe Black <gabeblack@google.com> |
cpu, sim: Return PortProxy &s from all the proxy accessors.
This is a step towards merging the accessors for SE and FS modes.
Change-Id: I76818ab88b97097ac363e243be9cc1911b283090 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18579 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
13865:cca49fc49c57 |
|
13-Apr-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Eliminate the ProxyThreadContext class.
Replace it with direct inheritance from the ThreadContext class in the SimpleThread class which was the only place it was used.
Also take the opportunity to use some specialized types instead of ints, etc., add some consts, and fix some style issues.
Change-Id: I5d2cfa87b20dc43615e33e6755c9d016564e9c0e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18048 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>
|
#
13641:648f3106ebdf |
|
02-Apr-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: fixed how O3 CPU executes an exit system call
When a thread executed an exit syscall in SE mode, the thread context was removed immediately in the same cycle, which left inflight squash operations and trap event incomplete. The problem happened when a new thread was assigned to the CPU later. The new thread started with some incomplete transactions of the previous thread (e.g., squashing). This problem could cause incorrect execution flow for the new thread (i.e., pc was not reset properly at the exit point), deadlock (i.e., some stage-to-stage signals were not reset) and incorrect rename map between logical and physical registers.
This patch adds a new state called 'Halting' to the thread context and defers removing thread context from a CPU until a trap event initiated by an exit syscall execution is processed. This patch also makes sure that the removal of a thread context happens after all inflight transactions of the to-be-removed thread in the pipeline complete.
Change-Id: If7ef1462fb8864e22b45371ee7ae67e2a5ad38b8 Reviewed-on: https://gem5-review.googlesource.com/c/8184 Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
13622:ba31c2a23eca |
|
21-Nov-2018 |
Gabe Black <gabeblack@google.com> |
cpu, arch: Replace the CCReg type with RegVal.
Most architectures weren't using the CCReg type, and in x86 and arm it was already a uint64_t.
Change-Id: I0b3d5e690e6b31db6f2627f449c89bde0f6750a6 Reviewed-on: https://gem5-review.googlesource.com/c/14515 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
13611:c8b7847b4171 |
|
19-Nov-2018 |
Gabe Black <gabeblack@google.com> |
arch: cpu: Rename *FloatRegBits* to *FloatReg*.
Now that there's no plain FloatReg, there's no reason to distinguish FloatRegBits with a special suffix since it's the only way to read or write FP registers.
Change-Id: I3a60168c1d4302aed55223ea8e37b421f21efded Reviewed-on: https://gem5-review.googlesource.com/c/14460 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
13610:5d5404ac6288 |
|
16-Oct-2018 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
arch,cpu: Add vector predicate registers
Latest-gen. vector/SIMD extensions, including the Arm Scalable Vector Extension (SVE), introduce the notion of a predicate register file. This changeset adds this feature across architectures and CPU models.
Change-Id: Iebcadbad89c0a582ff8b1b70de353305db603946 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13715 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
|
#
13601:f5c84915eb7f |
|
10-Jan-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu, arch, arch-arm: Wire unused VecElem code in the O3 model
VecElem code had been introduced in order to simulate change of renaming for vector registers. Most of the work is happening on the rename_map switchRenameMode. Change of renaming can happen after a squash in the pipeline. This patch is also changing the interface to the ISA part so that a PCState is used instead of ISA in order to check if rename mode has changed.
Change-Id: I8af795d771b958e0a0d459abfeceff5f16b4b5d4 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15601
|
#
13582:989577bf6abc |
|
18-Oct-2018 |
Gabe Black <gabeblack@google.com> |
arch: cpu: Stop passing around misc registers by reference.
These values are all basic integers (specifically uint64_t now), and so passing them by const & is actually less efficient since there's a extra level of indirection and an extra value, and the same sized value (a 64 bit pointer vs. a 64 bit int) is being passed around.
Change-Id: Ie9956b8dc4c225068ab1afaba233ec2b42b76da3 Reviewed-on: https://gem5-review.googlesource.com/c/13626 Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
|
#
13557:fc33e6048b25 |
|
13-Oct-2018 |
Gabe Black <gabeblack@google.com> |
cpu: dev: sim: gpu-compute: Banish some ISA specific register types.
These types are IntReg, FloatReg, FloatRegBits, and MiscReg. There are some remaining types, specifically the vector registers and the CCReg. I'm less familiar with these new types of registers, and so will look at getting rid of them at some later time.
Change-Id: Ide8f76b15c531286f61427330053b44074b8ac9b Reviewed-on: https://gem5-review.googlesource.com/c/13624 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
13500:6e0a2a7c6d8c |
|
19-Nov-2018 |
Gabe Black <gabeblack@google.com> |
arch, cpu: Remove float type accessors.
Use the binary accessors instead.
Change-Id: Iff1877e92c79df02b3d13635391a8c2f025776a2 Reviewed-on: https://gem5-review.googlesource.com/c/14457 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
12429:beefb9f5f551 |
|
09-Jan-2018 |
BKP <brandon.potter@amd.com> |
style: change C/C++ source permissions to noexec
Several files in the repository were tracked with execute permissions even though the files are just normal C/C++ files (and the one .isa).
Change-Id: I976b096acab4a1fc74c5699ef1f9b222c1e635c2 Reviewed-on: https://gem5-review.googlesource.com/7241 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
12279:48bca1fee7a0 |
|
09-Oct-2017 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Prevent cpu from suspending if it is already draining
Suspending the current thread context while draining due to a quiesce pseudo instruction (for example a wfi instruction) could deadlock the cpu and prevent it from successfully draining. This change ensures that the cpu is not draining before suspending the thread context.
Change-Id: I7c019847f5a870d4bc9ce2b19936bc3dc45e5fd7 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5881 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
12181:2150eff234c1 |
|
25-Aug-2017 |
Gabe Black <gabeblack@google.com> |
stats: Get rid of some kernel stats related cruft.
The kernel stat mechanism should really be refactored and moved somewhere else, but in the mean time there's some old cruft that can be cleared away.
Change-Id: I21e725de590dda0d20bf3bc675bbe976c7b1bd86 Reviewed-on: https://gem5-review.googlesource.com/4600 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
12109:f29e9c5418aa |
|
05-Apr-2017 |
Rekai Gonzalez-Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
cpu: Added interface for vector reg file
This patch adds some more functionality to the cpu model and the arch to interface with the vector register file.
This change consists mainly of augmenting ThreadContexts and ExecContexts with calls to get/set full vectors, underlying microarchitectural elements or lanes. Those are meant to interface with the vector register file. All classes that implement this interface also get an appropriate implementation.
This requires implementing the vector register file for the different models using the VecRegContainer class.
This change set also updates the Result abstraction to contemplate the possibility of having a vector as result.
The changes also affect how the remote_gdb connection works.
There are some (nasty) side effects, such as the need to define dummy numPhysVecRegs parameter values for architectures that do not implement vector extensions.
Nathanael Premillieu's work with an increasing number of fixes and improvements of mine.
Change-Id: Iee65f4e8b03abfe1e94e6940a51b68d0977fd5bb Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues and CC reg free list initialisation ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2705
|
#
12106:7784fac1b159 |
|
05-Apr-2017 |
Rekai Gonzalez-Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
cpu: Simplify the rename interface and use RegId
With the hierarchical RegId there are a lot of functions that are redundant now.
The idea behind the simplification is that instead of having the regId, telling which kind of register read/write/rename/lookup/etc. and then the function panic_if'ing if the regId is not of the appropriate type, we provide an interface that decides what kind of register to read depending on the register type of the given regId.
Change-Id: I7d52e9e21fc01205ae365d86921a4ceb67a57178 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2702
|
#
10935:acd48ddd725f |
|
28-Jul-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
revert 5af8f40d8f2c
|
#
10934:5af8f40d8f2c |
|
26-Jul-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: implements vector registers
This adds a vector register type. The type is defined as a std::array of a fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now.
|
#
10407:a9023811bf9e |
|
20-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate
activate(), suspend(), and halt() used on thread contexts had an optional delay parameter. However this parameter was often ignored. Also, when used, the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were ever specified). This patch removes the delay parameter and 'Events' associated with them across all ISAs and cores. Unused activate logic is also removed.
|
#
10033:21c14a2b2117 |
|
24-Jan-2014 |
Ali Saidi <Ali.Saidi@ARM.com> |
arch, cpu: Add support for flattening misc register indexes.
With ARMv8 support the same misc register id results in accessing different registers depending on the current mode of the processor. This patch adds the same orthogonality to the misc register file as the others (int, float, cc). For all the othre ISAs this is currently a null-implementation.
Additionally, a system variable is added to all the ISA objects.
|
#
9944:4ff1c5c6dcbc |
|
17-Oct-2013 |
Matt Horsnell <matt.horsnell@ARM.com> |
cpu: add consistent guarding to *_impl.hh files.
|
#
9920:028e4da64b42 |
|
15-Oct-2013 |
Yasuko Eckert <yasuko.eckert@amd.com> |
cpu: add a condition-code register class
Add a third register class for condition codes, in parallel with the integer and FP classes. No ISAs use the CC class at this point though.
|
#
9478:ba80f7d4f452 |
|
22-Jan-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
x86, cpu: corrects 270c9a75e91f, take over decoder on cpu switch The changes made by the changeset 270c9a75e91f do not work well with switching of cpus. The problem is that decoder for the old thread context holds state that is not taken over by the new decoder.
This patch adds a takeOverFrom() function to Decoder class in each ISA. Except for x86, functions in other ISAs are blank. For x86, the function copies state from the old decoder to the new decoder.
|
#
9441:1133617844c8 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Fix broken thread context handover
The thread context handover code used to break when multiple handovers were performed during the same quiesce period. Previously, the thread contexts would assign the TC pointer in the old quiesce event to the new TC. This obviously broke in cases where multiple switches were performed within the same quiesce period, in which case the TC pointer in the quiesce event would point to an old CPU.
The new implementation deschedules pending quiesce events in the old TC and schedules a new quiesce event in the new TC. The code has been refactored to remove most of the code duplication.
|
#
9436:4a0223da4924 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
o3 cpu: Remove unused variables
|
#
9428:029dfe6324d3 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Unify SimpleCPU and O3 CPU serialization code
The O3 CPU used to copy its thread context to a SimpleThread in order to do serialization. This was a bit of a hack involving two static SimpleThread instances and a magic constructor that was only used by the O3 CPU.
This patch moves the ThreadContext serialization code into two global procedures that, in addition to the normal serialization parameters, take a ThreadContext reference as a parameter. This allows us to reuse the serialization code in all ThreadContext implementations.
|
#
9426:0548b3e9734d |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Implement a flat register interface in thread contexts
Some architectures map registers differently depending on their mode of operations. There is currently no architecture independent way of accessing all registers. This patch introduces a flat register interface to the ThreadContext class. This interface is useful, for example, when serializing or copying thread contexts.
|
#
9384:877293183bdf |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
arch: Make the ISA class inherit from SimObject
The ISA class on stores the contents of ID registers on many architectures. In order to make reset values of such registers configurable, we make the class inherit from SimObject, which allows us to use the normal generated parameter headers.
This patch introduces a Python helper method, BaseCPU.createThreads(), which creates a set of ISAs for each of the threads in an SMT system. Although it is currently only needed when creating multi-threaded CPUs, it should always be called before instantiating the system as this is an obvious place to configure ID registers identifying a thread/CPU.
|
#
9382:1c97b57d5169 |
|
07-Jan-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: rename the misleading inSyscall to noSquashFromTC
isSyscall was originally created because during handling of a syscall in SE mode the threadcontext had to be updated. However, in many places this is used in FS mode (e.g. fault handlers) and the name doesn't make much sense. The boolean actually stops gem5 from squashing speculative and non-committed state when a write to a threadcontext happens, so re-name the variable to something more appropriate
|
#
9180:ee8d7a51651d |
|
28-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Clock: Add a Cycles wrapper class and use where applicable
This patch addresses the comments and feedback on the preceding patch that reworks the clocks and now more clearly shows where cycles (relative cycle counts) are used to express time.
Instead of bumping the existing patch I chose to make this a separate patch, merely to try and focus the discussion around a smaller set of changes. The two patches will be pushed together though.
This changes done as part of this patch are mostly following directly from the introduction of the wrapper class, and change enough code to make things compile and run again. There are definitely more places where int/uint/Tick is still used to represent cycles, and it will take some time to chase them all down. Similarly, a lot of parameters should be changed from Param.Tick and Param.Unsigned to Param.Cycles.
In addition, the use of curTick is questionable as there should not be an absolute cycle. Potential solutions can be built on top of this patch. There is a similar situation in the o3 CPU where lastRunningCycle is currently counting in Cycles, and is still an absolute time. More discussion to be had in other words.
An additional change that would be appropriate in the future is to perform a similar wrapping of Tick and probably also introduce a Ticks class along with suitable operators for all these classes.
|
#
8887:20ea02da9c53 |
|
09-Mar-2012 |
Geoffrey Blake <geoffrey.blake@arm.com> |
CheckerCPU: Make CheckerCPU runtime selectable instead of compile selectable
Enables the CheckerCPU to be selected at runtime with the --checker option from the configs/example/fs.py and configs/example/se.py configuration files. Also merges with the SE/FS changes.
|
#
8852:c744483edfcf |
|
24-Feb-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Make port proxies use references rather than pointers
This patch is adding a clearer design intent to all objects that would not be complete without a port proxy by making the proxies members rathen than dynamically allocated. In essence, if NULL would not be a valid value for the proxy, then we avoid using a pointer to make this clear.
The same approach is used for the methods using these proxies, such as loadSections, that now use references rather than pointers to better reflect the fact that NULL would not be an acceptable value (in fact the code would break and that is how this patch started out).
Overall the concept of "using a reference to express unconditional composition where a NULL pointer is never valid" could be done on a much broader scale throughout the code base, but for now it is only done in the locations affected by the proxies.
|
#
8809:bb10807da889 |
|
01-Feb-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head, hopefully the last time for this batch.
|
#
8799:dac1e33e07b0 |
|
28-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with the main repo.
|
#
8793:5f25086326ac |
|
18-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Get rid of FULL_SYSTEM in the CPU directory.
|
#
8777:dd43f1c9fa0a |
|
31-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Make the functions available from the TC consistent between SE and FS.
|
#
8767:e575781f71b8 |
|
30-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Make getProcessPtr available in both modes, and get rid of FULL_SYSTEMs.
|
#
8761:20322354b80b |
|
16-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Build/expose vport in SE mode.
|
#
8733:64a7bf8fa56c |
|
31-Jan-2012 |
Geoffrey Blake <geoffrey.blake@arm.com> |
CheckerCPU: Re-factor CheckerCPU to be compatible with current gem5
Brings the CheckerCPU back to life to allow FS and SE checking of the O3CPU. These changes have only been tested with the ARM ISA. Other ISAs potentially require modification.
|
#
8706:b1838faf3bcc |
|
17-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Add port proxies instead of non-structural ports
Port proxies are used to replace non-structural ports, and thus enable all ports in the system to correspond to a structural entity. This has the advantage of accessing memory through the normal memory subsystem and thus allowing any constellation of distributed memories, address maps, etc. Most accesses are done through the "system port" that is used for loading binaries, debugging etc. For the entities that belong to the CPU, e.g. threads and thread contexts, they wrap the CPU data port in a port proxy.
The following replacements are made: FunctionalPort > PortProxy TranslatingPort > SETranslatingPortProxy VirtualPort > FSTranslatingPortProxy
|
#
8518:9c87727099ce |
|
19-Aug-2011 |
Geoffrey Blake <geoffrey.blake@arm.com> |
Fix bugs due to interaction between SEV instructions and O3 pipeline
SEV instructions were originally implemented to cause asynchronous squashes via the generateTCSquash() function in the O3 pipeline when updating the SEV_MAILBOX miscReg. This caused race conditions between CPUs in an MP system that would lead to a pipeline either going inactive indefinitely or not being able to commit squashed instructions. Fixed SEV instructions to behave like interrupts and cause synchronous sqaushes inside the pipeline, eliminating the race conditions. Also fixed up the semantics of the WFE instruction to behave as documented in the ARMv7 ISA description to not sleep if SEV_MAILBOX=1 or unmasked interrupts are pending.
|
#
8232:b28d06a175be |
|
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
trace: reimplement the DTRACE function so it doesn't use a vector At the same time, rename the trace flags to debug flags since they have broader usage than simply tracing. This means that --trace-flags is now --debug-flags and --trace-help is now --debug-help
|
#
8208:45331a355c38 |
|
04-Apr-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works.
This change fixes a small bug in the arm copyRegs() code where some registers wouldn't be copied if the processor was in a mode other than MODE_USER. Additionally, this change simplifies the way the O3 switchCpu code works by utilizing TheISA::copyRegs() to copy the required context information rather than the adhoc copying that goes on in the CPU model. The current code makes assumptions about the visibility of int and float registers that aren't true for all architectures in FS mode.
|
#
7823:dac01f14f20f |
|
08-Jan-2011 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Replace curTick global variable with accessor functions. This step makes it easy to replace the accessor functions (which still access a global variable) with ones that access per-thread curTick values.
|
#
7763:ff2213d13e58 |
|
15-Nov-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: reset architetural state by calling clear()
|
#
7720:65d338a8dba4 |
|
31-Oct-2010 |
Gabe Black <gblack@eecs.umich.edu> |
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.
This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way.
PC type:
Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM.
These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later.
Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses.
Advancing the PC:
The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements.
One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now.
Variable length instructions:
To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back.
ISA parser:
To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out.
Return address stack:
The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works.
Change in stats:
There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS.
TODO:
Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
|
#
7679:f26cc2c68b48 |
|
14-Sep-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Get rid of the now unnecessary getInst/setInst family of functions.
This code is no longer needed because of the preceeding change which adds a StaticInstPtr parameter to the fault's invoke method, obviating the only use for this pair of functions.
|
#
7467:91994f36de7f |
|
22-Jun-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
O3ThreadContext: When taking over from a previous context, only assert that the system pointers match in Full System mode.
|
#
6658:f4de76601762 |
|
23-Sep-2009 |
Nathan Binkert <nate@binkert.org> |
arch: nuke arch/isa_specific.hh and move stuff to generated config/the_isa.hh
|
#
6329:5d8b91875859 |
|
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Add a registers.hh file as an ISA switched header. This file is for register indices, Num* constants, and register types. copyRegs and copyMiscRegs were moved to utility.hh and utility.cc.
|
#
6314:781969fbeca9 |
|
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Get rid of the float register width parameter.
|
#
6313:95f69a436c82 |
|
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Add an ISA object which replaces the MiscRegFile. This object encapsulates (or will eventually) the identity and characteristics of the ISA in the CPU.
|
#
6221:58a3c04e6344 |
|
26-May-2009 |
Nathan Binkert <nate@binkert.org> |
types: add a type for thread IDs and try to use it everywhere
|
#
6029:007c36616f47 |
|
15-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Get rid of the Unallocated thread context state. Basically merge it in with Halted. Also had to get rid of a few other functions that called ThreadContext::deallocate(), including: - InOrderCPU's setThreadRescheduleCondition. - ThreadContext::exit(). This function was there to avoid terminating simulation when one thread out of a multi-thread workload exits, but we need to find a better (non-cpu-centric) way.
|
#
5958:2d9737bf3c2f |
|
27-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Processes: Make getting and setting system call arguments part of a process object.
|
#
5803:aae3d7089925 |
|
19-Jan-2009 |
Nathan Binkert <nate@binkert.org> |
thread_context: move getSystemPtr so SE mode can get to it. There was really no reason that it should be FS only.
|
#
5715:e8c1d4e669a7 |
|
04-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
get rid of all instances of readTid() and getThreadNum(). Unify and eliminate redundancies with threadId() as their replacement.
|
#
5714:76abee886def |
|
02-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
Add in Context IDs to the simulator. From now on, cpuId is almost never used, the primary identifier for a hardware context should be contextId(). The concept of threads within a CPU remains, in the form of threadId() because sometimes you need to know which context within a cpu to manipulate.
|
#
5712:199d31b47f7b |
|
02-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
make BaseCPU the provider of _cpuId, and cpuId() instead of being scattered across the subclasses. generally make it so that member data is _cpuId and accessor functions are cpuId(). The ID val comes from the python (default -1 if none provided), and if it is -1, the index of cpuList will be given. this has passed util/regress quick and se.py -n4 and fs.py -n4 as well as standard switch.
|
#
5704:98224505352a |
|
21-Oct-2008 |
Nathan Binkert <nate@binkert.org> |
style: Use the correct m5 style for things relating to interrupts.
|
#
5499:8bfc7650c344 |
|
01-Jul-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
Remove delVirtPort() and make getVirtPort() only return cached version.
|
#
5494:85c8d296c1cb |
|
28-Jun-2008 |
Steve Reinhardt <stever@gmail.com> |
Backed out changeset 94a7bb476fca: caused memory leak.
|
#
5489:94a7bb476fca |
|
21-Jun-2008 |
Steve Reinhardt <stever@gmail.com> |
Generate more useful error messages for unconnected ports. Force all non-default ports to provide a name and an owner in the constructor.
|
#
5258:fcccd87d5178 |
|
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
put the flattenIndex stuff back in O3 AND put fatal() back in faults
|
#
5250:42577371ff31 |
|
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
Get MIPS simple regression working. Take out unecessary functions "setShadowSet", "CacheOp"
|
#
5235:f07f46843886 |
|
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make the micropc available through the thread context objects. This is necssary for fault handlers that branch to non-zero micro PCs.
|
#
5082:82dd253231c8 |
|
19-Sep-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Put in the foundation for x87 stack based fp registers.
|
#
4217:4c966fec2324 |
|
13-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
fix segfault when peer owner attempts to use functional port
|
#
4172:141705d83494 |
|
07-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
*MiscReg->*MiscRegNoEffect, *MiscRegWithEffect->*MiscReg
|
#
3778:ac52cbef744c |
|
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zower.eecs.umich.edu:/eecshome/m5/newmem
src/cpu/o3/commit_impl.hh: Hand Merge
|
#
3776:4f88e76d8ebe |
|
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Flattening and syscallReturn fixes
src/cpu/o3/thread_context_impl.hh: Use flattened indices src/cpu/simple_thread.hh: Use flattened indices, and pass a thread context to setSyscallReturn rather than a register file. src/cpu/thread_context.hh: The SyscallReturn class is no longer in arch/syscallreturn.hh
|
#
3686:fa8d8b90cd8a |
|
29-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Change the connecting of the physPort and virtPort to the memory object below the CPU to happen every time activateContext is called. The overhead is probably a little higher than necessary, but allows these connections to properly be made when there are CPUs that are inactive until they are switched in.
Right now this introduces a minor memory leak as old physPorts and virtPorts are not deleted when new ones are created. A flyspray task has been created for this issue. It can not be resolved until we determine how the bus will handle giving out ID's to functional ports that may be deleted.
src/cpu/o3/cpu.cc: src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: Change the setup of the physPort and virtPort to instead happen every time the CPU has a context activated. This is a little high overhead, but keeps it working correctly when the CPU does not have a physical memory attached to it until it switches in (like the case of switch CPUs). src/cpu/o3/thread_context.hh: Change function from being called at init() to just being called whenever the memory ports need to be connected. src/cpu/o3/thread_context_impl.hh: Update this to not delete the port if it's the same as the virtPort. src/cpu/thread_context.hh: Change function from being called at init() to whenever the memory ports need to be connected. src/cpu/thread_state.cc: Instead of initializing the ports, simply connect them, deleting any old ports that might exist. This allows these functions to be called multiple times. src/cpu/thread_state.hh: Ports are no longer initialized, but rather connected at context activation time.
|
#
3675:dc883b610345 |
|
19-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Update Virtual and Physical ports.
src/cpu/o3/alpha/cpu_impl.hh: Handle the PhysicalPort and VirtualPort in the ThreadState. src/cpu/o3/cpu.cc: Initialize the thread context. src/cpu/o3/thread_context.hh: Add new function to initialize thread context. src/cpu/o3/thread_context_impl.hh: Use code now put into function. src/cpu/simple_thread.cc: Move code to ThreadState and use the new helper function. src/cpu/simple_thread.hh: Remove init() in this derived class; use init() from ThreadState base class. src/cpu/thread_state.cc: Move setting up of Physical and Virtual ports here. Change getMemFuncPort() to connectToMemFunc(), which connects a port to a functional port of the memory object below the CPU. src/cpu/thread_state.hh: Update functions.
|
#
3548:85e64c82c522 |
|
07-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Moved the switched version of kernel_stats.hh back to kern, and moved the base kernel_stats to base_kernel_stats
|
#
3468:cf23ad1ceef2 |
|
01-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Adjustments for the AlphaTLB changing to AlphaISA::TLB and changing register file functions to not take faults
|
#
3221:669a04468c0d |
|
08-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Updates to O3 CPU. It should now work in FS mode, although sampling still has a bug.
src/cpu/o3/commit_impl.hh: Fixes for compile and sampling. src/cpu/o3/cpu.cc: Deallocate and activate threads properly. Also hopefully fix being able to use caches while switching over. src/cpu/o3/cpu.hh: Fixes for deallocating and activating threads. src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit.hh: Handle getting back a BadAddress result from the access. src/cpu/o3/iew_impl.hh: More debug output. src/cpu/o3/lsq_unit_impl.hh: Fixup store conditional handling (still a bit of a hack, but works now).
Also handle getting back a BadAddress result from the access. src/cpu/o3/thread_context_impl.hh: Deallocate context now records if the context should be fully removed.
|
#
3126:756092c6383c |
|
02-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Updates to fix merge issues and bring almost everything up to working speed. Ozone CPU remains untested, but everything else compiles and runs.
src/arch/alpha/isa_traits.hh: This got changed to the wrong version by accident. src/cpu/base.cc: Fix up progress event to not schedule itself if the interval is set to 0. src/cpu/base.hh: Fix up the CPU Progress Event to not print itself if it's set to 0. Also remove stats_reset_inst (something I added to m5 but isn't necessary here). src/cpu/base_dyn_inst.hh: src/cpu/checker/cpu.hh: Remove float variable of instResult; it's always held within the double part now. src/cpu/checker/cpu_impl.hh: Use thread and not cpuXC. src/cpu/o3/alpha/cpu_builder.cc: src/cpu/o3/checker_builder.cc: src/cpu/ozone/checker_builder.cc: src/cpu/ozone/cpu_builder.cc: src/python/m5/objects/BaseCPU.py: Remove stats_reset_inst. src/cpu/o3/commit_impl.hh: src/cpu/ozone/lw_back_end_impl.hh: Get TC, not XCProxy. src/cpu/o3/cpu.cc: Switch out updates from the version of m5 I have. Also remove serialize code that got added twice. src/cpu/o3/iew_impl.hh: src/cpu/o3/lsq_impl.hh: src/cpu/thread_state.hh: Remove code that was added twice. src/cpu/o3/lsq_unit.hh: Add back in stats that got lost in the merge. src/cpu/o3/lsq_unit_impl.hh: Use proper method to get flags. Also wake CPU if we're coming back from a cache miss. src/cpu/o3/thread_context_impl.hh: src/cpu/o3/thread_state.hh: Support profiling. src/cpu/ozone/cpu.hh: Update to use proper typename. src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/dyn_inst_impl.hh: Updates for newmem. src/cpu/ozone/lw_lsq_impl.hh: Get flags correctly. src/cpu/ozone/thread_state.hh: Reorder constructor initialization, use tc. src/sim/pseudo_inst.cc: Allow for loading of symbol file. Be sure to use ThreadContext and not ExecContext.
|
#
2986:99640058db70 |
|
15-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Some touchup to the reorganized includes and "using" directives.
|
#
2980:eab855f06b79 |
|
15-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Cleaned up include files and got rid of many using directives in header files.
|
#
2875:9b6f6b75b187 |
|
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Fix so that O3CPU doesnt segfault on exit. Major thing was to not execute commit if there are no active threads in CPU.
src/cpu/o3/alpha/thread_context.hh: call deallocate instead of deallocateContext src/cpu/o3/commit_impl.hh: dont run commit stage if there are no instructions src/cpu/o3/cpu.cc: add deallocate event, deactivateThread function, and edit deallocateContext. src/cpu/o3/cpu.hh: add deallocate event and add optional delay to deallocateContext src/cpu/o3/thread_context.hh: optional delay for deallocate src/cpu/o3/thread_context_impl.hh: edit DPRINTFs to say Thread Context instead of Alpha TC src/cpu/thread_context.hh: optional delay src/sim/syscall_emul.hh: name stuff
|
#
2834:c8342a71404b |
|
03-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Fix for FS O3CPU compile ... missing forward class declaration/header file after files got split for ISA-independence
src/cpu/o3/alpha/thread_context.hh: Use 'this' when accessing cpu src/cpu/o3/cpu.hh: add numActiveThreds function src/cpu/o3/thread_context.hh: forward class declarations src/cpu/o3/thread_context_impl.hh: add quiesce event header file src/cpu/thread_context.hh: add exit() function to thread context (read comments in file) src/sim/syscall_emul.cc: adjust exitFunc syscall
|
#
2817:273f7fb94f83 |
|
30-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
Make O3CPU model independent of the ISA
Use O3CPU when building instead of AlphaO3CPU.
I could use some better python magic in the cpu_models.py file!
AUTHORS: add middle initial SConstruct: change from AlphaO3CPU to O3CPU src/cpu/SConscript: edits to build O3CPU instead of AlphaO3CPU src/cpu/cpu_models.py: change substitution template to use proper CPU EXEC CONTEXT For O3CPU Model...
Actually, some Python expertise could be used here. The 'env' variable is not passed to this file, so I had to parse through the ARGV to find the ISA... src/cpu/o3/base_dyn_inst.cc: src/cpu/o3/bpred_unit.cc: src/cpu/o3/commit.cc: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/decode.cc: src/cpu/o3/fetch.cc: src/cpu/o3/iew.cc: src/cpu/o3/inst_queue.cc: src/cpu/o3/lsq.cc: src/cpu/o3/lsq_unit.cc: src/cpu/o3/mem_dep_unit.cc: src/cpu/o3/rename.cc: src/cpu/o3/rob.cc: use isa_specific.hh src/sim/process.cc: only initi NextNPC if not ALPHA src/cpu/o3/alpha/cpu.cc: alphao3cpu impl src/cpu/o3/alpha/cpu.hh: move AlphaTC to it's own file src/cpu/o3/alpha/cpu_impl.hh: Move AlphaTC to it's own file ... src/cpu/o3/alpha/dyn_inst.cc: src/cpu/o3/alpha/dyn_inst.hh: src/cpu/o3/alpha/dyn_inst_impl.hh: include paths src/cpu/o3/alpha/impl.hh: include paths, set default MaxThreads to 2 instead of 4 src/cpu/o3/alpha/params.hh: set Alpha Specific Params here src/python/m5/objects/O3CPU.py: add O3CPU class src/cpu/o3/SConscript: include isa-specific build files src/cpu/o3/alpha/thread_context.cc: NEW HOME of AlphaTC src/cpu/o3/alpha/thread_context.hh: new home of AlphaTC src/cpu/o3/isa_specific.hh: includes ISA specific files src/cpu/o3/params.hh: base o3 params src/cpu/o3/thread_context.hh: base o3 thread context src/cpu/o3/thread_context_impl.hh: base o3 thead context impl
|