#
14195:c5efdb3319aa |
|
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Move the instruction port into o3's fetch stage.
That's where it's used, and that avoids having to pass it around using the top level getInstPort accessor.
Change-Id: I489a3f3239b3116292f3dcd78a3945fb468c6311 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20239 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
14194:967b9c450b04 |
|
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Move O3's data port into the LSQ.
That's where it's used, and putting it there avoids having to pass around the port using the top level getDataPort function.
Change-Id: I0dea25d0c5f4bb3f58a6574a8f2b2d242784caf2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20238 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
14085:0075b0d29d55 |
|
28-Jun-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: isDrained renamed to isCpuDrained
cpu models inheriting from BaseCPU implement a draining checker called isDrained. This hides the base Drainable::isDrained method and might create confusion in the reader. This patch is renaming it to isCpuDrained in order to avoid any ambiguity
Change-Id: Ie5221da6a4673432c2403996e42d451cae960bbf Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19468 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com>
|
#
13981:577196ddd040 |
|
02-May-2019 |
Gabe Black <gabeblack@google.com> |
arch, base, cpu, dev, mem, sim: Remove #if 0-ed out code.
This code will be preserved through version control, but otherwise creates clutter and will rot in place since it's never compiled.
Change-Id: Id265f6deac445116843956ea5cf1210d8127274e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18608 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com>
|
#
13910:d5deee7b4279 |
|
28-Apr-2019 |
Gabe Black <gabeblack@google.com> |
cpu: alpha: Delete all occurrances of the simPalCheck function.
This is now handled within the ISA description.
Change-Id: Ie409bb46d102e59d4eb41408d9196fe235626d32 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18434 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>
|
#
13908:6ab98c626b06 |
|
27-Apr-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Remove hwrei from the generic interfaces.
This mechanism is specific to Alpha and doesn't belong sprinkled around the CPU's generic mechanisms.
Change-Id: I87904d1a08df2b03eb770205e2c4b94db25201a1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18432 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>
|
#
13905:5cf30883255c |
|
27-Apr-2019 |
Gabe Black <gabeblack@google.com> |
arch: cpu: Track kernel stats using the base ISA agnostic type.
Then cast to the ISA specific type when necessary. This removes (mostly) an ISA specific aspect to some of the interfaces. The ISA specific version of the kernel stats still needs to be constructed and stored in a few places which means that kernel_stats.hh still needs to be a switching arch header, for instance.
In the future, I'd like to make the kernel its own object like the Process objects in SE mode, and then it would be able to instantiate and maintain its own stats.
Change-Id: I8309d49019124f6bea1482aaea5b5b34e8c97433 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18429 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
13831:4fba790d88be |
|
06-Mar-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
misc: Removed inconsistency in O3* debug msgs
Added consistency in the DEBUG message form, to allow a better parsing. Fixed sn/tid type parameter. Removed some annoying newlines
Change-Id: I4761c49fc12b874a7d8b46779475b606865cad4b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17248 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
13818:f0126488ef9e |
|
26-Mar-2019 |
Javier Bueno <javier.bueno@metempsy.com> |
cpu: Added a probe to notify the address of retired instructions
A probe is added to notify the address of each retired instruction.
Change-Id: Iefc1b09d74b3aa0aa5773b17ba637bf51f5a59c9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17632 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
13644:6180ee72e061 |
|
02-Apr-2018 |
Tuan Ta <qtt2@cornell.edu> |
sim,cpu: make exit_group halt all threads in a group
When a thread calls exit_group, in addition to halting the thread itself, it needs to halt all other threads in its group (i.e., threads sharing the same thread group ID). This patch enables threads to do that.
Change-Id: Ib2e158fb27cf98843f177a64a2d643b1bbc94d03 Reviewed-on: https://gem5-review.googlesource.com/c/9623 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
13641:648f3106ebdf |
|
02-Apr-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: fixed how O3 CPU executes an exit system call
When a thread executed an exit syscall in SE mode, the thread context was removed immediately in the same cycle, which left inflight squash operations and trap event incomplete. The problem happened when a new thread was assigned to the CPU later. The new thread started with some incomplete transactions of the previous thread (e.g., squashing). This problem could cause incorrect execution flow for the new thread (i.e., pc was not reset properly at the exit point), deadlock (i.e., some stage-to-stage signals were not reset) and incorrect rename map between logical and physical registers.
This patch adds a new state called 'Halting' to the thread context and defers removing thread context from a CPU until a trap event initiated by an exit syscall execution is processed. This patch also makes sure that the removal of a thread context happens after all inflight transactions of the to-be-removed thread in the pipeline complete.
Change-Id: If7ef1462fb8864e22b45371ee7ae67e2a5ad38b8 Reviewed-on: https://gem5-review.googlesource.com/c/8184 Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
13622:ba31c2a23eca |
|
21-Nov-2018 |
Gabe Black <gabeblack@google.com> |
cpu, arch: Replace the CCReg type with RegVal.
Most architectures weren't using the CCReg type, and in x86 and arm it was already a uint64_t.
Change-Id: I0b3d5e690e6b31db6f2627f449c89bde0f6750a6 Reviewed-on: https://gem5-review.googlesource.com/c/14515 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
13611:c8b7847b4171 |
|
19-Nov-2018 |
Gabe Black <gabeblack@google.com> |
arch: cpu: Rename *FloatRegBits* to *FloatReg*.
Now that there's no plain FloatReg, there's no reason to distinguish FloatRegBits with a special suffix since it's the only way to read or write FP registers.
Change-Id: I3a60168c1d4302aed55223ea8e37b421f21efded Reviewed-on: https://gem5-review.googlesource.com/c/14460 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
13610:5d5404ac6288 |
|
16-Oct-2018 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
arch,cpu: Add vector predicate registers
Latest-gen. vector/SIMD extensions, including the Arm Scalable Vector Extension (SVE), introduce the notion of a predicate register file. This changeset adds this feature across architectures and CPU models.
Change-Id: Iebcadbad89c0a582ff8b1b70de353305db603946 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13715 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
|
#
13601:f5c84915eb7f |
|
10-Jan-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu, arch, arch-arm: Wire unused VecElem code in the O3 model
VecElem code had been introduced in order to simulate change of renaming for vector registers. Most of the work is happening on the rename_map switchRenameMode. Change of renaming can happen after a squash in the pipeline. This patch is also changing the interface to the ISA part so that a PCState is used instead of ISA in order to check if rename mode has changed.
Change-Id: I8af795d771b958e0a0d459abfeceff5f16b4b5d4 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15601
|
#
13598:39220222740c |
|
04-Jan-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Fix VecElemClass bugs in cpu models
This patch is:
* Adding a missing VecElemClass entry * Fixing assertion in rename map which was checking the number of free vector registers rather than free vector element registers * Fixing assertion in read/setVecElemOperand APIs. * Using the right register index in SimpleThread * Using VecElem instead of VecReg on O3 readArchVecElem
Change-Id: I265320dcbe35eb47075991301dfc99333c5190c4 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15598 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
13590:d7e018859709 |
|
13-Feb-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu-o3: O3 LSQ Generalisation
This patch does a large modification of the LSQ in the O3 model. The main goal of the patch is to remove the 'an operation can be served with one or two memory requests' assumption that is present in the LSQ and the instruction with the req, reqLow, reqHigh triplet, and generalising it to operations that can be addressed with one request, and operations that require many requests, embodied in the SingleDataRequest and the SplitDataRequest.
This modification has been done mimicking the minor model to an extent, shifting the responsibilities of dealing with VtoP translation and tracking the status and resources from the DynInst to the LSQ via the LSQRequest. The LSQRequest models the information concerning the operation, handles the creation of fragments for translation and request as well as assembling/splitting the data accordingly.
With this modifications, the implementation of vector ISAs, particularly on the memory side, become more rich, as the new model permits a dissociation of the ISA characteristics as vector length, from the microarchitectural characteristics that govern how contiguous loads are executing, allowing exploration of different LSQ to DL1 bus widths to understand the tradeoffs in complexity and performance.
Part of the complexities introduced stem from the fact that gem5 keeps a large amount of metadata regarding, in particular, memory operations, thus, when an instruction is squashed while some operation as TLB lookup or cache access is ongoing, when the relevant structure communicates to the LSQ that the operation is over, it tries to access some pieces of data that should have died when the instruction is squashed, leading to asserts, panics, or memory corruption. To ensure the correct behaviour, the LSQRequest rely on assesing who is their owner, and self-destroying if they detect their owner is done with the request, and there will be no subsequent action. For example, in the case of an instruction squashed whal the TLB is doing a walk to serve the translation, when the translation is served by the TLB, the LSQRequest detects that the instruction was squashed, and as the translation is done, no one else expect to access its information, and therefore, it self-destructs. Having destroyed the LSQRequest earlier, would lead to wrong behaviour as the TLB walk may access some fields of it.
Additional authors: - Gabor Dozsa <gabor.dozsa@arm.com>
Change-Id: I9578a1a3f6b899c390cdd886856a24db68ff7d0c Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13516 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
|
#
13582:989577bf6abc |
|
18-Oct-2018 |
Gabe Black <gabeblack@google.com> |
arch: cpu: Stop passing around misc registers by reference.
These values are all basic integers (specifically uint64_t now), and so passing them by const & is actually less efficient since there's a extra level of indirection and an extra value, and the same sized value (a 64 bit pointer vs. a 64 bit int) is being passed around.
Change-Id: Ie9956b8dc4c225068ab1afaba233ec2b42b76da3 Reviewed-on: https://gem5-review.googlesource.com/c/13626 Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
|
#
13557:fc33e6048b25 |
|
13-Oct-2018 |
Gabe Black <gabeblack@google.com> |
cpu: dev: sim: gpu-compute: Banish some ISA specific register types.
These types are IntReg, FloatReg, FloatRegBits, and MiscReg. There are some remaining types, specifically the vector registers and the CCReg. I'm less familiar with these new types of registers, and so will look at getting rid of them at some later time.
Change-Id: Ide8f76b15c531286f61427330053b44074b8ac9b Reviewed-on: https://gem5-review.googlesource.com/c/13624 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
13546:6cd6d7b19498 |
|
12-Dec-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Fix usage of setArchVecElem
setArchVecElem should create a VecElemClass RegId, and not a VecRegClass. Initializing a VecRegClass with three arguments makes it panic
Change-Id: I6c398d67305bfe7bea12cb02edd4f4c3a202e69a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15655 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
13500:6e0a2a7c6d8c |
|
19-Nov-2018 |
Gabe Black <gabeblack@google.com> |
arch, cpu: Remove float type accessors.
Use the binary accessors instead.
Change-Id: Iff1877e92c79df02b3d13635391a8c2f025776a2 Reviewed-on: https://gem5-review.googlesource.com/c/14457 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
13429:a1e199fd8122 |
|
06-Feb-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu: Fix the usage of const DynInstPtr
Summary: Usage of const DynInstPtr& when possible and introduction of move operators to RefCountingPtr.
In many places, scoped references to dynamic instructions do a copy of the DynInstPtr when a reference would do. This is detrimental to performance. On top of that, in case there is a need for reference tracking for debugging, the redundant copies make the process much more painful than it already is.
Also, from the theoretical point of view, a function/method that defines a convenience name to access an instruction should not be considered an owner of the data, i.e., doing a copy and not a reference is not justified.
On a related topic, C++11 introduces move semantics, and those are useful when, for example, there is a class modelling a HW structure that contains a list, and has a getHeadOfList function, to prevent doing a copy to an internal variable -> update pointer, remove from the list -> update pointer, return value making a copy to the assined variable -> update pointer, destroy the returned value -> update pointer.
Change-Id: I3bb46c20ef23b6873b469fd22befb251ac44d2f6 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13105 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
12284:b91c036913da |
|
20-Jul-2017 |
Jose Marinho <jose.marinho@arm.com> |
cpu, cpu, sim: move Cycle probe update
Move the code responsible for performing the actual probe point notify into BaseCPU. Use BaseCPU activateContext and suspendContext to keep track of sleep cycles. Create a probe point (ppActiveCycles) that does not count cycles where the processor was asleep. Rename ppCycles to ppAllCycles to reflect its nature.
Change-Id: I1907ddd07d0ff9f2ef22cc9f61f5f46c630c9d66 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5762 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
|
#
12276:22c220be30c5 |
|
16-Mar-2017 |
Anouk Van Laer <anouk.vanlaer@arm.com> |
pwr: Adds logic to enter power gating for the cpu model
If the CPU has been clock gated for a sufficient amount of time (configurable via pwrGatingLatency), the CPU will go into the OFF power state. This does not model hardware, just behaviour.
Change-Id: Ib3681d1ffa6ad25eba60f47b4020325f63472d43 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/3969 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
12143:e48005f585f2 |
|
06-Apr-2017 |
Anouk Van Laer <anouk.vanlaer@arm.com> |
cpu,o3: Fixed checkpointing bug occuring in the o3 CPU
Checkpointing a system with out-of-order CPUs might get stuck if one of the CPUs has been put to sleep. The quiesce instruction cannot get drained hence checkpointing never finishes.
This commit resolves that by activating all suspended thread contexts when draining the system.
Change-Id: I817ab1672b4ead777bd8e12a0445829481c46fdc Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/3970 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
12127:4207df055b0d |
|
28-Jun-2017 |
Sean Wilson <spwilson2@wisc.edu> |
cpu: Refactor some Event subclasses to lambdas
Change-Id: If765c6100d67556f157e4e61aa33c2b7eeb8d2f0 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3923 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
12109:f29e9c5418aa |
|
05-Apr-2017 |
Rekai Gonzalez-Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
cpu: Added interface for vector reg file
This patch adds some more functionality to the cpu model and the arch to interface with the vector register file.
This change consists mainly of augmenting ThreadContexts and ExecContexts with calls to get/set full vectors, underlying microarchitectural elements or lanes. Those are meant to interface with the vector register file. All classes that implement this interface also get an appropriate implementation.
This requires implementing the vector register file for the different models using the VecRegContainer class.
This change set also updates the Result abstraction to contemplate the possibility of having a vector as result.
The changes also affect how the remote_gdb connection works.
There are some (nasty) side effects, such as the need to define dummy numPhysVecRegs parameter values for architectures that do not implement vector extensions.
Nathanael Premillieu's work with an increasing number of fixes and improvements of mine.
Change-Id: Iee65f4e8b03abfe1e94e6940a51b68d0977fd5bb Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues and CC reg free list initialisation ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2705
|
#
12106:7784fac1b159 |
|
05-Apr-2017 |
Rekai Gonzalez-Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
cpu: Simplify the rename interface and use RegId
With the hierarchical RegId there are a lot of functions that are redundant now.
The idea behind the simplification is that instead of having the regId, telling which kind of register read/write/rename/lookup/etc. and then the function panic_if'ing if the regId is not of the appropriate type, we provide an interface that decides what kind of register to read depending on the register type of the given regId.
Change-Id: I7d52e9e21fc01205ae365d86921a4ceb67a57178 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2702
|
#
12105:742d80361989 |
|
05-Apr-2017 |
Nathanael Premillieu <nathanael.premillieu@arm.com> |
cpu: Physical register structural + flat indexing
Mimic the changes done on the architectural register indexes on the physical register indexes. This is specific to the O3 model. The structure, called PhysRegId, contains a register class, a register index and a flat register index. The flat register index is kept because it is useful in some cases where the type of register is not important (dependency graph and scoreboard for example). Instead of directly using the structure, most of the code is working with a const PhysRegId* (typedef to PhysRegIdPtr). The actual PhysRegId objects are stored in the regFile.
Change-Id: Ic879a3cc608aa2f34e2168280faac1846de77667 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2701 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
12104:edd63f9c6184 |
|
05-Apr-2017 |
Nathanael Premillieu <nathanael.premillieu@arm.com> |
arch, cpu: Architectural Register structural indexing
Replace the unified register mapping with a structure associating a class and an index. It is now much easier to know which class of register the index is referring to. Also, when adding a new class there is no need to modify existing ones.
Change-Id: I55b3ac80763702aa2cd3ed2cbff0a75ef7620373 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2700
|
#
11877:5ea85692a53e |
|
20-Jul-2015 |
Brandon Potter <brandon.potter@amd.com> |
syscall_emul: [patch 13/22] add system call retry capability
This changeset adds functionality that allows system calls to retry without affecting thread context state such as the program counter or register values for the associated thread context (when system calls return with a retry fault).
This functionality is needed to solve problems with blocking system calls in multi-process or multi-threaded simulations where information is passed between processes/threads. Blocking system calls can cause deadlock because the simulator itself is single threaded. There is only a single thread servicing the event queue which can cause deadlock if the thread hits a blocking system call instruction.
To illustrate the problem, consider two processes using the producer/consumer sharing model. The processes can use file descriptors and the read and write calls to pass information to one another. If the consumer calls the blocking read system call before the producer has produced anything, the call will block the event queue (while executing the system call instruction) and deadlock the simulation.
The solution implemented in this changeset is to recognize that the system calls will block and then generate a special retry fault. The fault will be sent back up through the function call chain until it is exposed to the cpu model's pipeline where the fault becomes visible. The fault will trigger the cpu model to replay the instruction at a future tick where the call has a chance to succeed without actually going into a blocking state.
In subsequent patches, we recognize that a syscall will block by calling a non-blocking poll (from inside the system call implementation) and checking for events. When events show up during the poll, it signifies that the call would not have blocked and the syscall is allowed to proceed (calling an underlying host system call if necessary). If no events are returned from the poll, we generate the fault and try the instruction for the thread context at a distant tick. Note that retrying every tick is not efficient.
As an aside, the simulator has some multi-threading support for the event queue, but it is not used by default and needs work. Even if the event queue was completely multi-threaded, meaning that there is a hardware thread on the host servicing a single simulator thread contexts with a 1:1 mapping between them, it's still possible to run into deadlock due to the event queue barriers on quantum boundaries. The solution of replaying at a later tick is the simplest solution and solves the problem generally.
|
#
11793:ef606668d247 |
|
09-Nov-2016 |
Brandon Potter <brandon.potter@amd.com> |
style: [patch 1/22] use /r/3648/ to reorganize includes
|
#
11627:fe32a5238754 |
|
13-Sep-2016 |
Michael LeBeane <michael.lebeane@amd.com> |
sim: Refactor quiesce and remove FS asserts The quiesce family of magic ops can be simplified by the inclusion of quiesceTick() and quiesce() functions on ThreadContext. This patch also gets rid of the FS guards, since suspending a CPU is also a valid operation for SE mode.
|
#
11526:5b81895e5d5e |
|
06-Jun-2016 |
David Guillen Fandos <david.guillen@arm.com> |
pwr: Low-power idle power state for idle CPUs
Add functionality to the BaseCPU that will put the entire CPU into a low-power idle state whenever all threads in it are idle.
Change-Id: I984d1656eb0a4863c87ceacd773d2d10de5cfd2b
|
#
11429:cf5af0cc3be4 |
|
06-Apr-2016 |
Andreas Sandberg <andreas.sandberg@arm.com> |
Revert power patch sets with unexpected interactions
The following patches had unexpected interactions with the current upstream code and have been reverted for now:
e07fd01651f3: power: Add support for power models 831c7f2f9e39: power: Low-power idle power state for idle CPUs 4f749e00b667: power: Add power states to ClockedObject
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
11423:831c7f2f9e39 |
|
09-Dec-2014 |
Akash Bagdia <akash.bagdia@ARM.com> |
power: Low-power idle power state for idle CPUs
Add functionality to the BaseCPU that will put the entire CPU into a low-power idle state whenever all threads in it are idle.
|
#
11284:b3926db25371 |
|
31-Dec-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Make cache terminology easier to understand
This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions.
The following name changes are made:
* the packet memInhibit flag is renamed to cacheResponding
* the packet sharedAsserted flag is renamed to hasSharers
* the packet NeedsExclusive attribute is renamed to NeedsWritable
* the packet isSupplyExclusive is renamed responderHadWritable
* the MSHR pendingDirty is renamed to pendingModified
The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand.
|
#
11246:93d2a1526103 |
|
07-Dec-2015 |
Radhika Jagtap <radhika.jagtap@ARM.com> |
probe: Add probe in Fetch, IEW, Rename and Commit
This patch adds probe points in Fetch, IEW, Rename and Commit stages as follows.
A probe point is added in the Fetch stage for probing when a fetch request is sent. Notify is fired on the probe point when a request is sent succesfully in the first attempt as well as on a retry attempt.
Probe points are added in the IEW stage when an instruction begins to execute and when execution is complete. This points can be used for monitoring the execution time of an instruction.
Probe points are added in the Rename stage to probe renaming of source and destination registers and when there is squashing. These probe points can be used to track register dependencies and remove when there is squashing.
A probe point for squashing is added in Commit to probe squashed instructions.
|
#
11225:9bc552f9e4b0 |
|
22-Nov-2015 |
Nathanael Premillieu <nathananel.premillieu@arm.com> |
cpu: Fix base FP and CC register index in o3 insertThread()
Note that the method is not used, and could possibly be deleted.
|
#
11151:ca4ea9b5c052 |
|
30-Sep-2015 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu,isa,mem: Add per-thread wakeup logic
Changes wakeup functionality so that only specific threads on SMT capable cpus are woken.
|
#
11150:a8a64cca231b |
|
30-Sep-2015 |
Mitch Hayenga <mitch.hayenga@arm.com> |
isa,cpu: Add support for FS SMT Interrupts
Adds per-thread interrupt controllers and thread/context logic so that interrupts properly get routed in SMT systems.
|
#
11148:1bc3d93c7eaa |
|
30-Sep-2015 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add per-thread monitors
Adds per-thread address monitors to support FullSystem SMT.
|
#
10935:acd48ddd725f |
|
28-Jul-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
revert 5af8f40d8f2c
|
#
10934:5af8f40d8f2c |
|
26-Jul-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: implements vector registers
This adds a vector register type. The type is defined as a std::array of a fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now.
|
#
10913:38dbdeea7f1f |
|
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Refactor and simplify the drain API
The drain() call currently passes around a DrainManager pointer, which is now completely pointless since there is only ever one global DrainManager in the system. It also contains vestiges from the time when SimObjects had to keep track of their child objects that needed draining.
This changeset moves all of the DrainState handling to the Drainable base class and changes the drain() and drainResume() calls to reflect this. Particularly, the drain() call has been updated to take no parameters (the DrainManager argument isn't needed) and return a DrainState instead of an unsigned integer (there is no point returning anything other than 0 or 1 any more). Drainable objects should return either DrainState::Draining (equivalent to returning 1 in the old system) if they need more time to drain or DrainState::Drained (equivalent to returning 0 in the old system) if they are already in a consistent state. Returning DrainState::Running is considered an error.
Drain done signalling is now done through the signalDrainDone() method in the Drainable class instead of using the DrainManager directly. The new call checks if the state of the object is DrainState::Draining before notifying the drain manager. This means that it is safe to call signalDrainDone() without first checking if the simulator has requested draining. The intention here is to reduce the code needed to implement draining in simple objects.
|
#
10910:32f3d1c454ec |
|
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Make the drain state a global typed enum
The drain state enum is currently a part of the Drainable interface. The same state machine will be used by the DrainManager to identify the global state of the simulator. Make the drain state a global typed enum to better cater for this usage scenario.
|
#
10905:a6ca6831e775 |
|
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Refactor the serialization base class
Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically:
* Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section.
* Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects.
* Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections).
* The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects.
* Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code.
|
#
10821:581fb2484bd6 |
|
05-May-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Snoop into caches on uncacheable accesses
This patch takes a last step in fixing issues related to uncacheable accesses. We do not separate uncacheable memory from uncacheable devices, and in cases where it is really memory, there are valid scenarios where we need to snoop since we do not support cache maintenance instructions (yet). On snooping an uncacheable access we thus provide data if possible. In essence this makes uncacheable accesses IO coherent.
The snoop filter is also queried to steer the snoops, but not updated since the uncacheable accesses do not allocate a block.
|
#
10774:68d688cbe26c |
|
03-Apr-2015 |
Nikos Nikoleris <nikos.nikoleris@gmail.com> |
cpu: fix system total instructions accounting
The totalInstructions counter is only incremented when the whole instruction is commited and not on every microop. It was incorrectly reset in atomic and timing cpus.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>"
|
#
10713:eddb533708cb |
|
02-Mar-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Split port retry for all different packet classes
This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.
|
#
10698:829adc48e175 |
|
16-Feb-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Make readMiscRegNoEffect const throughout
Finally took the plunge and made this apply to all ISAs, not just ARM.
|
#
10683:94901e131a7f |
|
06-Feb-2015 |
Alexandru Dutu <alexandru.dutu@amd.com> |
cpu: Idle CPU status logic revised
This patch sets the CPU status to idle when the last active thread gets suspended.
|
#
10529:05b5a6cf3521 |
|
06-Nov-2014 |
Marc Orr <morr@cs.wisc.edu> |
x86 isa: This patch attempts an implementation at mwait.
Mwait works as follows: 1. A cpu monitors an address of interest (monitor instruction) 2. A cpu calls mwait - this loads the cache line into that cpu's cache. 3. The cpu goes to sleep. 4. When another processor requests write permission for the line, it is evicted from the sleeping cpu's cache. This eviction is forwarded to the sleeping cpu, which then wakes up.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
#
10487:5914229e6b16 |
|
20-Oct-2014 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: corrects base FP and CC register index in removeThread()
|
#
10464:2a0fe8bca031 |
|
16-Oct-2014 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Probe points for basic PMU stats
This changeset adds probe points that can be used to implement PMU counters for CPU stats. The following probes are supported:
* BaseCPU::ppCycles / Cycles * BaseCPU::ppRetiredInsts / RetiredInsts * BaseCPU::ppRetiredLoads / RetiredLoads * BaseCPU::ppRetiredStores / RetiredStores * BaseCPU::ppRetiredBranches RetiredBranches
|
#
10417:710ee116eb68 |
|
27-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Use const StaticInstPtr references where possible
This patch optimises the passing of StaticInstPtr by avoiding copying the reference-counting pointer. This avoids first incrementing and then decrementing the reference-counting pointer.
|
#
10408:a59c189de383 |
|
20-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Remove unused deallocateContext calls
The call paths for de-scheduling a thread are halt() and suspend(), from the thread context. There is no call to deallocateContext() in general, though some CPUs chose to define it. This patch removes the function from BaseCPU and the cores which do not require it.
|
#
10407:a9023811bf9e |
|
20-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate
activate(), suspend(), and halt() used on thread contexts had an optional delay parameter. However this parameter was often ignored. Also, when used, the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were ever specified). This patch removes the delay parameter and 'Events' associated with them across all ISAs and cores. Unused activate logic is also removed.
|
#
10379:c00f6d7e2681 |
|
19-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Pass faults by const reference where possible
This patch changes how faults are passed between methods in an attempt to copy as few reference-counting pointer instances as possible. This should avoid unecessary copies being created, contributing to the increment/decrement of the reference counters.
|
#
10331:ed05298e8566 |
|
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix SMT scheduling issue with the O3 cpu
The o3 cpu could attempt to schedule inactive threads under round-robin SMT mode.
This is because it maintained an independent priority list of threads from the active thread list. This priority list could be come stale once threads were inactive, leading to the cpu trying to fetch/commit from inactive threads.
Additionally the fetch queue is now forcibly flushed of instrctuctions from the de-scheduled thread.
Relevant output:
24557000: system.cpu: [tid:1]: Calling deactivate thread. 24557000: system.cpu: [tid:1]: Removing from active threads list
24557500: system.cpu: FullO3CPU: Ticking main, FullO3CPU. 24557500: system.cpu.fetch: Running stage. 24557500: system.cpu.fetch: Attempting to fetch from [tid:1]
|
#
10239:592f0bb6bd6f |
|
21-Jun-2014 |
Binh Pham <binhpham@cs.rutgers.edu> |
o3: split load & store queue full cases in rename
Check for free entries in Load Queue and Store Queue separately to avoid cases when load cannot be renamed due to full Store Queue and vice versa.
This work was done while Binh was an intern at AMD Research.
|
#
10225:01df075d9f93 |
|
23-May-2014 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: remove stat totalCommittedInsts This patch removes the stat totalCommittedInsts. This variable was used for recording the total number of instructions committed across all the threads of a core. The instructions committed by each thread are recorded invidually. The total would now be generated by summing these individual counts.
|
#
10164:2d2c60bda8b2 |
|
19-Apr-2014 |
Faissal Sleiman <sleimanf@umich.edu> |
o3: Fix occupancy checks for SMT A number of calls to isEmpty() and numFreeEntries() should be thread-specific.
In cpu.cc, the fact that tid is /*commented*/ out is a bug. Say the rob has instructions from thread 0 (isEmpty() returns false), and none from thread 1. If we are trying to squash all of thread 1, then readTailInst(thread 1) will be called because rob->isEmpty() returns false. The result is end_it is not in the list and the while statement loops indefinitely back over the cpu's instList.
In iew_impl.hh, all threads are told they have the entire remaining IQ, when each thread actually has a certain allocation. The result is extra stalls at the iew dispatch stage which the rename stage usually takes care of.
In commit_impl.hh, rob->readHeadInst(thread 1) can be called if the rob only contains instructions from thread 0. This returns a dummyInst (which may work since we are trying to squash all instructions, but hardly seems like the right way to do it).
In rob_impl.hh this fix skips the rest of the function more frequently and is more efficient.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
#
10023:91faf6649de0 |
|
24-Jan-2014 |
Matt Horsnell <matt.horsnell@ARM.com> |
base: add support for probe points and common probes
The probe patch is motivated by the desire to move analytical and trace code away from functional code. This is achieved by the probe interface which is essentially a glorified observer model.
What this means to users: * add a probe point and a "notify" call at the source of an "event" * add an isolated module, that is being used to carry out *your* analysis (e.g. generate a trace) * register that module as a probe listener Note: an example is given for reference in src/cpu/o3/simple_trace.[hh|cc] and src/cpu/SimpleTrace.py
What is happening under the hood: * every SimObject maintains has a ProbeManager. * during initialization (src/python/m5/simulate.py) first regProbePoints and the regProbeListeners is called on each SimObject. this hooks up the probe point notify calls with the listeners.
FAQs: Why did you develop probe points: * to remove trace, stats gathering, analytical code out of the functional code. * the belief that probes could be generically useful.
What is a probe point: * a probe point is used to notify upon a given event (e.g. cpu commits an instruction)
What is a probe listener: * a class that handles whatever the user wishes to do when they are notified about an event.
What can be passed on notify: * probe points are templates, and so the user can generate probes that pass any type of argument (by const reference) to a listener.
What relationships can be generated (1:1, 1:N, N:M etc): * there isn't a restriction. You can hook probe points and listeners up in a 1:1, 1:N, N:M relationship. They become useful when a number of modules listen to the same probe points. The idea being that you can add a small number of probes into the source code and develop a larger number of useful analysis modules that use information passed by the probes.
Can you give examples: * adding a probe point to the cpu's commit method allows you to build a trace module (outputting assembler), you could re-use this to gather instruction distribution (arithmetic, load/store, conditional, control flow) stats.
Why is the probe interface currently restricted to passing a const reference: * the desire, initially at least, is to allow an interface to observe functionality, but not to change functionality. * of course this can be subverted by const-casting.
What is the performance impact of adding probes: * when nothing is actively listening to the probes they should have a relatively minor impact. Profiling has suggested even with a large number of probes (60) the impact of them (when not active) is very minimal (<1%).
|
#
9992:6e39e3641dd8 |
|
03-Dec-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: call BaseCPU startup() function in o3 cpu
|
#
9954:72a72649a156 |
|
31-Oct-2013 |
Faissal Sleiman <Faissal.Sleiman@arm.com> |
cpu: Construct ROB with cpu params struct instead of each variable
Most other structures/stages get passed the cpu params struct.
|
#
9920:028e4da64b42 |
|
15-Oct-2013 |
Yasuko Eckert <yasuko.eckert@amd.com> |
cpu: add a condition-code register class
Add a third register class for condition codes, in parallel with the integer and FP classes. No ISAs use the CC class at this point though.
|
#
9919:803903a8dac1 |
|
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu/o3: clean up rename map and free list
Restructured rename map and free list to clean up some extraneous code and separate out common code that can be reused across different register classes (int and fp at this point). Both components now consist of a set of Simple* objects that are stand-alone rename map & free list for each class, plus a Unified* object that presents a unified interface across all register classes and then redirects accesses to the appropriate Simple* object as needed.
Moved free list initialization to PhysRegFile to better isolate knowledge of physical register index mappings to that class (and remove the need to pass a number of parameters to the free list constructor).
Causes a small change to these stats: cpu.rename.int_rename_lookups cpu.rename.fp_rename_lookups because they are now categorized on a per-operand basis rather than a per-instruction basis. That is, an instruction with mixed fp/int/misc operand types will have each operand categorized independently, where previously the lookup was categorized based on the instruction type.
|
#
9916:9c3a4595cce9 |
|
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu/o3: clean up scoreboard object
It had a bunch of fields (and associated constructor parameters) thet it didn't really use, and the array initialization was needlessly verbose.
Also just hardwired the getReg() method to aleays return true for misc regs, rather than having an array of bits that we always kept marked as ready.
|
#
9915:d9e3ad574162 |
|
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu/o3: clean up physical register file
No need for PhysRegFile to be a template class, or have a pointer back to the CPU. Also made some methods for checking the physical register type (int vs. float) based on the phys reg index, which will come in handy later.
|
#
9648:f10eb34e3e38 |
|
22-Apr-2013 |
Dam Sunwoo <dam.sunwoo@arm.com> |
sim: separate nextCycle() and clockEdge() in clockedObjects
Previously, nextCycle() could return the *current* cycle if the current tick was already aligned with the clock edge. This behavior is not only confusing (not quite what the function name implies), but also caused problems in the drainResume() function. When exiting/re-entering the sim loop (e.g., to take checkpoints), the CPUs will drain and resume. Due to the previous behavior of nextCycle(), the CPU tick events were being rescheduled in the same ticks that were already processed before draining. This caused divergence from runs that did not exit/re-entered the sim loop. (Initially a cycle difference, but a significant impact later on.)
This patch separates out the two behaviors (nextCycle() and clockEdge()), uses nextCycle() in drainResume, and uses clockEdge() everywhere else. Nothing (other than name) should change except for the drainResume timing.
|
#
9524:d6ffa982a68b |
|
15-Feb-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
sim: Add a system-global option to bypass caches
Virtualized CPUs and the fastmem mode of the atomic CPU require direct access to physical memory. We currently require caches to be disabled when using them to prevent chaos. This is not ideal when switching between hardware virutalized CPUs and other CPU models as it would require a configuration change on each switch. This changeset introduces a new version of the atomic memory mode, 'atomic_noncaching', where memory accesses are inserted into the memory system as atomic accesses, but bypass caches.
To make memory mode tests cleaner, the following methods are added to the System class:
* isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'. * isTimingMode() -- True if the memory mode is 'timing'. * bypassCaches() -- True if caches should be bypassed.
The old getMemoryMode() and setMemoryMode() methods should never be used from the C++ world anymore.
|
#
9523:b8c8437f71d9 |
|
15-Feb-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Refactor memory system checks
CPUs need to test that the memory system is in the right mode in two places, when the CPU is initialized (unless it's switched out) and on a drainResume(). This led to some code duplication in the CPU models. This changeset introduces the verifyMemoryMode() method which is called by BaseCPU::init() if the CPU isn't switched out. The individual CPU models are responsible for calling this method when resuming from a drain as this code is CPU model specific.
|
#
9461:67a6ba6604c8 |
|
12-Jan-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
x86: Changes to decoder, corrects 9376 The changes made by the changeset 9376 were not quite correct. The patch made changes to the code which resulted in decoder not getting initialized correctly when the state was restored from a checkpoint.
This patch adds a startup function to each ISA object. For x86, this function sets the required state in the decoder. For other ISAs, the function is empty right now.
|
#
9448:569d1e8f74e4 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Unify the serialization code for all of the CPU models
Cleanup the serialization code for the simple CPUs and the O3 CPU. The CPU-specific code has been replaced with a (un)serializeThread that serializes the thread state / context of a specific thread. Assuming that the thread state class uses the CPU-specific thread state uses the base thread state serialization code, this allows us to restore a checkpoint with any of the CPU models.
|
#
9444:ab47fe7f03f0 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Rewrite O3 draining to avoid stopping in microcode
Previously, the O3 CPU could stop in the middle of a microcode sequence. This patch makes sure that the pipeline stops when it has committed a normal instruction or exited from a microcode sequence. Additionally, it makes sure that the pipeline has no instructions in flight when it is drained, which should make draining more robust.
Draining is controlled in the commit stage, which checks if the next PC after a committed instruction is in microcode. If this isn't the case, it requests a squash of all instructions after that the instruction that just committed and immediately signals a drain stall to the fetch stage. The CPU then continues to execute until the pipeline and all associated buffers are empty.
|
#
9436:4a0223da4924 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
o3 cpu: Remove unused variables
|
#
9433:34971d2e0019 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Rename defer_registration->switched_out
The defer_registration parameter is used to prevent a CPU from initializing at startup, leaving it in the "switched out" mode. The name of this parameter (and the help string) is confusing. This patch renames it to switched_out, which should be more descriptive.
|
#
9429:7c787b8030c6 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Correctly call parent on switchOut() and takeOverFrom()
This patch cleans up the CPU switching functionality by making sure that CPU models consistently call the parent on switchOut() and takeOverFrom(). This has the following implications that might alter current functionality:
* The call to BaseCPU::switchout() in the O3 CPU is moved from signalDrained() (!) to switchOut().
* A call to BaseSimpleCPU::switchOut() is introduced in the simple CPUs.
|
#
9428:029dfe6324d3 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Unify SimpleCPU and O3 CPU serialization code
The O3 CPU used to copy its thread context to a SimpleThread in order to do serialization. This was a bit of a hack involving two static SimpleThread instances and a magic constructor that was only used by the O3 CPU.
This patch moves the ThreadContext serialization code into two global procedures that, in addition to the normal serialization parameters, take a ThreadContext reference as a parameter. This allows us to reuse the serialization code in all ThreadContext implementations.
|
#
9427:ddf45c1d54d4 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Initialize the O3 pipeline from startup()
The entire O3 pipeline used to be initialized from init(), which is called before initState() or unserialize(). This causes the pipeline to be initialized from an incorrect thread context. This doesn't currently lead to correctness problems as instructions fetched from the incorrect start PC will be squashed a few cycles after initialization.
This patch will affect the regressions since the O3 CPU now issues its first instruction fetch to the correct PC instead of 0x0.
|
#
9424:d631aac65246 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Check that the memory system is in the correct mode
This patch adds checks to all CPU models to make sure that the memory system is in the correct mode at startup and when resuming after a drain. Previously, we only checked that the memory system was in the right mode when resuming. This is inadequate since this is a configuration error that should be detected at startup as well as when resuming. Additionally, since the check was done using an assert, it wasn't performed when NDEBUG was set (e.g., the fast target).
|
#
9384:877293183bdf |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
arch: Make the ISA class inherit from SimObject
The ISA class on stores the contents of ID registers on many architectures. In order to make reset values of such registers configurable, we make the class inherit from SimObject, which allows us to use the normal generated parameter headers.
This patch introduces a Python helper method, BaseCPU.createThreads(), which creates a set of ISAs for each of the threads in an SMT system. Although it is currently only needed when creating multi-threaded CPUs, it should always be called before instantiating the system as this is an obvious place to configure ID registers identifying a thread/CPU.
|
#
9382:1c97b57d5169 |
|
07-Jan-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: rename the misleading inSyscall to noSquashFromTC
isSyscall was originally created because during handling of a syscall in SE mode the threadcontext had to be updated. However, in many places this is used in FS mode (e.g. fault handlers) and the name doesn't make much sense. The boolean actually stops gem5 from squashing speculative and non-committed state when a write to a threadcontext happens, so re-name the variable to something more appropriate
|
#
9342:6fec8f26e56d |
|
02-Nov-2012 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
sim: Move the draining interface into a separate base class
This patch moves the draining interface from SimObject to a separate class that can be used by any object needing draining. However, objects not visible to the Python code (i.e., objects not deriving from SimObject) still depend on their parents informing them when to drain. This patch also gets rid of the CountedDrainEvent (which isn't really an event) and replaces it with a DrainManager.
|
#
9180:ee8d7a51651d |
|
28-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Clock: Add a Cycles wrapper class and use where applicable
This patch addresses the comments and feedback on the preceding patch that reworks the clocks and now more clearly shows where cycles (relative cycle counts) are used to express time.
Instead of bumping the existing patch I chose to make this a separate patch, merely to try and focus the discussion around a smaller set of changes. The two patches will be pushed together though.
This changes done as part of this patch are mostly following directly from the introduction of the wrapper class, and change enough code to make things compile and run again. There are definitely more places where int/uint/Tick is still used to represent cycles, and it will take some time to chase them all down. Similarly, a lot of parameters should be changed from Param.Tick and Param.Unsigned to Param.Cycles.
In addition, the use of curTick is questionable as there should not be an absolute cycle. Potential solutions can be built on top of this patch. There is a similar situation in the o3 CPU where lastRunningCycle is currently counting in Cycles, and is still an absolute time. More discussion to be had in other words.
An additional change that would be appropriate in the future is to perform a similar wrapping of Tick and probably also introduce a Ticks class along with suitable operators for all these classes.
|
#
9179:666bc9df1e49 |
|
28-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Clock: Rework clocks to avoid tick-to-cycle transformations
This patch introduces the notion of a clock update function that aims to avoid costly divisions when turning the current tick into a cycle. Each clocked object advances a private (hidden) cycle member and a tick member and uses these to implement functions for getting the tick of the next cycle, or the tick of a cycle some time in the future.
In the different modules using the clocks, changes are made to avoid counting in ticks only to later translate to cycles. There are a few oddities in how the O3 and inorder CPU count idle cycles, as seen by a few locations where a cycle is subtracted in the calculation. This is done such that the regression does not change any stats, but should be revisited in a future patch.
Another, much needed, change that is not done as part of this patch is to introduce a new typedef uint64_t Cycle to be able to at least hint at the unit of the variables counting Ticks vs Cycles. This will be done as a follow-up patch.
As an additional follow up, the thread context still uses ticks for the book keeping of last activate and last suspend and this should probably also be changed into cycles as well.
|
#
9158:d152d34a4adf |
|
21-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Clock: Make Tick unsigned and remove UTick
This patch makes the Tick unsigned and removes the UTick typedef. The ticks should never be negative, and there was only one major issue with removing it, caused by the o3 CPU using a -1 as an initial value.
The patch has no impact on any regressions.
|
#
9152:86c0e6ca5e7c |
|
15-Aug-2012 |
Anthony Gutierrez <atgutier@umich.edu> |
O3,ARM: fix some problems with drain/switchout functionality and add Drain DPRINTFs
This patch fixes some problems with the drain/switchout functionality for the O3 cpu and for the ARM ISA and adds some useful debug print statements.
This is an incremental fix as there are still a few bugs/mem leaks with the switchout code. Particularly when switching from an O3CPU to a TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA I haven't encountered any more assertion failures; now the kernel will typically panic inside of simulation.
|
#
8975:7f36d4436074 |
|
01-May-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Separate requests and responses for timing accesses
This patch moves send/recvTiming and send/recvTimingSnoop from the Port base class to the MasterPort and SlavePort, and also splits them into separate member functions for requests and responses: send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq, send/recvTimingSnoopResp. A master port sends requests and receives responses, and also receives snoop requests and sends snoop responses. A slave port has the reciprocal behaviour as it receives requests and sends responses, and sends snoop requests and receives snoop responses.
For all MemObjects that have only master ports or slave ports (but not both), e.g. a CPU, or a PIO device, this patch merely adds more clarity to what kind of access is taking place. For example, a CPU port used to call sendTiming, and will now call sendTimingReq. Similarly, a response previously came back through recvTiming, which is now recvTimingResp. For the modules that have both master and slave ports, e.g. the bus, the behaviour was previously relying on branches based on pkt->isRequest(), and this is now replaced with a direct call to the apprioriate member function depending on the type of access. Please note that send/recvRetry is still shared by all the timing accessors and remains in the Port base class for now (to maintain the current bus functionality and avoid changing the statistics of all regressions).
The packet queue is split into a MasterPort and SlavePort version to facilitate the use of the new timing accessors. All uses of the PacketQueue are updated accordingly.
With this patch, the type of packet (request or response) is now well defined for each type of access, and asserts on pkt->isRequest() and pkt->isResponse() are now moved to the appropriate send member functions. It is also worth noting that sendTimingSnoopReq no longer returns a boolean, as the semantics do not alow snoop requests to be rejected or stalled. All these assumptions are now excplicitly part of the port interface itself.
|
#
8948:e95ee70f876c |
|
14-Apr-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Separate snoops and normal memory requests/responses
This patch introduces port access methods that separates snoop request/responses from normal memory request/responses. The differentiation is made for functional, atomic and timing accesses and builds on the introduction of master and slave ports.
Before the introduction of this patch, the packets belonging to the different phases of the protocol (request -> [forwarded snoop request -> snoop response]* -> response) all use the same port access functions, even though the snoop packets flow in the opposite direction to the normal packet. That is, a coherent master sends normal request and receives responses, but receives snoop requests and sends snoop responses (vice versa for the slave). These two distinct phases now use different access functions, as described below.
Starting with the functional access, a master sends a request to a slave through sendFunctional, and the request packet is turned into a response before the call returns. In a system without cache coherence, this is all that is needed from the functional interface. For the cache-coherent scenario, a slave also sends snoop requests to coherent masters through sendFunctionalSnoop, with responses returned within the same packet pointer. This is currently used by the bus and caches, and the LSQ of the O3 CPU. The send/recvFunctional and send/recvFunctionalSnoop are moved from the Port super class to the appropriate subclass.
Atomic accesses follow the same flow as functional accesses, with request being sent from master to slave through sendAtomic. In the case of cache-coherent ports, a slave can send snoop requests to a master through sendAtomicSnoop. Just as for the functional access methods, the atomic send and receive member functions are moved to the appropriate subclasses.
The timing access methods are different from the functional and atomic in that requests and responses are separated in time and send/recvTiming are used for both directions. Hence, a master uses sendTiming to send a request to a slave, and a slave uses sendTiming to send a response back to a master, at a later point in time. Snoop requests and responses travel in the opposite direction, similar to what happens in functional and atomic accesses. With the introduction of this patch, it is possible to determine the direction of packets in the bus, and no longer necessary to look for both a master and a slave port with the requested port id.
In contrast to the normal recvFunctional, recvAtomic and recvTiming that are pure virtual functions, the recvFunctionalSnoop, recvAtomicSnoop and recvTimingSnoop have a default implementation that calls panic. This is to allow non-coherent master and slave ports to not implement these functions.
|
#
8921:e53972f72165 |
|
30-Mar-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Unify initMemProxies across CPUs and simulation modes
This patch unifies where initMemProxies is called, in the init() method of each BaseCPU subclass, before TheISA::initCPU is called. Moreover, it also ensures that initMemProxies is called in both full-system and syscall-emulation mode, thus unifying also across the modes. An additional check is added in the ThreadState to ensure that initMemProxies is only called once.
|
#
8887:20ea02da9c53 |
|
09-Mar-2012 |
Geoffrey Blake <geoffrey.blake@arm.com> |
CheckerCPU: Make CheckerCPU runtime selectable instead of compile selectable
Enables the CheckerCPU to be selected at runtime with the --checker option from the configs/example/fs.py and configs/example/se.py configuration files. Also merges with the SE/FS changes.
|
#
8876:44f8e7bb7fdf |
|
02-Mar-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Check that the interrupt controller is created when needed
This patch adds a creation-time check to the CPU to ensure that the interrupt controller is created for the cases where it is needed, i.e. if the CPU is not being switched in later and not a checker CPU.
The patch also adds the "createInterruptController" call to a number of the regression scripts.
|
#
8863:50ce4deacda9 |
|
01-Mar-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
x86: Fix switching of CPUs This patch prevents creation of interrupt controller for cpus that will be switched in later
|
#
8850:ed91b534ed04 |
|
24-Feb-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Round-two unifying instr/data CPU ports across models
This patch continues the unification of how the different CPU models create and share their instruction and data ports. Most importantly, it forces every CPU to have an instruction and a data port, and gives these ports explicit getters in the BaseCPU (getDataPort and getInstPort). The patch helps in simplifying the code, make assumptions more explicit, andfurther ease future patches related to the CPU ports.
The biggest changes are in the in-order model (that was not modified in the previous unification patch), which now moves the ports from the CacheUnit to the CPU. It also distinguishes the instruction fetch and load-store unit from the rest of the resources, and avoids the use of indices and casting in favour of keeping track of these two units explicitly (since they are always there anyways). The atomic, timing and O3 model simply return references to their already existing ports.
|
#
8834:21e8d54ecf07 |
|
12-Feb-2012 |
Anthony Gutierrez <atgutier@umich.edu> |
cpu: add separate stats for insts/ops both globally and per cpu model
|
#
8809:bb10807da889 |
|
01-Feb-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head, hopefully the last time for this batch.
|
#
8799:dac1e33e07b0 |
|
28-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with the main repo.
|
#
8796:a2ae5c378d0a |
|
07-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with the main repository again.
|
#
8795:0909f8ed7aa0 |
|
07-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with main repository.
|
#
8793:5f25086326ac |
|
18-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Get rid of FULL_SYSTEM in the CPU directory.
|
#
8779:2a590c51adb1 |
|
01-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Expose the same methods on the CPUs in SE and FS modes.
|
#
8777:dd43f1c9fa0a |
|
31-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Make the functions available from the TC consistent between SE and FS.
|
#
8766:b0773af78423 |
|
30-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Build the base process class in FS.
|
#
8737:770ccf3af571 |
|
31-Jan-2012 |
Koan-Sin Tan <koansin.tan@gmail.com> |
clang: Enable compiling gem5 using clang 2.9 and 3.0
This patch adds the necessary flags to the SConstruct and SConscript files for compiling using clang 2.9 and later (on Ubuntu et al and OSX XCode 4.2), and also cleans up a bunch of compiler warnings found by clang. Most of the warnings are related to hidden virtual functions, comparisons with unsigneds >= 0, and if-statements with empty bodies. A number of mismatches between struct and class are also fixed. clang 2.8 is not working as it has problems with class names that occur in multiple namespaces (e.g. Statistics in kernel_stats.hh).
clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which causes confusion between the container std::set and the function Packet::set, and this is currently addressed by not including the entire namespace std, but rather selecting e.g. "using std::vector" in the appropriate places.
|
#
8733:64a7bf8fa56c |
|
31-Jan-2012 |
Geoffrey Blake <geoffrey.blake@arm.com> |
CheckerCPU: Re-factor CheckerCPU to be compatible with current gem5
Brings the CheckerCPU back to life to allow FS and SE checking of the O3CPU. These changes have only been tested with the ARM ISA. Other ISAs potentially require modification.
|
#
8707:489489c67fd9 |
|
17-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Moving towards a more general port across CPU models
This patch performs minimal changes to move the instruction and data ports from specialised subclasses to the base CPU (to the largest degree possible). Ultimately it servers to make the CPU(s) have a well-defined interface to the memory sub-system.
|
#
8706:b1838faf3bcc |
|
17-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Add port proxies instead of non-structural ports
Port proxies are used to replace non-structural ports, and thus enable all ports in the system to correspond to a structural entity. This has the advantage of accessing memory through the normal memory subsystem and thus allowing any constellation of distributed memories, address maps, etc. Most accesses are done through the "system port" that is used for loading binaries, debugging etc. For the entities that belong to the CPU, e.g. threads and thread contexts, they wrap the CPU data port in a port proxy.
The following replacements are made: FunctionalPort > PortProxy TranslatingPort > SETranslatingPortProxy VirtualPort > FSTranslatingPortProxy
|
#
8627:86358c187837 |
|
01-Dec-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Add stat that counts how many cycles the O3 cpu was quiesced.
|
#
8607:5fb918115c07 |
|
31-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
GCC: Get everything working with gcc 4.6.1.
And by "everything" I mean all the quick regressions.
|
#
8491:606cf2660887 |
|
07-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Get rid of the unused addToRemoveList function.
|
#
8460:3893d9d2c6c2 |
|
10-Jul-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Make sure fetch doesn't go off into the weeds during speculation.
|
#
8232:b28d06a175be |
|
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
trace: reimplement the DTRACE function so it doesn't use a vector At the same time, rename the trace flags to debug flags since they have broader usage than simply tracing. This means that --trace-flags is now --debug-flags and --trace-help is now --debug-help
|
#
8229:78bf55f23338 |
|
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
includes: sort all includes
|
#
8138:f08692f2932e |
|
17-Mar-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Send instruction back to fetch on squash to seed predecoder correctly.
|
#
7897:d9e8b1fd1a9f |
|
07-Feb-2011 |
Joel Hestness <hestness@cs.utexas.edu> |
mcpat: Adds McPAT performance counters
Updated patches from Rick Strong's set that modify performance counters for McPAT
|
#
7823:dac01f14f20f |
|
08-Jan-2011 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Replace curTick global variable with accessor functions. This step makes it easy to replace the accessor functions (which still access a global variable) with ones that access per-thread curTick values.
|
#
7720:65d338a8dba4 |
|
31-Oct-2010 |
Gabe Black <gblack@eecs.umich.edu> |
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.
This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way.
PC type:
Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM.
These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later.
Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses.
Advancing the PC:
The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements.
One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now.
Variable length instructions:
To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back.
ISA parser:
To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out.
Return address stack:
The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works.
Change in stats:
There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS.
TODO:
Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
|
#
7684:ce48527a3edb |
|
20-Sep-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Fix O3 and possible InOrder segfaults in FS.
|
#
7678:f19b6a3a8cec |
|
13-Sep-2010 |
Gabe Black <gblack@eecs.umich.edu> |
Faults: Pass the StaticInst involved, if any, to a Fault's invoke method.
Also move the "Fault" reference counted pointer type into a separate file, sim/fault.hh. It would be better to name this less similarly to sim/faults.hh to reduce confusion, but fault.hh matches the name of the type. We could change Fault to FaultPtr to match other pointer types, and then changing the name of the file would make more sense.
|
#
7507:b1ac6773e83d |
|
22-Jul-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
O3CPU: O3's tick event gets squashed when it is switched out. When repeatedly switching between O3 and another CPU, O3's tick event might still be scheduled in the event queue (as squashed). Therefore, check for a squashed tick event as well as a non-scheduled event when taking over from another CPU and deal with it accordingly.
|
#
6711:c79d72abdbe5 |
|
04-Nov-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
o3: get rid of unused physmem pointer
|
#
6658:f4de76601762 |
|
23-Sep-2009 |
Nathan Binkert <nate@binkert.org> |
arch: nuke arch/isa_specific.hh and move stuff to generated config/the_isa.hh
|
#
6331:d947798df4a1 |
|
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Get rid of the unused get(Data|Inst)Asid and (inst|data)Asid functions.
|
#
6314:781969fbeca9 |
|
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Get rid of the float register width parameter.
|
#
6313:95f69a436c82 |
|
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Add an ISA object which replaces the MiscRegFile. This object encapsulates (or will eventually) the identity and characteristics of the ISA in the CPU.
|
#
6221:58a3c04e6344 |
|
26-May-2009 |
Nathan Binkert <nate@binkert.org> |
types: add a type for thread IDs and try to use it everywhere
|
#
6034:fc2e234b4404 |
|
17-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
o3, inorder: fix FS bug due to initializing ThreadState to Halted. For some reason o3 FS init() only called initCPU if the thread state was Suspended, which was no longer the case. There's no apparent reason to check, so I whacked the test completely rather than changing the check to Halted. The inorder init() was also updated to be symmetric, though the previous code was just a fancy no-op.
|
#
6032:e5c792a67b3d |
|
16-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
o3: fix {read,set}ArchFloatReg* functions. Register indices were not being calculated properly.
|
#
6031:be16ad28822f |
|
15-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
ThreadState: initialize status to Halted in constructor. This provides a common initial status for all threads independent of CPU model (unlike the prior situation where CPUs initialized threads to inconsistent states). This mostly matters for SE mode; in FS mode, ISA-specific startupCPU() methods generally handle boot-time initialization of thread contexts (since the right thing to do is ISA-dependent).
|
#
5958:2d9737bf3c2f |
|
27-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Processes: Make getting and setting system call arguments part of a process object.
|
#
5807:57f9f8b8e62f |
|
24-Jan-2009 |
Nathan Binkert <nate@binkert.org> |
cpu: provide a wakeup mechanism that can be used to pull CPUs out of sleep. Make interrupts use the new wakeup method, and pull all of the interrupt stuff into the cpu base class so that only the wakeup code needs to be updated. I tried to make wakeup, wakeCPU, and the various other mechanisms for waking and sleeping a little more sane, but I couldn't understand why the statistics were changing the way they were. Maybe we'll try again some day.
|
#
5804:34fe9bbc6705 |
|
21-Jan-2009 |
Nathan Binkert <nate@binkert.org> |
o3cpu: give a name to the activity recorder for better tracing
|
#
5737:f43dbc09fad3 |
|
10-Nov-2008 |
Clint Smullen <cws3k@cs.virginia.edu> |
O3CPU: Make the instcount debugging stuff per-cpu. This is to prevent the assertion from firing if you have a large multicore. Also make sure that it's not compiled in when NDEBUG is defined
|
#
5714:76abee886def |
|
02-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
Add in Context IDs to the simulator. From now on, cpuId is almost never used, the primary identifier for a hardware context should be contextId(). The concept of threads within a CPU remains, in the form of threadId() because sometimes you need to know which context within a cpu to manipulate.
|
#
5712:199d31b47f7b |
|
02-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
make BaseCPU the provider of _cpuId, and cpuId() instead of being scattered across the subclasses. generally make it so that member data is _cpuId and accessor functions are cpuId(). The ID val comes from the python (default -1 if none provided), and if it is -1, the index of cpuList will be given. this has passed util/regress quick and se.py -n4 and fs.py -n4 as well as standard switch.
|
#
5707:da86e00f87a0 |
|
23-Oct-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
s/cpu_id/cpuId in o3 (to be consistent and match style), also fix some typos in comments.
|
#
5704:98224505352a |
|
21-Oct-2008 |
Nathan Binkert <nate@binkert.org> |
style: Use the correct m5 style for things relating to interrupts.
|
#
5702:bf84e2fa05f7 |
|
20-Oct-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
O3CPU: Undo Gabe's changes to remove hwrei and simpalcheck from O3 CPU. Removing hwrei causes the instruction after the hwrei to be fetched before the ITB/DTB_CM register is updated in a call pal call sys and thus the translation fails because the user is attempting to access a super page address.
Minimally, it seems as though some sort of fetch stall or refetch after a hwrei is required. I think this works currently because the hwrei uses the exec context interface, and the o3 stalls when that occurs.
Additionally, these changes don't update the LOCK register and probably break ll/sc. Both o3 changes were removed since a great deal of manual patching would be required to only remove the hwrei change.
|
#
5647:b06b49498c79 |
|
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
Turn Interrupts objects into SimObjects. Also, move local APIC state into x86's Interrupts object.
|
#
5640:c811ced9efc1 |
|
11-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Eliminate the simPalCheck funciton.
|
#
5639:67cc7f0427e7 |
|
11-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Eliminate the hwrei function.
|
#
5606:6da7a58b0bc8 |
|
09-Oct-2008 |
Nathan Binkert <nate@binkert.org> |
eventq: convert all usage of events to use the new API. For now, there is still a single global event queue, but this is necessary for making the steps towards a parallelized m5.
|
#
5595:6ebdae3f619b |
|
09-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Generalize the O3 CPU object so it isn't split out by ISA.
|
#
5570:13592d41f290 |
|
28-Sep-2008 |
Nathan Binkert <nate@binkert.org> |
gcc: Add extra parens to quell warnings. Even though we're not incorrect about operator precedence, let's add some parens in some particularly confusing places to placate GCC 4.3 so that we don't have to turn the warning off. Agreed that this is a bit of a pain for those users who get the order of operations correct, but it is likely to prevent bugs in certain cases.
|
#
5529:9ae69b9cd7fd |
|
11-Aug-2008 |
Nathan Binkert <nate@binkert.org> |
params: Convert the CPU objects to use the auto generated param structs. A whole bunch of stuff has been converted to use the new params stuff, but the CPU wasn't one of them. While we're at it, make some things a bit more stylish. Most of the work was done by Gabe, I just cleaned stuff up a bit more at the end.
|
#
5497:89a6483d7047 |
|
01-Jul-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
Make the cached virtPort have a thread context so it can do everything that a newly created one can.
|
#
5364:66d1251b7ae6 |
|
27-Feb-2008 |
Korey Sewell <ksewell@umich.edu> |
Add comments in code to describe bug conditions. This should help if somebody gets to the bug fix before me (or someone else)...
|
#
5363:c474cb7a2b9c |
|
27-Feb-2008 |
Korey Sewell <ksewell@umich.edu> |
Fix Load/Store Queue squashing after a SMT thread is removed but ensuring you are squashing from the current instruction # causing the thread exit.
|
#
5362:0adba9a562c9 |
|
27-Feb-2008 |
Korey Sewell <ksewell@umich.edu> |
Fix offset in removeThread() function so that float registers start freeing up from the right point (#32 usually) instead of restarting at 0 and double-freeing.
Commented out assert line in free_list.hh that will check for when double-free condition goes bad.
|
#
5336:c7e21f4e5a2e |
|
06-Feb-2008 |
Stephen Hines <hines@cs.fsu.edu> |
Make the Event::description() a const function
|
#
5314:e902f12a3af1 |
|
02-Jan-2008 |
Steve Reinhardt <stever@gmail.com> |
Add functional PrintReq command for memory-system debugging.
|
#
5100:7a0180040755 |
|
28-Sep-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Rename cycles() function to ticks()
|
#
5099:8ff1345b3ae4 |
|
28-Sep-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Update statistics to use cycles properly instead of ticks
|
#
4997:e7380529bd2d |
|
26-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Address Translation: Make SE mode use an actual TLB/MMU for translation like FS.
|
#
4918:3214e3694fb2 |
|
27-Jul-2007 |
Nathan Binkert <nate@binkert.org> |
Merge python and x86 changes with cache branch
|
#
4873:b135f6e6adfe |
|
30-Jun-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Event descriptions should not end in "event" (they function as adjectives not nouns)
|
#
4762:c94e103c83ad |
|
24-Jul-2007 |
Nathan Binkert <nate@binkert.org> |
Major changes to how SimObjects are created and initialized. Almost all creation and initialization now happens in python. Parameter objects are generated and initialized by python. The .ini file is now solely for debugging purposes and is not used in construction of the objects in any way.
|
#
4656:dbfa364feec8 |
|
21-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into ahchoo.blinky.homelinux.org:/home/gblack/m5/newmem-o3-micro
src/cpu/o3/fetch_impl.hh: hand merge
|
#
4644:4e77ab0671e8 |
|
23-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/wexford/x/gblack/m5/newmem-o3-spec
|
#
4636:afc8da9f526e |
|
14-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Add support for microcode and pull out the special branch delay slot handling. Branch delay slots need to be squash on a mispredict as well because the nnpc they saw was incorrect.
|
#
4632:be5b8f67b8fb |
|
13-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Remove most of the special handling for delay slots since they have to be squashed anyway on a mispredict. This is because the NNPC value they saw when executing was incorrect.
|
#
4598:56adf2e778a8 |
|
20-Jun-2007 |
Nathan Binkert <binkertn@umich.edu> |
Don't do checker stuff if the checker is not defined
|
#
4392:271b73b42e34 |
|
22-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Use proper cycles for IPC and CPI equations.
src/cpu/o3/cpu.cc: Use proper cycles for these equations.
|
#
4329:52057dbec096 |
|
04-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Pass ISA-specific O3 CPU as a constructor parameter instead of using setCPU functions.
src/cpu/o3/alpha/cpu_impl.hh: Pass ISA-specific O3 CPU to FullO3CPU as a constructor parameter instead of using setCPU functions.
|
#
4284:c8800319ed0c |
|
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/tmp/clean2
src/cpu/base_dyn_inst.hh: Hand merge. Line is no longer needed because it's handled in the ISA.
|
#
4192:7accc6365bb9 |
|
09-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Two fixes: 1. Make sure connectMemPorts() only gets called when the CPU's peer gets changed. This is done by making setPeer() virtual, and overriding it in the CPU's ports. When it gets called on a CPU's port (dcache specifically), it calls the normal setPeer() function, and also connectMemPorts(). 2. Consolidate redundant code that handles switching in a CPU.
src/cpu/base.cc: Move common code of switching over peers to base CPU. src/cpu/base.hh: Move common code of switching over peers to BaseCPU. src/cpu/o3/cpu.cc: Add in function that updates thread context's ports. Also use updated function to takeOverFrom() in BaseCPU. This gets rid of some repeated code. src/cpu/o3/cpu.hh: Include function to update thread context's memory ports. src/cpu/o3/lsq.hh: Add function to dcache port that will update the memory ports upon getting a new peer. Also include a function that will tell the CPU to update those memory ports. src/cpu/o3/lsq_impl.hh: Add function that will update the memory ports upon getting a new peer. src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: Add function that will update thread context's memory ports upon getting a new peer. Also use the new BaseCPU's take over from function. src/cpu/simple/atomic.hh: Add in function (and dcache port) that will allow the dcache to update memory ports when it gets assigned a new peer. src/cpu/simple/timing.hh: Add function that will update thread context's memory ports upon getting a new peer. src/mem/port.hh: Make setPeer virtual so that other classes can override it.
|
#
4167:ce5d0f62f13b |
|
06-Mar-2007 |
Nathan Binkert <binkertn@umich.edu> |
Move all of the parameters of the Root SimObject so they are directly configured by python. Move stuff from root.(cc|hh) to core.(cc|hh) since it really belogs there now. In the process, simplify how ticks are used in the python code.
|
#
4030:4046b2213995 |
|
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
A couple of minor fixes. 1. Set CPU ID in all modes for the O3 CPU. 2. Use nextCycle() function to prevent phase drift in O3 CPU. 3. Remove assertion in rename map that is no longer true.
src/cpu/o3/alpha/cpu_builder.cc: Allow for CPU id in all modes, not just full system. Also include a parameter that was left out by accident. src/cpu/o3/alpha/cpu_impl.hh: Set the CPU ID properly. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: Use nextCycle() function so that the CPU does not get out of phase when starting up from quiesces. src/cpu/o3/rename_map.cc: Remove assertion that is no longer true. tests/configs/o3-timing.py: Set CPU's id to 0.
|
#
3970:d54945bab95d |
|
03-Jan-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zower.eecs.umich.edu:/eecshome/m5/newmem
|
#
3965:b4cab77371ed |
|
28-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Implement a stub nnpc for alpha that is read only as npc+4.
|
#
3859:9278f759e55c |
|
21-Dec-2006 |
Nathan Binkert <binkertn@umich.edu> |
<scold> Make sure that variables are always initalized! </scold>
|
#
3795:60ecc96c3cee |
|
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Made branch delay slots get squashed, and passed back an NPC and NNPC to start fetching from.
|
#
3781:b00795985f07 |
|
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Got rid of some typedefs, moved the tlbs to the base o3 cpu, and called the architecture defined setSyscallReturn function instead of a duplicate copy.
src/cpu/o3/alpha/cpu.hh: Got rid of some typedefs, and moved the tlbs to the base o3 cpu. src/cpu/o3/alpha/thread_context.hh: src/cpu/o3/cpu.cc: Moved the tlbs to the base o3 cpu.
|
#
3686:fa8d8b90cd8a |
|
29-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Change the connecting of the physPort and virtPort to the memory object below the CPU to happen every time activateContext is called. The overhead is probably a little higher than necessary, but allows these connections to properly be made when there are CPUs that are inactive until they are switched in.
Right now this introduces a minor memory leak as old physPorts and virtPorts are not deleted when new ones are created. A flyspray task has been created for this issue. It can not be resolved until we determine how the bus will handle giving out ID's to functional ports that may be deleted.
src/cpu/o3/cpu.cc: src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: Change the setup of the physPort and virtPort to instead happen every time the CPU has a context activated. This is a little high overhead, but keeps it working correctly when the CPU does not have a physical memory attached to it until it switches in (like the case of switch CPUs). src/cpu/o3/thread_context.hh: Change function from being called at init() to just being called whenever the memory ports need to be connected. src/cpu/o3/thread_context_impl.hh: Update this to not delete the port if it's the same as the virtPort. src/cpu/thread_context.hh: Change function from being called at init() to whenever the memory ports need to be connected. src/cpu/thread_state.cc: Instead of initializing the ports, simply connect them, deleting any old ports that might exist. This allows these functions to be called multiple times. src/cpu/thread_state.hh: Ports are no longer initialized, but rather connected at context activation time.
|
#
3675:dc883b610345 |
|
19-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Update Virtual and Physical ports.
src/cpu/o3/alpha/cpu_impl.hh: Handle the PhysicalPort and VirtualPort in the ThreadState. src/cpu/o3/cpu.cc: Initialize the thread context. src/cpu/o3/thread_context.hh: Add new function to initialize thread context. src/cpu/o3/thread_context_impl.hh: Use code now put into function. src/cpu/simple_thread.cc: Move code to ThreadState and use the new helper function. src/cpu/simple_thread.hh: Remove init() in this derived class; use init() from ThreadState base class. src/cpu/thread_state.cc: Move setting up of Physical and Virtual ports here. Change getMemFuncPort() to connectToMemFunc(), which connects a port to a functional port of the memory object below the CPU. src/cpu/thread_state.hh: Update functions.
|
#
3512:cefe7f965104 |
|
09-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Draining fixes.
src/cpu/o3/cpu.cc: Handle draining properly when CPU isn't actually being used. src/cpu/simple/atomic.cc: Be sure to set status properly when draining. src/mem/bus.cc: Fix for draining.
|
#
3402:db60546818d0 |
|
31-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Remove mem parameter. Now the translating port asks the CPU's dcache's peer for its MemObject instead of having to have a paramter for the MemObject.
configs/example/fs.py: configs/example/se.py: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: src/cpu/thread_state.cc: src/cpu/thread_state.hh: tests/configs/o3-timing-mp.py: tests/configs/o3-timing.py: tests/configs/simple-atomic-mp.py: tests/configs/simple-atomic.py: tests/configs/simple-timing-mp.py: tests/configs/simple-timing.py: tests/configs/tsunami-simple-atomic-dual.py: tests/configs/tsunami-simple-atomic.py: tests/configs/tsunami-simple-timing-dual.py: tests/configs/tsunami-simple-timing.py: No need for mem parameter any more. src/cpu/checker/cpu.cc: Use new constructor for simple thread (no more MemObject parameter). src/cpu/checker/cpu.hh: Remove MemObject parameter. src/cpu/memtest/memtest.hh: Ports now take in their MemObject owner. src/cpu/o3/alpha/cpu_builder.cc: Remove mem parameter. src/cpu/o3/alpha/cpu_impl.hh: Remove memory parameter and clean up handling of TranslatingPort. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/mips/cpu_builder.cc: src/cpu/o3/mips/cpu_impl.hh: src/cpu/o3/params.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_builder.cc: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/simple_params.hh: src/cpu/ozone/thread_state.hh: src/cpu/simple/atomic.cc: Remove memory parameter.
|
#
3319:1ec49a9bfaa3 |
|
18-Oct-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
only do this assert after you know you're not switched out or idle.
|
#
3229:cfb4b2250d26 |
|
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Comment out code that messed up SMT (but will be needed eventually).
src/cpu/o3/cpu.cc: Comment out reseting CPU structures for now. This can be updated to work in the future.
|
#
3227:fe19356d6f88 |
|
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix caches plus sampling switch over.
src/cpu/o3/cpu.cc: Fix up caches plus sampling switch over.
|
#
3226:de4981baa276 |
|
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix outstanding bug (FS#158).
src/cpu/o3/cpu.cc: Extra debugging, fix a bug brought up on bug tracker.
|
#
3221:669a04468c0d |
|
08-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Updates to O3 CPU. It should now work in FS mode, although sampling still has a bug.
src/cpu/o3/commit_impl.hh: Fixes for compile and sampling. src/cpu/o3/cpu.cc: Deallocate and activate threads properly. Also hopefully fix being able to use caches while switching over. src/cpu/o3/cpu.hh: Fixes for deallocating and activating threads. src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit.hh: Handle getting back a BadAddress result from the access. src/cpu/o3/iew_impl.hh: More debug output. src/cpu/o3/lsq_unit_impl.hh: Fixup store conditional handling (still a bit of a hack, but works now).
Also handle getting back a BadAddress result from the access. src/cpu/o3/thread_context_impl.hh: Deallocate context now records if the context should be fully removed.
|
#
3126:756092c6383c |
|
02-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Updates to fix merge issues and bring almost everything up to working speed. Ozone CPU remains untested, but everything else compiles and runs.
src/arch/alpha/isa_traits.hh: This got changed to the wrong version by accident. src/cpu/base.cc: Fix up progress event to not schedule itself if the interval is set to 0. src/cpu/base.hh: Fix up the CPU Progress Event to not print itself if it's set to 0. Also remove stats_reset_inst (something I added to m5 but isn't necessary here). src/cpu/base_dyn_inst.hh: src/cpu/checker/cpu.hh: Remove float variable of instResult; it's always held within the double part now. src/cpu/checker/cpu_impl.hh: Use thread and not cpuXC. src/cpu/o3/alpha/cpu_builder.cc: src/cpu/o3/checker_builder.cc: src/cpu/ozone/checker_builder.cc: src/cpu/ozone/cpu_builder.cc: src/python/m5/objects/BaseCPU.py: Remove stats_reset_inst. src/cpu/o3/commit_impl.hh: src/cpu/ozone/lw_back_end_impl.hh: Get TC, not XCProxy. src/cpu/o3/cpu.cc: Switch out updates from the version of m5 I have. Also remove serialize code that got added twice. src/cpu/o3/iew_impl.hh: src/cpu/o3/lsq_impl.hh: src/cpu/thread_state.hh: Remove code that was added twice. src/cpu/o3/lsq_unit.hh: Add back in stats that got lost in the merge. src/cpu/o3/lsq_unit_impl.hh: Use proper method to get flags. Also wake CPU if we're coming back from a cache miss. src/cpu/o3/thread_context_impl.hh: src/cpu/o3/thread_state.hh: Support profiling. src/cpu/ozone/cpu.hh: Update to use proper typename. src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/dyn_inst_impl.hh: Updates for newmem. src/cpu/ozone/lw_lsq_impl.hh: Get flags correctly. src/cpu/ozone/thread_state.hh: Reorder constructor initialization, use tc. src/sim/pseudo_inst.cc: Allow for loading of symbol file. Be sure to use ThreadContext and not ExecContext.
|
#
3125:febd811bccc6 |
|
30-Sep-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zamp:./local/clean/o3-merge/m5 into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem
configs/boot/micro_memlat.rcS: configs/boot/micro_tlblat.rcS: src/arch/alpha/ev5.cc: src/arch/alpha/isa/decoder.isa: src/arch/alpha/isa_traits.hh: src/cpu/base.cc: src/cpu/base.hh: src/cpu/base_dyn_inst.hh: src/cpu/checker/cpu.hh: src/cpu/checker/cpu_impl.hh: src/cpu/o3/alpha/cpu_impl.hh: src/cpu/o3/alpha/params.hh: src/cpu/o3/checker_builder.cc: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/regfile.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/checker_builder.cc: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_back_end.hh: src/cpu/ozone/lw_back_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/thread_state.hh: src/cpu/simple/base.cc: src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: src/cpu/thread_state.hh: src/dev/ide_disk.cc: src/python/m5/objects/O3CPU.py: src/python/m5/objects/Root.py: src/python/m5/objects/System.py: src/sim/pseudo_inst.cc: src/sim/pseudo_inst.hh: src/sim/system.hh: util/m5/m5.c: Hand merge.
|
#
3093:b09c33e66bce |
|
31-Aug-2006 |
Korey Sewell <ksewell@umich.edu> |
add ISA_HAS_DELAY_SLOT directive instead of "#if THE_ISA == ALPHA_ISA" throughout CPU models
src/arch/alpha/isa_traits.hh: src/arch/mips/isa_traits.hh: src/arch/sparc/isa_traits.hh: define 'ISA_HAS_DELAY_SLOT' src/cpu/base_dyn_inst.hh: src/cpu/o3/bpred_unit_impl.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/simple/base.cc: use ISA_HAS_DELAY_SLOT instead of THE_ISA == ALPHA_ISA
|
#
2935:d1223a6c9156 |
|
23-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
This changeset gets the MIPS ISA pretty much working in the O3CPU. It builds, runs, and gets very very close to completing the hello world succesfully but there are some minor quirks to iron out. Who would've known a DELAY SLOT introduces that much complexity?! arrgh!
Anyways, a lot of this stuff had to do with my project at MIPS and me needing to know how I was going to get this working for the MIPS ISA. So I figured I would try to touch it up and throw it in here (I hate to introduce non-completely working components... )
src/arch/alpha/isa/mem.isa: spacing src/arch/mips/faults.cc: src/arch/mips/faults.hh: Gabe really authored this src/arch/mips/isa/decoder.isa: add StoreConditional Flag to instruction src/arch/mips/isa/formats/basic.isa: Steven really did this file src/arch/mips/isa/formats/branch.isa: fix bug for uncond/cond control src/arch/mips/isa/formats/mem.isa: Adjust O3CPU memory access to use new memory model interface. src/arch/mips/isa/formats/util.isa: update LoadStoreBase template src/arch/mips/isa_traits.cc: update SERIALIZE partially src/arch/mips/process.cc: src/arch/mips/process.hh: no need for this for NOW. ASID/Virtual addressing handles it src/arch/mips/regfile/misc_regfile.hh: add in clear() function and comments for future usage of special misc. regs src/cpu/base_dyn_inst.hh: add in nextNPC variable and supporting functions.
add isCondDelaySlot function
Update predTaken and mispredicted functions src/cpu/base_dyn_inst_impl.hh: init nextNPC src/cpu/o3/SConscript: add MIPS files to compile src/cpu/o3/alpha/thread_context.hh: no need for my name on this file src/cpu/o3/bpred_unit_impl.hh: Update RAS appropriately for MIPS src/cpu/o3/comm.hh: add some extra communication variables to aid in handling the delay slots src/cpu/o3/commit.hh: minor name fix for nextNPC functions. src/cpu/o3/commit_impl.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/rename_impl.hh: Fix necessary variables and functions for squashes with delay slots src/cpu/o3/cpu.cc: Update function interface ...
adjust removeInstsNotInROB function to recognize delay slots insts src/cpu/o3/cpu.hh: update removeInstsNotInROB src/cpu/o3/decode.hh: declare necessary variables for handling delay slot src/cpu/o3/dyn_inst.hh: Add in MipsDynInst src/cpu/o3/fetch.hh: src/cpu/o3/iew.hh: src/cpu/o3/rename.hh: declare necessary variables and adjust functions for handling delay slot src/cpu/o3/inst_queue.hh: src/cpu/simple/base.cc: no need for my name here src/cpu/o3/isa_specific.hh: add in MIPS files src/cpu/o3/scoreboard.hh: dont include alpha specific isa traits! src/cpu/o3/thread_context.hh: no need for my name here, i just rearranged where the file goes src/cpu/static_inst.hh: add isCondDelaySlot function src/cpu/o3/mips/cpu.cc: src/cpu/o3/mips/cpu.hh: src/cpu/o3/mips/cpu_builder.cc: src/cpu/o3/mips/cpu_impl.hh: src/cpu/o3/mips/dyn_inst.cc: src/cpu/o3/mips/dyn_inst.hh: src/cpu/o3/mips/dyn_inst_impl.hh: src/cpu/o3/mips/impl.hh: src/cpu/o3/mips/params.hh: src/cpu/o3/mips/thread_context.cc: src/cpu/o3/mips/thread_context.hh: MIPS file for O3CPU...mirrors ALPHA definition
|
#
2923:db8a876258df |
|
14-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge
configs/test/fs.py: configs/test/test.py: SCCS merged
|
#
2918:20cdaf201249 |
|
12-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Serialization changes to make O3CPU consistent with the other models.
src/cpu/o3/commit_impl.hh: Always set instruction. This is necessary for serialization as the instruction is also serialized. src/cpu/o3/cpu.cc: Change serialization so it matches other CPU's output. Also fix up some indexing.
|
#
2911:854ee6cd377e |
|
14-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
forgot tid
|
#
2910:7eb6f817e267 |
|
14-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
For now, halt context is the same as deallocating. suspend context will now take the thread off the activeThread list.
src/arch/mips/isa_traits.cc: add in copy MiscRegs unimplemented function
|
#
2905:62879b0282eb |
|
13-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Update for changes to draining.
|
#
2886:2fdb9976b0a3 |
|
10-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge
|
#
2880:a48d5059cd35 |
|
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/zooks/y/ksewell/research/m5-sim/newmem-o3
|
#
2877:4b56debc25d1 |
|
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Minor fix for SMT Hello Worlds to finish correctly. Still, there is a problem with the LSQ and indexing out of range in the buffer. I havent nailed down the fix yet, but it's coming ...
src/cpu/o3/commit_impl.hh: add space to DPRINT src/cpu/o3/cpu.cc: add newline to DPRINT src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: Each thread needs it's own squashedSeqNum for the case where they are both squashing at the same time and they dont write over each other's squash number.
|
#
2876:a862ab9f93f8 |
|
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/zooks/y/ksewell/research/m5-sim/newmem-o3
|
#
2875:9b6f6b75b187 |
|
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Fix so that O3CPU doesnt segfault on exit. Major thing was to not execute commit if there are no active threads in CPU.
src/cpu/o3/alpha/thread_context.hh: call deallocate instead of deallocateContext src/cpu/o3/commit_impl.hh: dont run commit stage if there are no instructions src/cpu/o3/cpu.cc: add deallocate event, deactivateThread function, and edit deallocateContext. src/cpu/o3/cpu.hh: add deallocate event and add optional delay to deallocateContext src/cpu/o3/thread_context.hh: optional delay for deallocate src/cpu/o3/thread_context_impl.hh: edit DPRINTFs to say Thread Context instead of Alpha TC src/cpu/thread_context.hh: optional delay src/sim/syscall_emul.hh: name stuff
|
#
2873:1377a68cd00e |
|
10-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Add parameters for backwards and forwards sizes for time buffers.
src/base/timebuf.hh: Add a function to return the size of the time buffer.
|
#
2871:7ed5c9ef3eb6 |
|
07-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Support Ron's changes for hooking up ports.
src/cpu/checker/cpu.hh: Now that BaseCPU is a MemObject, the checker must define this function. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/iew.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_unit.hh: Implement getPort function so the connector can connect the ports properly. src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit_impl.hh: The connector handles connecting the ports now. src/python/m5/objects/O3CPU.py: Add ports to the parameters.
|
#
2867:cc92d58a3210 |
|
07-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Switch out fixes for CPUs.
src/cpu/o3/cpu.cc: Fix up keeping proper state when switched out and drained. src/cpu/simple/timing.cc: src/cpu/simple/timing.hh: Keep track of the event we use to schedule fetch initially and upon resume. We may have to cancel the event if the CPU is switched out.
|
#
2864:eab7ff8f6d72 |
|
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Support serializing and unserializing in the O3 CPU. Also a few small fixes for draining/switching CPUs.
src/cpu/o3/commit_impl.hh: Fix to clear drainPending variable on call to resume. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: Support serializing and unserializing in the O3 CPU. src/cpu/o3/lsq_impl.hh: Be sure to say we have no stores to write back if the active thread list is empty. src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: Slightly change how SimpleThread is used to copy from other ThreadContexts.
|
#
2863:2592e056dc5c |
|
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix the O3CPU to support the multi-pass method for checking if the system has fully drained.
src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: Return a value so that the CPU can instantly return from draining if the pipeline is already drained. src/cpu/o3/cpu.cc: Use values returned from pipeline stages so that the CPU can instantly return from draining if the pipeline is already drained.
|
#
2852:7fc1b748dd81 |
|
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge
|
#
2849:c285bf8ffb4a |
|
06-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/zooks/y/ksewell/research/m5-sim/newmem-o3
|
#
2847:6b19f07d9666 |
|
06-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
more steps toward O3 SMT
src/arch/mips/isa/formats/fp.isa: Adjust for newmem src/cpu/cpu_models.py: Use O3DynInst instead of convoluted way src/cpu/o3/alpha/impl.hh: take out O3DynInst typedef here ... src/cpu/o3/cpu.cc: open up the SMT functions in the O3CPU src/cpu/static_inst.hh: Add O3DynInst src/cpu/o3/dyn_inst.hh: Use to get ISA-specific O3DynInst
|
#
2843:19c4c6c2b5b1 |
|
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Support for draining, and the new method of switching out. Now switching out happens after the pipeline has been drained, deferring the three way handshake to the normal drain mechanism. The calls of switchOut() and takeOverFrom() both take action immediately.
src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: Support for draining, new method of switching out.
|
#
2840:227f7c4f8c81 |
|
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Remove sampler and serializer. Now they are handled through C++ interacting with Python.
src/SConscript: src/cpu/base.cc: src/cpu/base.hh: src/cpu/checker/cpu.hh: src/cpu/checker/cpu_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_impl.hh: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/sim/pseudo_inst.cc: Remove sampler. src/sim/sim_object.cc: Remove serializer.
|
#
2831:0a42b294727c |
|
02-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Fix default SMT configuration in O3CPU (i.e. fetch policy, workloads/numThreads)
Edit Test3 for newmem
src/base/traceflags.py: Add O3CPU flag src/cpu/base.cc: for some reason adding a BaseCPU flag doesnt work so just go back to old way... src/cpu/o3/alpha/cpu_builder.cc: Determine number threads by workload size instead of solely by parameter.
Default SMT fetch policy to RoundRobin if it's not specified in Config file src/cpu/o3/commit.hh: only use nextNPC for !ALPHA src/cpu/o3/commit_impl.hh: add FetchTrapPending as condition for commit src/cpu/o3/cpu.cc: panic if active threads is more than Impl::MaxThreads src/cpu/o3/fetch.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: name stuff src/cpu/o3/fetch_impl.hh: fatal if try to use SMT branch count, that's unimplemented right now src/python/m5/config.py: make it clearer that a parameter is not valid within a configuration class
|
#
2829:f354c00bba05 |
|
01-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
fix cpu builder to build the correct name...
add activateThread event and functions
src/cpu/o3/alpha/cpu_builder.cc: Have CPU builder build a DerivO3CPU not a DerivAlphaO3CPU src/cpu/o3/cpu.cc: add activateThread Event
add activateThread function
adjust activateContext to schedule a thread to activate within the CPU instead of activating thread right away. This will lead to stages trying to use threads that arent ready yet and wasting execution time & possibly performance. src/cpu/o3/cpu.hh: add activateThread Event
add activateThread function
add schedule/descheculed activate thread event
|
#
2818:a2b6429690b6 |
|
30-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
now O3CPU is totally independent of the ISA... all alpha specific stuff is the cpu/o3/alpha directory
src/cpu/o3/alpha/cpu.cc: src/cpu/o3/alpha/cpu_impl.hh: src/cpu/o3/alpha/impl.hh: filenames src/cpu/o3/alpha/thread_context.hh: public src/cpu/o3/base_dyn_inst.cc: src/cpu/o3/bpred_unit.cc: src/cpu/o3/commit.cc: src/cpu/o3/cpu.cc: src/cpu/o3/decode.cc: src/cpu/o3/fetch.cc: src/cpu/o3/iew.cc: src/cpu/o3/inst_queue.cc: src/cpu/o3/lsq.cc: src/cpu/o3/lsq_unit.cc: src/cpu/o3/mem_dep_unit.cc: src/cpu/o3/rename.cc: src/cpu/o3/rob.cc: use O3CPUImpl ... not Alpha src/cpu/o3/checker_builder.cc: filename
|
#
2817:273f7fb94f83 |
|
30-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
Make O3CPU model independent of the ISA
Use O3CPU when building instead of AlphaO3CPU.
I could use some better python magic in the cpu_models.py file!
AUTHORS: add middle initial SConstruct: change from AlphaO3CPU to O3CPU src/cpu/SConscript: edits to build O3CPU instead of AlphaO3CPU src/cpu/cpu_models.py: change substitution template to use proper CPU EXEC CONTEXT For O3CPU Model...
Actually, some Python expertise could be used here. The 'env' variable is not passed to this file, so I had to parse through the ARGV to find the ISA... src/cpu/o3/base_dyn_inst.cc: src/cpu/o3/bpred_unit.cc: src/cpu/o3/commit.cc: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/decode.cc: src/cpu/o3/fetch.cc: src/cpu/o3/iew.cc: src/cpu/o3/inst_queue.cc: src/cpu/o3/lsq.cc: src/cpu/o3/lsq_unit.cc: src/cpu/o3/mem_dep_unit.cc: src/cpu/o3/rename.cc: src/cpu/o3/rob.cc: use isa_specific.hh src/sim/process.cc: only initi NextNPC if not ALPHA src/cpu/o3/alpha/cpu.cc: alphao3cpu impl src/cpu/o3/alpha/cpu.hh: move AlphaTC to it's own file src/cpu/o3/alpha/cpu_impl.hh: Move AlphaTC to it's own file ... src/cpu/o3/alpha/dyn_inst.cc: src/cpu/o3/alpha/dyn_inst.hh: src/cpu/o3/alpha/dyn_inst_impl.hh: include paths src/cpu/o3/alpha/impl.hh: include paths, set default MaxThreads to 2 instead of 4 src/cpu/o3/alpha/params.hh: set Alpha Specific Params here src/python/m5/objects/O3CPU.py: add O3CPU class src/cpu/o3/SConscript: include isa-specific build files src/cpu/o3/alpha/thread_context.cc: NEW HOME of AlphaTC src/cpu/o3/alpha/thread_context.hh: new home of AlphaTC src/cpu/o3/isa_specific.hh: includes ISA specific files src/cpu/o3/params.hh: base o3 params src/cpu/o3/thread_context.hh: base o3 thread context src/cpu/o3/thread_context_impl.hh: base o3 thead context impl
|
#
2794:0dd6cb8820e1 |
|
22-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Checker related updates.
src/cpu/o3/cpu.cc: Updates to make sure the checker is compiled in if enabled and also to include it only when it's used.
|
#
2757:58e3a66e72f7 |
|
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge
|
#
2756:7bf0d6481df9 |
|
15-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
Initial changes to allowed DetailedCPU to work with other architectures (i.e. Sparc & MIPS)
Still need to add some code to fetch & commit stages
src/cpu/o3/commit.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: Add nextNPC read & set functions src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: Add nextNPC
|
#
2733:e0eac8fc5774 |
|
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Two updates that got combined into one ChangeSet accidentally. They're both pretty simple so they shouldn't cause any trouble.
First: Rename FullCPU and its variants in the o3 directory to O3CPU to differentiate from the old model, and also to specify it's an out of order model.
Second: Include build options for selecting the Checker to be used. These options make sure if the Checker is being used there is a CPU that supports it also being compiled.
SConstruct: Add in option USE_CHECKER to allow for not compiling in checker code. The checker is enabled through this option instead of through the CPU_MODELS list. However it's still necessary to treat the Checker like a CPU model, so it is appended onto the CPU_MODELS list if enabled. configs/test/test.py: Name change for DetailedCPU to DetailedO3CPU. Also include option for max tick. src/base/traceflags.py: Add in O3CPU trace flag. src/cpu/SConscript: Rename AlphaFullCPU to AlphaO3CPU.
Only include checker sources if they're necessary. Also add a list of CPUs that support the Checker, and only allow the Checker to be compiled in if one of those CPUs are also being included. src/cpu/base_dyn_inst.cc: src/cpu/base_dyn_inst.hh: Rename typedef to ImplCPU instead of FullCPU, to differentiate from the old FullCPU. src/cpu/cpu_models.py: src/cpu/o3/alpha_cpu.cc: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_cpu_impl.hh: Rename AlphaFullCPU to AlphaO3CPU to differentiate from old FullCPU model. src/cpu/o3/alpha_dyn_inst.hh: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/alpha_impl.hh: src/cpu/o3/alpha_params.hh: src/cpu/o3/commit.hh: src/cpu/o3/cpu.hh: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/regfile.hh: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: src/cpu/o3/thread_state.hh: src/python/m5/objects/AlphaO3CPU.py: Rename FullCPU to O3CPU to differentiate from old FullCPU model. src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit_impl.hh: Rename FullCPU to O3CPU to differentiate from old FullCPU model. Also #ifdef the checker code so it doesn't need to be included if it's not selected.
|
#
2690:f4337c0d9e6f |
|
08-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Get O3 CPU mostly working in full system, and fix an FP bug that showed up.
It still does not yet handle retries.
src/cpu/base_dyn_inst.hh: Get working in full-system mode and fix some FP bugs. src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/thread_context.hh: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/thread_state.hh: src/cpu/thread_state.hh: Get working in full system. src/cpu/checker/o3_cpu_builder.cc: Checker does not take a MemObject as a simobj parameter. src/cpu/o3/alpha_dyn_inst.hh: Fix up float regs. src/cpu/o3/regfile.hh: Fix up an fp error, print out more useful output messages.
|
#
2683:d6b72bb2ed97 |
|
07-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Reorganization/renaming of CPUExecContext. Now it is called SimpleThread in order to clear up the confusion due to the many ExecContexts. It also derives from a common ThreadState object, which holds various state common to threads across CPU models.
Following with the previous check-in, ExecContext now refers only to the interface provided to the ISA in order to access CPU state. ThreadContext refers to the interface provided to all objects outside the CPU in order to access thread state. SimpleThread provides all thread state and the interface to access it, and is suitable for simple execution models such as the SimpleCPU.
src/SConscript: Include thread state file. src/arch/alpha/ev5.cc: src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/thread_context.hh: src/cpu/memtest/memtest.cc: src/cpu/memtest/memtest.hh: src/cpu/o3/cpu.cc: src/cpu/ozone/cpu_impl.hh: src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: Rename CPUExecContext to SimpleThread. src/cpu/base_dyn_inst.hh: Make thread member variables protected.. src/cpu/o3/alpha_cpu.hh: src/cpu/o3/cpu.hh: Make various members of ThreadState protected. src/cpu/o3/alpha_cpu_impl.hh: Push generation of TranslatingPort into the CPU itself. Make various members of ThreadState protected. src/cpu/o3/thread_state.hh: Pull a lot of common code into the base ThreadState class. src/cpu/ozone/thread_state.hh: Rename CPUExecContext to SimpleThread, move a lot of common code into base ThreadState class. src/cpu/thread_state.hh: Push a lot of common code into base ThreadState class. This goes along with renaming CPUExecContext to SimpleThread, and making it derive from ThreadState. src/cpu/simple_thread.cc: Rename CPUExecContext to SimpleThread, make it derive from ThreadState. This helps push a lot of common code/state into a single class that can be used by all CPUs. src/cpu/simple_thread.hh: Rename CPUExecContext to SimpleThread, make it derive from ThreadState. src/kern/system_events.cc: Rename cpu_exec_context to thread_context. src/sim/process.hh: Remove unused forward declaration.
|
#
2680:246e7104f744 |
|
06-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Change ExecContext to ThreadContext. This is being renamed to differentiate between the interface used objects outside of the CPU, and the interface used by the ISA. ThreadContext is used by objects outside of the CPU and is specifically defined in thread_context.hh. ExecContext is more implicit, and is defined by files such as base_dyn_inst.hh or cpu/simple/base.hh.
Further renames/reorganization will be coming shortly; what is currently CPUExecContext (the old ExecContext from m5) will be renamed to SimpleThread or something similar.
src/arch/alpha/arguments.cc: src/arch/alpha/arguments.hh: src/arch/alpha/ev5.cc: src/arch/alpha/faults.cc: src/arch/alpha/faults.hh: src/arch/alpha/freebsd/system.cc: src/arch/alpha/freebsd/system.hh: src/arch/alpha/isa/branch.isa: src/arch/alpha/isa/decoder.isa: src/arch/alpha/isa/main.isa: src/arch/alpha/linux/process.cc: src/arch/alpha/linux/system.cc: src/arch/alpha/linux/system.hh: src/arch/alpha/linux/threadinfo.hh: src/arch/alpha/process.cc: src/arch/alpha/regfile.hh: src/arch/alpha/stacktrace.cc: src/arch/alpha/stacktrace.hh: src/arch/alpha/tlb.cc: src/arch/alpha/tlb.hh: src/arch/alpha/tru64/process.cc: src/arch/alpha/tru64/system.cc: src/arch/alpha/tru64/system.hh: src/arch/alpha/utility.hh: src/arch/alpha/vtophys.cc: src/arch/alpha/vtophys.hh: src/arch/mips/faults.cc: src/arch/mips/faults.hh: src/arch/mips/isa_traits.cc: src/arch/mips/isa_traits.hh: src/arch/mips/linux/process.cc: src/arch/mips/process.cc: src/arch/mips/regfile/float_regfile.hh: src/arch/mips/regfile/int_regfile.hh: src/arch/mips/regfile/misc_regfile.hh: src/arch/mips/regfile/regfile.hh: src/arch/mips/stacktrace.hh: src/arch/sparc/faults.cc: src/arch/sparc/faults.hh: src/arch/sparc/isa_traits.hh: src/arch/sparc/linux/process.cc: src/arch/sparc/linux/process.hh: src/arch/sparc/process.cc: src/arch/sparc/regfile.hh: src/arch/sparc/solaris/process.cc: src/arch/sparc/stacktrace.hh: src/arch/sparc/ua2005.cc: src/arch/sparc/utility.hh: src/arch/sparc/vtophys.cc: src/arch/sparc/vtophys.hh: src/base/remote_gdb.cc: src/base/remote_gdb.hh: src/cpu/base.cc: src/cpu/base.hh: src/cpu/base_dyn_inst.hh: src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/exec_context.hh: src/cpu/cpu_exec_context.cc: src/cpu/cpu_exec_context.hh: src/cpu/cpuevent.cc: src/cpu/cpuevent.hh: src/cpu/exetrace.hh: src/cpu/intr_control.cc: src/cpu/memtest/memtest.hh: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/regfile.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/back_end.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/inorder_back_end.hh: src/cpu/ozone/lw_back_end.hh: src/cpu/ozone/lw_back_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/thread_state.hh: src/cpu/pc_event.cc: src/cpu/pc_event.hh: src/cpu/profile.cc: src/cpu/profile.hh: src/cpu/quiesce_event.cc: src/cpu/quiesce_event.hh: src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: src/cpu/static_inst.cc: src/cpu/static_inst.hh: src/cpu/thread_state.hh: src/dev/alpha_console.cc: src/dev/ns_gige.cc: src/dev/sinic.cc: src/dev/tsunami_cchip.cc: src/kern/kernel_stats.cc: src/kern/kernel_stats.hh: src/kern/linux/events.cc: src/kern/linux/events.hh: src/kern/system_events.cc: src/kern/system_events.hh: src/kern/tru64/dump_mbuf.cc: src/kern/tru64/tru64.hh: src/kern/tru64/tru64_events.cc: src/kern/tru64/tru64_events.hh: src/mem/vport.cc: src/mem/vport.hh: src/sim/faults.cc: src/sim/faults.hh: src/sim/process.cc: src/sim/process.hh: src/sim/pseudo_inst.cc: src/sim/pseudo_inst.hh: src/sim/syscall_emul.cc: src/sim/syscall_emul.hh: src/sim/system.cc: src/cpu/thread_context.hh: src/sim/system.hh: src/sim/vptr.hh: Change ExecContext to ThreadContext.
|
#
2670:9107b8bd08cd |
|
02-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zizzer.eecs.umich.edu:/.automount/zamp/z/ktlim2/clean/newmem
|
#
2669:f2b336e89d2a |
|
02-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes to get compiling to work. This is mainly fixing up some includes; changing functions within the XCs; changing MemReqPtrs to Requests or Packets where appropriate.
Currently the O3 and Ozone CPUs do not work in the new memory system; I still need to fix up the ports to work and handle responses properly. This check-in is so that the merge between m5 and newmem is no longer outstanding.
src/SConscript: Need to include FU Pool for new CPU model. I'll try to figure out a cleaner way to handle this in the future. src/base/traceflags.py: Include new traces flags, fix up merge mess up. src/cpu/SConscript: Include the base_dyn_inst.cc as one of othe sources. Don't compile the Ozone CPU for now. src/cpu/base.cc: Remove an extra } from the merge. src/cpu/base_dyn_inst.cc: Fixes to make compiling work. Don't instantiate the OzoneCPU for now. src/cpu/base_dyn_inst.hh: src/cpu/o3/2bit_local_pred.cc: src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst.hh: src/cpu/o3/alpha_params.hh: src/cpu/o3/bpred_unit.cc: src/cpu/o3/btb.hh: src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/free_list.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/regfile.hh: src/cpu/o3/sat_counter.hh: src/cpu/op_class.hh: src/cpu/ozone/cpu.hh: src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/exec_context.hh: src/cpu/checker/o3_cpu_builder.cc: src/cpu/ozone/cpu_impl.hh: src/mem/request.hh: src/cpu/o3/fu_pool.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/back_end.hh: src/cpu/ozone/dyn_inst.cc: src/cpu/ozone/dyn_inst.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/inorder_back_end.hh: src/cpu/ozone/lw_back_end.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/ozone_impl.hh: src/cpu/ozone/thread_state.hh: Fixes to get compiling to work. src/cpu/o3/alpha_cpu.hh: Fixes to get compiling to work. Float reg accessors have changed, as well as MemReqPtrs to RequestPtrs. src/cpu/o3/alpha_dyn_inst_impl.hh: Fixes to get compiling to work. Pass in the packet to the completeAcc function. Fix up syscall function.
|
#
2665:a124942bacb8 |
|
31-May-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Updated Authors from bk prs info
|
#
2654:9559cfa91b9d |
|
30-May-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/m5 into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem
SConstruct: src/SConscript: src/arch/SConscript: src/arch/alpha/faults.cc: src/arch/alpha/tlb.cc: src/base/traceflags.py: src/cpu/SConscript: src/cpu/base.cc: src/cpu/base.hh: src/cpu/base_dyn_inst.cc: src/cpu/cpu_exec_context.cc: src/cpu/cpu_exec_context.hh: src/cpu/exec_context.hh: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/regfile.hh: src/cpu/ozone/cpu.hh: src/cpu/simple/base.cc: src/cpu/base_dyn_inst.hh: src/cpu/o3/2bit_local_pred.cc: src/cpu/o3/2bit_local_pred.hh: src/cpu/o3/alpha_cpu.cc: src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_dyn_inst.cc: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/alpha_impl.hh: src/cpu/o3/alpha_params.hh: src/cpu/o3/bpred_unit.cc: src/cpu/o3/bpred_unit.hh: src/cpu/o3/bpred_unit_impl.hh: src/cpu/o3/btb.cc: src/cpu/o3/btb.hh: src/cpu/o3/comm.hh: src/cpu/o3/commit.cc: src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu_policy.hh: src/cpu/o3/decode.cc: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.cc: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/free_list.cc: src/cpu/o3/free_list.hh: src/cpu/o3/iew.cc: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.cc: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/mem_dep_unit.cc: src/cpu/o3/mem_dep_unit.hh: src/cpu/o3/mem_dep_unit_impl.hh: src/cpu/o3/ras.cc: src/cpu/o3/ras.hh: src/cpu/o3/rename.cc: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/rename_map.cc: src/cpu/o3/rename_map.hh: src/cpu/o3/rob.cc: src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: src/cpu/o3/sat_counter.cc: src/cpu/o3/sat_counter.hh: src/cpu/o3/store_set.cc: src/cpu/o3/store_set.hh: src/cpu/o3/tournament_pred.cc: src/cpu/o3/tournament_pred.hh: Hand merges.
|
#
2632:1bb2f91485ea |
|
22-May-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
New directory structure: - simulator source now in 'src' subdirectory - imported files from 'ext' repository - support building in arbitrary places, including outside of the source tree. See comment at top of SConstruct file for more details. Regression tests are temporarily disabled; that syetem needs more extensive revisions.
SConstruct: Update for new directory structure. Modify to support build trees that are not subdirectories of the source tree. See comment at top of file for more details. Regression tests are temporarily disabled. src/arch/SConscript: src/arch/isa_parser.py: src/python/SConscript: Update for new directory structure.
|