14298:e1c8c253ce95 |
13-Sep-2019 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu: Fix checker cpu instantiation
This change uses the params as instantiated from the default constructor to create the checker cpu. If any of these parameters are invalid for the checker cpu, the simulation will exit with a warning.
Change-Id: I0e58ed096c9ea5f413f2e9b64d8d184d9b0fc84e Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21079 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14297:b4519e586f5e |
10-Sep-2019 |
Jordi Vaquero <jordi.vaquero@metempsy.com> |
cpu, mem: Changing AtomicOpFunctor* for unique_ptr<AtomicOpFunctor>
This change is based on modify the way we move the AtomicOpFunctor* through gem5 in order to mantain proper ownership of the object and ensuring its destruction when it is no longer used.
Doing that we fix at the same time a memory leak in Request.hh where we were assigning a new AtomicOpFunctor* without destroying the previous one.
This change creates a new type AtomicOpFunctor_ptr as a std::unique_ptr<AtomicOpFunctor> and move its ownership as needed. Except for its only usage when AtomicOpFunc() is called.
Change-Id: Ic516f9d8217cb1ae1f0a19500e5da0336da9fd4f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20919 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14198:9c2f67392409 |
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Make get(Data|Inst)Port return a Port and not a MasterPort.
No caller uses any of the MasterPort specific properties of these function's return values, so we can instead return a reference to the base Port class. This makes it possible for the data and inst ports to be of any port type, not just gem5 style MasterPorts. This makes life simpler for, for example, systemc based CPUs which might have TLM ports.
It also makes it possible for any two CPUs which have compatible ports to be switched between, as long as the ports they use support being unbound. Unfortunately that does not include TLM or systemc ports which are bound permanently.
Change-Id: I98fce5a16d2ef1af051238e929dd96d57a4ac838 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20240 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> |
14195:c5efdb3319aa |
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Move the instruction port into o3's fetch stage.
That's where it's used, and that avoids having to pass it around using the top level getInstPort accessor.
Change-Id: I489a3f3239b3116292f3dcd78a3945fb468c6311 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20239 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
14194:967b9c450b04 |
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Move O3's data port into the LSQ.
That's where it's used, and putting it there avoids having to pass around the port using the top level getDataPort function.
Change-Id: I0dea25d0c5f4bb3f58a6574a8f2b2d242784caf2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20238 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
14136:67b0ce25b683 |
05-Aug-2019 |
Jordi Vaquero <jordi.vaquero@metempsy.com> |
cpu-o3: fix atomic instructions non-speculative
Fix problem with O3 and AMO instructions. At initial stages amo instruction is considered a type of non-speculative store. After the instruction has been commited and during the squash step, acquire_release version of the AMO operation is considered speculative, that differents results in an assert fault. This fix ensures that AMO instructions are always considered non-speculative, during early stages and during squas/removal of the instruction.
Change-Id: Ia0c5fbb9dc44a9991337b57eb759b1ed08e4149e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19815 Maintainer: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14135:a0affe46d00a |
26-Jul-2019 |
Jordi Vaquero <jordi.vaquero@metempsy.com> |
cpu-o3: added _amo_op parameter in o3 LSQ
Fix bug with AMO (or RMW) instructions where the amo_op variable is not being propagated to the LSQ request.
Change-Id: I60c59641d9b497051376f638e27f3c4cc361f615 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19814 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> |
14111:14c05f862590 |
15-Nov-2018 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu-o3: Fix too strict assert condition in writeback()
The assert() in the LSQ writeback() only allowed ReExec faults. However, a SplitRequest which completed the translation in PartialFault state (i.e. any but the very first cacheline translation failed) may end up here. The assert() condition is extended accordingly.
The patch also removes the superfluous/unused Complete/Squashed states from the LSQ request. (The completion of the request is recorded in the flags still.)
Change-Id: Ie575f4d3b4d5295585828ad8c7d3f4c7c1fe15d0 Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19174 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
14105:969b4e972b07 |
27-Feb-2019 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu: Add first-/non-faulting load support to Minor and O3
Some architectures allow masking faults of memory load instructions in some specific circumstances (e.g. first-faulting and non-faulting loads in Arm SVE). This patch adds support for such loads in the Minor and O3 CPU models.
Change-Id: I264a81a078f049127779aa834e89f0e693ba0bea Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19178 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14085:0075b0d29d55 |
28-Jun-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: isDrained renamed to isCpuDrained
cpu models inheriting from BaseCPU implement a draining checker called isDrained. This hides the base Drainable::isDrained method and might create confusion in the reader. This patch is renaming it to isCpuDrained in order to avoid any ambiguity
Change-Id: Ie5221da6a4673432c2403996e42d451cae960bbf Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19468 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14083:057fe59ed45a |
12-Jul-2019 |
Pouya Fotouhi <Pouya.Fotouhi@amd.com> |
cpu-o3: Set packet data type for IPR read
This change assigns packet data type to static for IPR read. Caused by change (e13d6dc9c0d7a4ae0215f1ee6793eb32570c5169), and has been reported a few times in the mailing list.
Change-Id: I0f02c20a16824e220df876e9e552bbc1c9636f95 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19449 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14080:4472576445e7 |
15-Feb-2018 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu-o3: Reset fault status for mem access in pushRequest
Reset the fault status always before translation is initiated in pushRequest() in the LSQ. This avoids the problem when a strictly ordered load needs to be re-executed multiple times. If the translation is delayed at one of those attempts then the internal panicFault (from the previous execution attempt) can get fired at commit.
Change-Id: I0c22b2f7afd6e2cb00bc359a4a01042efd2d01d2 Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19388 Reviewed-by: Ciro Santilli <ciro.santilli@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14030:a58e14bf581c |
08-Feb-2018 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu-o3: Increase LSQ buffer sizes to match max vector length
Change-Id: I5890c7cfa147125ce3389001f85d56d4b5a9911d Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13525 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Michael LeBeane <Michael.Lebeane@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
14025:3a133070aa2e |
26-Feb-2018 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
cpu-o3: Add support for pinned writes
This patch adds support for pinning registers for a certain number of consecutive writes. This is only relevant for timing CPU models (functional-only models are unaffected), and it is primarily needed to provide a realistic execution model for micro-coded operations whose microops can write to non-overlapping portions of a destination register, e.g. vector gather loads. In those cases, this mechanism can disable renaming for a sequence of consecutive writes, thus making the resulting execution more efficient: allocating a new physical register for each microop would introduce a read-modify-write chain of dependencies, while with these modifications the microops can write back in parallel.
Please note that this new feature is only leveraged by O3CPU for the time being.
Additional authors: - Gabor Dozsa <gabor.dozsa@arm.com>
Change-Id: I07eb5fdbd1fa0b748c9bdc1174d9f330fda34f81 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13520 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14024:abe47b13653d |
02-May-2019 |
Gabe Black <gabeblack@google.com> |
arch, base, cpu, gpu, sim: Merge getMemProxy and getVirtProxy.
These two functions were performing the same function but had two different names for historical reasons. This change merges them together, keeping the getVirtProxy name to be consistent with the getPhysProxy method used to get a non-translating proxy port.
Change-Id: Idd83c6b899f9343795075b030ccbc723a79e52a4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18581 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
14022:a7cdc33dab35 |
02-May-2019 |
Gabe Black <gabeblack@google.com> |
cpu, sim: Return PortProxy &s from all the proxy accessors.
This is a step towards merging the accessors for SE and FS modes.
Change-Id: I76818ab88b97097ac363e243be9cc1911b283090 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18579 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
14016:265e8272c728 |
25-May-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
cpu: Added correct return type for ROB::countInsts
- return size_t (unsigned) according to the .size() return type - fixed typo in doc (source of warning with some compilers)
Change-Id: I48ee2e317cf41011a6fcb5ca45aef67e75329bfa Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18948 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13981:577196ddd040 |
02-May-2019 |
Gabe Black <gabeblack@google.com> |
arch, base, cpu, dev, mem, sim: Remove #if 0-ed out code.
This code will be preserved through version control, but otherwise creates clutter and will rot in place since it's never compiled.
Change-Id: Id265f6deac445116843956ea5cf1210d8127274e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18608 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13954:2f400a5f2627 |
07-Jul-2017 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
cpu,mem: Add support for partial loads/stores and wide mem. accesses
This changeset adds support for partial (or masked) loads/stores, i.e. loads/stores that can disable accesses to individual bytes within the target address range. In addition, this changeset extends the code to crack memory accesses across most CPU models (TimingSimpleCPU still TBD), so that arbitrarily wide memory accesses are supported. These changes are required for supporting ISAs with wide vectors.
Additional authors: - Gabor Dozsa <gabor.dozsa@arm.com> - Tiago Muck <tiago.muck@arm.com>
Change-Id: Ibad33541c258ad72925c0b1d5abc3e5e8bf92d92 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13518 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> |
13953:43ae8a30ec1f |
23-Oct-2018 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
cpu: Add a memory access predicate
This changeset introduces a new predicate to guard memory accesses. The most immediate use for this is to allow proper handling of predicated-false vector contiguous loads and predicated-false micro-ops of vector gather loads (added in separate changesets).
Change-Id: Ice6894fe150faec2f2f7ab796a00c99ac843810a Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17991 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Bradley Wang <radwang@ucdavis.edu> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> |
13910:d5deee7b4279 |
28-Apr-2019 |
Gabe Black <gabeblack@google.com> |
cpu: alpha: Delete all occurrances of the simPalCheck function.
This is now handled within the ISA description.
Change-Id: Ie409bb46d102e59d4eb41408d9196fe235626d32 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18434 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13908:6ab98c626b06 |
27-Apr-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Remove hwrei from the generic interfaces.
This mechanism is specific to Alpha and doesn't belong sprinkled around the CPU's generic mechanisms.
Change-Id: I87904d1a08df2b03eb770205e2c4b94db25201a1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18432 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13905:5cf30883255c |
27-Apr-2019 |
Gabe Black <gabeblack@google.com> |
arch: cpu: Track kernel stats using the base ISA agnostic type.
Then cast to the ISA specific type when necessary. This removes (mostly) an ISA specific aspect to some of the interfaces. The ISA specific version of the kernel stats still needs to be constructed and stored in a few places which means that kernel_stats.hh still needs to be a switching arch header, for instance.
In the future, I'd like to make the kernel its own object like the Process objects in SE mode, and then it would be able to instantiate and maintain its own stats.
Change-Id: I8309d49019124f6bea1482aaea5b5b34e8c97433 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18429 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13900:d4bcfecd871e |
28-Apr-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Get rid of the (read|set)RegOtherThread methods.
These are implemented by MIPS internally now.
Change-Id: If7465e1666e51e1314968efb56a5a814e62ee2d1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18436 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13892:0182a0601f66 |
22-Apr-2019 |
Gabe Black <gabeblack@google.com> |
mem: Minimize the use of MemObject.
MemObject doesn't provide anything beyond its base ClockedObject any more, so this change removes it from most inheritance hierarchies. Occasionally MemObject is replaced with SimObject when I was fairly confident that the extra functionality of ClockedObject wasn't needed.
Change-Id: Ic014ab61e56402e62548e8c831eb16e26523fdce Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18289 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Gabe Black <gabeblack@google.com> |
13865:cca49fc49c57 |
13-Apr-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Eliminate the ProxyThreadContext class.
Replace it with direct inheritance from the ThreadContext class in the SimpleThread class which was the only place it was used.
Also take the opportunity to use some specialized types instead of ints, etc., add some consts, and fix some style issues.
Change-Id: I5d2cfa87b20dc43615e33e6755c9d016564e9c0e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18048 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13843:2d8dfe55d22a |
29-Mar-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: O3 switchFreeList checking VecElems instead of FloatRegs
Vector elements should be checked instead of floats since those are the ones mapped to the vector registers.
Change-Id: I36088ab90e63720d846fcf5b43360da105b6c736 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17850 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13831:4fba790d88be |
06-Mar-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
misc: Removed inconsistency in O3* debug msgs
Added consistency in the DEBUG message form, to allow a better parsing. Fixed sn/tid type parameter. Removed some annoying newlines
Change-Id: I4761c49fc12b874a7d8b46779475b606865cad4b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17248 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13830:b5d6aa6c0e99 |
25-Mar-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
arch-mips: added missing override specifier (o3)
Change-Id: Ic538825a2964fd62def672b933a83067a15bd12a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17648 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13818:f0126488ef9e |
26-Mar-2019 |
Javier Bueno <javier.bueno@metempsy.com> |
cpu: Added a probe to notify the address of retired instructions
A probe is added to notify the address of each retired instruction.
Change-Id: Iefc1b09d74b3aa0aa5773b17ba637bf51f5a59c9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17632 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13762:36d5a1d9f5e6 |
01-Mar-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
cpu: Refactor of Physical Register implementation
The implementation of the PhyRegId class is shared between multiple cpu models. The o3/misc.hh should only be included in o3 models.
This patch removes the dependencies between different model implementations, allowing to add new O3-like CPU model.
Change-Id: Ibb812517043befe75c48fab3ce9605a0d272870b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/16908 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Bradley Wang <radwang@ucdavis.edu> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13759:9941fca869a9 |
16-Oct-2018 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
arch-arm,cpu: Add initial support for Arm SVE
This changeset adds initial support for the Arm Scalable Vector Extension (SVE) by implementing: - support for most data-processing instructions (no loads/stores yet); - basic system-level support.
Additional authors: - Javier Setoain <javier.setoain@arm.com> - Gabor Dozsa <gabor.dozsa@arm.com> - Giacomo Travaglini <giacomo.travaglini@arm.com>
Thanks to Pau Cabre for his contribution of bugfixes.
Change-Id: I1808b5ff55b401777eeb9b99c9a1129e0d527709 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13515 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13734:a57152849a55 |
11-Feb-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
misc: Segmentation Fault during O3PipeView execution
During the O3PipeView execution, a potential invalid iterator is used to Update the instruction storeTick field.
If the store_idx iterator is the first() of the StoreQueue, the corresponding instruction is removed from the queue, leaving the iterator invalid and not usable in the TRACING_ON block.
This patch uses the store_inst variable to access (and update) the instruction tick, instead of the (potential) invalid one.
Change-Id: I671052ef282b9048e5239da8629b89e8afa86bf0 Reviewed-on: https://gem5-review.googlesource.com/c/16322 Maintainer: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
13710:5ba1d8066ef0 |
25-Jun-2018 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu-o3: Add cache read ports limit to LSQ
This change introduces cache read ports to limit the number of per-cycle loads. Previously only the number of per-cycle stores could be limited.
Change-Id: I39bbd984056c5a696725ee2db462a55b2079e2d4 Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13517 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13693:85fa3a41014b |
14-Feb-2019 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
cpu: Add ISA* getter in Thread interface
This patch is adding a ISA* getter to the TC interface
Change-Id: Ib8ddc5d8fdd44e782f50a2ad15878a6bcf931e58 Reviewed-on: https://gem5-review.googlesource.com/c/16462 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
13688:5bb3bf2f2559 |
15-Feb-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Fix fast build broken due to unused variable
This fixes fast build for commit 25dc765889d948693995cfa622f001aa94b5364b (fast build is striping out assertions)
Change-Id: I9536ad58a3d85990b16a1f8c2515f6bf5d3acf71 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/16463 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13665:9c7fe3811b88 |
25-Jan-2019 |
Andreas Sandberg <andreas.sandberg@arm.com> |
python: Don't assume SimObjects live in the global namespace
The importer in Python 3 doesn't like the way we import SimObjects from the global namespace. Convert the existing SimObject declarations to import from m5.objects. As a side-effect, this makes these files consistent with configuration files.
Change-Id: I11153502b430822130722839e1fa767b82a027aa Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15981 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> |
13652:45d94ac03a27 |
22-Jan-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: support atomic memory request type with AtomicOpFunctor
This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU, MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory system.
Atomic memory instruction is treated as a special store instruction in all CPU models.
In simple CPUs, an AMO request with an associated AtomicOpFunctor is simply sent to L1 dcache.
In MinorCPU, an AMO request bypasses store buffer and waits for any conflicting store request(s) currently in the store buffer to retire before the AMO request is sent to the cache. AMO requests are not buffered in the store buffer, so their effects appear immediately in the cache.
In DerivO3CPU, an AMO request is inserted in the store buffer so that it is delivered to the cache only after all previous stores are issued to the cache. Data forwarding between between an outstanding AMO in the store buffer and a subsequent load is not allowed since the AMO request does not hold valid data until it's executed in the cache.
This implementation assumes that a target ISA implementation must insert enough memory fences as micro-ops around an atomic instruction to enforce a correct order of memory instructions with respect to its memory consistency model. Without extra memory fences, this implementation can allow AMOs and other memory instructions that do not conflict (i.e., not target the same address) to reorder.
This implementation also assumes that atomic instructions execute within a cache line boundary since the cache for now is not able to execute an operation on two different cache lines in one single step. Therefore, ISAs like x86 that require multi-cache-line atomic instructions need to either use a pair of locking load and unlocking store or change the cache implementation to guarantee the atomicity of an atomic instruction.
Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a Reviewed-on: https://gem5-review.googlesource.com/c/8188 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13644:6180ee72e061 |
02-Apr-2018 |
Tuan Ta <qtt2@cornell.edu> |
sim,cpu: make exit_group halt all threads in a group
When a thread calls exit_group, in addition to halting the thread itself, it needs to halt all other threads in its group (i.e., threads sharing the same thread group ID). This patch enables threads to do that.
Change-Id: Ib2e158fb27cf98843f177a64a2d643b1bbc94d03 Reviewed-on: https://gem5-review.googlesource.com/c/9623 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13641:648f3106ebdf |
02-Apr-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: fixed how O3 CPU executes an exit system call
When a thread executed an exit syscall in SE mode, the thread context was removed immediately in the same cycle, which left inflight squash operations and trap event incomplete. The problem happened when a new thread was assigned to the CPU later. The new thread started with some incomplete transactions of the previous thread (e.g., squashing). This problem could cause incorrect execution flow for the new thread (i.e., pc was not reset properly at the exit point), deadlock (i.e., some stage-to-stage signals were not reset) and incorrect rename map between logical and physical registers.
This patch adds a new state called 'Halting' to the thread context and defers removing thread context from a CPU until a trap event initiated by an exit syscall execution is processed. This patch also makes sure that the removal of a thread context happens after all inflight transactions of the to-be-removed thread in the pipeline complete.
Change-Id: If7ef1462fb8864e22b45371ee7ae67e2a5ad38b8 Reviewed-on: https://gem5-review.googlesource.com/c/8184 Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13628:332f730a1855 |
04-Feb-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
misc: added missing override specifier
Added missing specifier for various virtual functions.
Change-Id: I4783e92d78789a9ae182fad79aadceafb00b2458 Reviewed-on: https://gem5-review.googlesource.com/c/16103 Reviewed-by: Hoa Nguyen <hoanguyen@ucdavis.edu> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13622:ba31c2a23eca |
21-Nov-2018 |
Gabe Black <gabeblack@google.com> |
cpu, arch: Replace the CCReg type with RegVal.
Most architectures weren't using the CCReg type, and in x86 and arm it was already a uint64_t.
Change-Id: I0b3d5e690e6b31db6f2627f449c89bde0f6750a6 Reviewed-on: https://gem5-review.googlesource.com/c/14515 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> |
13611:c8b7847b4171 |
19-Nov-2018 |
Gabe Black <gabeblack@google.com> |
arch: cpu: Rename *FloatRegBits* to *FloatReg*.
Now that there's no plain FloatReg, there's no reason to distinguish FloatRegBits with a special suffix since it's the only way to read or write FP registers.
Change-Id: I3a60168c1d4302aed55223ea8e37b421f21efded Reviewed-on: https://gem5-review.googlesource.com/c/14460 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
13610:5d5404ac6288 |
16-Oct-2018 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
arch,cpu: Add vector predicate registers
Latest-gen. vector/SIMD extensions, including the Arm Scalable Vector Extension (SVE), introduce the notion of a predicate register file. This changeset adds this feature across architectures and CPU models.
Change-Id: Iebcadbad89c0a582ff8b1b70de353305db603946 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13715 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
13601:f5c84915eb7f |
10-Jan-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu, arch, arch-arm: Wire unused VecElem code in the O3 model
VecElem code had been introduced in order to simulate change of renaming for vector registers. Most of the work is happening on the rename_map switchRenameMode. Change of renaming can happen after a squash in the pipeline. This patch is also changing the interface to the ISA part so that a PCState is used instead of ISA in order to check if rename mode has changed.
Change-Id: I8af795d771b958e0a0d459abfeceff5f16b4b5d4 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15601 |
13600:f39b1083ac84 |
10-Jan-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: O3 rename using the flatIndex instead of index
This patch is replacing the RegId::index with RegId::flatIndex so that it provides a valid register number when used by a VecElem register.
Change-Id: I5b000abb9457cd325c2a3021e772a75ea33d8a4c Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15600 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
13598:39220222740c |
04-Jan-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Fix VecElemClass bugs in cpu models
This patch is:
* Adding a missing VecElemClass entry * Fixing assertion in rename map which was checking the number of free vector registers rather than free vector element registers * Fixing assertion in read/setVecElemOperand APIs. * Using the right register index in SimpleThread * Using VecElem instead of VecReg on O3 readArchVecElem
Change-Id: I265320dcbe35eb47075991301dfc99333c5190c4 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15598 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13590:d7e018859709 |
13-Feb-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu-o3: O3 LSQ Generalisation
This patch does a large modification of the LSQ in the O3 model. The main goal of the patch is to remove the 'an operation can be served with one or two memory requests' assumption that is present in the LSQ and the instruction with the req, reqLow, reqHigh triplet, and generalising it to operations that can be addressed with one request, and operations that require many requests, embodied in the SingleDataRequest and the SplitDataRequest.
This modification has been done mimicking the minor model to an extent, shifting the responsibilities of dealing with VtoP translation and tracking the status and resources from the DynInst to the LSQ via the LSQRequest. The LSQRequest models the information concerning the operation, handles the creation of fragments for translation and request as well as assembling/splitting the data accordingly.
With this modifications, the implementation of vector ISAs, particularly on the memory side, become more rich, as the new model permits a dissociation of the ISA characteristics as vector length, from the microarchitectural characteristics that govern how contiguous loads are executing, allowing exploration of different LSQ to DL1 bus widths to understand the tradeoffs in complexity and performance.
Part of the complexities introduced stem from the fact that gem5 keeps a large amount of metadata regarding, in particular, memory operations, thus, when an instruction is squashed while some operation as TLB lookup or cache access is ongoing, when the relevant structure communicates to the LSQ that the operation is over, it tries to access some pieces of data that should have died when the instruction is squashed, leading to asserts, panics, or memory corruption. To ensure the correct behaviour, the LSQRequest rely on assesing who is their owner, and self-destroying if they detect their owner is done with the request, and there will be no subsequent action. For example, in the case of an instruction squashed whal the TLB is doing a walk to serve the translation, when the translation is served by the TLB, the LSQRequest detects that the instruction was squashed, and as the translation is done, no one else expect to access its information, and therefore, it self-destructs. Having destroyed the LSQRequest earlier, would lead to wrong behaviour as the TLB walk may access some fields of it.
Additional authors: - Gabor Dozsa <gabor.dozsa@arm.com>
Change-Id: I9578a1a3f6b899c390cdd886856a24db68ff7d0c Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13516 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
13582:989577bf6abc |
18-Oct-2018 |
Gabe Black <gabeblack@google.com> |
arch: cpu: Stop passing around misc registers by reference.
These values are all basic integers (specifically uint64_t now), and so passing them by const & is actually less efficient since there's a extra level of indirection and an extra value, and the same sized value (a 64 bit pointer vs. a 64 bit int) is being passed around.
Change-Id: Ie9956b8dc4c225068ab1afaba233ec2b42b76da3 Reviewed-on: https://gem5-review.googlesource.com/c/13626 Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
13563:68c171235dc5 |
03-Jan-2019 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Make the smtCommitPolicy a Param.ScopedEnum
The smtCommitPolicy is a parameter in the o3 cpu that can have 3 different values. Previously this setting was done through a string and a parser function would turn it into a c++ enum value. This changeset turns the string into a python Param.ScopedEnum.
Change-Id: I3625f2c08a1ae0c3b0dce7a641c6ae1ce3fd79a5 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15400 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13562:8fe39a3fc056 |
03-Jan-2019 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Make the smtROBPolicy a Param.ScopedEnum
The smtROBPolicy is a parameter in the o3 cpu that can have 3 different values. Previously this setting was done through a string and a parser function would turn it into a c++ enum value. This changeset turns the string into a python Param.ScopedEnum.
Change-Id: Ie104d055dbbc6e44997ae0c1470de714239be5a3 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15399 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13561:523608bb180c |
03-Jan-2019 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Make the smtIQPolicy a Param.ScopedEnum
The smtIQPolicy is a parameter in the o3 cpu that can have 3 different values. Previously this setting was done through a string and a parser function would turn it into a c++ enum value. This changeset turns the string into a python Param.ScopedEnum.
Change-Id: Ieecf0a19427dd250b0d5ae3d531ab46a37326ae5 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15398 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13560:f8732494c155 |
24-Dec-2018 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Make the smtLSQPolicy a Param.ScopedEnum
The smtLSQPolicy is a parameter in the o3 cpu that can have 3 different values. Previously this setting was done through a string and a parser function would turn it into a c++ enum value. This changeset turns the string into a python Param.ScopedEnum.
Change-Id: I82041b88bd914c5dc660058d9e3998e3114e7c35 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15397 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13559:e9983a972327 |
03-Jan-2019 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Make the smtFetchPolicy a Param.ScopedEnum
The smtFetchPolicy is a parameter in the o3 cpu that can have 5 different values. Previously this setting was done through a string and a parser function would turn it into a c++ enum value. This changeset turns the string into a python Param.ScopedEnum.
Change-Id: Iafb4b4b27587541185ea912e5ed581bce09695f5 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15396 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
13557:fc33e6048b25 |
13-Oct-2018 |
Gabe Black <gabeblack@google.com> |
cpu: dev: sim: gpu-compute: Banish some ISA specific register types.
These types are IntReg, FloatReg, FloatRegBits, and MiscReg. There are some remaining types, specifically the vector registers and the CCReg. I'm less familiar with these new types of registers, and so will look at getting rid of them at some later time.
Change-Id: Ide8f76b15c531286f61427330053b44074b8ac9b Reviewed-on: https://gem5-review.googlesource.com/c/13624 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> |
13546:6cd6d7b19498 |
12-Dec-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Fix usage of setArchVecElem
setArchVecElem should create a VecElemClass RegId, and not a VecRegClass. Initializing a VecRegClass with three arguments makes it panic
Change-Id: I6c398d67305bfe7bea12cb02edd4f4c3a202e69a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15655 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13501:ce73744918e7 |
19-Nov-2018 |
Gabe Black <gabeblack@google.com> |
cpu: Stop using unions to store FP registers.
These are now accessed only as integer values.
Change-Id: I21ae6537ebbcbaa02890384194ee1ce001c092bb Reviewed-on: https://gem5-review.googlesource.com/c/14458 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
13500:6e0a2a7c6d8c |
19-Nov-2018 |
Gabe Black <gabeblack@google.com> |
arch, cpu: Remove float type accessors.
Use the binary accessors instead.
Change-Id: Iff1877e92c79df02b3d13635391a8c2f025776a2 Reviewed-on: https://gem5-review.googlesource.com/c/14457 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
13492:3679580cd1e7 |
10-Dec-2018 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
cpu-o3: Fix bug in LSQUnit(uint32_t, uint32_t) ctor
Change 9af1214 added a new ctor to the LSQUnit, however there is a typo/bug because it sizes the SQEntries member variable to lqEntries + 1, as opposed to sqEntries + 1. This change corrects the issue by using sqEntries.
Change-Id: I19dfaa5c0e335bd7b84343a92034147d7c5d914e Reviewed-on: https://gem5-review.googlesource.com/c/15015 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13472:7ceacede4f1e |
01-Mar-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu: Change raw pointers to STL Containers
This patch changes two members from being raw pointers to being STL containers. The reason behind, other than cleanlyness and arguable OO best practices is that containers have more intronspections capabilities than naked pointers do, as the size is known.
Using STL containers adds little overhead and eases the automation of process during debugging (gdb).
Change-Id: I4d9d3eedafa8b5e50ac512ea93b458a4200229f2 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13126 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13453:4a7a060ea26e |
10-Feb-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu,arch-arm: Initialise data members
The value that is not initialized has a bogus value that manifests when using some debug-flags what makes the usage of tracediff a bit more challenging.
In addition, while debugging with other techniques, it introduces the problem of understanding if the value of a field is 'intended' or just an effect of the lack of initialisation.
Change-Id: Ied88caa77479c6f1d5166d80d1a1a057503cb106 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13125 Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
13449:2f7efa89c58b |
26-Nov-2018 |
Gabe Black <gabeblack@google.com> |
arch, base, cpu, gpu, mem: Replace assert(0 or false with panic.
Neither assert(0) nor assert(false) give any hint as to why control getting to them is bad, and their more descriptive versions, assert(0 && "description") and assert(false && "description"), jury rig assert to add an error message when the utility function panic() already does that directly with better formatting options.
This change replaces that flavor of call to assert with panic, except in the actual code which processes the formatting that panic uses (to avoid infinitely recurring error handling), and in some *.sm files since I don't know what rules those have to follow and don't want to accidentaly break them.
Change-Id: I8addfbfaf77eaed94ec8191f2ae4efb477cefdd0 Reviewed-on: https://gem5-review.googlesource.com/c/14636 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13429:a1e199fd8122 |
06-Feb-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu: Fix the usage of const DynInstPtr
Summary: Usage of const DynInstPtr& when possible and introduction of move operators to RefCountingPtr.
In many places, scoped references to dynamic instructions do a copy of the DynInstPtr when a reference would do. This is detrimental to performance. On top of that, in case there is a need for reference tracking for debugging, the redundant copies make the process much more painful than it already is.
Also, from the theoretical point of view, a function/method that defines a convenience name to access an instruction should not be considered an owner of the data, i.e., doing a copy and not a reference is not justified.
On a related topic, C++11 introduces move semantics, and those are useful when, for example, there is a class modelling a HW structure that contains a list, and has a getHeadOfList function, to prevent doing a copy to an internal variable -> update pointer, remove from the list -> update pointer, return value making a copy to the assined variable -> update pointer, destroy the returned value -> update pointer.
Change-Id: I3bb46c20ef23b6873b469fd22befb251ac44d2f6 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13105 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12857:6fc1b2a47d76 |
19-Jul-2018 |
Bradley Wang <radwang@ucdavis.edu> |
cpu: Removed unnecessary file reg_class_impl.hh
Previously, reg_class_impl.hh was added in order to prevent a cyclic dependency between it and the_isa.hh (See http://reviews.gem5.org/r/3754). It was determined that this was not necessary. The two files had almost entirely the same includes, and the current test-suite including multiple gcc and clang compilers on both MacOS and Linux successfully built the library with all functionality moved into the reg_class.hh file.
Change-Id: I0319e187b9eb280726a003951bb1ce315ffe17f5 Signed-off-by: Bradley Wang <radwang@ucdavis.edu> Reviewed-on: https://gem5-review.googlesource.com/11869 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12833:4566a9331697 |
20-Jan-2018 |
Hanhwi Jang <jang.hanhwi@gmail.com> |
cpu-o3: Missing freeing the heads of DepGraph in IQ squashing
Free the squahsed instructions' heads of DepGraph in IQ squashing
In a system with large register file (ex.2048), the number of DynInst hits the hardcoded limit (1500). This is caused by missing freeing the heads of DepGraph in IQ. IQ only clears out the heads when instructions reach writeback stage. If a instruction is squashed before writeback stage, its head of dependency graph, which holds the instruction's DynInstPtr, would not be cleared out. This prevents freeing the DynInst of the squahsed instruction even after it is committed.
Change-Id: I05b3db93cb6ad8960183d7ae765149c7f292e5b3 Reviewed-on: https://gem5-review.googlesource.com/7481 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12749:223c83ed9979 |
04-Jun-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
misc: Using smart pointers for memory Requests
This patch is changing the underlying type for RequestPtr from Request* to shared_ptr<Request>. Having memory requests being managed by smart pointers will simplify the code; it will also prevent memory leakage and dangling pointers.
Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10996 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> |
12748:ae5ce8e42de7 |
03-Jun-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
misc: Substitute pointer to Request with aliased RequestPtr
Every usage of Request* in the code has been replaced with the RequestPtr alias. This is a preparing patch for when RequestPtr will be the typdefed to a smart pointer to Request rather then a raw pointer to Request.
Change-Id: I73cbaf2d96ea9313a590cdc731a25662950cd51a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10995 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12622:91cce46512f2 |
27-Mar-2018 |
Gabe Black <gabeblack@google.com> |
cpu: Remove ExtMachInst typedefs from the O3 CPU model.
These typedefs aren't used, and they expose ISA specific types outside the ISA implementations.
Change-Id: I64b9cec18d6f92765eebbdf8c8f1de15c0deba34 Reviewed-on: https://gem5-review.googlesource.com/9404 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
12563:8d59ed22ae79 |
06-Mar-2018 |
Gabe Black <gabeblack@google.com> |
scons: Switch from the print statement to the print function.
Starting with version 3, scons imposes using the print function instead of the print statement in code it processes. To get things building again, this change moves all python code within gem5 to use the function version. Another change by another author separately made this same change to the site_tools and site_init.py files.
Change-Id: I2de7dc3b1be756baad6f60574c47c8b7e80ea3b0 Reviewed-on: https://gem5-review.googlesource.com/8761 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> |
12537:aeff8f3d80c9 |
13-Feb-2018 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu-o3: Don't add non-speculative mem barriers to the IQ twice
There are cases where the IEW adds a non-speculative instruction to the IQ twice. This can happen if an instruction is flagged as IsMemBarrier and IsNonSpeculative. Avoid adding non-speculative instructions in the IEW to the IQ by checking if it has been added already.
Change-Id: Ifcff676a451b57b2406ce00ed8dae19ed399515f Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Javier Setoain <javier.setoain@arm.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/8374 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12429:beefb9f5f551 |
09-Jan-2018 |
BKP <brandon.potter@amd.com> |
style: change C/C++ source permissions to noexec
Several files in the repository were tracked with execute permissions even though the files are just normal C/C++ files (and the one .isa).
Change-Id: I976b096acab4a1fc74c5699ef1f9b222c1e635c2 Reviewed-on: https://gem5-review.googlesource.com/7241 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12427:b0611f1ad833 |
20-Dec-2017 |
Gabe Black <gabeblack@google.com> |
alpha,arm,mips,power,riscv,sparc,x86,cpu: Get rid of ISA_HAS_DELAY_SLOT.
This constant is, first, a #define, and second only used in one place.
In that one place, it appears that the code it guards is no longer necessary in general. It was originally written to avoid refetching a block of data that you're still in, even if you've moved slightly farther in it because you're skipping the next instruction due to an annulled branch delay slot. In reality however, in SPARC, the one ISA I'm aware of which has this sort of branching behavior, the PC state object will correctly determine that no branch is happening in these cases. Code lower down in the loop will then recompute where fetching should continue based on the next PC, automatically skipping the annulled branch slot without misinterpretting the gap as a branch.
This change therefore also removes this block of code.
Change-Id: I820ebc9df10aeb4fcb69c12f6a784e9ec616743c Reviewed-on: https://gem5-review.googlesource.com/6821 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12422:9d6162c8c1de |
05-Jan-2018 |
Gabe Black <gabeblack@google.com> |
cpu: Use the NotAnInst flag to avoid passing an inst to fetch faults.
When a fault happens in fetch in O3, a dummy inst is created to carry the fault through the pipeline to commit, but conceptually there isn't actually any instruction since we failed to fetch one.
This change marks the dummy instruction as NotAnInst, and when any such instruction gets to commit, the fault object associated with it is invoked and passed a null static inst pointer instead of a pointer to the dummy inst.
Change-Id: I18d993083406deb625402e06af4ba0d4772ca5a3 Reviewed-on: https://gem5-review.googlesource.com/7124 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> |
12406:86bde4a026b5 |
22-Dec-2017 |
Gabe Black <gabeblack@google.com> |
arch,cpu: "virtualize" the TLB interface.
CPUs have historically instantiated the architecture specific version of the TLBs to avoid a virtual function call, making them a little bit more dependent on what the current ISA is. Some simple performance measurement, the x86 twolf regression on the atomic CPU, shows that there isn't actually any performance benefit, and if anything the simulator goes slightly faster (although still within margin of error) when the TLB functions are virtual.
This change switches everything outside of the architectures themselves to use the generic BaseTLB type, and then inside the ISA for them to cast that to their architecture specific type to call into architecture specific interfaces.
The ARM TLB needed the most adjustment since it was using non-standard translation function signatures. Specifically, they all took an extra "type" parameter which defaulted to normal, and translateTiming returned a Fault. translateTiming actually doesn't need to return a Fault because everywhere that consumed it just stored it into a structure which it then deleted(?), and the fault is stored in the Translation object when the translation is done.
A little more work is needed to fully obviate the arch/tlb.hh header, so the TheISA::TLB type is still visible outside of the ISAs. Specifically, the TlbEntry type is used in the generic PageTable which lives in src/mem.
Change-Id: I51b68ee74411f9af778317eff222f9349d2ed575 Reviewed-on: https://gem5-review.googlesource.com/6921 Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12405:01736aa058a5 |
20-Dec-2017 |
Gabe Black <gabeblack@google.com> |
cpu: Use the generic nop static inst instead of decoding the arch version.
This removes a dependence on the ISA.
Change-Id: I01013bc70558f0831327213912bcac11258066a6 Reviewed-on: https://gem5-review.googlesource.com/6824 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12385:288c62455dde |
13-Dec-2017 |
Gabe Black <gabeblack@google.com> |
cpu,alpha,mips,power,riscv,sparc: Get rid of eaComp and memAccInst.
Neither of these were used, particularly memAccInst.
Change-Id: I4ac9e44cf624e5de42519d586d7b699f08a2cdfc Reviewed-on: https://gem5-review.googlesource.com/6601 Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
12355:568ec3a0c614 |
07-Feb-2017 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu: Add support for CMOs in the cpu models
Cache maintenance operations go through the write channel of the cpu. This changes makes sure that the cpu does not try to fill in the packet with data.
Change-Id: Ic83205bb1cda7967636d88f15adcb475eb38d158 Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5055 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12334:e0ab29a34764 |
30-Nov-2017 |
Gabe Black <gabeblack@google.com> |
misc: Rename misc.(hh|cc) to logging.(hh|cc)
These files aren't a collection of miscellaneous stuff, they're the definition of the Logger interface, and a few utility macros for calling into that interface (panic, warn, etc.).
Change-Id: I84267ac3f45896a83c0ef027f8f19c5e9a5667d1 Reviewed-on: https://gem5-review.googlesource.com/6226 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Maintainer: Gabe Black <gabeblack@google.com> |
12319:db37ad4d5395 |
23-Nov-2017 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu-o3: Add missing vector stat initializers
All of the O3 vector stats added by 'arch: ISA parser additions of vector registers' are currently missing their stat initializers. Add the missing stat initialization to InstructionQueue::regStats.
Change-Id: Idc4b8e2824120a2542d8a604340a1b41bde6aa28 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/6101 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12284:b91c036913da |
20-Jul-2017 |
Jose Marinho <jose.marinho@arm.com> |
cpu, cpu, sim: move Cycle probe update
Move the code responsible for performing the actual probe point notify into BaseCPU. Use BaseCPU activateContext and suspendContext to keep track of sleep cycles. Create a probe point (ppActiveCycles) that does not count cycles where the processor was asleep. Rename ppCycles to ppAllCycles to reflect its nature.
Change-Id: I1907ddd07d0ff9f2ef22cc9f61f5f46c630c9d66 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5762 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12279:48bca1fee7a0 |
09-Oct-2017 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Prevent cpu from suspending if it is already draining
Suspending the current thread context while draining due to a quiesce pseudo instruction (for example a wfi instruction) could deadlock the cpu and prevent it from successfully draining. This change ensures that the cpu is not draining before suspending the thread context.
Change-Id: I7c019847f5a870d4bc9ce2b19936bc3dc45e5fd7 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5881 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12276:22c220be30c5 |
16-Mar-2017 |
Anouk Van Laer <anouk.vanlaer@arm.com> |
pwr: Adds logic to enter power gating for the cpu model
If the CPU has been clock gated for a sufficient amount of time (configurable via pwrGatingLatency), the CPU will go into the OFF power state. This does not model hardware, just behaviour.
Change-Id: Ib3681d1ffa6ad25eba60f47b4020325f63472d43 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/3969 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12255:9ef9176e4bb2 |
21-Sep-2017 |
Radhika Jagtap <radhika.jagtap@arm.com> |
cpu, probe: Fix elastic trace register dependency
Change-Id: I017852eac183fac3f914fdb96d7e72a56ea9d682 Reviewed-by: Nathanael Premillieu <nathanael.premillieu@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5121 Reviewed-by: Matthias Jung <jungma@eit.uni-kl.de> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12224:5c4d885d3507 |
02-Oct-2017 |
Jason Lowe-Power <jason@lowepower.com> |
cpu-o3: Add M5_VAR_USED to variable
Fixes compile error for gem5.fast on CLANG due to unused variable.
Change-Id: Iabe777a27d75ee8bfa7b214fff577aed3c7582c7 Signed-off-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-on: https://gem5-review.googlesource.com/4980 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> |
12217:0a16f4c03c02 |
27-Jul-2017 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Check predication before the SQ size for a debug print
The size of the store entry in the LSQ is used to indicate a fault in the execution of the store. At the same time, a store that is predicated false will also have 0 size in the corresponding store queue entry. This changeset ensures that we check if the store was predicated false before checking the size field. This way we avoid printing stores as faulting when they are only predicated false.
Change-Id: Ie07982197bd73d7b44d26a3257d54ecb103a952a Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/4821 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12216:70bb3ae0fbfc |
25-Jul-2017 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Avoid early checker verification for store conditionals
The O3CPU allows stores to commit before they are completed and as soon as they enter the store queue. This is the reason why stores are verified by the the checker CPU, separately, once they complete and after they are sent to the memory.
Store conditionals, on the other hand, have an additional writeback stage in the pipeline as they return their result to a register, similarly to loads. This is the reason why they do not commit before they receive a response from the memory. This allows store conditionals to be verified by the checker CPU as soon as they commit in the same way as all other non-store insturctions.
At the same time, the presense of a checker CPU should not require changes to way we handle instructions. This change removes explicit calls to: * incorrectly set the extra data of the request to 0 (a subsequent call to completeAcc already does this without making any ISA assumptions about the return value of the failed store conditional) * complete failing store conditionals
Change-Id: If21d70b21caa55b35e9fdcc50f254c590465d3c3 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/4820 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12181:2150eff234c1 |
25-Aug-2017 |
Gabe Black <gabeblack@google.com> |
stats: Get rid of some kernel stats related cruft.
The kernel stat mechanism should really be refactored and moved somewhere else, but in the mean time there's some old cruft that can be cleared away.
Change-Id: I21e725de590dda0d20bf3bc675bbe976c7b1bd86 Reviewed-on: https://gem5-review.googlesource.com/4600 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12171:b11b56bba18f |
28-Aug-2017 |
Matthias Hille <matthiashille8@gmail.com> |
cpu-o3: fix data pkt initialization for split load
When a split load hits a memory region where IPRs are mapped, the Writebackevent which is scheduled for that was carrying a data packet that was not correctly initialized which caused an assertion to fire when the Writeback event is processed.
Change-Id: I71a4e291f0086f7468d7e8124a0a8f098088972f Signed-off-by: Matthias Hille <matthiashille8@gmail.com> Reported-by: Matthias Hille <matthiashille8@gmail.com> Reviewed-on: https://gem5-review.googlesource.com/4620 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12144:3f2976f87529 |
18-Jul-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu: Add missing rename of vector registers in the O3 CPU
The introduction of a new vector register class broke rename in the O3 CPU due to an unhandled register class in DefaultRename<Impl>::renameSrcRegs(). This patch fixes adds the necessary handling to avoid a panic when the vector register file is used.
Change-Id: Ie380ab35ec4a151db15402f25b25b58931ee0581 Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/4140 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12143:e48005f585f2 |
06-Apr-2017 |
Anouk Van Laer <anouk.vanlaer@arm.com> |
cpu,o3: Fixed checkpointing bug occuring in the o3 CPU
Checkpointing a system with out-of-order CPUs might get stuck if one of the CPUs has been put to sleep. The quiesce instruction cannot get drained hence checkpointing never finishes.
This commit resolves that by activating all suspended thread contexts when draining the system.
Change-Id: I817ab1672b4ead777bd8e12a0445829481c46fdc Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/3970 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12127:4207df055b0d |
28-Jun-2017 |
Sean Wilson <spwilson2@wisc.edu> |
cpu: Refactor some Event subclasses to lambdas
Change-Id: If765c6100d67556f157e4e61aa33c2b7eeb8d2f0 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3923 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12110:c24ee249b8ba |
05-Apr-2017 |
Rekai Gonzalez-Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
arch: ISA parser additions of vector registers
Reiley's update :) of the isa parser definitions. My addition of the vector element operand concept for the ISA parser. Nathanael's modification creating a hierarchy between vector registers and its constituencies to the isa parser.
Some fixes/updates on top to consider instructions as vectors instead of floating when they use the VectorRF. Some counters added to all the models to keep faithful counts.
Change-Id: Id8f162a525240dfd7ba884c5a4d9fa69f4050101 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2706 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12109:f29e9c5418aa |
05-Apr-2017 |
Rekai Gonzalez-Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
cpu: Added interface for vector reg file
This patch adds some more functionality to the cpu model and the arch to interface with the vector register file.
This change consists mainly of augmenting ThreadContexts and ExecContexts with calls to get/set full vectors, underlying microarchitectural elements or lanes. Those are meant to interface with the vector register file. All classes that implement this interface also get an appropriate implementation.
This requires implementing the vector register file for the different models using the VecRegContainer class.
This change set also updates the Result abstraction to contemplate the possibility of having a vector as result.
The changes also affect how the remote_gdb connection works.
There are some (nasty) side effects, such as the need to define dummy numPhysVecRegs parameter values for architectures that do not implement vector extensions.
Nathanael Premillieu's work with an increasing number of fixes and improvements of mine.
Change-Id: Iee65f4e8b03abfe1e94e6940a51b68d0977fd5bb Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues and CC reg free list initialisation ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2705 |
12106:7784fac1b159 |
05-Apr-2017 |
Rekai Gonzalez-Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
cpu: Simplify the rename interface and use RegId
With the hierarchical RegId there are a lot of functions that are redundant now.
The idea behind the simplification is that instead of having the regId, telling which kind of register read/write/rename/lookup/etc. and then the function panic_if'ing if the regId is not of the appropriate type, we provide an interface that decides what kind of register to read depending on the register type of the given regId.
Change-Id: I7d52e9e21fc01205ae365d86921a4ceb67a57178 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2702 |
12105:742d80361989 |
05-Apr-2017 |
Nathanael Premillieu <nathanael.premillieu@arm.com> |
cpu: Physical register structural + flat indexing
Mimic the changes done on the architectural register indexes on the physical register indexes. This is specific to the O3 model. The structure, called PhysRegId, contains a register class, a register index and a flat register index. The flat register index is kept because it is useful in some cases where the type of register is not important (dependency graph and scoreboard for example). Instead of directly using the structure, most of the code is working with a const PhysRegId* (typedef to PhysRegIdPtr). The actual PhysRegId objects are stored in the regFile.
Change-Id: Ic879a3cc608aa2f34e2168280faac1846de77667 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2701 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12104:edd63f9c6184 |
05-Apr-2017 |
Nathanael Premillieu <nathanael.premillieu@arm.com> |
arch, cpu: Architectural Register structural indexing
Replace the unified register mapping with a structure associating a class and an index. It is now much easier to know which class of register the index is referring to. Also, when adding a new class there is no need to modify existing ones.
Change-Id: I55b3ac80763702aa2cd3ed2cbff0a75ef7620373 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2700 |
12085:de78ea63e0ca |
07-Jun-2017 |
Sean Wilson <spwilson2@wisc.edu> |
cpu, gpu-compute: Replace EventWrapper use with EventFunctionWrapper
Change-Id: Idd5992463bcf9154f823b82461070d1f1842cea3 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3746 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12022:256a709054f3 |
09-Apr-2017 |
Alec Roelke <ar4jc@virginia.edu> |
cpu: fix problem with forwarding and locked load
If a (regular) store is followed closely enough by a locked load that overlaps, the LSQ will forward the store's data to the locked load and never tell the cache about the locked load. As a result, the cache will not lock the address and all future store-conditional requests on that address will fail. This patch fixes that by preventing forwarding if the memory request is a locked load and adding another case to the LSQ forwarding logic that delays the locked load request if a store in the LSQ contains all or part of the data that is requested.
[Merge second and last if blocks because their bodies are the same.]
Change-Id: I895cc2b9570035267bdf6ae3fdc8a09049969841 Reviewed-on: https://gem5-review.googlesource.com/2400 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
11886:43b882cada33 |
27-Feb-2017 |
Brandon Potter <brandon.potter@amd.com> |
syscall_emul: [PATCH 15/22] add clone/execve for threading and multiprocess simulations
Modifies the clone system call and adds execve system call. Requires allowing processes to steal thread contexts from other processes in the same system object and the ability to detach pieces of process state (such as MemState) to allow dynamic sharing. |
11877:5ea85692a53e |
20-Jul-2015 |
Brandon Potter <brandon.potter@amd.com> |
syscall_emul: [patch 13/22] add system call retry capability
This changeset adds functionality that allows system calls to retry without affecting thread context state such as the program counter or register values for the associated thread context (when system calls return with a retry fault).
This functionality is needed to solve problems with blocking system calls in multi-process or multi-threaded simulations where information is passed between processes/threads. Blocking system calls can cause deadlock because the simulator itself is single threaded. There is only a single thread servicing the event queue which can cause deadlock if the thread hits a blocking system call instruction.
To illustrate the problem, consider two processes using the producer/consumer sharing model. The processes can use file descriptors and the read and write calls to pass information to one another. If the consumer calls the blocking read system call before the producer has produced anything, the call will block the event queue (while executing the system call instruction) and deadlock the simulation.
The solution implemented in this changeset is to recognize that the system calls will block and then generate a special retry fault. The fault will be sent back up through the function call chain until it is exposed to the cpu model's pipeline where the fault becomes visible. The fault will trigger the cpu model to replay the instruction at a future tick where the call has a chance to succeed without actually going into a blocking state.
In subsequent patches, we recognize that a syscall will block by calling a non-blocking poll (from inside the system call implementation) and checking for events. When events show up during the poll, it signifies that the call would not have blocked and the syscall is allowed to proceed (calling an underlying host system call if necessary). If no events are returned from the poll, we generate the fault and try the instruction for the thread context at a distant tick. Note that retrying every tick is not efficient.
As an aside, the simulator has some multi-threading support for the event queue, but it is not used by default and needs work. Even if the event queue was completely multi-threaded, meaning that there is a hardware thread on the host servicing a single simulator thread contexts with a 1:1 mapping between them, it's still possible to run into deadlock due to the event queue barriers on quantum boundaries. The solution of replaying at a later tick is the simplest solution and solves the problem generally. |
11793:ef606668d247 |
09-Nov-2016 |
Brandon Potter <brandon.potter@amd.com> |
style: [patch 1/22] use /r/3648/ to reorganize includes |
11781:1ae84c76066b |
21-Dec-2016 |
Arthur Perais <arthur.perais@inria.fr> |
cpu: Resolve targets of predicted 'taken' decode for O3
The target of taken conditional direct branches does not need to be resolved in IEW: the target can be computed at decode, usually using the decoded instruction word and the PC.
The higher-than-necessary penalty is taken only on conditional branches that are predicted taken but miss in the BTB. Thus, this is mostly inconsequential on IPC if the BTB is big/associative enough (fewer capacity/conflict misses). Nonetheless, what gem5 simulates is not representative of how conditional branch targets can be handled.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com> |
11780:9af039ea0c1e |
21-Dec-2016 |
Arthur Perais <arthur.perais@inria.fr> |
cpu: Clarify meaning of cachePorts variable in lsq_unit.hh of O3
cachePorts currently constrains the number of store packets written to the D-Cache each cycle), but loads currently affect this variable. This leads to unexpected congestion (e.g., setting cachePorts to a realistic 1 will in fact allow a store to WB only if no loads have accessed the D-Cache this cycle). In the absence of arbitration, this patch decouples how many loads can be done per cycle from how many stores can be done per cycle.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com> |
11683:f1e198a028be |
15-Oct-2016 |
Fernando Endo <fernando.endo2@gmail.com> |
cpu, arm: Distinguish Float* and SimdFloat*, create FloatMem* opClass
Modify the opClass assigned to AArch64 FP instructions from SimdFloat* to Float*. Also create the FloatMemRead and FloatMemWrite opClasses, which distinguishes writes to the INT and FP register banks. Change the latency of (Simd)FloatMultAcc to 5, based on the Cortex-A72, where the "latency" of FMADD is 3 if the next instruction is a FMADD and has only the augend to destination dependency, otherwise it's 7 cycles.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com> |
11650:fe601d7bd955 |
22-Sep-2016 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu: Fix the O3 CPU Drain
The drain did not wait until stages were ready again. Therefore, as a result of messages in the TimeBuffer being drain, the state after the drain was not consistent and asserts fired in some places when the draining happened after a stage got blocked, but before the notification arrived to the previous stages.
Change-Id: Ib50b3b40b7f745b62c1eba2931dec76860824c71 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11627:fe32a5238754 |
13-Sep-2016 |
Michael LeBeane <michael.lebeane@amd.com> |
sim: Refactor quiesce and remove FS asserts The quiesce family of magic ops can be simplified by the inclusion of quiesceTick() and quiesce() functions on ThreadContext. This patch also gets rid of the FS guards, since suspending a CPU is also a valid operation for SE mode. |
11526:5b81895e5d5e |
06-Jun-2016 |
David Guillen Fandos <david.guillen@arm.com> |
pwr: Low-power idle power state for idle CPUs
Add functionality to the BaseCPU that will put the entire CPU into a low-power idle state whenever all threads in it are idle.
Change-Id: I984d1656eb0a4863c87ceacd773d2d10de5cfd2b |
11523:81332eb10367 |
06-Jun-2016 |
David Guillen Fandos <david.guillen@arm.com> |
stats: Fixing regStats function for some SimObjects
Fixing an issue with regStats not calling the parent class method for most SimObjects in Gem5. This causes issues if one adds new stats in the base class (since they are never initialized properly!).
Change-Id: Iebc5aa66f58816ef4295dc8e48a357558d76a77c Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11435:0f1b46dde3fa |
07-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
mem: Remove threadId from memory request class
In general, the ThreadID parameter is unnecessary in the memory system as the ContextID is what is used for the purposes of locks/wakeups. Since we allocate sequential ContextIDs for each thread on MT-enabled CPUs, ThreadID is unnecessary as the CPUs can identify the requesting thread through sideband info (SenderState / LSQ entries) or ContextID offset from the base ContextID for a cpu.
This is a re-spin of 20264eb after the revert (bd1c6789) and includes some fixes of that commit. |
11429:cf5af0cc3be4 |
06-Apr-2016 |
Andreas Sandberg <andreas.sandberg@arm.com> |
Revert power patch sets with unexpected interactions
The following patches had unexpected interactions with the current upstream code and have been reverted for now:
e07fd01651f3: power: Add support for power models 831c7f2f9e39: power: Low-power idle power state for idle CPUs 4f749e00b667: power: Add power states to ClockedObject
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11428:20264eb69fbf |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
mem: Remove threadId from memory request class
In general, the ThreadID parameter is unnecessary in the memory system as the ContextID is what is used for the purposes of locks/wakeups. Since we allocate sequential ContextIDs for each thread on MT-enabled CPUs, ThreadID is unnecessary as the CPUs can identify the requesting thread through sideband info (SenderState / LSQ entries) or ContextID offset from the base ContextID for a cpu. |
11423:831c7f2f9e39 |
09-Dec-2014 |
Akash Bagdia <akash.bagdia@ARM.com> |
power: Low-power idle power state for idle CPUs
Add functionality to the BaseCPU that will put the entire CPU into a low-power idle state whenever all threads in it are idle. |
11365:83c3e117464e |
05-May-2015 |
Rekai Gonzalez Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
cpu: Change literal integer constants to meaningful labels
fu_pool and inst_queue were using -1 for "no such FU" and -2 for "all those FUs are busy at the moment" when requesting for a FU and replying. This patch introduces new constants NoCapableFU and NoFreeFU respectively.
In addition, the condition (idx == -2 || idx != -1) is equivalent to (idx != -1), so this patch also simplifies that. |
11359:b0b976a1ceda |
27-Nov-2015 |
Andreas Sandberg <andreas@sandberg.pp.se> |
base: Add support for changing output directories
This changeset adds support for changing the simulator output directory. This can be useful when the simulation goes through several stages (e.g., a warming phase, a simulation phase, and a verification phase) since it allows the output from each stage to be located in a different directory. Relocation is done by calling core.setOutputDir() from Python or simout.setOutputDirectory() from C++.
This change affects several parts of the design of the gem5's output subsystem. First, files returned by an OutputDirectory instance (e.g., simout) are of the type OutputStream instead of a std::ostream. This allows us to do some more book keeping and control re-opening of files when the output directory is changed. Second, new subdirectories are OutputDirectory instances, which should be used to create files in that sub-directory.
Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se> [sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version] Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11357:6668387fa488 |
10-Aug-2015 |
Stephan Diestelhorst <stephan.diestelhorst@arm.com> |
mem, cpu: Add assertions to snoop invalidation logic
This patch adds assertions that enforce that only invalidating snoops will ever reach into the logic that tracks in-order load completion and also invalidation of LL/SC (and MONITOR / MWAIT) monitors. Also adds some comments to MSHR::replaceUpgrades(). |
11356:a80884911971 |
19-Jul-2015 |
Krishnendra Nathella <krinat01@arm.com> |
cpu: Fix LLSC atomic CPU wakeup
Writes to locked memory addresses (LLSC) did not wake up the locking CPU. This can lead to deadlocks on multi-core runs. In AtomicSimpleCPU, recvAtomicSnoop was checking if the incoming packet was an invalidation (isInvalidate) and only then handled a locked snoop. But, writes are seen instead of invalidates when running without caches (fast-forward configurations). As as simple fix, now handleLockedSnoop is also called even if the incoming snoop packet are from writes. |
11331:cd5c48db28e6 |
10-Feb-2016 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Deduce if cache should forward snoops
This patch changes how the cache determines if snoops should be forwarded from the memory side to the CPU side. Instead of having a parameter, the cache now looks at the port connected on the CPU side, and if it is a snooping port, then snoops are forwarded. Less error prone, and less parameters to worry about.
The patch also tidies up the CPU classes to ensure that their I-side port is not snooping by removing overrides to the snoop request handler, such that snoop requests will panic via the default MasterPort implement |
11321:02e930db812d |
06-Feb-2016 |
Steve Reinhardt <steve.reinhardt@amd.com> |
style: fix missing spaces in control statements
Result of running 'hg m5style --skip-all --fix-control -a'. |
11320:42ecb523c64a |
06-Feb-2016 |
Steve Reinhardt <steve.reinhardt@amd.com> |
style: remove trailing whitespace
Result of running 'hg m5style --skip-all --fix-white -a'. |
11302:bce9037689b0 |
17-Jan-2016 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu: remove unnecessary data ptr from O3 internal read() funcs
The read() function merely initiates a memory read operation; the data doesn't arrive until the access completes and a response packet is received from the memory system. Thus there's no need to provide a data pointer; its existence is historical.
Getting this pointer out of this internal o3 interface sets the stage for similar cleanup in the ExecContext interface. Also found that we were pointlessly setting the contents at this pointer on a store forward (the useful memcpy happens just a few lines below the deleted one). |
11284:b3926db25371 |
31-Dec-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Make cache terminology easier to understand
This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions.
The following name changes are made:
* the packet memInhibit flag is renamed to cacheResponding
* the packet sharedAsserted flag is renamed to hasSharers
* the packet NeedsExclusive attribute is renamed to NeedsWritable
* the packet isSupplyExclusive is renamed responderHadWritable
* the MSHR pendingDirty is renamed to pendingModified
The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. |
11253:daf9f91b11e9 |
07-Dec-2015 |
Radhika Jagtap <radhika.jagtap@ARM.com> |
cpu: Support virtual addr in elastic traces
This patch adds support to optionally capture the virtual address and asid for load/store instructions in the elastic traces. If they are present in the traces, Trace CPU will set those fields of the request during replay. |
11252:18bb597fc40c |
07-Dec-2015 |
Radhika Jagtap <radhika.jagtap@ARM.com> |
cpu: Create record type enum for elastic traces
This patch replaces the booleans that specified the elastic trace record type with an enum type. The source of change is the proto message for elastic trace where the enum is introduced. The struct definitions in the elastic trace probe listener as well as the Trace CPU replace the boleans with the proto message enum.
The patch does not impact functionality, but traces are not compatible with previous version. This is preparation for adding new types of records in subsequent patches. |
11247:76f75db08e09 |
07-Dec-2015 |
Radhika Jagtap <radhika.jagtap@ARM.com> |
proto, probe: Add elastic trace probe to o3 cpu
The elastic trace is a type of probe listener and listens to probe points in multiple stages of the O3CPU. The notify method is called on a probe point typically when an instruction successfully progresses through that stage.
As different listener methods mapped to the different probe points execute, relevant information about the instruction, e.g. timestamps and register accesses, are captured and stored in temporary InstExecInfo class objects. When the instruction progresses through the commit stage, the timing and the dependency information about the instruction is finalised and encapsulated in a struct called TraceInfo. TraceInfo objects are collected in a list instead of writing them out to the trace file one a time. This is required as the trace is processed in chunks to evaluate order dependencies and computational delay in case an instruction does not have any register dependencies. By this we achieve a simpler algorithm during replay because every record in the trace can be hooked onto a record in its past. The instruction dependency trace is written out as a protobuf format file. A second trace containing fetch requests at absolute timestamps is written to a separate protobuf format file.
If the instruction is not executed then it is not added to the trace. The code checks if the instruction had a fault, if it predicated false and thus previous register values were restored or if it was a load/store that did not have a request (e.g. when the size of the request is zero). In all these cases the instruction is set as executed by the Execute stage and is picked up by the commit probe listener. But a request is not issued and registers are not written. So practically, skipping these should not hurt the dependency modelling.
If squashing results in squashing younger instructions, it may happen that the squash probe discards the inst and removes it from the temporary store but execute stage deals with the instruction in the next cycle which results in the execute probe seeing this inst as 'new' inst. A sequence number of the last processed trace record is used to trap these cases and not add to the temporary store.
The elastic instruction trace and fetch request trace can be read in and played back by the TraceCPU. |
11246:93d2a1526103 |
07-Dec-2015 |
Radhika Jagtap <radhika.jagtap@ARM.com> |
probe: Add probe in Fetch, IEW, Rename and Commit
This patch adds probe points in Fetch, IEW, Rename and Commit stages as follows.
A probe point is added in the Fetch stage for probing when a fetch request is sent. Notify is fired on the probe point when a request is sent succesfully in the first attempt as well as on a retry attempt.
Probe points are added in the IEW stage when an instruction begins to execute and when execution is complete. This points can be used for monitoring the execution time of an instruction.
Probe points are added in the Rename stage to probe renaming of source and destination registers and when there is squashing. These probe points can be used to track register dependencies and remove when there is squashing.
A probe point for squashing is added in Commit to probe squashed instructions. |
11243:f876d08c7b21 |
04-Dec-2015 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: fix unitialized variable which may cause assertion failure
The assert in lsq_unit_impl.hh line 963 needs pktPending to be initialized to NULL (I got the assertion failure several times without the fix).
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
11225:9bc552f9e4b0 |
22-Nov-2015 |
Nathanael Premillieu <nathananel.premillieu@arm.com> |
cpu: Fix base FP and CC register index in o3 insertThread()
Note that the method is not used, and could possibly be deleted. |
11213:f0c7b76cadab |
16-Nov-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
o3: drop unused statistic wbPenalized and wbPenalizedRate |
11169:44b5c183c3cd |
12-Oct-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Add explicit overrides and fix other clang >= 3.5 issues
This patch adds explicit overrides as this is now required when using "-Wall" with clang >= 3.5, the latter now part of the most recent XCode. The patch consequently removes "virtual" for those methods where "override" is added. The latter should be enough of an indication.
As part of this patch, a few minor issues that clang >= 3.5 complains about are also resolved (unused methods and variables). |
11168:f98eb2da15a4 |
12-Oct-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Remove redundant compiler-specific defines
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions. |
11165:d90aec9435bd |
09-Oct-2015 |
Rekai Gonzalez Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
isa: Add parameter to pick different decoder inside ISA
The decoder is responsible for splitting instructions in micro operations (uops). Given that different micro architectures may split operations differently, this patch allows to specify which micro architecture each isa implements, so different cores in the system can split instructions differently, also decoupling uop splitting (microArch) from ISA (Arch). This is done making the decodification calls templates that receive a type 'DecoderFlavour' that maps the name of the operation to the class that implements it. This way there is only one selection point (converting the command line enum to the appropriate DecodeFeatures object). In addition, there is no explicit code replication: template instantiation hides that, and the compiler should be able to resolve a number of things at compile-time. |
11151:ca4ea9b5c052 |
30-Sep-2015 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu,isa,mem: Add per-thread wakeup logic
Changes wakeup functionality so that only specific threads on SMT capable cpus are woken. |
11150:a8a64cca231b |
30-Sep-2015 |
Mitch Hayenga <mitch.hayenga@arm.com> |
isa,cpu: Add support for FS SMT Interrupts
Adds per-thread interrupt controllers and thread/context logic so that interrupts properly get routed in SMT systems. |
11148:1bc3d93c7eaa |
30-Sep-2015 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add per-thread monitors
Adds per-thread address monitors to support FullSystem SMT. |
11097:da477ae38907 |
15-Sep-2015 |
Hongil Yoon <ongal@cs.wisc.edu> |
cpu, o3: consider split requests for LSQ checksnoop operations
This patch enables instructions in LSQ to track two physical addresses for corresponding two split requests. Later, the information is used in checksnoop() to search for/invalidate the corresponding LD instructions.
The current implementation has kept track of only the physical address that is referenced by the first split request. Thus, for checksnoop(), the line accessed by the second request has not been considered, causing potential correctness issues.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
11005:e7f403b6b76f |
07-Aug-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
base: Declare a type for context IDs
Context IDs used to be declared as ad hoc (usually as int). This changeset introduces a typedef for ContextIDs and a constant for invalid context IDs. |
10960:b51a2a09ac7d |
20-Jul-2015 |
David Hashe <david.hashe@amd.com> |
cpu: Fixed a bug on where to fetch the next instruction from
Figure out if the next instruction to fetch comes from the micro-op ROM or not. Otherwise, wrong instructions may be fetched. |
10935:acd48ddd725f |
28-Jul-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
revert 5af8f40d8f2c |
10934:5af8f40d8f2c |
26-Jul-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: implements vector registers
This adds a vector register type. The type is defined as a std::array of a fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now. |
10933:e1309937d313 |
26-Jul-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: slight correction to identation in rename_impl.hh |
10913:38dbdeea7f1f |
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Refactor and simplify the drain API
The drain() call currently passes around a DrainManager pointer, which is now completely pointless since there is only ever one global DrainManager in the system. It also contains vestiges from the time when SimObjects had to keep track of their child objects that needed draining.
This changeset moves all of the DrainState handling to the Drainable base class and changes the drain() and drainResume() calls to reflect this. Particularly, the drain() call has been updated to take no parameters (the DrainManager argument isn't needed) and return a DrainState instead of an unsigned integer (there is no point returning anything other than 0 or 1 any more). Drainable objects should return either DrainState::Draining (equivalent to returning 1 in the old system) if they need more time to drain or DrainState::Drained (equivalent to returning 0 in the old system) if they are already in a consistent state. Returning DrainState::Running is considered an error.
Drain done signalling is now done through the signalDrainDone() method in the Drainable class instead of using the DrainManager directly. The new call checks if the state of the object is DrainState::Draining before notifying the drain manager. This means that it is safe to call signalDrainDone() without first checking if the simulator has requested draining. The intention here is to reduce the code needed to implement draining in simple objects. |
10910:32f3d1c454ec |
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Make the drain state a global typed enum
The drain state enum is currently a part of the Drainable interface. The same state machine will be used by the DrainManager to identify the global state of the simulator. Make the drain state a global typed enum to better cater for this usage scenario. |
10905:a6ca6831e775 |
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Refactor the serialization base class
Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically:
* Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section.
* Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects.
* Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections).
* The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects.
* Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code. |
10897:a90d22342aa5 |
04-Jul-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
o3: correct the number of cc registers in rename map |
10835:d4b162a57400 |
15-May-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Appease gcc 5.1
Three minor issues are resolved:
1. Apparently gcc 5.1 does not like negation of booleans followed by bitwise AND.
2. Somehow the compiler also gets confused and warns about NoopMachInst being unused (removing it causes compilation errors though). Most likely a compiler bug.
3. There seems to be a number of instances where loop unrolling causes false positives for the array-bounds check. For now, switch to std::array. Potentially we could disable the warning for newer gcc versions, but switching to std::array is probably a good move in any case. |
10824:308771bd2647 |
05-May-2015 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
mem, cpu: Add a separate flag for strictly ordered memory
The Request::UNCACHEABLE flag currently has two different functions. The first, and obvious, function is to prevent the memory system from caching data in the request. The second function is to prevent reordering and speculation in CPU models.
This changeset gives the order/speculation requirement a separate flag (Request::STRICT_ORDER). This flag prevents CPU models from doing the following optimizations:
* Speculation: CPU models are not allowed to issue speculative loads.
* Write combining: CPU models and caches are not allowed to merge writes to the same cache line.
Note: The memory system may still reorder accesses unless the UNCACHEABLE flag is set. It is therefore expected that the STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent this behavior. |
10821:581fb2484bd6 |
05-May-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Snoop into caches on uncacheable accesses
This patch takes a last step in fixing issues related to uncacheable accesses. We do not separate uncacheable memory from uncacheable devices, and in cases where it is really memory, there are valid scenarios where we need to snoop since we do not support cache maintenance instructions (yet). On snooping an uncacheable access we thus provide data if possible. In essence this makes uncacheable accesses IO coherent.
The snoop filter is also queried to steer the snoops, but not updated since the uncacheable accesses do not allocate a block. |
10814:46b6043bd32c |
05-May-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Work around gcc 4.9 issues with Num_OpClasses
This patch fixes a recent issue with gcc 4.9 (and possibly more) being convinced that indices outside the array bounds are used when initialising the FUPool members. |
10807:dac26eb4cb64 |
29-Apr-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: replace issueLatency with bool pipelined
Currently, each op class has a parameter issueLat that denotes the cycles after which another op of the same class can be issued. As of now, this latency can either be one cycle (fully pipelined) or same as execution latency of the op (not at all pipelined). The fact that issueLat is a parameter of type Cycles makes one believe that it can be set to any value. To avoid the confusion, the parameter is being renamed as 'pipelined' with type boolean. If set to true, the op would execute in a fully pipelined fashion. Otherwise, it would execute in an unpipelined fashion. |
10806:b9410e821c41 |
29-Apr-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: single cycle default div microop latency on x86
This patch sets the default latency of the division microop to a single cycle on x86. This is because the division instructions DIV and IDIV have been implemented as loops of div microops, where each microop computes a single bit of the quotient. |
10797:855cafd64da1 |
22-Apr-2015 |
Brandon Potter <brandon.potter@amd.com> |
cpu: remove conditional check (count > 0) on o3 IQ squashes
The o3 cpu instruction queue model uses the count variable to track the number of unissued instructions in the queue. Previously, the squash method used this variable to avoid executing the doSquash method when there were no unissued instructions in the pipeline. A corner case problem exists when only issued instructions exist in the pipeline and a squash occurs; the doSquash code is not invoked and subsequently does not clean up state properly. |
10785:f56c10663a01 |
13-Apr-2015 |
Dibakar Gope <gope@wisc.edu> |
cpu: re-organizes the branch predictor structure.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10774:68d688cbe26c |
03-Apr-2015 |
Nikos Nikoleris <nikos.nikoleris@gmail.com> |
cpu: fix system total instructions accounting
The totalInstructions counter is only incremented when the whole instruction is commited and not on every microop. It was incorrectly reset in atomic and timing cpus.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>" |
10734:cbed6a2cbc35 |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: another assert instead of check |
10733:705aca3c1240 |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: Remove unused code in iew, add assert instead. |
10732:60482901c996 |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: commit: mark pipeline delay variable as consts |
10731:17c5d36dfdac |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: remove unused stat variables. |
10730:11cb85883e6a |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: combine if with same condition |
10729:41c93a3c1051 |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: remove member variable squashCounter The variable is used in only one place and a whole new function setNextStatus() has been defined just to compute the value of the variable. Instead of calling the function, the value is now computed in the loop that preceded the function call. |
10728:0fd6a08a7332 |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: remove unused function annotateMemoryUnits() |
10715:ced453290507 |
02-Mar-2015 |
Rekai <Rekai.GonzalezAlberquilla@arm.com> |
cpu: o3 register renaming request handling improved
Now, prior to the renaming, the instruction requests the exact amount of registers it will need, and the rename_map decides whether the instruction is allowed to proceed or not. |
10713:eddb533708cb |
02-Mar-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Split port retry for all different packet classes
This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks. |
10698:829adc48e175 |
16-Feb-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Make readMiscRegNoEffect const throughout
Finally took the plunge and made this apply to all ISAs, not just ARM. |
10683:94901e131a7f |
06-Feb-2015 |
Alexandru Dutu <alexandru.dutu@amd.com> |
cpu: Idle CPU status logic revised
This patch sets the CPU status to idle when the last active thread gets suspended. |
10664:61a0b02aa800 |
25-Jan-2015 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Remove all notion that we know when the cpu is misspeculating.
We have no way of knowing if a CPU model is on the wrong path with our execute-in-execute CPU models. Don't pretend that we do. |
10596:1eec33d2fc52 |
05-Dec-2014 |
Gabe Black <gabeblack@google.com> |
cpu: Only check for PC events on instruction boundaries.
Only the instruction address is actually checked, so there's no need to check repeatedly while we're working through the microops of a macroop and that's not changing. |
10575:a8d612fa170b |
02-Dec-2014 |
Marco Elver <Marco.Elver@ARM.com> |
cpu, o3: Ignored invalidate causing same-address load reordering
In case the memory subsystem sends a combined response with invalidate (e.g. ReadRespWithInvalidate), we cannot ignore the invalidate part of the response.
If we were to ignore the invalidate part, under certain circumstances this effectively leads to reordering of loads to the same address which is not permitted under any memory consistency model implemented in gem5.
Consider the case where a later load's address is computed before an earlier load in program order, and is therefore sent to the memory subsystem first. At some point the earlier load's address is computed and in doing so correctly marks the later load as a possibleLoadViolation. In the meantime some other node writes and sends invalidations to all other nodes. The invalidation races with the later load's ReadResp, and arrives before ReadResp and is deferred. Upon receipt of the ReadResp, the response is changed to ReadRespWithInvalidate, and sent to the CPU. If we ignore the invalidate part of the packet, we let the later load read the old value of the address. Eventually the earlier load's ReadResp arrives, but with new data. As there was no invalidate snoop (sunk into the ReadRespWithInvalidate), and if we did not process the invalidate of the ReadRespWithInvalidate, we obtain a load reordering.
A similar scenario can be constructed where the earlier load's address is computed after ReadRespWithInvalidate arrives for the younger load. In this case hitExternalSnoop needs to be set to true on the ReadRespWithInvalidate, so that upon knowing the address of the earlier load, checkViolations will cause the later load to be squashed.
Finally we must account for the case where both loads are sent to the memory subsystem (reordered), a snoop invalidate arrives and correctly sets the later loads fault to ReExec. However, before the CPU processes the fault, the later load's ReadResp arrives and the writeback discards the outstanding fault. We must add a check to ensure that we do not skip any unprocessed faults. |
10573:3b405d11d6dc |
02-Dec-2014 |
Stephan Diestelhorst <stephan.diestelhorst@arm.com> |
cpu: Move packet deallocation to recvTimingResp in the O3 CPU
Move the packet deallocations in the O3 CPU so that the completeDataAccess deals only with the LSQ specific parts and the generic recvTimingResp frees the packet in all other cases. |
10566:c99c8d2a7c31 |
02-Dec-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Assume all dynamic packet data is array allocated
This patch simplifies how we deal with dynamically allocated data in the packet, always assuming that it is array allocated, and hence should be array deallocated (delete[] as opposed to delete). The only uses of dataDynamic was in the Ruby testers.
The ARRAY_DATA flag in the packet is removed accordingly. No defragmentation of the flags is done at this point, leaving a gap in the bit masks.
As the last part the patch, it renames dataDynamicArray to dataDynamic. |
10563:755b18321206 |
02-Dec-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Add const getters for write packet data
This patch takes a first step in tightening up how we use the data pointer in write packets. A const getter is added for the pointer itself (getConstPtr), and a number of member functions are also made const accordingly. In a range of places throughout the memory system the new member is used.
The patch also removes the unused isReadWrite function. |
10537:47fe87b0cf97 |
14-Nov-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arm: Fixes based on UBSan and static analysis
Another churn to clean up undefined behaviour, mostly ARM, but some parts also touching the generic part of the code base.
Most of the fixes are simply ensuring that proper intialisation. One of the more subtle changes is the return type of the sign-extension, which is changed to uint64_t. This is to avoid shifting negative values (undefined behaviour) in the ISA code. |
10529:05b5a6cf3521 |
06-Nov-2014 |
Marc Orr <morr@cs.wisc.edu> |
x86 isa: This patch attempts an implementation at mwait.
Mwait works as follows: 1. A cpu monitors an address of interest (monitor instruction) 2. A cpu calls mwait - this loads the cache line into that cpu's cache. 3. The cpu goes to sleep. 4. When another processor requests write permission for the line, it is evicted from the sleeping cpu's cache. This eviction is forwarded to the sleeping cpu, which then wakes up.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10511:e57f5bffc553 |
30-Oct-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add writeback modeling for drain functionality
It is possible for the O3 CPU to consider itself drained and later have a squashed instruction perform a writeback. This patch re-adds tracking of in-flight instructions to prevent falsely signaling a drained event. |
10510:7e54a9a9f6b2 |
30-Oct-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add drain check functionality to IEW
IEW did not check the instQueue and memDepUnit to ensure they were drained. This caused issues when drainSanityCheck() did check those structures after asserting IEW was drained. |
10487:5914229e6b16 |
20-Oct-2014 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: corrects base FP and CC register index in removeThread() |
10474:799c8ee4ecba |
16-Oct-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Use shared_ptr for all Faults
This patch takes quite a large step in transitioning from the ad-hoc RefCountingPtr to the c++11 shared_ptr by adopting its use for all Faults. There are no changes in behaviour, and the code modifications are mostly just replacing "new" with "make_shared". |
10473:4cbe53150053 |
16-Oct-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
o3: Use shared_ptr for MemDepEntry
This patch transitions the o3 MemDepEntry from the ad-hoc RefCountingPtr to the c++11 shared_ptr. There are no changes in behaviour, and the code modifications are mainly replacing "new" with "make_shared". |
10464:2a0fe8bca031 |
16-Oct-2014 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Probe points for basic PMU stats
This changeset adds probe points that can be used to implement PMU counters for CPU stats. The following probes are supported:
* BaseCPU::ppCycles / Cycles * BaseCPU::ppRetiredInsts / RetiredInsts * BaseCPU::ppRetiredLoads / RetiredLoads * BaseCPU::ppRetiredStores / RetiredStores * BaseCPU::ppRetiredBranches RetiredBranches |
10450:933cc91f63e1 |
11-Oct-2014 |
Andrew Lukefahr <lukefahr@umich.edu> |
cpu: Fix o3 SMT IQCount bug
Commmitted by: Nilay Vaish <nilay@cs.wisc.edu> |
10426:cba563d00376 |
09-Oct-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Remove Ozone CPU from the source tree
The Ozone CPU is now very much out of date and completely non-functional, with no one actively working on restoring it. It is a source of confusion for new users who attempt to use it before realizing its current state. RIP |
10417:710ee116eb68 |
27-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Use const StaticInstPtr references where possible
This patch optimises the passing of StaticInstPtr by avoiding copying the reference-counting pointer. This avoids first incrementing and then decrementing the reference-counting pointer. |
10408:a59c189de383 |
20-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Remove unused deallocateContext calls
The call paths for de-scheduling a thread are halt() and suspend(), from the thread context. There is no call to deallocateContext() in general, though some CPUs chose to define it. This patch removes the function from BaseCPU and the cores which do not require it. |
10407:a9023811bf9e |
20-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate
activate(), suspend(), and halt() used on thread contexts had an optional delay parameter. However this parameter was often ignored. Also, when used, the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were ever specified). This patch removes the delay parameter and 'Events' associated with them across all ISAs and cores. Unused activate logic is also removed. |
10386:c81407818741 |
20-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
base: Clean up redundant string functions and use C++11
This patch does a bit of housekeeping on the string helper functions and relies on the C++11 standard library where possible. It also does away with our custom string hash as an implementation is already part of the standard library. |
10379:c00f6d7e2681 |
19-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Pass faults by const reference where possible
This patch changes how faults are passed between methods in an attempt to copy as few reference-counting pointer instances as possible. This should avoid unecessary copies being created, contributing to the increment/decrement of the reference counters. |
10378:a3e23d599e11 |
19-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Use a deque in o3 rename instruction queue
Switch from a list to a data structure with better data layout. |
10363:c870b43d2ba6 |
09-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Only iterate over possible threads on the o3 cpu
Some places in O3 always iterated over "Impl::MaxThreads" even if a CPU had fewer threads. This removes a few of those instances. |
10342:711eb0e64249 |
13-May-2014 |
Curtis Dunham <Curtis.Dunham@arm.com> |
mem: Refactor assignment of Packet types
Put the packet type swizzling (that is currently done in a lot of places) into a refineCommand() member function. |
10340:40d24a672351 |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix o3 drain bug
For X86, the o3 CPU would get stuck with the commit stage not being drained if an interrupt arrived while drain was pending. isDrained() makes sure that pcState.microPC() == 0, thus ensuring that we are at an instruction boundary. However, when we take an interrupt we execute:
pcState.upc(romMicroPC(entry)); pcState.nupc(romMicroPC(entry) + 1); tc->pcState(pcState);
As a result, the MicroPC is no longer zero. This patch ensures the drain is delayed until no interrupts are present. Once draining, non-synchronous interrupts are deffered until after the switch. |
10338:8bee5f4edb92 |
29-Apr-2014 |
Curtis Dunham <Curtis.Dunham@arm.com> |
arm: use condition code registers for ARM ISA
Analogous to ee049bf (for x86). Requires a bump of the checkpoint version and corresponding upgrader code to move the condition code register values to the new register file. |
10333:6be8945d226b |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix cache blocked load behavior in o3 cpu
This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked.
Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that.
Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior. |
10332:1ba825974ee6 |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix o3 quiesce fetch bug
O3 is supposed to stop fetching instructions once a quiesce is encountered. However due to a bug, it would continue fetching instructions from the current fetch buffer. This is because of a break statment that only broke out of the first of 2 nested loops. It should have broken out of both. |
10331:ed05298e8566 |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix SMT scheduling issue with the O3 cpu
The o3 cpu could attempt to schedule inactive threads under round-robin SMT mode.
This is because it maintained an independent priority list of threads from the active thread list. This priority list could be come stale once threads were inactive, leading to the cpu trying to fetch/commit from inactive threads.
Additionally the fetch queue is now forcibly flushed of instrctuctions from the de-scheduled thread.
Relevant output:
24557000: system.cpu: [tid:1]: Calling deactivate thread. 24557000: system.cpu: [tid:1]: Removing from active threads list
24557500: system.cpu: FullO3CPU: Ticking main, FullO3CPU. 24557500: system.cpu.fetch: Running stage. 24557500: system.cpu.fetch: Attempting to fetch from [tid:1] |
10329:12e3be8203a5 |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add a fetch queue to the o3 cpu
This patch adds a fetch queue that sits between fetch and decode to the o3 cpu. This effectively decouples fetch from decode stalls allowing it to be more aggressive, running futher ahead in the instruction stream. |
10328:867b536a68be |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix o3 front-end pipeline interlock behavior
The o3 pipeline interlock/stall logic is incorrect. o3 unnecessicarily stalled fetch and decode due to later stages in the pipeline. In general, a stage should usually only consider if it is stalled by the adjacent, downstream stage. Forcing stalls due to later stages creates and results in bubbles in the pipeline. Additionally, o3 stalled the entire frontend (fetch, decode, rename) on a branch mispredict while the ROB is being serially walked to update the RAT (robSquashing). Only should have stalled at rename. |
10327:5b6279635c49 |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Change writeback modeling for outstanding instructions
As highlighed on the mailing list gem5's writeback modeling can impact performance. This patch removes the limitation on maximum outstanding issued instructions, however the number that can writeback in a single cycle is still respected in instToCommit(). |
10319:4207f9bfcceb |
03-Sep-2014 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
arch, cpu: Factor out the ExecContext into a proper base class
We currently generate and compile one version of the ISA code per CPU model. This is obviously wasting a lot of resources at compile time. This changeset factors out the interface into a separate ExecContext class, which also serves as documentation for the interface between CPUs and the ISA code. While doing so, this changeset also fixes up interface inconsistencies between the different CPU models.
The main argument for using one set of ISA code per CPU model has always been performance as this avoid indirect branches in the generated code. However, this argument does not hold water. Booting Linux on a simulated ARM system running in atomic mode (opt/10.linux-boot/realview-simple-atomic) is actually 2% faster (compiled using clang 3.4) after applying this patch. Additionally, compilation time is decreased by 35%. |
10244:d2deb51a4abf |
30-Jun-2014 |
Anthony Gutierrez <atgutier@umich.edu> |
cpu: implement a bi-mode branch predictor |
10240:15f822e9410a |
21-Jun-2014 |
Binh Pham <binhpham@cs.rutgers.edu> |
o3: make dispatch LSQ full check more selective
Dispatch should not check LSQ size/LSQ stall for non load/store instructions.
This work was done while Binh was an intern at AMD Research. |
10239:592f0bb6bd6f |
21-Jun-2014 |
Binh Pham <binhpham@cs.rutgers.edu> |
o3: split load & store queue full cases in rename
Check for free entries in Load Queue and Store Queue separately to avoid cases when load cannot be renamed due to full Store Queue and vice versa.
This work was done while Binh was an intern at AMD Research. |
10231:cb2e6950956d |
31-May-2014 |
Steve Reinhardt <steve.reinhardt@amd.com> |
style: eliminate equality tests with true and false
Using '== true' in a boolean expression is totally redundant, and using '== false' is pretty verbose (and arguably less readable in most cases) compared to '!'.
It's somewhat of a pet peeve, perhaps, but I had some time waiting for some tests to run and decided to clean these up.
Unfortunately, SLICC appears not to have the '!' operator, so I had to leave the '== false' tests in the SLICC code. |
10225:01df075d9f93 |
23-May-2014 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: remove stat totalCommittedInsts This patch removes the stat totalCommittedInsts. This variable was used for recording the total number of instructions committed across all the threads of a core. The instructions committed by each thread are recorded invidually. The total would now be generated by summing these individual counts. |
10193:d717abc806aa |
09-May-2014 |
Curtis Dunham <Curtis.Dunham@arm.com> |
cpu: add more instruction mix statistics
For the o3, add instruction mix (OpClass) histogram at commit (stats also already collected at issue). For the simple CPUs we add a histogram of executed instructions |
10190:fb83d025d1c3 |
09-May-2014 |
Akash Bagdia <akash.bagdia@arm.com> |
cpu, arm: Allow the specification of a socket field
Allow the specification of a socket ID for every core that is reflected in the MPIDR field in ARM systems. This allows studying multi-socket / cluster systems with ARM CPUs. |
10175:e639ff917d2e |
01-Apr-2014 |
Mitch Hayenga <Mitch.Hayenga@ARM.com> |
cpu: Fix case where o3 lsq could print out uninitialized data
In the O3 LSQ, data read/written is printed out in DPRINTFs. However, the data field is treated as a character string with a null terminated. However the data field is not encoded this way. This patch removes that possibility by removing the data part of the print. |
10172:790a214be1f4 |
23-Apr-2014 |
Dam Sunwoo <dam.sunwoo@arm.com> |
cpu: Add O3 CPU width checks
O3CPU has a compile-time maximum width set in o3/impl.hh, but checking the configuration against this limit was not implemented anywhere except for fetch. Configuring a wider pipe than the limit can silently cause various issues during the simulation. This patch adds the proper checking in the constructor of the various pipeline stages. |
10164:2d2c60bda8b2 |
19-Apr-2014 |
Faissal Sleiman <sleimanf@umich.edu> |
o3: Fix occupancy checks for SMT A number of calls to isEmpty() and numFreeEntries() should be thread-specific.
In cpu.cc, the fact that tid is /*commented*/ out is a bug. Say the rob has instructions from thread 0 (isEmpty() returns false), and none from thread 1. If we are trying to squash all of thread 1, then readTailInst(thread 1) will be called because rob->isEmpty() returns false. The result is end_it is not in the list and the while statement loops indefinitely back over the cpu's instList.
In iew_impl.hh, all threads are told they have the entire remaining IQ, when each thread actually has a certain allocation. The result is extra stalls at the iew dispatch stage which the rename stage usually takes care of.
In commit_impl.hh, rob->readHeadInst(thread 1) can be called if the rob only contains instructions from thread 0. This returns a dummyInst (which may work since we are trying to squash all instructions, but hardly seems like the right way to do it).
In rob_impl.hh this fix skips the rest of the function more frequently and is more efficient.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10149:45a67d84fd4a |
25-Mar-2014 |
Marco Elver <marco.elver@ed.ac.uk> |
cpu: o3: lsq: Fix TSO implementation This patch fixes violation of TSO in the O3CPU, as all loads must be ordered with all other loads. In the LQ, if a snoop is observed, all subsequent loads need to be squashed if the system is TSO.
Prior to this patch, the following case could be violated:
P0 | P1 ; MOV [x],mail=/usr/spool/mail/nilay | MOV EAX,[y] ; MOV [y],mail=/usr/spool/mail/nilay | MOV EBX,[x] ;
exists (1:EAX=1 /\ 1:EBX=0) [is a violation]
The problem was found using litmus [http://diy.inria.fr].
Committed by: Nilay Vaish <nilay@cs.wisc.edu |
10111:fd90d9e55e5c |
12-Mar-2014 |
Paul Rosenfeld <dramninjas@gmail.com> |
alpha: Small removal of dead comments/code from alpha ISA
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10110:580b47334a97 |
07-Mar-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Make CPU and ThreadContext getters const
This patch merely tidies up the CPU and ThreadContext getters by making them const where appropriate. |
10104:ff709c429b7b |
07-Mar-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
scons: Fixes uninitialized warnings issued by clang
Small fixes to appease recent clang versions. |
10034:f2ce7114b137 |
24-Jan-2014 |
Geoffrey Blake <Geoffrey.Blake@arm.com> |
checker: CheckerCPU handling of MiscRegs was incorrect
The CheckerCPU model in pre-v8 code was not checking the updates to miscellaneous registers due to some methods for setting misc regs were not instrumented. The v8 patches exposed this by calling the instrumented misc reg update methods and then invoking the checker before the main CPU had updated its misc regs, leading to false positives about register mismatches. This patch fixes the non-instrumented misc reg update methods and places calls to the checker in the proper places in the O3 model. |
10033:21c14a2b2117 |
24-Jan-2014 |
Ali Saidi <Ali.Saidi@ARM.com> |
arch, cpu: Add support for flattening misc register indexes.
With ARMv8 support the same misc register id results in accessing different registers depending on the current mode of the processor. This patch adds the same orthogonality to the misc register file as the others (int, float, cc). For all the othre ISAs this is currently a null-implementation.
Additionally, a system variable is added to all the ISA objects. |
10032:5a7852a013d4 |
24-Jan-2014 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
cpu: Add support for Memory+Barrier instruction types in O3 cpu. |
10031:79d034cd6ba3 |
24-Jan-2014 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Add support for instructions that zero cache lines. |
10030:b531e328342d |
24-Jan-2014 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Add CPU support for generatig wake up events when LLSC adresses are snooped.
This patch add support for generating wake-up events in the CPU when an address that is currently in the exclusive state is hit by a snoop. This mechanism is required for ARMv8 multi-processor support. |
10024:fc10e1f9f124 |
24-Jan-2014 |
Dam Sunwoo <dam.sunwoo@arm.com> |
mem: per-thread cache occupancy and per-block ages
This patch enables tracking of cache occupancy per thread along with ages (in buckets) per cache blocks. Cache occupancy stats are recalculated on each stat dump. |
10023:91faf6649de0 |
24-Jan-2014 |
Matt Horsnell <matt.horsnell@ARM.com> |
base: add support for probe points and common probes
The probe patch is motivated by the desire to move analytical and trace code away from functional code. This is achieved by the probe interface which is essentially a glorified observer model.
What this means to users: * add a probe point and a "notify" call at the source of an "event" * add an isolated module, that is being used to carry out *your* analysis (e.g. generate a trace) * register that module as a probe listener Note: an example is given for reference in src/cpu/o3/simple_trace.[hh|cc] and src/cpu/SimpleTrace.py
What is happening under the hood: * every SimObject maintains has a ProbeManager. * during initialization (src/python/m5/simulate.py) first regProbePoints and the regProbeListeners is called on each SimObject. this hooks up the probe point notify calls with the listeners.
FAQs: Why did you develop probe points: * to remove trace, stats gathering, analytical code out of the functional code. * the belief that probes could be generically useful.
What is a probe point: * a probe point is used to notify upon a given event (e.g. cpu commits an instruction)
What is a probe listener: * a class that handles whatever the user wishes to do when they are notified about an event.
What can be passed on notify: * probe points are templates, and so the user can generate probes that pass any type of argument (by const reference) to a listener.
What relationships can be generated (1:1, 1:N, N:M etc): * there isn't a restriction. You can hook probe points and listeners up in a 1:1, 1:N, N:M relationship. They become useful when a number of modules listen to the same probe points. The idea being that you can add a small number of probes into the source code and develop a larger number of useful analysis modules that use information passed by the probes.
Can you give examples: * adding a probe point to the cpu's commit method allows you to build a trace module (outputting assembler), you could re-use this to gather instruction distribution (arithmetic, load/store, conditional, control flow) stats.
Why is the probe interface currently restricted to passing a const reference: * the desire, initially at least, is to allow an interface to observe functionality, but not to change functionality. * of course this can be subverted by const-casting.
What is the performance impact of adding probes: * when nothing is actively listening to the probes they should have a relatively minor impact. Profiling has suggested even with a large number of probes (60) the impact of them (when not active) is very minimal (<1%). |
10020:2f33cb012383 |
24-Jan-2014 |
Matt Horsnell <matt.horsnell@ARM.com> |
mem: track per-request latencies and access depths in the cache hierarchy
Add some values and methods to the request object to track the translation and access latency for a request and which level of the cache hierarchy responded to the request. |
10017:c75015bbbd78 |
24-Jan-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Relax check on squashed non-speculative instructions
This patch relaxes the check performed when squashing non-speculative instructions, as it caused problems with loads that were marked ready, and then stalled on a blocked cache. The assertion is now allowing memory references to be non-faulting. |
9992:6e39e3641dd8 |
03-Dec-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: call BaseCPU startup() function in o3 cpu |
9982:b2bfc23f932c |
15-Nov-2013 |
Anthony Gutierrez <atgutier@umich.edu> |
cpu: allow the fetch buffer to be smaller than a cache line
the current implementation of the fetch buffer in the o3 cpu is only allowed to be the size of a cache line. some architectures, e.g., ARM, have fetch buffers smaller than a cache line, see slide 22 at: http://www.arm.com/files/pdf/at-exploring_the_design_of_the_cortex-a15.pdf
this patch allows the fetch buffer to be set to values smaller than a cache line. |
9954:72a72649a156 |
31-Oct-2013 |
Faissal Sleiman <Faissal.Sleiman@arm.com> |
cpu: Construct ROB with cpu params struct instead of each variable
Most other structures/stages get passed the cpu params struct. |
9948:6cbe5c9d0ebb |
31-Oct-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Fix O3 issuse with load+barrier instructions.
Fix a problem in the O3 CPU for instructions that are both memory loads and memory barriers (e.g. load acquire) and to uncacheable memory. This combination can confuse the commit stage into commitng an instruction that hasn't executed and got it's value yet. At the same time refactor the code slightly to remove duplication between two of the cases. |
9944:4ff1c5c6dcbc |
17-Oct-2013 |
Matt Horsnell <matt.horsnell@ARM.com> |
cpu: add consistent guarding to *_impl.hh files. |
9938:d3b7970e1b33 |
17-Oct-2013 |
Faissal Sleiman <Faissal.Sleiman@arm.com> |
cpu: Removing an unused variable in rename |
9937:49a534f54e72 |
17-Oct-2013 |
Faissal Sleiman <Faissal.Sleiman@arm.com> |
cpu: Change IEW DPRINTF to use IEW debug flag
IEW DPRINTF uses Decode debug flag, which appears to be a copying error. This patch changes this to the IEW Debug flag. |
9936:f00546aff354 |
17-Oct-2013 |
Faissal Sleiman <Faissal.Sleiman@arm.com> |
cpu: Put in assertions to check for maximum supported LQ/SQ size
LSQSenderState represents the LQ/SQ index using uint8_t, which supports up to 256 entries (including the sentinel entry). Sending packets to memory with a higher index than 255 truncates the index, such that the response matches the wrong entry. For instance, this can result in a deadlock if a store completion does not clear the head entry. |
9921:ee049bfce978 |
15-Oct-2013 |
Yasuko Eckert <yasuko.eckert@amd.com> |
arch/x86: add support for explicit CC register file
Convert condition code registers from being specialized ("pseudo") integer registers to using the recently added CC register class.
Nilay Vaish also contributed to this patch. |
9920:028e4da64b42 |
15-Oct-2013 |
Yasuko Eckert <yasuko.eckert@amd.com> |
cpu: add a condition-code register class
Add a third register class for condition codes, in parallel with the integer and FP classes. No ISAs use the CC class at this point though. |
9919:803903a8dac1 |
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu/o3: clean up rename map and free list
Restructured rename map and free list to clean up some extraneous code and separate out common code that can be reused across different register classes (int and fp at this point). Both components now consist of a set of Simple* objects that are stand-alone rename map & free list for each class, plus a Unified* object that presents a unified interface across all register classes and then redirects accesses to the appropriate Simple* object as needed.
Moved free list initialization to PhysRegFile to better isolate knowledge of physical register index mappings to that class (and remove the need to pass a number of parameters to the free list constructor).
Causes a small change to these stats: cpu.rename.int_rename_lookups cpu.rename.fp_rename_lookups because they are now categorized on a per-operand basis rather than a per-instruction basis. That is, an instruction with mixed fp/int/misc operand types will have each operand categorized independently, where previously the lookup was categorized based on the instruction type. |
9918:2c7219e2d999 |
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu: rename *_DepTag constants to *_Reg_Base
Make these names more meaningful.
Specifically, made these substitutions:
s/FP_Base_DepTag/FP_Reg_Base/g; s/Ctrl_Base_DepTag/Misc_Reg_Base/g; s/Max_DepTag/Max_Reg_Index/g; |
9916:9c3a4595cce9 |
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu/o3: clean up scoreboard object
It had a bunch of fields (and associated constructor parameters) thet it didn't really use, and the array initialization was needlessly verbose.
Also just hardwired the getReg() method to aleays return true for misc regs, rather than having an array of bits that we always kept marked as ready. |
9915:d9e3ad574162 |
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu/o3: clean up physical register file
No need for PhysRegFile to be a template class, or have a pointer back to the CPU. Also made some methods for checking the physical register type (int vs. float) based on the phys reg index, which will come in handy later. |
9913:7f43babfde6a |
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu: clean up architectural register classification
Move from a poorly documented scheme where the mapping of unified architectural register indices to register classes is hardcoded all over to one where there's an enum for the register classes and a function that encapsulates the mapping. |
9868:44a67004d6b4 |
11-Sep-2013 |
Joel Hestness <jthestness@gmail.com> |
cpu: Dynamically instantiate O3 CPU LSQUnits
Previously, the LSQ would instantiate MaxThreads LSQUnits in the body of it's object, but it would only initialize numThreads LSQUnits as specified by the user. This had the effect of leaving some LSQUnits uninitialized when the number of threads was less than MaxThreads, and when adding statistics to the LSQUnit that must be initialized, this caused the stats initialization check to fail. By dynamically instantiating LSQUnits, they are all initialized and this avoids uninitialized LSQUnits from floating around during runtime. |
9849:603e2ed487f3 |
04-Sep-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Move the branch predictor out of the BaseCPU
The branch predictor is guarded by having either the in-order or out-of-order CPU as one of the available CPU models and therefore should not be used in the BaseCPU. This patch moves the parameter to the relevant CPU classes. |
9822:7f7cbcece75a |
19-Aug-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Fix a bug in the O3 CPU introduced by the cache line patch
This patch fixes a bug in the O3 fetch stage that was introduced when the cache line size was moved to the system. By mistake, the initialisation and resetting of the fetch stage was merged and put in the constructor. The resetting is now re-added where it should be. |
9814:7ad2b0186a32 |
18-Jul-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Set the cache line size on a system level
This patch removes the notion of a peer block size and instead sets the cache line size on the system level.
Previously the size was set per cache, and communicated through the interconnect. There were plenty checks to ensure that everyone had the same size specified, and these checks are now removed. Another benefit that is not yet harnessed is that the cache line size is now known at construction time, rather than after the port binding. Hence, the block size can be locally stored and does not have to be queried every time it is used.
A follow-on patch updates the configuration scripts accordingly. |
9793:6e6cefc1db1f |
27-Jun-2013 |
Akash Bagdia <akash.bagdia@arm.com> |
sim: Add the notion of clock domains to all ClockedObjects
This patch adds the notion of source- and derived-clock domains to the ClockedObjects. As such, all clock information is moved to the clock domain, and the ClockedObjects are grouped into domains.
The clock domains are either source domains, with a specific clock period, or derived domains that have a parent domain and a divider (potentially chained). For piece of logic that runs at a derived clock (a ratio of the clock its parent is running at) the necessary derived clock domain is created from its corresponding parent clock domain. For now, the derived clock domain only supports a divider, thus ensuring a lower speed compared to its parent. Multiplier functionality implies a PLL logic that has not been modelled yet (create a separate clock instead).
The clock domains should be used as a mechanism to provide a controllable clock source that affects clock for every clocked object lying beneath it. The clock of the domain can (in a future patch) be controlled by a handler responsible for dynamic frequency scaling of the respective clock domains.
All the config scripts have been retro-fitted with clock domains. For the System a default SrcClockDomain is created. For CPUs that run at a different speed than the system, there is a seperate clock domain created. This domain incorporates the CPU and the associated caches. As before, Ruby runs under its own clock domain.
The clock period of all domains are pre-computed, such that no virtual functions or multiplications are needed when calling clockPeriod. Instead, the clock period is pre-computed when any changes occur. For this to be possible, each clock domain tracks its children. |
9783:8d327ffdba62 |
27-Jun-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Consider instructions waiting for FU completion in draining
This patch changes the IEW drain check to include the FU pool as there can be instructions that are "stored" in FU completion events and thus not covered by the existing checks. With this patch, we simply include a check to see if all the FUs are considered non-busy in the next tick.
Without this patch, the pc-switcheroo-full regression fails after minor changes to the cache timing (aligning to clock edge). |
9648:f10eb34e3e38 |
22-Apr-2013 |
Dam Sunwoo <dam.sunwoo@arm.com> |
sim: separate nextCycle() and clockEdge() in clockedObjects
Previously, nextCycle() could return the *current* cycle if the current tick was already aligned with the clock edge. This behavior is not only confusing (not quite what the function name implies), but also caused problems in the drainResume() function. When exiting/re-entering the sim loop (e.g., to take checkpoints), the CPUs will drain and resume. Due to the previous behavior of nextCycle(), the CPU tick events were being rescheduled in the same ticks that were already processed before draining. This caused divergence from runs that did not exit/re-entered the sim loop. (Initially a cycle difference, but a significant impact later on.)
This patch separates out the two behaviors (nextCycle() and clockEdge()), uses nextCycle() in drainResume, and uses clockEdge() everywhere else. Nothing (other than name) should change except for the drainResume timing. |
9644:07352f119e48 |
22-Apr-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: fix a switching issue with the o3 cpu.
This change fixes the switcheroo test that broke earlier this month. The code that was checking for the pipeline being blocked wasn't checking for a pending translation, only for a icache access. |
9624:43bd6562745e |
29-Mar-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
o3cpu: commit: changes interrupt handling Currently the commit stage keeps a local copy of the interrupt object. Since the interrupt is usually handled several cycles after the commit stage becomes aware of it, it is possible that the local copy of the interrupt object may not be the interrupt that is actually handled. It is possible that another interrupt occurred in the interval between interrupt detection and interrupt handling.
This patch creates a copy of the interrupt just before the interrupt is handled. The local copy is ignored. |
9608:e2b6b86fda03 |
26-Mar-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Remove CpuPort and use MasterPort in the CPU classes
This patch changes the port in the CPU classes to use MasterPort instead of the derived CpuPort. The functions of the CpuPort are now distributed across the relevant subclasses. The port accessor functions (getInstPort and getDataPort) now return a MasterPort instead of a CpuPort. This simplifies creating derivative CPUs that do not use the CpuPort. |
9574:5bb4346cbfa7 |
04-Mar-2013 |
Ali Saidi <saidi@eecs.umich.edu> |
cpu: fix a switching issue with the o3 cpu.
This change fixes the switcheroo test that broke earlier this month. The code that was checking for the pipeline being blocked wasn't checking for a pending translation, only for a icache access. |
9550:e0e2c8f83d08 |
19-Feb-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
scons: Fix up numerous warnings about name shadowing
This patch address the most important name shadowing warnings (as produced when using gcc/clang with -Wshadow). There are many locations where constructor parameters and function parameters shadow local variables, but these are left unchanged. |
9532:01f0fac41c84 |
15-Feb-2013 |
Geoffrey Blake <geoffrey.blake@arm.com> |
cpu: Avoid duplicate entries in tracking structures for writes to misc regs
setMiscReg currently makes a new entry for each write to a misc reg without checking for duplicates, this can cause a triggering of the assert if an instruction get replayed and writes to the same misc regs multiple times. This fix prevents duplicate entries and instead updates the value. |
9531:1114ead790eb |
15-Feb-2013 |
Geoffrey Blake <geoffrey.blake@arm.com> |
cpu: Fix rename mis-handling serializing instructions when resource constrained
The rename can mis-handle serializing instructions (i.e. strex) if it gets into a resource constrained situation and the serializing instruction has to be placed on the skid buffer to handle blocking. In this situation the instruction informs the pipeline it is serializing and logs that the next instruction must be serialized, but since we are blocking the pipeline defers this action to place the serializing instruction and incoming instructions into the skid buffer. When resuming from blocking, rename will pull the serializing instruction from the skid buffer and the current logic will see this as the "next" instruction that has to be serialized and because of flags set on the serializing instruction, it passes through the pipeline stage as normal and resets rename to non-serializing. This causes instructions to follow the serializing inst incorrectly and eventually leads to an error in the pipeline. To fix this rename should check first if it has to block before checking for serializing instructions. |
9527:68154bc0e0ea |
15-Feb-2013 |
Matt Horsnell <Matt.Horsnell@arm.com> |
o3: fix tick used for renaming and issue with range selection
Fixes the tick used from rename: - previously this gathered the tick on leaving rename which was always 1 less than the dispatch. This conflated the decode ticks when back pressure built in the pipeline. - now picks up tick on entry.
Added --store_completions flag: - will additionally display the store completion tail in the viewer. - this highlights periods when large numbers of stores are outstanding (>16 LSQ blocking)
Allows selection by tick range (previously this caused an infinite loop) |
9524:d6ffa982a68b |
15-Feb-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
sim: Add a system-global option to bypass caches
Virtualized CPUs and the fastmem mode of the atomic CPU require direct access to physical memory. We currently require caches to be disabled when using them to prevent chaos. This is not ideal when switching between hardware virutalized CPUs and other CPU models as it would require a configuration change on each switch. This changeset introduces a new version of the atomic memory mode, 'atomic_noncaching', where memory accesses are inserted into the memory system as atomic accesses, but bypass caches.
To make memory mode tests cleaner, the following methods are added to the System class:
* isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'. * isTimingMode() -- True if the memory mode is 'timing'. * bypassCaches() -- True if caches should be bypassed.
The old getMemoryMode() and setMemoryMode() methods should never be used from the C++ world anymore. |
9523:b8c8437f71d9 |
15-Feb-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Refactor memory system checks
CPUs need to test that the memory system is in the right mode in two places, when the CPU is initialized (unless it's switched out) and on a drainResume(). This led to some code duplication in the CPU models. This changeset introduces the verifyMemoryMode() method which is called by BaseCPU::init() if the CPU isn't switched out. The individual CPU models are responsible for calling this method when resuming from a drain as this code is CPU model specific. |
9519:bed1c3244425 |
15-Feb-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Make checker CPUs inherit from CheckerCPU in the Python hierarchy
Checker CPUs currently don't inherit from the CheckerCPU in the Python object hierarchy. This has two consequences: * It makes CPU model discovery from the Python world somewhat complicated as there is no way of testing if a CPU is a checker. * Parameters are duplicated in the checker configuration specification.
This changeset makes all checker CPUs inherit from the base checker CPU class. |
9518:8faae62af8c3 |
15-Feb-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Add CPU metadata om the Python classes
The configuration scripts currently hard-code the requirements of each CPU. This is clearly not optimal as it makes writing new configuration scripts painful and adding new CPU models requires existing scripts to be updated. This patch adds the following class methods to the base CPU and all relevant CPUs:
* memory_mode -- Return a string describing the current memory mode (invalid/atomic/timing).
* require_caches -- Does the CPU model require caches?
* support_take_over -- Does the CPU support CPU handover? |
9516:8bb2deb544a5 |
15-Feb-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: include set in o3/commit_impl.
While the majority of compilers seemed to pickup set from else where, one version of gcc 4.7 complains, so explictly add it. |
9514:40e2bf800921 |
15-Feb-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: fix case with o3 cpu blocking and unblocking decode in cycle
Fix a case in the O3 CPU where the decode stage blocks and unblocks in a single cycle sending both signals to fetch which causes an assert or worse. The previous check could never work before since the status was set to Blocked before a test for the status being Unblocking was executed. |
9513:690357ffbce2 |
15-Feb-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Fix a livelock in the o3 cpu.
Check if an instruction just enabled interrupts and we've previously had an interrupt pending that was not handled because interrupts were subsequently disabled before the pipeline reached a place to handle the interrupt. In that case squash now to make sure the interrupt is handled. |
9480:d059f8a95a42 |
24-Jan-2013 |
Nilay Vaish <nilay@cs.wisc.edu>, Timothy Jones <timothy.jones@cl.cam.ac.uk> |
branch predictor: move out of o3 and inorder cpus This patch moves the branch predictor files in the o3 and inorder directories to src/cpu/pred. This allows sharing the branch predictor across different cpu models.
This patch was originally posted by Timothy Jones in July 2010 but never made it to the repository. |
9479:f9e76b1eb79a |
22-Jan-2013 |
Andrea Pellegrini <andrea.pellegrini@gmail.com> |
o3 cpu: fix zero reg problem There was an issue w/ the rename logic, which would assign a previous physical register to the ZeroReg architectural register in x86. This issue was giving problems for instructions squashed in threads w/ ID different from 0, sometimes allowing non-mispredicted instructions to obtain a value different from zero when reading the zeroReg. |
9478:ba80f7d4f452 |
22-Jan-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
x86, cpu: corrects 270c9a75e91f, take over decoder on cpu switch The changes made by the changeset 270c9a75e91f do not work well with switching of cpus. The problem is that decoder for the old thread context holds state that is not taken over by the new decoder.
This patch adds a takeOverFrom() function to Decoder class in each ISA. Except for x86, functions in other ISAs are blank. For x86, the function copies state from the old decoder to the new decoder. |
9476:4a14ff47b8e3 |
19-Jan-2013 |
Joel Hestness <hestness@cs.wisc.edu> |
O3 IEW: Make incrWb and decrWb clearer
Move the increment/decrement of wbOutstanding outside of the comparison in incrWb and decrWb in the IEW. This also fixes a compiler bug with gcc 4.4.7, which incorrectly optimizes "-- ==" as "-=". |
9461:67a6ba6604c8 |
12-Jan-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
x86: Changes to decoder, corrects 9376 The changes made by the changeset 9376 were not quite correct. The patch made changes to the code which resulted in decoder not getting initialized correctly when the state was restored from a checkpoint.
This patch adds a startup function to each ISA object. For x86, this function sets the required state in the decoder. For other ISAs, the function is empty right now. |
9448:569d1e8f74e4 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Unify the serialization code for all of the CPU models
Cleanup the serialization code for the simple CPUs and the O3 CPU. The CPU-specific code has been replaced with a (un)serializeThread that serializes the thread state / context of a specific thread. Assuming that the thread state class uses the CPU-specific thread state uses the base thread state serialization code, this allows us to restore a checkpoint with any of the CPU models. |
9444:ab47fe7f03f0 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Rewrite O3 draining to avoid stopping in microcode
Previously, the O3 CPU could stop in the middle of a microcode sequence. This patch makes sure that the pipeline stops when it has committed a normal instruction or exited from a microcode sequence. Additionally, it makes sure that the pipeline has no instructions in flight when it is drained, which should make draining more robust.
Draining is controlled in the commit stage, which checks if the next PC after a committed instruction is in microcode. If this isn't the case, it requests a squash of all instructions after that the instruction that just committed and immediately signals a drain stall to the fetch stage. The CPU then continues to execute until the pipeline and all associated buffers are empty. |
9441:1133617844c8 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Fix broken thread context handover
The thread context handover code used to break when multiple handovers were performed during the same quiesce period. Previously, the thread contexts would assign the TC pointer in the old quiesce event to the new TC. This obviously broke in cases where multiple switches were performed within the same quiesce period, in which case the TC pointer in the quiesce event would point to an old CPU.
The new implementation deschedules pending quiesce events in the old TC and schedules a new quiesce event in the new TC. The code has been refactored to remove most of the code duplication. |
9440:fdc91cab5760 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Fix O3 LSQ debug dumping constness and formatting |
9437:8088e94a9de0 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Fix broken squashAfter implementation in O3 CPU
Commit can currently both commit and squash in the same cycle. This confuses other stages since the signals coming from the commit stage can only signal either a squash or a commit in a cycle. This changeset changes the behavior of squashAfter so that it commits all instructions, including the instruction that requested the squash, in the first cycle and then starts to squash in the next cycle. |
9436:4a0223da4924 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
o3 cpu: Remove unused variables |
9433:34971d2e0019 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Rename defer_registration->switched_out
The defer_registration parameter is used to prevent a CPU from initializing at startup, leaving it in the "switched out" mode. The name of this parameter (and the help string) is confusing. This patch renames it to switched_out, which should be more descriptive. |
9429:7c787b8030c6 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Correctly call parent on switchOut() and takeOverFrom()
This patch cleans up the CPU switching functionality by making sure that CPU models consistently call the parent on switchOut() and takeOverFrom(). This has the following implications that might alter current functionality:
* The call to BaseCPU::switchout() in the O3 CPU is moved from signalDrained() (!) to switchOut().
* A call to BaseSimpleCPU::switchOut() is introduced in the simple CPUs. |
9428:029dfe6324d3 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Unify SimpleCPU and O3 CPU serialization code
The O3 CPU used to copy its thread context to a SimpleThread in order to do serialization. This was a bit of a hack involving two static SimpleThread instances and a magic constructor that was only used by the O3 CPU.
This patch moves the ThreadContext serialization code into two global procedures that, in addition to the normal serialization parameters, take a ThreadContext reference as a parameter. This allows us to reuse the serialization code in all ThreadContext implementations. |
9427:ddf45c1d54d4 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Initialize the O3 pipeline from startup()
The entire O3 pipeline used to be initialized from init(), which is called before initState() or unserialize(). This causes the pipeline to be initialized from an incorrect thread context. This doesn't currently lead to correctness problems as instructions fetched from the incorrect start PC will be squashed a few cycles after initialization.
This patch will affect the regressions since the O3 CPU now issues its first instruction fetch to the correct PC instead of 0x0. |
9426:0548b3e9734d |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Implement a flat register interface in thread contexts
Some architectures map registers differently depending on their mode of operations. There is currently no architecture independent way of accessing all registers. This patch introduces a flat register interface to the ThreadContext class. This interface is useful, for example, when serializing or copying thread contexts. |
9424:d631aac65246 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Check that the memory system is in the correct mode
This patch adds checks to all CPU models to make sure that the memory system is in the correct mode at startup and when resuming after a drain. Previously, we only checked that the memory system was in the right mode when resuming. This is inadequate since this is a configuration error that should be detected at startup as well as when resuming. Additionally, since the check was done using an assert, it wasn't performed when NDEBUG was set (e.g., the fast target). |
9384:877293183bdf |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
arch: Make the ISA class inherit from SimObject
The ISA class on stores the contents of ID registers on many architectures. In order to make reset values of such registers configurable, we make the class inherit from SimObject, which allows us to use the normal generated parameter headers.
This patch introduces a Python helper method, BaseCPU.createThreads(), which creates a set of ISAs for each of the threads in an SMT system. Although it is currently only needed when creating multi-threaded CPUs, it should always be called before instantiating the system as this is an obvious place to configure ID registers identifying a thread/CPU. |
9383:55fa95053ee8 |
07-Jan-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
o3: Fix issue with LLSC ordering and speculation
This patch unlocks the cpu-local monitor when the CPU sees a snoop to a locked address. Previously we relied on the cache to handle the locking for us, however some users on the gem5 mailing list reported a case where the cpu speculatively executes a ll operation after a pending sc operation in the pipeline and that makes the cache monitor valid. This should handle that case by invaliding the local monitor. |
9382:1c97b57d5169 |
07-Jan-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: rename the misleading inSyscall to noSquashFromTC
isSyscall was originally created because during handling of a syscall in SE mode the threadcontext had to be updated. However, in many places this is used in FS mode (e.g. fault handlers) and the name doesn't make much sense. The boolean actually stops gem5 from squashing speculative and non-committed state when a write to a threadcontext happens, so re-name the variable to something more appropriate |
9377:6f294e7a93d1 |
04-Jan-2013 |
Gabe Black <gblack@eecs.umich.edu> |
Decoder: Remove the thread context get/set from the decoder.
This interface is no longer used, and getting rid of it simplifies the decoders and code that sets up the decoders. The thread context had been used to read architectural state which was used to contextualize the instruction memory as it came in. That was changed so that the state is now sent to the decoders to keep locally if/when it changes. That's significantly more efficient.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
9360:515891d9057a |
06-Dec-2012 |
Erik Tomusk <E.Tomusk@sms.ed.ac.uk> |
TournamentBP: Fix some bugs with table sizes and counters globalHistoryBits, globalPredictorSize, and choicePredictorSize are decoupled. globalHistoryBits controls how much history is kept, global and choice predictor sizes control how much of that history is used when accessing predictor tables. This way, global and choice predictors can actually be different sizes, and it is no longer possible to walk off the predictor arrays and cause a seg fault.
There are now individual thresholds for choice, global, and local saturating counters, so that taken/not taken decisions are correct even when the predictors' counters' sizes are different.
The interface for localPredictorSize has been removed from TournamentBP because the value can be calculated from localHistoryBits.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
9358:aa761458ddcb |
06-Dec-2012 |
Nathanael Premillieu <nathanael.premillieu@irisa.fr> |
o3 cpu: remove some unused buggy functions in the lsq Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
9342:6fec8f26e56d |
02-Nov-2012 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
sim: Move the draining interface into a separate base class
This patch moves the draining interface from SimObject to a separate class that can be used by any object needing draining. However, objects not visible to the Python code (i.e., objects not deriving from SimObject) still depend on their parents informing them when to drain. This patch also gets rid of the CountedDrainEvent (which isn't really an event) and replaces it with a DrainManager. |
9341:a0eff1e9c773 |
02-Nov-2012 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
cpu: O3 add a header declaring the DerivO3CPU
SWIG needs a complete declaration of all wrapped objects. This patch adds a header file with the DerivO3CPU class and includes it in the SWIG interface. |
9340:40f8c6a8f38d |
02-Nov-2012 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
cpu: Add header files for checker CPUs
In order to create reliable SWIG wrappers, we need to include the declaration of the wrapped class in the SWIG file. Previously, we didn't expose the declaration of checker CPUs. This patch adds header files for such CPUs and include them in the SWIG wrapper. |
9338:97b4a2be1e5b |
02-Nov-2012 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
sim: Include object header files in SWIG interfaces
When casting objects in the generated SWIG interfaces, SWIG uses classical C-style casts ( (Foo *)bar; ). In some cases, this can degenerate into the equivalent of a reinterpret_cast (mainly if only a forward declaration of the type is available). This usually works for most compilers, but it is known to break if multiple inheritance is used anywhere in the object hierarchy.
This patch introduces the cxx_header attribute to Python SimObject definitions, which should be used to specify a header to include in the SWIG interface. The header should include the declaration of the wrapped object. We currently don't enforce header the use of the header attribute, but a warning will be generated for objects that do not use it. |
9260:9ca8345d24c4 |
25-Sep-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Pack the comm structures a bit better to reduce their size. |
9252:f350fac86d0f |
25-Sep-2012 |
Djordje Kovacevic <djordje.kovacevic@arm.com> |
CPU: Add abandoned instructions to O3 Pipe Viewer |
9218:7e9e34d4203b |
12-Sep-2012 |
Anthony Gutierrez <atgutier@umich.edu> |
stats: remove duplicate instruction stats from the commit stage
these stats are duplicates of insts/opsCommitted, cause confusion, and are poorly named. |
9194:149a32e42697 |
07-Sep-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Get rid of incorrect assert in RAS. |
9184:a1a8f137b796 |
07-Sep-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Param: Transition to Cycles for relevant parameters
This patch is a first step to using Cycles as a parameter type. The main affected modules are the CPUs and the Ruby caches. There are definitely plenty more places that are affected, but this patch serves as a starting point to making the transition.
An important part of this patch is to actually enable parameters to be specified as Param.Cycles which involves some changes to params.py. |
9180:ee8d7a51651d |
28-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Clock: Add a Cycles wrapper class and use where applicable
This patch addresses the comments and feedback on the preceding patch that reworks the clocks and now more clearly shows where cycles (relative cycle counts) are used to express time.
Instead of bumping the existing patch I chose to make this a separate patch, merely to try and focus the discussion around a smaller set of changes. The two patches will be pushed together though.
This changes done as part of this patch are mostly following directly from the introduction of the wrapper class, and change enough code to make things compile and run again. There are definitely more places where int/uint/Tick is still used to represent cycles, and it will take some time to chase them all down. Similarly, a lot of parameters should be changed from Param.Tick and Param.Unsigned to Param.Cycles.
In addition, the use of curTick is questionable as there should not be an absolute cycle. Potential solutions can be built on top of this patch. There is a similar situation in the o3 CPU where lastRunningCycle is currently counting in Cycles, and is still an absolute time. More discussion to be had in other words.
An additional change that would be appropriate in the future is to perform a similar wrapping of Tick and probably also introduce a Ticks class along with suitable operators for all these classes. |
9179:666bc9df1e49 |
28-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Clock: Rework clocks to avoid tick-to-cycle transformations
This patch introduces the notion of a clock update function that aims to avoid costly divisions when turning the current tick into a cycle. Each clocked object advances a private (hidden) cycle member and a tick member and uses these to implement functions for getting the tick of the next cycle, or the tick of a cycle some time in the future.
In the different modules using the clocks, changes are made to avoid counting in ticks only to later translate to cycles. There are a few oddities in how the O3 and inorder CPU count idle cycles, as seen by a few locations where a cycle is subtracted in the calculation. This is done such that the regression does not change any stats, but should be revisited in a future patch.
Another, much needed, change that is not done as part of this patch is to introduce a new typedef uint64_t Cycle to be able to at least hint at the unit of the variables counting Ticks vs Cycles. This will be done as a follow-up patch.
As an additional follow up, the thread context still uses ticks for the book keeping of last activate and last suspend and this should probably also be changed into cycles as well. |
9165:f9e3dac185ba |
22-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Packet: Remove NACKs from packet and its use in endpoints
This patch removes the NACK frrom the packet as there is no longer any module in the system that issues them (the bridge was the only one and the previous patch removes that).
The handling of NACKs was mostly avoided throughout the code base, by using e.g. panic or assert false, but in a few locations the NACKs were actually dealt with (although NACKs never occured in any of the regressions). Most notably, the DMA port will now never receive a NACK and the backoff time is thus never changed. As a consequence, the entire backoff mechanism (similar to a PCI bus) is now removed and the DMA port entirely relies on the bus performing the arbitration and issuing a retry when appropriate. This is more in line with e.g. PCIe.
Surprisingly, this patch has no impact on any of the regressions. As mentioned in the patch that removes the NACK from the bridge, a follow-up patch should change the request and response buffer size for at least one regression to also verify that the system behaves as expected when the bridge fills up. |
9161:e353c178fb36 |
21-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Remove overloaded function_trace_start parameter
This patch removes the overloading of the parameter, which seems both redundant, and possibly incorrect.
The inorder CPU is particularly interesting as it uses a different name for the parameter, and never make any use of it internally. |
9158:d152d34a4adf |
21-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Clock: Make Tick unsigned and remove UTick
This patch makes the Tick unsigned and removes the UTick typedef. The ticks should never be negative, and there was only one major issue with removing it, caused by the o3 CPU using a -1 as an initial value.
The patch has no impact on any regressions. |
9152:86c0e6ca5e7c |
15-Aug-2012 |
Anthony Gutierrez <atgutier@umich.edu> |
O3,ARM: fix some problems with drain/switchout functionality and add Drain DPRINTFs
This patch fixes some problems with the drain/switchout functionality for the O3 cpu and for the ARM ISA and adds some useful debug print statements.
This is an incremental fix as there are still a few bugs/mem leaks with the switchout code. Particularly when switching from an O3CPU to a TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA I haven't encountered any more assertion failures; now the kernel will typically panic inside of simulation. |
9132:c8d4b0595448 |
27-Jul-2012 |
Anthony Gutierrez <atgutier@umich.edu> |
checker: make checker cpu id match its host's cpu id
when using the checker i ran into problems where an instruction reading the cpu id register failed because the ids did not match, and hence, the result of the instruction did not match. this patch ensures that the ids match so this instruction does not fail. this problem only seemed to manifest itself when multiple cores were in the system, either multi-core, or extra switched- out cores present in the system. |
9095:0e6bd7082fac |
09-Jul-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Port: Align port names in C++ and Python
This patch is a first step to align the port names used in the Python world and the C++ world. Ultimately it serves to make the use of config.json together with output from the simulation easier, including post-processing of statistics.
Most notably, the CPU, cache, and bus is addressed in this patch, and there might be other ports that should be updated accordingly. The dash name separator has also been replaced with a "." which is what is used to concatenate the names in python, and a separation is made between the master and slave port in the bus. |
9086:496304c8017d |
09-Jul-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Fix: Address a few benign memory leaks
This patch is the result of static analysis identifying a number of memory leaks. The leaks are all benign as they are a result of not deallocating memory in the desctructor. The fix still has value as it removes false positives in the static analysis. |
9078:6222624550e7 |
29-Jun-2012 |
Nathanael Premillieu <npremill@irisa.fr> |
O3: Track if the RAS has been pushed or not to pop the RAS if neccessary.
Add new flag (named pushedRAS) in the PredictorHistory structure. This flag tracks whether the RAS has been pushed or not during a prediction. Then, in the squash function it is used to pop the RAS if necessary. |
9057:f5ee56466b91 |
05-Jun-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
ISA: Back-out NoopMachInst as a StaticInstPtr change. |
9046:a1104cc13db2 |
05-Jun-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Clean up the O3 structures and try to pack them a bit better.
DynInst is extremely large the hope is that this re-organization will put the most used members close to each other. |
9044:904ddeecc653 |
05-Jun-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
sim: Remove FastAlloc
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe. After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc when running twolf for ARM. |
9040:cdfe09f9bdee |
04-Jun-2012 |
Gabe Black <gblack@eecs.umich.edu> |
ISA: Turn the ExtMachInst NoopMachinst into the StaticInstPtr NoopStaticInst.
This eliminates a use of the ExtMachInst type outside of the ISAs. |
9023:e9201a7bce59 |
26-May-2012 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Merge the predecoder and decoder.
These classes are always used together, and merging them will give the ISAs more flexibility in how they cache things and manage the process. |
9020:14321ce30881 |
25-May-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Decode: Make the Decoder class defined per ISA. |
8975:7f36d4436074 |
01-May-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Separate requests and responses for timing accesses
This patch moves send/recvTiming and send/recvTimingSnoop from the Port base class to the MasterPort and SlavePort, and also splits them into separate member functions for requests and responses: send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq, send/recvTimingSnoopResp. A master port sends requests and receives responses, and also receives snoop requests and sends snoop responses. A slave port has the reciprocal behaviour as it receives requests and sends responses, and sends snoop requests and receives snoop responses.
For all MemObjects that have only master ports or slave ports (but not both), e.g. a CPU, or a PIO device, this patch merely adds more clarity to what kind of access is taking place. For example, a CPU port used to call sendTiming, and will now call sendTimingReq. Similarly, a response previously came back through recvTiming, which is now recvTimingResp. For the modules that have both master and slave ports, e.g. the bus, the behaviour was previously relying on branches based on pkt->isRequest(), and this is now replaced with a direct call to the apprioriate member function depending on the type of access. Please note that send/recvRetry is still shared by all the timing accessors and remains in the Port base class for now (to maintain the current bus functionality and avoid changing the statistics of all regressions).
The packet queue is split into a MasterPort and SlavePort version to facilitate the use of the new timing accessors. All uses of the PacketQueue are updated accordingly.
With this patch, the type of packet (request or response) is now well defined for each type of access, and asserts on pkt->isRequest() and pkt->isResponse() are now moved to the appropriate send member functions. It is also worth noting that sendTimingSnoopReq no longer returns a boolean, as the semantics do not alow snoop requests to be rejected or stalled. All these assumptions are now excplicitly part of the port interface itself. |
8949:3fa1ee293096 |
14-Apr-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Remove the Broadcast destination from the packet
This patch simplifies the packet by removing the broadcast flag and instead more firmly relying on (and enforcing) the semantics of transactions in the classic memory system, i.e. request packets are routed from a master to a slave based on the address, and when they are created they have neither a valid source, nor destination. On their way to the slave, the request packet is updated with a source field for all modules that multiplex packets from multiple master (e.g. a bus). When a request packet is turned into a response packet (at the final slave), it moves the potentially populated source field to the destination field, and the response packet is routed through any multiplexing components back to the master based on the destination field.
Modules that connect multiplexing components, such as caches and bridges store any existing source and destination field in the sender state as a stack (just as before).
The packet constructor is simplified in that there is no longer a need to pass the Packet::Broadcast as the destination (this was always the case for the classic memory system). In the case of Ruby, rather than using the parameter to the constructor we now rely on setDest, as there is already another three-argument constructor in the packet class.
In many places where the packet information was printed as part of DPRINTFs, request packets would be printed with a numeric "dest" that would always be -1 (Broadcast) and that field is now removed from the printing. |
8948:e95ee70f876c |
14-Apr-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Separate snoops and normal memory requests/responses
This patch introduces port access methods that separates snoop request/responses from normal memory request/responses. The differentiation is made for functional, atomic and timing accesses and builds on the introduction of master and slave ports.
Before the introduction of this patch, the packets belonging to the different phases of the protocol (request -> [forwarded snoop request -> snoop response]* -> response) all use the same port access functions, even though the snoop packets flow in the opposite direction to the normal packet. That is, a coherent master sends normal request and receives responses, but receives snoop requests and sends snoop responses (vice versa for the slave). These two distinct phases now use different access functions, as described below.
Starting with the functional access, a master sends a request to a slave through sendFunctional, and the request packet is turned into a response before the call returns. In a system without cache coherence, this is all that is needed from the functional interface. For the cache-coherent scenario, a slave also sends snoop requests to coherent masters through sendFunctionalSnoop, with responses returned within the same packet pointer. This is currently used by the bus and caches, and the LSQ of the O3 CPU. The send/recvFunctional and send/recvFunctionalSnoop are moved from the Port super class to the appropriate subclass.
Atomic accesses follow the same flow as functional accesses, with request being sent from master to slave through sendAtomic. In the case of cache-coherent ports, a slave can send snoop requests to a master through sendAtomicSnoop. Just as for the functional access methods, the atomic send and receive member functions are moved to the appropriate subclasses.
The timing access methods are different from the functional and atomic in that requests and responses are separated in time and send/recvTiming are used for both directions. Hence, a master uses sendTiming to send a request to a slave, and a slave uses sendTiming to send a response back to a master, at a later point in time. Snoop requests and responses travel in the opposite direction, similar to what happens in functional and atomic accesses. With the introduction of this patch, it is possible to determine the direction of packets in the bus, and no longer necessary to look for both a master and a slave port with the requested port id.
In contrast to the normal recvFunctional, recvAtomic and recvTiming that are pure virtual functions, the recvFunctionalSnoop, recvAtomicSnoop and recvTimingSnoop have a default implementation that calls panic. This is to allow non-coherent master and slave ports to not implement these functions. |
8931:7a1dfb191e3f |
06-Apr-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Enable multiple distributed generalized memories
This patch removes the assumption on having on single instance of PhysicalMemory, and enables a distributed memory where the individual memories in the system are each responsible for a single contiguous address range.
All memories inherit from an AbstractMemory that encompasses the basic behaviuor of a random access memory, and provides untimed access methods. What was previously called PhysicalMemory is now SimpleMemory, and a subclass of AbstractMemory. All future types of memory controllers should inherit from AbstractMemory.
To enable e.g. the atomic CPU and RubyPort to access the now distributed memory, the system has a wrapper class, called PhysicalMemory that is aware of all the memories in the system and their associated address ranges. This class thus acts as an infinitely-fast bus and performs address decoding for these "shortcut" accesses. Each memory can specify that it should not be part of the global address map (used e.g. by the functional memories by some testers). Moreover, each memory can be configured to be reported to the OS configuration table, useful for populating ATAG structures, and any potential ACPI tables.
Checkpointing support currently assumes that all memories have the same size and organisation when creating and resuming from the checkpoint. A future patch will enable a more flexible re-organisation. |
8922:17f037ad8918 |
30-Mar-2012 |
William Wang <william.wang@arm.com> |
MEM: Introduce the master/slave port sub-classes in C++
This patch introduces the notion of a master and slave port in the C++ code, thus bringing the previous classification from the Python classes into the corresponding simulation objects and memory objects.
The patch enables us to classify behaviours into the two bins and add assumptions and enfore compliance, also simplifying the two interfaces. As a starting point, isSnooping is confined to a master port, and getAddrRanges to slave ports. More of these specilisations are to come in later patches.
The getPort function is not getMasterPort and getSlavePort, and returns a port reference rather than a pointer as NULL would never be a valid return value. The default implementation of these two functions is placed in MemObject, and calls fatal.
The one drawback with this specific patch is that it requires some code duplication, e.g. QueuedPort becomes QueuedMasterPort and QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort (avoiding multiple inheritance). With the later introduction of the port interfaces, moving the functionality outside the port itself, a lot of the duplicated code will disappear again. |
8921:e53972f72165 |
30-Mar-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Unify initMemProxies across CPUs and simulation modes
This patch unifies where initMemProxies is called, in the init() method of each BaseCPU subclass, before TheISA::initCPU is called. Moreover, it also ensures that initMemProxies is called in both full-system and syscall-emulation mode, thus unifying also across the modes. An additional check is added in the ThreadState to ensure that initMemProxies is only called once. |
8907:26256a3e8fa4 |
21-Mar-2012 |
Andrew Lukefahr <lukefahr@umich.edu> |
O3: Fix sizing of decode to rename skid buffer. |
8905:f6faef9f888d |
21-Mar-2012 |
Brian Grayson <b.grayson@samsung.com> |
O3: Fix size of skid buffer between fetch and decode when widths are different |
8902:75b524b64c28 |
19-Mar-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
gcc: Clean-up of non-C++0x compliant code, first steps
This patch cleans up a number of minor issues aiming to get closer to compliance with the C++0x standard as interpreted by gcc and clang (compile with std=c++0x and -pedantic-errors). In particular, the patch cleans up enums where the last item was succeded by a comma, namespaces closed by a curcly brace followed by a semi-colon, and the use of the GNU-extension typeof (replaced by templated functions). It does not address variable-length arrays, zero-size arrays, anonymous structs, range expressions in switch statements, and the use of long long. The generated CPU code also has a large number of issues that remain to be fixed, mainly related to overflows in implicit constant conversion (due to shifts). |
8895:ad5f1f128faf |
11-Mar-2012 |
Brian Grayson <b.grayson@samsung.com> |
O3: Add fatal when fetchWidth > Impl::MaxWidth. |
8890:9cf2327b7f5d |
09-Mar-2012 |
Geoffrey Blake <geoffrey.blake@arm.com> |
O3/Ozone: Eliminate dead code counting software prefetch insts
Eliminates dead code in the O3 and Ozone CPU models that counted software prefetch instructions separately for the ALPHA ISA only. |
8887:20ea02da9c53 |
09-Mar-2012 |
Geoffrey Blake <geoffrey.blake@arm.com> |
CheckerCPU: Make CheckerCPU runtime selectable instead of compile selectable
Enables the CheckerCPU to be selected at runtime with the --checker option from the configs/example/fs.py and configs/example/se.py configuration files. Also merges with the SE/FS changes. |
8876:44f8e7bb7fdf |
02-Mar-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Check that the interrupt controller is created when needed
This patch adds a creation-time check to the CPU to ensure that the interrupt controller is created for the cases where it is needed, i.e. if the CPU is not being switched in later and not a checker CPU.
The patch also adds the "createInterruptController" call to a number of the regression scripts. |
8863:50ce4deacda9 |
01-Mar-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
x86: Fix switching of CPUs This patch prevents creation of interrupt controller for cpus that will be switched in later |
8852:c744483edfcf |
24-Feb-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Make port proxies use references rather than pointers
This patch is adding a clearer design intent to all objects that would not be complete without a port proxy by making the proxies members rathen than dynamically allocated. In essence, if NULL would not be a valid value for the proxy, then we avoid using a pointer to make this clear.
The same approach is used for the methods using these proxies, such as loadSections, that now use references rather than pointers to better reflect the fact that NULL would not be an acceptable value (in fact the code would break and that is how this patch started out).
Overall the concept of "using a reference to express unconditional composition where a NULL pointer is never valid" could be done on a much broader scale throughout the code base, but for now it is only done in the locations affected by the proxies. |
8850:ed91b534ed04 |
24-Feb-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Round-two unifying instr/data CPU ports across models
This patch continues the unification of how the different CPU models create and share their instruction and data ports. Most importantly, it forces every CPU to have an instruction and a data port, and gives these ports explicit getters in the BaseCPU (getDataPort and getInstPort). The patch helps in simplifying the code, make assumptions more explicit, andfurther ease future patches related to the CPU ports.
The biggest changes are in the in-order model (that was not modified in the previous unification patch), which now moves the ports from the CacheUnit to the CPU. It also distinguishes the instruction fetch and load-store unit from the rest of the resources, and avoids the use of indices and casting in favour of keeping track of these two units explicitly (since they are always there anyways). The atomic, timing and O3 model simply return references to their already existing ports. |
8843:7d3ac6813147 |
13-Feb-2012 |
Mrinmoy Ghosh <mrinmoy.ghosh@arm.com> |
BPred: Fix RAS to handle predicated call/return instructions.
Change RAS to fix issues with predicated call/return instructions. Handled all cases in the life of a predicated call and return instruction. |
8842:a02932e2e73d |
13-Feb-2012 |
Mrinmoy Ghosh <mrinmoy.ghosh@arm.com> |
BP: Fix several Branch Predictor issues. 1. Updates the Branch Predictor correctly to the state just after a mispredicted branch, if a squash occurs. 2. If a BTB does not find an entry, the branch is predicted not taken. The global history is modified to correctly reflect this prediction. 3. Local history is now updated at the fetch stage instead of execute stage. 4. In the Update stage of the branch predictor the local predictors are now correctly updated according to the state of local history during fetch stage.
This patch also improves performance by as much as 17% on some benchmarks |
8834:21e8d54ecf07 |
12-Feb-2012 |
Anthony Gutierrez <atgutier@umich.edu> |
cpu: add separate stats for insts/ops both globally and per cpu model |
8832:247fee427324 |
12-Feb-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
mem: Add a master ID to each request object.
This change adds a master id to each request object which can be used identify every device in the system that is capable of issuing a request. This is part of the way to removing the numCpus+1 stats in the cache and replacing them with the master ids. This is one of a series of changes that make way for the stats output to be changed to python. |
8824:a42647b4a6b6 |
10-Feb-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
O3 CPU: Improve handling of delayed commit flag The delayed commit flag is used in conjunction with interrupt pending flag to figure out whether or not fetch stage should get more instructions. This patch clears this flag when instructions are squashed. Also, in case an interrupt is pending, currently it is not possible to access the instruction cache. This patch allows accessing the cache in case this flag is set. |
8823:ae411fcf4935 |
10-Feb-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
O3 CPU: Strengthen condition for handling interrupts The condition for handling interrupts is to check whether or not the cpu's instruction list is empty. As observed, this can lead to cases in which even though the instruction list is empty, interrupts are handled when they should not be. The condition is being strengthened so that interrupts get handled only when the last committed microop did not had IsDelayedCommit set. |
8822:e7ae13867098 |
10-Feb-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
O3 CPU: Provide the squashing instruction This patch adds a function to the ROB that will get the squashing instruction from the ROB's list of instructions. This squashing instruction is used for figuring out the macroop from which the fetch stage should fetch the microops. Further, a check has been added that if the instructions are to be fetched from the cache maintained by the fetch stage, then the data in the cache should be valid and the PC of the thread being fetched from is same as the address of the cache block. |
8821:bba1a976c293 |
10-Feb-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
O3 Fetch: Check if PC is pointing to Microcode ROM |
8817:c36441eed919 |
07-Feb-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Faults: Turn off arch/faults.hh
Because there are no longer architecture independent but specialized functions in arch/XXX/faults.hh, code that isn't using the faults from a particular ISA no longer needs to be able to include them through the switching header file arch/faults.hh. By removing that header file (arch/faults.hh), the potential interface between ISA code and non ISA code is narrowed. |
8809:bb10807da889 |
01-Feb-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head, hopefully the last time for this batch. |
8808:8af87554ad7e |
31-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with main repository. |
8807:35e77c938919 |
29-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Yet another merge with the main repository. |
8806:669e93d79ed9 |
29-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Implement Ali's review feedback.
Try to decrease indentation, and remove some redundant FullSystem checks. |
8799:dac1e33e07b0 |
28-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with the main repo. |
8798:adaa92be9037 |
16-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge yet again with the main repository. |
8797:3202eb01e01e |
07-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Another merge with the main repository. |
8796:a2ae5c378d0a |
07-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with the main repository again. |
8795:0909f8ed7aa0 |
07-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with main repository. |
8794:e2ac2b7164dd |
18-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Get rid of includes of config/full_system.hh. |
8793:5f25086326ac |
18-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Get rid of FULL_SYSTEM in the CPU directory. |
8779:2a590c51adb1 |
01-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Expose the same methods on the CPUs in SE and FS modes. |
8777:dd43f1c9fa0a |
31-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Make the functions available from the TC consistent between SE and FS. |
8767:e575781f71b8 |
30-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Make getProcessPtr available in both modes, and get rid of FULL_SYSTEMs. |
8766:b0773af78423 |
30-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Build the base process class in FS. |
8764:e4660687c49f |
16-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Include getMemPort in FS. |
8761:20322354b80b |
16-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Build/expose vport in SE mode. |
8754:0996451df6de |
16-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Make physPort and getPhysPort available in SE mode. |
8737:770ccf3af571 |
31-Jan-2012 |
Koan-Sin Tan <koansin.tan@gmail.com> |
clang: Enable compiling gem5 using clang 2.9 and 3.0
This patch adds the necessary flags to the SConstruct and SConscript files for compiling using clang 2.9 and later (on Ubuntu et al and OSX XCode 4.2), and also cleans up a bunch of compiler warnings found by clang. Most of the warnings are related to hidden virtual functions, comparisons with unsigneds >= 0, and if-statements with empty bodies. A number of mismatches between struct and class are also fixed. clang 2.8 is not working as it has problems with class names that occur in multiple namespaces (e.g. Statistics in kernel_stats.hh).
clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which causes confusion between the container std::set and the function Packet::set, and this is currently addressed by not including the entire namespace std, but rather selecting e.g. "using std::vector" in the appropriate places. |
8733:64a7bf8fa56c |
31-Jan-2012 |
Geoffrey Blake <geoffrey.blake@arm.com> |
CheckerCPU: Re-factor CheckerCPU to be compatible with current gem5
Brings the CheckerCPU back to life to allow FS and SE checking of the O3CPU. These changes have only been tested with the ARM ISA. Other ISAs potentially require modification. |
8730:0a742249f76b |
30-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Clean-up of Functional/Virtual/TranslatingPort remnants
This patch cleans up forward declarations and a member-function prototype that still referred to the old FunctionalPort, VirtualPort and TranslatingPort. There is no change in functionality. |
8727:b3995530319f |
28-Jan-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
O3 CPU LSQ: Implement TSO This patch makes O3's LSQ maintain total order between stores. Essentially only the store at the head of the store buffer is allowed to be in flight. Only after that store completes, the next store is issued to the memory system. By default, the x86 architecture will have TSO. |
8711:c7e14f52c682 |
17-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Separate queries for snooping and address ranges
This patch simplifies the address-range determination mechanism and also unifies the naming across ports and devices. It further splits the queries for determining if a port is snooping and what address ranges it responds to (aiming towards a separation of cache-maintenance ports and pure memory-mapped ports). Default behaviours are such that most ports do not have to define isSnooping, and master ports need not implement getAddrRanges. |
8707:489489c67fd9 |
17-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Moving towards a more general port across CPU models
This patch performs minimal changes to move the instruction and data ports from specialised subclasses to the base CPU (to the largest degree possible). Ultimately it servers to make the CPU(s) have a well-defined interface to the memory sub-system. |
8706:b1838faf3bcc |
17-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Add port proxies instead of non-structural ports
Port proxies are used to replace non-structural ports, and thus enable all ports in the system to correspond to a structural entity. This has the advantage of accessing memory through the normal memory subsystem and thus allowing any constellation of distributed memories, address maps, etc. Most accesses are done through the "system port" that is used for loading binaries, debugging etc. For the entities that belong to the CPU, e.g. threads and thread contexts, they wrap the CPU data port in a port proxy.
The following replacements are made: FunctionalPort > PortProxy TranslatingPort > SETranslatingPortProxy VirtualPort > FSTranslatingPortProxy |
8674:a9476951e3a2 |
10-Jan-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
DPRINTF: Improve some dprintf messages. |
8665:e75d9251f7e6 |
09-Jan-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Remove some asserts that no longer seem to be valid. |
8662:d4548b381e87 |
09-Jan-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Add support of function tracing with O3 CPU. |
8641:4d3ecac1abec |
13-Dec-2011 |
Nathan Binkert <nate@binkert.org> |
gcc: fix unused variable warnings from GCC 4.6.1 |
8631:8c038d4cd210 |
01-Dec-2011 |
Chander Sudanthi <chander.sudanthi@arm.com> |
O3: Remove hardcoded tgts_per_mshr in O3CPU.py.
There are two lines in O3CPU.py that set the dcache and icache tgts_per_mshr to 20, ignoring any pre-configured value of tgts_per_mshr. This patch removes these hardcoded lines from O3CPU.py and sets the default L1 cache mshr targets to 20. |
8627:86358c187837 |
01-Dec-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Add stat that counts how many cycles the O3 cpu was quiesced. |
8607:5fb918115c07 |
31-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
GCC: Get everything working with gcc 4.6.1.
And by "everything" I mean all the quick regressions. |
8592:30a97c4198df |
27-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Tidy up some DPRINTFs in the LSQ. |
8591:8f23aeaf6a91 |
27-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Faults: Replace calls to genMachineCheckFault with M5PanicFault. |
8587:acce52081b45 |
26-Sep-2011 |
Nilay Vaish<nilay@cs.wisc.edu> |
LSQ: Moved a couple of lines to enable O3 + Ruby This patch makes O3 CPU work along with the Ruby memory model. Ruby overwrites the senderState pointer with another pointer. The pointer is restored only when Ruby gets done with the packet. LSQ makes use of senderState just after sendTiming() returns. But the dynamic_cast returns a NULL pointer since Ruby's senderState pointer is from a different class. Storing the senderState pointer before calling sendTiming() does away with the problem. |
8581:56f97760eadd |
22-Sep-2011 |
Steve Reinhardt <steve.reinhardt@amd.com> |
event: minor cleanup Initialize flags via the Event constructor instead of calling setFlags() in the body of the derived class's constructor. I forget exactly why, but this made life easier when implementing multi-queue support.
Also rename Event::getFlags() to isFlagSet() to better match common usage, and get rid of some unused Event methods. |
8557:f44572edfba3 |
19-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Syscall: Make the syscall function available in both SE and FS modes.
In FS mode the syscall function will panic, but the interface will be consistent and code which calls syscall can be compiled in. This will allow, for instance, instructions that use syscall to be built unconditionally but then not returned by the decoder. |
8545:a3992291e230 |
13-Sep-2011 |
Ali Saidi <saidi@eecs.umich.edu> |
LSQ: Only trigger a memory violation with a load/load if the value changes.
Only create a memory ordering violation when the value could have changed between two subsequent loads, instead of just when loads go out-of-order to the same address. While not very common in the case of Alpha, with an architecture with a hardware table walker this can happen reasonably frequently beacuse a translation will miss and start a table walk and before the CPU re-schedules the faulting instruction another one will pass it to the same address (or cache block depending on the dendency checking).
This patch has been tested with a couple of self-checking hand crafted programs to stress ordering between two cores.
The performance improvement on SPEC benchmarks can be substantial (2-10%). |
8541:27aaee8ec7cc |
09-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Decode: Pull instruction decoding out of the StaticInst class into its own.
This change pulls the instruction decoding machinery (including caches) out of the StaticInst class and puts it into its own class. This has a few intrinsic benefits. First, the StaticInst code, which has gotten to be quite large, gets simpler. Second, the code that handles decode caching is now separated out into its own component and can be looked at in isolation, making it easier to understand. I took the opportunity to restructure the code a bit which will hopefully also help.
Beyond that, this change also lays some ground work for each ISA to have its own, potentially stateful decode object. We'd be able to include less contextualizing information in the ExtMachInst objects since that context would be applied at the decoder. Also, the decoder could "know" ahead of time that all the instructions it's going to see are going to be, for instance, 64 bit mode, and it will have one less thing to check when it decodes them. Because the decode caching mechanism has been separated out, it's now possible to have multiple caches which correspond to different types of decoding context. Having one cache for each element of the cross product of different configurations may become prohibitive, so it may be desirable to clear out the cache when relatively static state changes and not to have one for each setting.
Because the decode function is no longer universally accessible as a static member of the StaticInst class, a new function was added to the ThreadContexts that returns the applicable decode object. |
8519:ef35ce2bd73f |
19-Aug-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
LSQ: Set store predictor to periodically clear itself as recommended in the storesets paper.
This patch improves performance by as much as 10% on some spec benchmarks. |
8518:9c87727099ce |
19-Aug-2011 |
Geoffrey Blake <geoffrey.blake@arm.com> |
Fix bugs due to interaction between SEV instructions and O3 pipeline
SEV instructions were originally implemented to cause asynchronous squashes via the generateTCSquash() function in the O3 pipeline when updating the SEV_MAILBOX miscReg. This caused race conditions between CPUs in an MP system that would lead to a pipeline either going inactive indefinitely or not being able to commit squashed instructions. Fixed SEV instructions to behave like interrupts and cause synchronous sqaushes inside the pipeline, eliminating the race conditions. Also fixed up the semantics of the WFE instruction to behave as documented in the ARMv7 ISA description to not sleep if SEV_MAILBOX=1 or unmasked interrupts are pending. |
8516:a9c0d2ab490a |
19-Aug-2011 |
Mrinmoy Ghosh <Mrinmoy.Ghosh@arm.com> |
LSQ: Add some better dprintfs for storeset predictor. |
8515:12420b96b364 |
19-Aug-2011 |
Mrinmoy Ghosh <Mrinmoy.Ghosh@arm.com> |
LSQ: Fix a few issues with the storeset predictor.
Two issues are fixed in this patch: 1. The load and store pc passed to the predictor are passed in reverse order. 2. The flag indicating that a barrier is inflight was never cleared when the barrier was squashed instead of committed. This made all load insts dependent on a non-existent barrier in-flight. |
8513:f4272aa61e74 |
19-Aug-2011 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
O3: Squash the violator and younger instructions instead not all insts.
Change the way instructions are squashed on memory ordering violations to squash the violator and younger instructions, not all instructions that are younger than the instruction they violated (no reason to throw away valid work). |
8506:5a9c6f49f882 |
16-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Make lsq_unit.hh include arch/isa_traits.hh directly, not transitively. |
8503:479b186a4652 |
14-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: When squashing, restore the macroop that should be used for fetching. |
8502:f1fc7102c970 |
14-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Add a pointer to the macroop for a microop in the dyninst. |
8499:e5f14b00c0ae |
13-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: At the end of an instruction, force fetchAddr to something sensible.
It's possible (though until now very unlikely) for fetchAddr to get out of sync with the actual PC of the current instruction. This change forcefull resets fetchAddr at the end of every instruction. |
8495:6ee3a2359fcb |
09-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Stop using the current macroop no matter why you're leaving it.
Until now, the only reason a macroop would be left was because it ended at a microop marked as the last microop. In O3 with branch prediction, it's possible for the branch predictor to have entries which originally came from different instructions which happened to have the same RIP. This could theoretically happen in many ways, but it was encountered specifically when different programs in different address spaces ran one after the other in X86_FS.
What would happen in that case was that the macroop would continue to be looped over and microops fetched from it until it reached the last microop even though the macropc had moved out from under it. If things lined up properly, this could mean that the end bytes of an instruction actually fell into the instruction sized block of memory after the one in the predecoder. The fetch loop implicitly assumes that the last instruction sized chunk of memory processed was the last one needed for the instruction it just finished executing. It would then tell the predecoder to move to an offset within the bytes it was given that is larger than those bytes, and that would trip an assert in the x86 predecoder.
This change fixes this problem by making fetch stop processing the current macroop if the address it should be fetching from changed when the PC is updated. That happens when the last microop was reached because the instruction handled it properly, and it also catches the case where the branch predictor makes fetch do a macro level branch when it shouldn't.
The check of isLastMicroop is retained because otherwise, a macroop that branches back to itself would act like a single, long macroop instead of multiple instances of the same microop. There may be situations (which may turn out to be purely hypothetical) where that matters.
This also fixes a relatively minor issue where the curMacroop variable would be set to NULL immediately after seeing that a microop was the last one before curMacroop was used to build the dyninst. The traceData structure would have a NULL pointer to the macroop for that microop. |
8493:0eca041a8c06 |
09-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: When waiting to handle an interrupt, let everything drain out.
Before this change, the commit stage would wait until the ROB and store queue were empty before recognizing an interrupt. The fetch stage would stop generating instructions at an appropriate point, so commit would then wait until a valid time to interrupt the instruction stream. Instructions might be in flight after fetch but not the in the ROB or store queue (in rename, for instance), so this change makes commit wait until all in flight instructions are finished. |
8491:606cf2660887 |
07-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Get rid of the unused addToRemoveList function. |
8489:2e12a633d269 |
07-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Let squashed and deferred instructions issue.
Let squahsed and deferred instructions issue so they don't accumulate and clog up the CPU. |
8484:3c641509bf3e |
02-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Get rid of the raw ExtMachInst constructor on DynInsts.
This constructor assumes that the ExtMachInst can be decoded directly into a StaticInst that's useful to execute. With the advent of microcoded instructions that's no longer true. |
8481:818aea9960f5 |
31-Jul-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Implement memory mapped IPRs for O3. |
8479:e68b1ad09c6b |
31-Jul-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Fix corner case squashing into the microcode ROM.
When fetching from the microcode ROM, if the PC is set so that it isn't in the cache block that's been fetched the CPU will get stuck. The fetch stage notices that it's in the ROM so it doesn't try to fetch from the current PC. It then later notices that it's outside of the current cache block so it skips generating instructions expecting to continue once the right bytes have been fetched. This change lets the fetch stage attempt to generate instructions, and only checks if the bytes it's going to use are valid if it's really going to use them. |
8471:18e560ba1539 |
15-Jul-2011 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
O3: Create a pipeline activity viewer for the O3 CPU model.
Implemented a pipeline activity viewer as a python script (util/o3-pipeview.py) and modified O3 code base to support an extra trace flag (O3PipeView) for generating traces to be used as inputs by the tool. |
8462:80492ae5148e |
10-Jul-2011 |
Geoffrey Blake <geoffrey.blake@arm.com |
O3: Fix up pipelining icache accesses in fetch stage to function properly
Fixed up the patch from Yasuko Watanabe that enabled pipelining of fetch accessess to icache to work with recent changes to main repository. Also added in ability for fetch stage to delay issuing the fault carrying nop when a pipeline fetch causes a fault and no fetch bandwidth is available until the next cycle. |
8460:3893d9d2c6c2 |
10-Jul-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Make sure fetch doesn't go off into the weeds during speculation. |
8346:ce8b9a250021 |
10-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
o3: missing newlines on some dprintfs |
8335:9228e00459d4 |
02-Jun-2011 |
Nathan Binkert <nate@binkert.org> |
scons: rename TraceFlags to DebugFlags |
8316:6fd588813142 |
23-May-2011 |
Geoffrey Blake <geoffrey.blake@arm.com> |
O3: Fix offset calculation into storeQueue buffer for store->load forwarding
Calculation of offset to copy from storeQueue[idx].data structure for load to store forwarding fixed to be difference in bytes between store and load virtual addresses. Previous method would induce bug where a load would index into buffer at the wrong location. |
8315:6173b87e7652 |
23-May-2011 |
Geoffrey Blake <geoffrey.blake@arm.com> |
O3: Fix issue w/wbOutstading being decremented multiple times on blocked cache.
If a split load fails on a blocked cache wbOutstanding can be decremented twice if the first part of the split load succeeds and the second part fails. Condition the decrementing on not having completed the first part of the load. |
8314:13ac7b9939ef |
23-May-2011 |
Geoffrey Blake <geoffrey.blake@arm.com> |
O3: Fix issue with interrupts/faults occuring in the middle of a macro-op
This patch fixes two problems with the O3 cpu model. The first is an issue with an instruction fetch causing a fault on the next address while the current macro-op is being issued. This happens when the micro-ops exceed the fetch bandwdith and then on the next cycle the fetch stage attempts to issue a request to the next line while it still has micro-ops to issue if the next line faults a fault is attached to a micro-op in the currently executing macro-op rather than a "nop" from the next instruction block. This leads to an instruction incorrectly faulting when on fetch when it had no reason to fault.
A similar problem occurs with interrupts. When an interrupt occurs the fetch stage nominally stops issuing instructions immediately. This is incorrect in the case of a macro-op as the current location might not be interruptable. |
8298:3c1296738e34 |
13-May-2011 |
Geoffrey Blake <geoffrey.blake@arm.com> |
O3: Fix an issue with a load & branch instruction and mem dep squashing
Instructions that load an address and are control instructions can execute down the wrong path if they were predicted correctly and then instructions following them are squashed. If an instruction is a memory and control op use the predicted address for the next PC instead of just advancing the PC. Without this change NPC is used for the next instruction, but predPC is used to verify that the branch was successful so the wrong path is silently executed. |
8275:8c88a94c2f4f |
04-May-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Remove assertion for case that is actually handled in code.
If an nonspeculative instruction has a fault it might not be in the nonSpecInsts map. |
8272:82057507f2f9 |
04-May-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Fix a small corner case with the lsq hazard detection logic. |
8247:acf4b902c02e |
20-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
stats: one more name violation |
8240:38befb82b2c9 |
19-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
stats: rename stats so they can be used as python expressions |
8232:b28d06a175be |
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
trace: reimplement the DTRACE function so it doesn't use a vector At the same time, rename the trace flags to debug flags since they have broader usage than simply tracing. This means that --trace-flags is now --debug-flags and --trace-help is now --debug-help |
8230:845c8eb5ac49 |
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
includes: fix up code after sorting |
8229:78bf55f23338 |
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
includes: sort all includes |
8208:45331a355c38 |
04-Apr-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works.
This change fixes a small bug in the arm copyRegs() code where some registers wouldn't be copied if the processor was in a mode other than MODE_USER. Additionally, this change simplifies the way the O3 switchCpu code works by utilizing TheISA::copyRegs() to copy the required context information rather than the adhoc copying that goes on in the CPU model. The current code makes assumptions about the visibility of int and float registers that aren't true for all architectures in FS mode. |
8205:7ecbffb674aa |
04-Apr-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Cleanup implementation of ITSTATE and put important code in PCState.
Consolidate all code to handle ITSTATE in the PCState object rather than touching a variety of structures/objects. |
8201:89221928d131 |
04-Apr-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
CPU: Remove references to memory copy operations |
8199:3d6c08c877a9 |
04-Apr-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Tighten memory order violation checking to 16 bytes.
The comment in the code suggests that the checking granularity should be 16 bytes, however in reality the shift by 8 is 256 bytes which seems much larger than required. |
8138:f08692f2932e |
17-Mar-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Send instruction back to fetch on squash to seed predecoder correctly. |
8137:48371b9fb929 |
17-Mar-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Cleanup the commitInfo comm struct.
Get rid of unused members and use base types rather than derrived values where possible to limit amount of state. |
8134:b01a51ff05fa |
17-Mar-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
Mem: Fix issue with dirty block being lost when entire block transferred to non-cache.
This change fixes the problem for all the cases we actively use. If you want to try more creative I/O device attachments (E.g. sharing an L2), this won't work. You would need another level of caching between the I/O device and the cache (which you actually need anyway with our current code to make sure writes propagate). This is required so that you can mark the cache in between as top level and it won't try to send ownership of a block to the I/O device. Asserts have been added that should catch any issues. |
8133:9f704aa10eb4 |
17-Mar-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Fix unaligned stores when cache blocked
Without this change the a store can be issued to the cache multiple times. If this case occurs when the l1 cache is out of mshrs (and thus blocked) the processor will never make forward progress because each cycle it will send a single request using the recently freed mshr and not completing the multipart store. This will continue forever. |
8089:4a59661d3fd1 |
25-Feb-2011 |
Timothy M. Jones <timothy.jones@cl.cam.ac.uk> |
O3CPU: Fix iqCount and lsqCount SMT fetch policies. Fixes two of the SMT fetch policies in O3CPU that were returning the count of instructions in the IQ or LSQ rather than the thread ID to fetch from. |
8073:e154b9b8e366 |
23-Feb-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: When a prefetch causes a fault, don't record it in the inst |
8071:7bf6fccab013 |
23-Feb-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: If there is an outstanding table walk don't let the inst queue sleep.
If there is an outstanding table walk and no other activity in the CPU it can go to sleep and never wake up. This change makes the instruction queue always active if the CPU is waiting for a store to translate.
If Gabe changes the way this code works then the below should be removed as indicated by the todo. |
8068:749581c26e71 |
23-Feb-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Do something for ISB, DSB, DMB |
8067:21f14583aa6a |
23-Feb-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Fix bug that let two table walks occur in parallel. |
8064:5b111ae7e7d4 |
23-Feb-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Fix bug when a squash occurs right before TLB miss returns.
In this case we need to throw away the TLB miss, not assume it was the one we were waiting for. |
7963:6d955240bb62 |
13-Feb-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Fetch from the microcode ROM when needed. |
7962:404170ece9a4 |
13-Feb-2011 |
Ali Saidi <saidi@eecs.umich.edu> |
O3: Fix GCC 4.2.4 complaint |
7947:6d07db809a81 |
11-Feb-2011 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
O3: Fix pipeline restart when a table walk completes in the fetch stage.
When a table walk is initiated by the fetch stage, the CPU can potentially move to the idle state and never wake up.
The fetch stage must call cpu->wakeCPU() when a translation completes (in finishTranslation()). |
7944:1daf51f62013 |
11-Feb-2011 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
O3: Enhance data address translation by supporting hardware page table walkers.
Some ISAs (like ARM) relies on hardware page table walkers. For those ISAs, when a TLB miss occurs, initiateTranslation() can return with NoFault but with the translation unfinished.
Instructions experiencing a delayed translation due to a hardware page table walk are deferred until the translation completes and kept into the IQ. In order to keep track of them, the IQ has been augmented with a queue of the outstanding delayed memory instructions. When their translation completes, instructions are re-executed (only their initiateAccess() was already executed; their DTB translation is now skipped). The IEW stage has been modified to support such a 2-pass execution. |
7897:d9e8b1fd1a9f |
07-Feb-2011 |
Joel Hestness <hestness@cs.utexas.edu> |
mcpat: Adds McPAT performance counters
Updated patches from Rick Strong's set that modify performance counters for McPAT |
7876:189b9b258779 |
03-Feb-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Config: Keep track of uncached and cached ports separately.
This makes sure that the address ranges requested for caches and uncached ports don't conflict with each other, and that accesses which are always uncached (message signaled interrupts for instance) don't waste time passing through caches. |
7875:4afd05b9485e |
03-Feb-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Fix a style bug in O3. |
7868:6029008db669 |
01-Feb-2011 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Add L1 caches for the TLB walkers.
Small L1 caches are connected to the TLB walkers when caches are used. This allows them to participate in the coherence protocol properly. |
7857:b2c7e56572a4 |
18-Jan-2011 |
Matt Horsnell <Matt.Horsnell@arm.com> |
O3: Fix some variable length instruction issues with the O3 CPU and ARM ISA. |
7856:d25827665112 |
18-Jan-2011 |
Matt Horsnell <Matt.Horsnell@arm.com> |
O3: Don't test misprediction on load instructions until executed. |
7855:c0be563517da |
18-Jan-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Keep around the last committed instruction and use for squashing.
Without this change 0 is always used for the youngest sequence number if a squash occured and the ROB was empty (E.g. an instruction is marked serializeAfter or a fetch stall prevents other instructions from issuing). Using 0 there is a race to rename where an instruction that committed the same cycle as the squashing instruction can have it's renamed state undone by the squash using sequence number 0. |
7854:3c6783497976 |
18-Jan-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Don't try to scoreboard misc registers.
I'm not positive this is the correct fix, but it's working right now. Either we need to do something like this, prevent the misc reg from being renamed at all, or there something else going on. We need to find the root cause as to why this is only a problem sometimes. |
7852:07ba4754ae0a |
18-Jan-2011 |
Matt.Horsnell <Matt.Horsnell@arm.com> |
O3: Fix corner cases where multiple squashes/fetch redirects overwrite timebuf. |
7851:bb38f0c47ade |
18-Jan-2011 |
Matt Horsnell <Matt.Horsnell@arm.com> |
O3: Fix mispredicts from non control instructions. The squash inside the fetch unit should not attempt to remove them from the branch predictor as non-control instructions are not pushed into the predictor. |
7850:02450f4443ce |
18-Jan-2011 |
Matt Horsnell <Matt.Horsnell@arm.com> |
O3: Fixes the way prefetches are handled inside the iew unit.
This patch prevents the prefetch being added to the instCommit queue twice. |
7849:2290428b5f04 |
18-Jan-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Support timing translations for O3 CPU fetch. |
7848:cc5e64f8423f |
18-Jan-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Add support for moving predicated false dest operands from sources. |
7847:0c6613ad8f18 |
18-Jan-2011 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
O3: Fixes fetch deadlock when the interrupt clears before CPU handles it.
When this condition occurs the cpu should restart the fetch stage to fetch from the original execution path. Fault handling in the commit stage is cleaned up a little bit so the control flow is simplier. Finally, if an instruction is being used to carry a fault it isn't executed, so the fault propagates appropriately. |
7823:dac01f14f20f |
08-Jan-2011 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Replace curTick global variable with accessor functions. This step makes it easy to replace the accessor functions (which still access a global variable) with ones that access per-thread curTick values. |
7813:7338bc628489 |
03-Jan-2011 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Move sched_list.hh and timebuf.hh from src/base to src/cpu. These files really aren't general enough to belong in src/base. This patch doesn't reorder include lines, leaving them unsorted in many cases, but Nate's magic script will fix that up shortly. |
7786:bafa8a197088 |
07-Dec-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Allow a store entry to store up to 16 bytes (instead of TheISA::IntReg).
The store queue doesn't need to be ISA specific and architectures can frequently store more than an int registers worth of data. A 128 bits seems more common, but even 256 bits may be appropriate. Pretty much anything less than a cache line size is buildable. |
7784:e7649570ff3a |
07-Dec-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Support squashing all state after special instruction
For SPARC ASIs are added to the ExtMachInst. If the ASI is changed simply marking the instruction as Serializing isn't enough beacuse that only stops rename. This provides a mechanism to squash all the instructions and refetch them |
7783:9b880b40ac10 |
07-Dec-2010 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
O3: Make all instructions that write a misc. register not perform the write until commit.
ARM instructions updating cumulative flags (ARM FP exceptions and saturation flags) are not serialized.
Added aliases for ARM FP exceptions and saturation flags in FPSCR. Removed write accesses to the FP condition codes for most ARM VFP instructions: only VCMP and VCMPE instructions update the FP condition codes. Removed a potential cause of seg. faults in the O3 model for NEON memory macro-ops (ARM). |
7782:9b87755cb699 |
07-Dec-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
O3: Support SWAP and predicated loads/store in ARM. |
7767:bf5377d8f5c1 |
18-Nov-2010 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Fix fp destination register flattening, and index offset adjusting.
This change makes O3 flatten floating point destination registers, and also fixes misc register flattening so that it's correctly repositioned relative to the resized regions for integer and floating point indices.
It also fixes some overly long lines. |
7764:03efcdc3421f |
15-Nov-2010 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Make O3 support variably lengthed instructions. |
7763:ff2213d13e58 |
15-Nov-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: reset architetural state by calling clear() |
7760:e93e7e0caae1 |
15-Nov-2010 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
CPU/ARM: Add SIMD op classes to CPU models and ARM ISA. |
7758:28a677d7cb51 |
15-Nov-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
O3: prevent a squash when completeAcc() modifies misc reg through TC.
This happens on ARM instructions when they update the IT state bits. Code and associated comment was copied from execute() and initiateAcc() methods |
7720:65d338a8dba4 |
31-Oct-2010 |
Gabe Black <gblack@eecs.umich.edu> |
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.
This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way.
PC type:
Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM.
These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later.
Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses.
Advancing the PC:
The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements.
One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now.
Variable length instructions:
To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back.
ISA parser:
To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out.
Return address stack:
The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works.
Change in stats:
There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS.
TODO:
Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. |
7717:f166f8bd8818 |
24-Oct-2010 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Get rid of a bunch of commented out lines. |
7699:addb847910d2 |
04-Oct-2010 |
Gabe Black <gblack@eecs.umich.edu> |
Alpha: Fix Alpha NumMiscArchRegs constant.
Also add asserts in O3's Scoreboard class to catch bad indexes. |
7684:ce48527a3edb |
20-Sep-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Fix O3 and possible InOrder segfaults in FS. |
7679:f26cc2c68b48 |
14-Sep-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Get rid of the now unnecessary getInst/setInst family of functions.
This code is no longer needed because of the preceeding change which adds a StaticInstPtr parameter to the fault's invoke method, obviating the only use for this pair of functions. |
7678:f19b6a3a8cec |
13-Sep-2010 |
Gabe Black <gblack@eecs.umich.edu> |
Faults: Pass the StaticInst involved, if any, to a Fault's invoke method.
Also move the "Fault" reference counted pointer type into a separate file, sim/fault.hh. It would be better to name this less similarly to sim/faults.hh to reduce confusion, but fault.hh matches the name of the type. We could change Fault to FaultPtr to match other pointer types, and then changing the name of the file would make more sense. |
7676:92274350b953 |
10-Sep-2010 |
Nathan Binkert <nate@binkert.org> |
style: fix sorting of includes and whitespace in some files |
7649:a6a6177a5ffa |
25-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
ARM: Fixed register flattening logic (FP_Base_DepTag was set too low)
When decoding a srs instruction, invalid mode encoding returns invalid instruction. This can happen when garbage instructions are fetched from mispredicted path |
7627:3b0c4b819651 |
23-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
ISA: Get rid of old, unused utility functions cluttering up the ISAs. |
7616:1a0ab2308bbe |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
O3: Skipping mem-order violation check for uncachable loads. Uncachable load is not executed until it reaches the head of the ROB, hence cannot cause one. |
7615:50f6494d9b55 |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
ARM: Improve printing of uop disassembly. |
7600:eff7f79f7dfd |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
CPU: Make Exec trace to print predication result (if false) for memory instructions |
7599:f6bbf266f2c8 |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
ARM: mark msr/mrs instructions as SerializeBefore/After Since miscellaneous registers bypass wakeup logic, force serialization to resolve data dependencies through them * * * ARM: adding non-speculative/serialize flags for instructions change CPSR |
7598:c0ae58952ed0 |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
O3: Handle loads when the destination is the PC. For loads that PC is the destination, check if the load was mispredicted again when the value being loaded returns from memory |
7597:063f160e8b50 |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
ARM/O3: store the result of the predicate evaluation in DynInst or Threadstate. THis allows the CPU to handle predicated-false instructions accordingly. This particular patch makes loads that are predicated-false to be sent straight to the commit stage directly, not waiting for return of the data that was never requested since it was predicated-false. |
7520:67c670459d01 |
13-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Add readBytes and writeBytes functions to the exec contexts. |
7511:bd104adbf04d |
22-Jul-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
LSQ Unit: After deleting part of a split request, set it to NULL so that it isn't accidentally deleted again later (causing a segmentation fault). |
7509:3bd51d6ac9ef |
22-Jul-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
O3CPU: Fix a bug where stores in the cpu where never marked as split. |
7507:b1ac6773e83d |
22-Jul-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
O3CPU: O3's tick event gets squashed when it is switched out. When repeatedly switching between O3 and another CPU, O3's tick event might still be scheduled in the event queue (as squashed). Therefore, check for a squashed tick event as well as a non-scheduled event when taking over from another CPU and deal with it accordingly. |
7467:91994f36de7f |
22-Jun-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
O3ThreadContext: When taking over from a previous context, only assert that the system pointers match in Full System mode. |
6994:c6951099a1cb |
26-Feb-2010 |
Nathan Binkert <nate@binkert.org> |
cpu_models: get rid of cpu_models.py and move the stuff into SCons |
6974:4d4903a3e7c5 |
12-Feb-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
O3PCU: Split loads and stores that cross cache line boundaries.
When each load or store is sent to the LSQ, we check whether it will cross a cache line boundary and, if so, split it in two. This creates two TLB translations and two memory requests. Care has to be taken if the first packet of a split load is sent but the second blocks the cache. Similarly, for a store, if the first packet cannot be sent, we must store the second one somewhere to retry later.
This modifies the LSQSenderState class to record both packets in a split load or store.
Finally, a new const variable, HasUnalignedMemAcc, is added to each ISA to indicate whether unaligned memory accesses are allowed. This is used throughout the changed code so that compiler can optimise away code dealing with split requests for ISAs that don't need them. |
6711:c79d72abdbe5 |
04-Nov-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
o3: get rid of unused physmem pointer |
6667:8b5bc1a777bc |
26-Sep-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
O3: Add flag to control whether faulting instructions are traced. When enabled, faulting instructions appear in the trace twice (once when they fault and again when they're re-executed). This flag is set by the Exec compound flag for backwards compatibility. |
6664:4df6f4bd36cd |
26-Sep-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
O3: Mark fetch stage as active if it faults. Otherwise if the rest of the pipeline is idle then fault will never propagate to commit to be handled, causing CPU to deadlock. |
6658:f4de76601762 |
23-Sep-2009 |
Nathan Binkert <nate@binkert.org> |
arch: nuke arch/isa_specific.hh and move stuff to generated config/the_isa.hh |
6654:4c84e771cca7 |
22-Sep-2009 |
Nathan Binkert <nate@binkert.org> |
python: Move more code into m5.util allow SCons to use that code. Get rid of misc.py and just stick misc things in __init__.py Move utility functions out of SCons files and into m5.util Move utility type stuff from m5/__init__.py to m5/util/__init__.py Remove buildEnv from m5 and allow access only from m5.defines Rename AddToPath to addToPath while we're moving it to m5.util Rename read_command to readCommand while we're moving it Rename compare_versions to compareVersions while we're moving it. |
6429:7ed8937e375a |
02-Aug-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Fix setting of INST_FETCH flag for O3 CPU. It's still broken in inorder. Also enhance DPRINTFs in cache and physical memory so we can see more easily whether it's getting set or not. |
6387:70172be3f986 |
25-Jul-2009 |
Korey Sewell <ksewell@umich.edu> |
o3-smt: enforce numThreads parameter for SMT SE mode |
6331:d947798df4a1 |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Get rid of the unused get(Data|Inst)Asid and (inst|data)Asid functions. |
6329:5d8b91875859 |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Add a registers.hh file as an ISA switched header. This file is for register indices, Num* constants, and register types. copyRegs and copyMiscRegs were moved to utility.hh and utility.cc. |
6314:781969fbeca9 |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Get rid of the float register width parameter. |
6313:95f69a436c82 |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Add an ISA object which replaces the MiscRegFile. This object encapsulates (or will eventually) the identity and characteristics of the ISA in the CPU. |
6226:f1076450ab2b |
05-Jun-2009 |
Nathan Binkert <nate@binkert.org> |
move: put predictor includes and cc files into the same place |
6221:58a3c04e6344 |
26-May-2009 |
Nathan Binkert <nate@binkert.org> |
types: add a type for thread IDs and try to use it everywhere |
6216:2f4020838149 |
17-May-2009 |
Nathan Binkert <nate@binkert.org> |
includes: sort includes again |
6214:1ec0ec8933ae |
17-May-2009 |
Nathan Binkert <nate@binkert.org> |
types: Move stuff for global types into src/base/types.hh |
6184:c947586b3d9e |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-o3: allow both to compile together allow InOrder and O3CPU to be compiled at the same time: need to make branch prediction filed shared by both models |
6180:1a8950d566ff |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-bpred: edits to handle non-delay-slot ISAs Changes so that InOrder can work for a non-delay-slot ISA like Alpha. Typically, changes have to do with handling misspeculated branches at different points in pipeline |
6102:7fbf97dc6540 |
20-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Mem: Change isLlsc to isLLSC. |
6076:e141cc7896ce |
19-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Memory: Rename LOCKED for load locked store conditional to LLSC. |
6036:f0841ee466a5 |
18-Apr-2009 |
Korey Sewell <ksewell@umich.edu> |
o3-delay-slot-bpred: fix decode stage handling of uncdtl. branches.\n decode stage was not setting the predicted PC correctly or passing that information back to fetch correctly |
6034:fc2e234b4404 |
17-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
o3, inorder: fix FS bug due to initializing ThreadState to Halted. For some reason o3 FS init() only called initCPU if the thread state was Suspended, which was no longer the case. There's no apparent reason to check, so I whacked the test completely rather than changing the check to Halted. The inorder init() was also updated to be symmetric, though the previous code was just a fancy no-op. |
6033:f1a9f7f6e7c6 |
16-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
o3: handle fetch with no active threads correctly. This situation can arise now on the first fetch cycle after the last active thread is halted. It seems easy enough to deal with when it happens rather than trying to avoid it. |
6032:e5c792a67b3d |
16-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
o3: fix {read,set}ArchFloatReg* functions. Register indices were not being calculated properly. |
6031:be16ad28822f |
15-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
ThreadState: initialize status to Halted in constructor. This provides a common initial status for all threads independent of CPU model (unlike the prior situation where CPUs initialized threads to inconsistent states). This mostly matters for SE mode; in FS mode, ISA-specific startupCPU() methods generally handle boot-time initialization of thread contexts (since the right thing to do is ISA-dependent). |
6029:007c36616f47 |
15-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Get rid of the Unallocated thread context state. Basically merge it in with Halted. Also had to get rid of a few other functions that called ThreadContext::deallocate(), including: - InOrderCPU's setThreadRescheduleCondition. - ThreadContext::exit(). This function was there to avoid terminating simulation when one thread out of a multi-thread workload exits, but we need to find a better (non-cpu-centric) way. |
6023:47b4fcb10c11 |
09-Apr-2009 |
Nathan Binkert <nate@binkert.org> |
tlb: More fixing of unified TLB |
6022:410194bb3049 |
09-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
tlb: Don't separate the TLB classes into an instruction TLB and a data TLB |
6020:0647c8b31a99 |
06-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Merge ARM into the head. ARM will compile but may not actually work. |
6005:1dc178e53487 |
07-Mar-2009 |
Nathan Binkert <nate@binkert.org> |
stats: fix duplicate statistics names. This generally requires providing a more meaningful name() function for a class. |
5999:3cf8e71257e0 |
05-Mar-2009 |
Nathan Binkert <nate@binkert.org> |
stats: Fix all stats usages to deal with template fixes |
5982:de47df436ace |
04-Mar-2009 |
Steve Reinhardt <stever@gmail.com> |
O3: Make numThreads error message more helpful. |
5958:2d9737bf3c2f |
27-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Processes: Make getting and setting system call arguments part of a process object. |
5953:899ecfbce5af |
26-Feb-2009 |
Ali Saidi <saidi@eecs.umich.edu> |
CPA: Add code to automatically record function symbols as CPU executes. |
5891:73084c6bb183 |
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
ISA: Replace the translate functions in the TLBs with translateAtomic. |
5890:bdef71accd68 |
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Get rid of translate... functions from various interface classes. |
5865:54ed46881217 |
10-Feb-2009 |
Korey Sewell <ksewell@umich.edu> |
CPU: Prepare CPU models for the new in-order CPU model. Some new functions and forward declarations are necessary to make things work |
5807:57f9f8b8e62f |
24-Jan-2009 |
Nathan Binkert <nate@binkert.org> |
cpu: provide a wakeup mechanism that can be used to pull CPUs out of sleep. Make interrupts use the new wakeup method, and pull all of the interrupt stuff into the cpu base class so that only the wakeup code needs to be updated. I tried to make wakeup, wakeCPU, and the various other mechanisms for waking and sleeping a little more sane, but I couldn't understand why the statistics were changing the way they were. Maybe we'll try again some day. |
5804:34fe9bbc6705 |
21-Jan-2009 |
Nathan Binkert <nate@binkert.org> |
o3cpu: give a name to the activity recorder for better tracing |
5803:aae3d7089925 |
19-Jan-2009 |
Nathan Binkert <nate@binkert.org> |
thread_context: move getSystemPtr so SE mode can get to it. There was really no reason that it should be FS only. |
5769:e53bdd0e4bf1 |
06-Dec-2008 |
Nathan Binkert <nate@binkert.org> |
eventq: use the flags data structure |
5737:f43dbc09fad3 |
10-Nov-2008 |
Clint Smullen <cws3k@cs.virginia.edu> |
O3CPU: Make the instcount debugging stuff per-cpu. This is to prevent the assertion from firing if you have a large multicore. Also make sure that it's not compiled in when NDEBUG is defined |
5715:e8c1d4e669a7 |
04-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
get rid of all instances of readTid() and getThreadNum(). Unify and eliminate redundancies with threadId() as their replacement. |
5714:76abee886def |
02-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
Add in Context IDs to the simulator. From now on, cpuId is almost never used, the primary identifier for a hardware context should be contextId(). The concept of threads within a CPU remains, in the form of threadId() because sometimes you need to know which context within a cpu to manipulate. |
5712:199d31b47f7b |
02-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
make BaseCPU the provider of _cpuId, and cpuId() instead of being scattered across the subclasses. generally make it so that member data is _cpuId and accessor functions are cpuId(). The ID val comes from the python (default -1 if none provided), and if it is -1, the index of cpuList will be given. this has passed util/regress quick and se.py -n4 and fs.py -n4 as well as standard switch. |
5707:da86e00f87a0 |
23-Oct-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
s/cpu_id/cpuId in o3 (to be consistent and match style), also fix some typos in comments. |
5704:98224505352a |
21-Oct-2008 |
Nathan Binkert <nate@binkert.org> |
style: Use the correct m5 style for things relating to interrupts. |
5702:bf84e2fa05f7 |
20-Oct-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
O3CPU: Undo Gabe's changes to remove hwrei and simpalcheck from O3 CPU. Removing hwrei causes the instruction after the hwrei to be fetched before the ITB/DTB_CM register is updated in a call pal call sys and thus the translation fails because the user is attempting to access a super page address.
Minimally, it seems as though some sort of fetch stall or refetch after a hwrei is required. I think this works currently because the hwrei uses the exec context interface, and the o3 stalls when that occurs.
Additionally, these changes don't update the LOCK register and probably break ll/sc. Both o3 changes were removed since a great deal of manual patching would be required to only remove the hwrei change. |
5668:5b5a9f4203d1 |
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
Get rid of old RegContext code. |
5647:b06b49498c79 |
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
Turn Interrupts objects into SimObjects. Also, move local APIC state into x86's Interrupts object. |
5640:c811ced9efc1 |
11-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Eliminate the simPalCheck funciton. |
5639:67cc7f0427e7 |
11-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Eliminate the hwrei function. |
5606:6da7a58b0bc8 |
09-Oct-2008 |
Nathan Binkert <nate@binkert.org> |
eventq: convert all usage of events to use the new API. For now, there is still a single global event queue, but this is necessary for making the steps towards a parallelized m5. |
5597:e2983d751be4 |
09-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Generaize the O3 IMPL class so it isn't split out by ISA. |
5596:cdc8893c649e |
09-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Generaize the O3 dynamic instruction class so it isn't split out by ISA. |
5595:6ebdae3f619b |
09-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Generalize the O3 CPU object so it isn't split out by ISA. |
5570:13592d41f290 |
28-Sep-2008 |
Nathan Binkert <nate@binkert.org> |
gcc: Add extra parens to quell warnings. Even though we're not incorrect about operator precedence, let's add some parens in some particularly confusing places to placate GCC 4.3 so that we don't have to turn the warning off. Agreed that this is a bit of a pain for those users who get the order of operations correct, but it is likely to prevent bugs in certain cases. |
5557:03c186e416aa |
26-Sep-2008 |
Kevin Lim <ktlim@umich.edu> |
O3CPU: Fix thread writeback logic. Fix the logic in the LSQ that determines if there are any stores to write back. In the commit stage, check for thread specific writebacks instead of just any writeback. |
5556:c9f52fae6b37 |
26-Sep-2008 |
Kevin Lim <ktlim@umich.edu> |
O3CPU: Add a hack to ensure that nextPC is set correctly after syscalls. Just check CPU's nextPC before and after syscall and if it changes, update this instruction's nextPC because the syscall must have changed the nextPC. |
5553:de0fa35df4cb |
22-Sep-2008 |
Nathan Binkert <nate@binkert.org> |
gcc: Version 4.3 is pretty anal about shadowing types, placate it. In the future, it would be nice to put the O3CPU into its own namespace so that we don't end up hardcoding pointers to the global namespace. |
5543:3af77710f397 |
10-Sep-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
style: Remove non-leading tabs everywhere they shouldn't be. Developers should configure their editors to not insert tabs |
5536:17c0c17726ff |
18-Aug-2008 |
Richard Strong<rstrong@hp.com> |
Changed BaseCPU::ProfileEvent's interval member to be of type Tick. This was done to be consistent with its python type of a latency. In addition, the multiple definitions of profile in the different cpu models caused problems for intialization of the interval value. If a child class's profile value was defined, the parent BaseCPU::ProfileEvent interval field would be initialized with a garbage value. The fix was to remove the multiple redifitions of profile in the child CPU classes. |
5529:9ae69b9cd7fd |
11-Aug-2008 |
Nathan Binkert <nate@binkert.org> |
params: Convert the CPU objects to use the auto generated param structs. A whole bunch of stuff has been converted to use the new params stuff, but the CPU wasn't one of them. While we're at it, make some things a bit more stylish. Most of the work was done by Gabe, I just cleaned stuff up a bit more at the end. |
5499:8bfc7650c344 |
01-Jul-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
Remove delVirtPort() and make getVirtPort() only return cached version. |
5497:89a6483d7047 |
01-Jul-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
Make the cached virtPort have a thread context so it can do everything that a newly created one can. |
5494:85c8d296c1cb |
28-Jun-2008 |
Steve Reinhardt <stever@gmail.com> |
Backed out changeset 94a7bb476fca: caused memory leak. |
5489:94a7bb476fca |
21-Jun-2008 |
Steve Reinhardt <stever@gmail.com> |
Generate more useful error messages for unconnected ports. Force all non-default ports to provide a name and an owner in the constructor. |
5386:5614618f4027 |
24-Mar-2008 |
Steve Reinhardt <stever@gmail.com> |
Don't FastAlloc MSHRs since we don't allocate them on the fly. |
5364:66d1251b7ae6 |
27-Feb-2008 |
Korey Sewell <ksewell@umich.edu> |
Add comments in code to describe bug conditions. This should help if somebody gets to the bug fix before me (or someone else)... |
5363:c474cb7a2b9c |
27-Feb-2008 |
Korey Sewell <ksewell@umich.edu> |
Fix Load/Store Queue squashing after a SMT thread is removed but ensuring you are squashing from the current instruction # causing the thread exit. |
5362:0adba9a562c9 |
27-Feb-2008 |
Korey Sewell <ksewell@umich.edu> |
Fix offset in removeThread() function so that float registers start freeing up from the right point (#32 usually) instead of restarting at 0 and double-freeing.
Commented out assert line in free_list.hh that will check for when double-free condition goes bad. |
5358:e9acb84bbafb |
26-Feb-2008 |
Gabe Black <gblack@eecs.umich.edu> |
TLB: Make a TLB base class and put a virtual demapPage function in it. |
5336:c7e21f4e5a2e |
06-Feb-2008 |
Stephen Hines <hines@cs.fsu.edu> |
Make the Event::description() a const function |
5335:69d45f5f21a2 |
05-Feb-2008 |
Stephen Hines <hines@cs.fsu.edu> |
Add base ARM code to M5 |
5327:3390941f0643 |
14-Jan-2008 |
Ke Meng <mengke97@hotmail.com> |
The reason is that the event is supposed to put the instructions ready to execute for next cycle. And the FUCompletion event has a lower priority than CPU tick event. It is called after the iew->tick() for current cycle has already been executed and the issueToExecuteQueue has already advanced this time. And assume the issueToExecuteLatency is 1, to catch up, the increasement should be made at access(-1) instead of access(0). Otherwise I found it could increase the actual op_latency of the instructions to execute by 1 cycle and potentially put the simulated CPU into a permanent idle state.
Signed-off by: Ali Saidi <saidi@eecs.umich.edu> |
5314:e902f12a3af1 |
02-Jan-2008 |
Steve Reinhardt <stever@gmail.com> |
Add functional PrintReq command for memory-system debugging. |
5261:faf87a7e3ef8 |
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
add thread id to misc. reg functions |
5259:74ef5093154f |
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
add microPC stuff back in. got deleted on changeset propragation somehow. |
5258:fcccd87d5178 |
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
put the flattenIndex stuff back in O3 AND put fatal() back in faults |
5250:42577371ff31 |
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
Get MIPS simple regression working. Take out unecessary functions "setShadowSet", "CacheOp" |
5249:49d44a466496 |
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
branch merge |
5236:0050ad4fb3ef |
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement a page table walker. |
5235:f07f46843886 |
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make the micropc available through the thread context objects. This is necssary for fault handlers that branch to non-zero micro PCs. |
5222:bb733a878f85 |
13-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
Add in files from merge-bare-iron, get them compiling in FS and SE mode |
5215:68f719ce5496 |
06-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Remove unneeded variable. |
5192:582e583f8e7e |
31-Oct-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Traceflags: Add SCons function to created a traceflag instead of having one file with them all. |
5110:4a6ab0f8cf33 |
02-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Make the cpuid parameter get set in SE mode as well. |
5108:3b59ba14a7f3 |
02-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Make the cpus check the pc event queues in SE mode. |
5104:cb14dda4d8fc |
02-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Make sure the system parameter gets set in the cpu builders. Other parameters need to be fixed as well. |
5100:7a0180040755 |
28-Sep-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Rename cycles() function to ticks() |
5099:8ff1345b3ae4 |
28-Sep-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Update statistics to use cycles properly instead of ticks |
5082:82dd253231c8 |
19-Sep-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Put in the foundation for x87 stack based fp registers. |
5034:6186ef720dd4 |
30-Aug-2007 |
Miles Kaufmann <milesck@eecs.umich.edu> |
params: Deprecate old-style constructors; update most SimObject constructors.
SimObjects not yet updated: - Process and subclasses - BaseCPU and subclasses
The SimObject(const std::string &name) constructor was removed. Subclasses that still rely on that behavior must call the parent initializer as : SimObject(makeParams(name)) |
5018:21795007349e |
27-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head. |
5012:c0a28154d002 |
27-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head |
4997:e7380529bd2d |
26-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Address Translation: Make SE mode use an actual TLB/MMU for translation like FS. |
4991:7e3bb2eabbbf |
13-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Set up the predicted npc and nnpc for a fault carrying noop so that it doesn't cause a false branch mispredict. |
4988:5b26eba4283f |
13-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Move the "translate" member functions back into the base o3 class. |
4986:b7c82ad6b3ef |
24-Aug-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Mem: Make errors in the memory system be responses, not requests. Fixes cache handling of error responses. |
4985:9f577f468009 |
21-Aug-2007 |
Kevin Lim <ktlim@umich.edu> |
o3: Fix for retry ID bug. It should be cleared prior to the call to recvRetry. Add extra DPRINTF statement for clearer debugging output. |
4928:951bd17db218 |
29-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Merge Gabe's changes from head. |
4918:3214e3694fb2 |
27-Jul-2007 |
Nathan Binkert <nate@binkert.org> |
Merge python and x86 changes with cache branch |
4909:f3b84a9b5c5a |
23-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Fix WriteReq/StoreCondReq setting in O3. |
4895:d36959284fbc |
15-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Fix up a bunch of multilevel coherence issues. Atomic mode seems to work. Timing is closer but not there yet. |
4878:5b747482d2d8 |
30-Jun-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Make CPU models use new LoadLockedReq/StoreCondReq commands. |
4873:b135f6e6adfe |
30-Jun-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Event descriptions should not end in "event" (they function as adjectives not nouns) |
4870:fcc39d001154 |
30-Jun-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Get rid of Packet result field. Error responses are now encoded in cmd field. |
4800:910dde7af74f |
30-Jul-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Fix problem with tracer not being initialized. |
4776:8c8407243a2c |
28-Jul-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Turn the instruction tracing code into pluggable sim objects. These need to be refined a little still and given parameters. |
4772:f08370a81812 |
27-Jul-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Fix argument register indexing. Code was assuming that all argument registers followed in order from ArgumentReg0. There is now an ArgumentReg array which is indexed to find the right index. There is a constant, NumArgumentRegs, which can be used to protect against using an invalid ArgumentReg. |
4762:c94e103c83ad |
24-Jul-2007 |
Nathan Binkert <nate@binkert.org> |
Major changes to how SimObjects are created and initialized. Almost all creation and initialization now happens in python. Parameter objects are generated and initialized by python. The .ini file is now solely for debugging purposes and is not used in construction of the objects in any way. |
4673:833d4a116810 |
28-Jun-2007 |
Korey Sewell <ksewell@umich.edu> |
o3cpu build for mips |
4656:dbfa364feec8 |
21-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into ahchoo.blinky.homelinux.org:/home/gblack/m5/newmem-o3-micro
src/cpu/o3/fetch_impl.hh: hand merge |
4654:225cc048edfa |
20-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Fix compiler errors. |
4653:19f884e6a48b |
19-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into doughnut.hpl.hp.com:/home/gblack/newmem-o3-micro
src/cpu/base_dyn_inst_impl.hh: src/cpu/o3/fetch_impl.hh: Hand merge |
4652:cead97b41680 |
12-May-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make sure all addresses used in syscalls are truncated to 32 bits. Actually -all- arguements are truncated to 32 bits, but we should be able to get away with it. |
4650:bb9977571ff4 |
09-May-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into doughnut.mwconnections.com:/home/gblack/newmem-o3-micro |
4644:4e77ab0671e8 |
23-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/wexford/x/gblack/m5/newmem-o3-spec |
4642:d7b2de2d72f1 |
22-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make the floating point zero register special handling only apply for ALPHA. |
4638:e181f5b0ebca |
15-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make an inner loop which pulls microops out of macroops. These aren't checked for control flow because we can pull out microops until we run out of buffer. This prevents microops from being interpretted as branches because the pc doesn't become npc. |
4637:d3adce1577fd |
15-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Add extra constructors to Alpha and MIPS |
4636:afc8da9f526e |
14-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Add support for microcode and pull out the special branch delay slot handling. Branch delay slots need to be squash on a mispredict as well because the nnpc they saw was incorrect. |
4632:be5b8f67b8fb |
13-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Remove most of the special handling for delay slots since they have to be squashed anyway on a mispredict. This is because the NNPC value they saw when executing was incorrect. |
4598:56adf2e778a8 |
20-Jun-2007 |
Nathan Binkert <binkertn@umich.edu> |
Don't do checker stuff if the checker is not defined |
4597:063f25d13229 |
20-Jun-2007 |
Nathan Binkert <binkertn@umich.edu> |
Make sure all parameters have default values if they're supposed to and make sure parameters have the right type. Also make sure that any object that should be an intermediate type has the right options set. |
4593:16b19397172c |
19-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make branches work by repopulating the predecoder every time through. This is probably fine as far as the predecoder goes, but the simple cpu might want to not refetch something it already has. That reintroduces the self modifying code problem though. |
4564:d1fb13424616 |
13-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Seperate the pc-pc and the pc of the incoming bytes, and get rid of the "moreBytes" which just takes a MachInst.
src/arch/x86/predecoder.cc: Seperate the pc-pc and the pc of the incoming bytes, and get rid of the "moreBytes" which just takes a MachInst. Also make the "opSize" field describe the number of bytes and not the log of the number of bytes. |
4551:c131b771a066 |
10-Jun-2007 |
Nathan Binkert <binkertn@umich.edu> |
Use the right type |
4517:626afdfa6ec9 |
01-Jun-2007 |
Nathan Binkert <binkertn@umich.edu> |
Fix typo so m5.fast will compile |
4513:ad010b9fb1dc |
01-Jun-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
don't generate trace data unless tracing is on |
4497:17e34dbcc8b3 |
30-May-2007 |
Nathan Binkert <binkertn@umich.edu> |
Fix cut-n-pasto to make the path correct |
4486:aaeb03a8a6e1 |
27-May-2007 |
Nathan Binkert <binkertn@umich.edu> |
Move SimObject python files alongside the C++ and fix the SConscript files so that only the objects that are actually available in a given build are compiled in. Remove a bunch of files that aren't used anymore. |
4475:fb185cc1c845 |
22-May-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Change getDeviceAddressRanges to use bool for snoop arg. |
4406:46f15e4eb062 |
26-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Remove extra delete that was causing segfault. |
4405:57af43e114b5 |
26-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Remove unnecessary check. |
4395:9acb011a6c35 |
21-Apr-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
fixes for solaris compile |
4392:271b73b42e34 |
22-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Use proper cycles for IPC and CPI equations.
src/cpu/o3/cpu.cc: Use proper cycles for these equations. |
4357:f8b2da607906 |
09-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/tmp/head |
4352:52f11aaf7d19 |
08-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Take into account that the flattened integer register space is a different size than the architected one. Also fixed some asserts. |
4350:c3f402102507 |
07-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Get the "hard" SPARC instructions working in o3. I don't like that the IsStoreConditional flag needs to be set for them because they aren't store conditional instructions, and I should fix the format code which is not handling the opt_flags correctly. |
4345:a95454b0e835 |
09-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Fix bug when blocking due to no free registers. |
4332:548ef28989b8 |
04-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into ahchoo.blinky.homelinux.org:/home/gblack/m5/newmem-o3-spec |
4331:e53c3a1aedad |
04-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Updates for other ISA cpu_builders. |
4329:52057dbec096 |
04-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Pass ISA-specific O3 CPU as a constructor parameter instead of using setCPU functions.
src/cpu/o3/alpha/cpu_impl.hh: Pass ISA-specific O3 CPU to FullO3CPU as a constructor parameter instead of using setCPU functions. |
4326:a9277254c1e4 |
03-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Made the "data" field of store queue entries into a character array. It's sized to match an IntReg which was what it used to be, but we might want to make it something architecture independent. All data is now endian converted before entering the store queue entries which simplifies store to load forwarding in "trans endian" simulations, and makes twin memory ops work.
src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: fixed twin memory operations. |
4319:b8eae8c6afcc |
03-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Fix a memory leak. Hopefully this fixes the longer running benchmarks. |
4318:eb4241362a80 |
02-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Remove/comment out DPRINTFs that were causing a segfault.
The removed ones were unnecessary. The commented out ones could be useful in the future, should this problem get fixed. See flyspray task #243.
src/cpu/o3/commit_impl.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/rob_impl.hh: Remove/comment out DPRINTFs that were causing a segfault. |
4317:99838c26f7be |
02-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Fix up SPARC's CPU builder to match changes to Alpha's CPU builder. |
4302:c45514c856b0 |
29-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Update code so that the O3 CPU can handle not initially having anything hooked up to its ports. This fixes the segfault Ali recently found when using sampling.
src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: Update code so that the O3 CPU can handle not initially having anything hooked up to its ports. |
4288:1fc3aa7ad095 |
25-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Update for new trace data behavior. |
4284:c8800319ed0c |
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/tmp/clean2
src/cpu/base_dyn_inst.hh: Hand merge. Line is no longer needed because it's handled in the ISA. |
4217:4c966fec2324 |
13-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
fix segfault when peer owner attempts to use functional port |
4202:f7a05daec670 |
11-Mar-2007 |
Nathan Binkert <binkertn@umich.edu> |
Rework the way SCons recurses into subdirectories, making it automatic. The point is that now a subdirectory can be added to the build process just by creating a SConscript file in it. The process has two passes. On the first pass, all subdirs of the root of the tree are searched for SConsopts files. These files contain any command line options that ought to be added for a particular subdirectory. On the second pass, all subdirs of the src directory are searched for SConscript files. These files describe how to build any given subdirectory. I have added a Source() function. Any file (relative to the directory in which the SConscript resides) passed to that function is added to the build. Clean up everything to take advantage of Source(). function is added to the list of files to be built. |
4192:7accc6365bb9 |
09-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Two fixes: 1. Make sure connectMemPorts() only gets called when the CPU's peer gets changed. This is done by making setPeer() virtual, and overriding it in the CPU's ports. When it gets called on a CPU's port (dcache specifically), it calls the normal setPeer() function, and also connectMemPorts(). 2. Consolidate redundant code that handles switching in a CPU.
src/cpu/base.cc: Move common code of switching over peers to base CPU. src/cpu/base.hh: Move common code of switching over peers to BaseCPU. src/cpu/o3/cpu.cc: Add in function that updates thread context's ports. Also use updated function to takeOverFrom() in BaseCPU. This gets rid of some repeated code. src/cpu/o3/cpu.hh: Include function to update thread context's memory ports. src/cpu/o3/lsq.hh: Add function to dcache port that will update the memory ports upon getting a new peer. Also include a function that will tell the CPU to update those memory ports. src/cpu/o3/lsq_impl.hh: Add function that will update the memory ports upon getting a new peer. src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: Add function that will update thread context's memory ports upon getting a new peer. Also use the new BaseCPU's take over from function. src/cpu/simple/atomic.hh: Add in function (and dcache port) that will allow the dcache to update memory ports when it gets assigned a new peer. src/cpu/simple/timing.hh: Add function that will update thread context's memory ports upon getting a new peer. src/mem/port.hh: Make setPeer virtual so that other classes can override it. |
4185:42c0395a03f9 |
07-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
I missed a couple of WithEffects, this should do it |
4182:5b2c0d266107 |
14-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make the predecoder an object with it's own switched header file. Start adding predecoding functionality to x86.
src/arch/SConscript: src/arch/alpha/utility.hh: src/arch/mips/utility.hh: src/arch/sparc/utility.hh: src/cpu/base.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/static_inst.hh: src/arch/alpha/predecoder.hh: src/arch/mips/predecoder.hh: src/arch/sparc/predecoder.hh: Make the predecoder an object with it's own switched header file. |
4181:6edaeff44647 |
13-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Replaced makeExtMI with predecode. Removed the getOpcode function from StaticInst which only made sense for Alpha. Started implementing the x86 predecoder. |
4172:141705d83494 |
07-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
*MiscReg->*MiscRegNoEffect, *MiscRegWithEffect->*MiscReg |
4167:ce5d0f62f13b |
06-Mar-2007 |
Nathan Binkert <binkertn@umich.edu> |
Move all of the parameters of the Root SimObject so they are directly configured by python. Move stuff from root.(cc|hh) to core.(cc|hh) since it really belogs there now. In the process, simplify how ticks are used in the python code. |
4149:3da926f8ea75 |
05-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Added an x86 dyninst |
4046:ef34b290091e |
10-Feb-2007 |
Nathan Binkert <binkertn@umich.edu> |
Clean up tracing stuff more, get rid of the trace log since its not all that useful. Fix a few bugs with python/C++ integration. |
4035:f80ad98b2304 |
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Updates for commit. 1. Move interrupt handling to a separate function to clean up main commit() function a bit. Also gate the function call off properly based on whether or not there are outstanding interrupts, and the system is not in PAL mode. 2. Better handling of updating instruction's status bits. Instructions are not marked "atCommit" until other stages view it (pushed off to IEW/IQ), and they have been properly handled (faults). 3. Don't consider the ROB "empty" for the purpose of other stages until the ROB is empty, all stores have written back, and there was no store commits this cycle. The last is necessary in case a store committed, in which case it would look like all stores have written back but in actuality have not.
src/cpu/o3/commit.hh: Slightly modify how interrupts are handled. Also include some extra bools to keep track of state properly. src/cpu/o3/commit_impl.hh: Slightly modify how interrupts are handled. Also include some extra bools to keep track of state.
General correctness updates, most specifically for when commit broadcasts to other stages that the ROB is empty. |
4033:7bb1223f9645 |
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Handle status bits a little better, as well as non-speculative instructions.
src/cpu/o3/iew_impl.hh: Allow for slightly more flexible handling of non-speculative instructions. They can be other classes now, such as loads or stores.
Also be sure to clear the state associated with squashes that are not used. i.e. if a squash due to a memory ordering violation happens on the same cycle as an older branch squashing, clear the state associated with the memory ordering violation.
Lastly don't consider uncached loads to officially be "at commit" until IEW receives the signal back from commit about the load. src/cpu/o3/inst_queue_impl.hh: Don't consider non-speculative instructions to be "at commit" until the IQ has received a signal from commit about the instruction. This prevents non-speculative instructions from being issued too early. src/cpu/o3/mem_dep_unit_impl.hh: Clear instruction's ability to issue if it's replayed. |
4032:8b987a6a2afc |
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Two fixes: 1. Requests are handled more properly now. They assume the memory system takes control of the request upon sending out an access. 2. load-load ordering is maintained.
src/cpu/base_dyn_inst.hh: Update how requests are handled. The BaseDynInst should not be able to hold a pointer to the request because the request becomes owned by the memory system once it is sent out.
Also include some functions to allow certain status bits to be cleared. src/cpu/base_dyn_inst_impl.hh: Update how requests are handled. The BaseDynInst should not be able to hold a pointer to the request because the request becomes owned by the memory system once it is sent out. src/cpu/o3/fetch_impl.hh: General correctness fixes. retryPkt is not necessarily always set, so handle it properly. Also consider the cache unblocked only when recvRetry is called. src/cpu/o3/lsq_unit.hh: Handle requests a little more correctly. Now that the requests aren't pointed to by the DynInst, be sure to delete the request if it's not being used by the memory system.
Also be sure to not store-load forward from an uncacheable store. src/cpu/o3/lsq_unit_impl.hh: Check to make sure load-load ordering was maintained.
Also handle requests a little more correctly. |
4030:4046b2213995 |
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
A couple of minor fixes. 1. Set CPU ID in all modes for the O3 CPU. 2. Use nextCycle() function to prevent phase drift in O3 CPU. 3. Remove assertion in rename map that is no longer true.
src/cpu/o3/alpha/cpu_builder.cc: Allow for CPU id in all modes, not just full system. Also include a parameter that was left out by accident. src/cpu/o3/alpha/cpu_impl.hh: Set the CPU ID properly. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: Use nextCycle() function so that the CPU does not get out of phase when starting up from quiesces. src/cpu/o3/rename_map.cc: Remove assertion that is no longer true. tests/configs/o3-timing.py: Set CPU's id to 0. |
4022:c422464ca16e |
07-Feb-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Make memory commands dense again to avoid cache stat table explosion. Created MemCmd class to wrap enum and provide handy methods to check attributes, convert to string/int, etc. |
3984:8f1bb70a4abf |
29-Jan-2007 |
Gabe Black <gblack@eecs.umich.edu> |
A minor hack to get branch prediction to behave like before on Alpha. |
3983:87619a68b7ba |
29-Jan-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Fixed a warning about an unused variable. |
3980:9bcb2a2e9bb8 |
27-Jan-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zower.eecs.umich.edu:/eecshome/m5/newmem
src/arch/sparc/isa/formats/mem/util.isa: src/arch/sparc/isa_traits.hh: src/arch/sparc/system.cc: Hand Merge |
3970:d54945bab95d |
03-Jan-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zower.eecs.umich.edu:/eecshome/m5/newmem |
3969:77957f66c1d5 |
28-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fixes to get non-delay slot ISAs (Alpha) working again, and pulling some debug output out of ifdefs. |
3968:0a08763926a1 |
28-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Phased out DelaySlotInfo. |
3967:1f1dff08a596 |
28-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Some fixes for decode stage branches without delay slots. This will need some work to be compatible with delay slots too. Also changed some direct variable uses to use an accessor function. |
3966:e589d0a642f5 |
28-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Make sure the value of PC is actually updated now that the instruction target isn't set explicitly. |
3965:b4cab77371ed |
28-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Implement a stub nnpc for alpha that is read only as npc+4. |
3962:18329efc47b8 |
20-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fixes to get MIPS_SE to compile. |
3961:42374ae36922 |
20-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fixes to get ALPHA_FS and ALPHA_SE to compile again. |
3958:58d09260d073 |
18-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fix a place where the wrong width parameter was used, and set the nextNPC correctly on memory squashes. |
3957:37329de528a9 |
18-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Make sure you only handle branch delay slots specially when there actually was a branch. |
3953:300d526414e6 |
17-Dec-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Convert Alpha (and finish converting MIPS) to new InstObjParam interface.
src/arch/alpha/isa/branch.isa: src/arch/alpha/isa/fp.isa: src/arch/alpha/isa/int.isa: src/arch/alpha/isa/main.isa: src/arch/alpha/isa/mem.isa: src/arch/alpha/isa/pal.isa: src/arch/mips/isa/formats/mem.isa: src/arch/mips/isa/formats/util.isa: Get rid of CodeBlock calls to adapt to new InstObjParam interface. src/arch/isa_parser.py: Check template code for operands (in addition to snippets). src/cpu/o3/alpha/dyn_inst.hh: Add (read|write)MiscRegOperand calls to Alpha DynInst. |
3949:b6664282d899 |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zower.eecs.umich.edu:/eecshome/m5/newmem
src/arch/isa_parser.py: src/arch/sparc/isa/formats/mem/basicmem.isa: src/arch/sparc/isa/formats/mem/blockmem.isa: src/arch/sparc/isa/formats/mem/util.isa: src/arch/sparc/miscregfile.cc: src/arch/sparc/miscregfile.hh: src/cpu/o3/iew_impl.hh: Hand Merge |
3923:a8ce86366fd3 |
26-Jan-2007 |
Lisa Hsu <hsul@eecs.umich.edu> |
eliminate cpu checkInterrupts bool, it is redundant and unnecessary. |
3884:cc52005408ef |
30-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix up previous commit to proper logic.
src/cpu/o3/commit_impl.hh: Oops, changed the logic a little bit. Fix it up to how it used to be. |
3876:127c71cfe21a |
26-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Remove some #if FULL_SYSTEMs so MP stuff works even in SE mode. |
3870:fc7a16797788 |
22-Dec-2006 |
Nathan Binkert <binkertn@umich.edu> |
style |
3867:807483cfab77 |
21-Dec-2006 |
Nathan Binkert <binkertn@umich.edu> |
don't use (*activeThreads).begin(), use activeThreads->blah(). Also don't call (*activeThreads).end() over and over. Just call activeThreads->end() once and save the result. Make sure we always check that there are elements in the list before we grab the first one. |
3859:9278f759e55c |
21-Dec-2006 |
Nathan Binkert <binkertn@umich.edu> |
<scold> Make sure that variables are always initalized! </scold> |
3846:a0fe3210ce53 |
15-Dec-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
little fixes i noticed while searching for reason for address range issues (but these weren't the cause of the problem).
RangeSize as a function takes a start address, and a SIZE, and will make the range (start, start+size-1) for you.
src/cpu/memtest/memtest.hh: src/cpu/o3/fetch.hh: src/cpu/o3/lsq.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/simple/atomic.hh: src/cpu/simple/timing.hh: Fix RangeSize arguments src/dev/alpha/tsunami_cchip.cc: src/dev/alpha/tsunami_io.cc: src/dev/alpha/tsunami_pchip.cc: src/dev/baddev.cc: pioSize indicates SIZE, not a mask |
3803:031d9d1b3924 |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Switch the endianness of data that's forwarded. This is the same sort of problem that was happening when stores went all the way to memory and back. |
3802:e8f55dfb0f56 |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Make fetch detect when a branch is happening, rather than trying to compute when. |
3800:31469c190b22 |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Don't have "predict" set the predicted target of the instruction. Do that explicitly when you use predict. |
3798:ec59feae527b |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Add in capability to return to unblocking after a squash. This is needed because if you don't squash -all- the instructions, you need to keep clearing out whatever is left in the skid buffer. |
3797:9b58fa5ccaf5 |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Make sure endian conversion is done on the memory data when it's just set to an existing buffer. |
3796:9cb1eaf3a461 |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Make the decoder use the new setup in the dyninsts for branch prediction. |
3795:60ecc96c3cee |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Made branch delay slots get squashed, and passed back an NPC and NNPC to start fetching from. |
3792:dae368e56d0e |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Changes to the isa_parser and affected files to fix an indexing problem with split execute instructions and miscregs aliasing with integer registers.
src/arch/isa_parser.py: Rearranged things so that classes with more than one execute function treat operands properly. 1. Eliminated the CodeBlock class 2. Created a SubOperandList 3. Redefined how InstObjParams is constructed
To define an InstObjParam, you can either pass in a single code literal which will be named "code", or you can pass in a dictionary of code snippets which will be substituted into the Templates. In order to get this to work, there is a new restriction that each template has only one function in it. These changes should only affect memory instructions which have regular and split execute functions.
Also changed the MiscRegs so that they use the instrunctions srcReg and destReg arrays. src/arch/sparc/isa/formats/basic.isa: src/arch/sparc/isa/formats/branch.isa: src/arch/sparc/isa/formats/integerop.isa: src/arch/sparc/isa/formats/mem/basicmem.isa: src/arch/sparc/isa/formats/mem/blockmem.isa: src/arch/sparc/isa/formats/mem/util.isa: src/arch/sparc/isa/formats/nop.isa: src/arch/sparc/isa/formats/priv.isa: src/arch/sparc/isa/formats/trap.isa: Rearranged to work with new InstObjParam scheme. src/cpu/o3/sparc/dyn_inst.hh: Added functions to access the miscregs using the indexes from instructions srcReg and destReg arrays. Also changed the names of the other accessors so that they have the suffix "Operand" if they use those arrays. src/cpu/simple/base.hh: Added functions to access the miscregs using the indexes from instructions srcReg and destReg arrays. |
3791:f1783bae1afe |
12-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem/ into zower.eecs.umich.edu:/eecshome/m5/newmem |
3789:9ce219516b5d |
07-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Compilation fixes |
3788:5c804ea5cc48 |
07-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fix for squashing during a serializing instruction. |
3785:e863df7f4630 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Use the renamed register index, rather than the flattened one. |
3784:edc6cff4cbc1 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Got rid of some typedefs and moved the tlbs into the base o3 cpu. |
3783:cd831e0ab049 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Use the setSyscallReturn defined in arch rather than duplicating it here. |
3782:6a52c6c1b8b4 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Moved the RegIdx arrays to the base dyninst. |
3781:b00795985f07 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Got rid of some typedefs, moved the tlbs to the base o3 cpu, and called the architecture defined setSyscallReturn function instead of a duplicate copy.
src/cpu/o3/alpha/cpu.hh: Got rid of some typedefs, and moved the tlbs to the base o3 cpu. src/cpu/o3/alpha/thread_context.hh: src/cpu/o3/cpu.cc: Moved the tlbs to the base o3 cpu. |
3778:ac52cbef744c |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zower.eecs.umich.edu:/eecshome/m5/newmem
src/cpu/o3/commit_impl.hh: Hand Merge |
3777:2a232a230370 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Added a DPRINTF to print out the actual value pulled from memory. |
3776:4f88e76d8ebe |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Flattening and syscallReturn fixes
src/cpu/o3/thread_context_impl.hh: Use flattened indices src/cpu/simple_thread.hh: Use flattened indices, and pass a thread context to setSyscallReturn rather than a register file. src/cpu/thread_context.hh: The SyscallReturn class is no longer in arch/syscallreturn.hh |
3775:ced38affb6b1 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Don't panic, but this needs to be fixed. |
3774:13180c61fe86 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Make syscalls flatten their register indices, and also call into the ISA's setSyscallReturn function rather than having a duplicated one. |
3773:61c53465193d |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Change rename to rename the flattened register index instead of the architectural one. |
3772:71cccab4eff8 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Added in endianness conversion on memory accesses as the data goes out. This will break the checker! |
3771:808a4c19cf34 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Change how optional delay slot instructions are detected and squashed. |
3770:422aa205500a |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Get rid of some typedefs which were hardly used, and move some stuff back here that shouldn't be in the architecture specific DynInst classes. |
3760:a4fadb8ef046 |
24-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Initial changes to get O3 working with SPARC
src/arch/sparc/process.cc: MachineBytes doesn't exist any more. src/arch/sparc/regfile.cc: Add in the miscRegFile for good measure. src/cpu/o3/isa_specific.hh: Add in a section for SPARC src/cpu/o3/sparc/cpu.cc: src/cpu/o3/sparc/cpu.hh: src/cpu/o3/sparc/cpu_builder.cc: src/cpu/o3/sparc/cpu_impl.hh: src/cpu/o3/sparc/dyn_inst.cc: src/cpu/o3/sparc/dyn_inst.hh: src/cpu/o3/sparc/dyn_inst_impl.hh: src/cpu/o3/sparc/impl.hh: src/cpu/o3/sparc/params.hh: src/cpu/o3/sparc/thread_context.cc: src/cpu/o3/sparc/thread_context.hh: Sparc version of this file. |
3735:86a7cf4dcc11 |
12-Dec-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Rename the StaticInst-based (read|set)(Int|Float)Reg methods to (read|set)(Int|Float)RegOperand to distinguish from non-StaticInst version. |
3732:e84a6e9ebd3d |
12-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Allow for multiple redirects to happen on a single cycle (only the one for the oldest instruction is passed on to commit).
This fixes a minor bug when multiple FU completions come back out of order (due to the order in which the FUs are freed up), and the oldest redirect isn't recorded properly. The eon benchmark should run now.
src/cpu/o3/iew_impl.hh: Allow for multiple redirects to happen on a single cycle (only the one for the oldest instruction is passed on to commit). |
3731:4cd483eb6f16 |
11-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix up in case a req hasn't yet been generated for this instruction (if there was a fault prior to translation). |
3730:6ccb47795cd5 |
11-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for fetch to use the icache's block size to generate proper access size. |
3708:b174ae14f007 |
06-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for MIPS_SE/m5.fast compile. |
3698:0aa0884a9040 |
02-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes for MIPS_SE compiling. Regressions seem to work, but Korey should make sure these changes (commit especially) work okay.
src/cpu/o3/commit_impl.hh: src/cpu/o3/fetch_impl.hh: Fixes for MIPS_SE compile. |
3686:fa8d8b90cd8a |
29-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Change the connecting of the physPort and virtPort to the memory object below the CPU to happen every time activateContext is called. The overhead is probably a little higher than necessary, but allows these connections to properly be made when there are CPUs that are inactive until they are switched in.
Right now this introduces a minor memory leak as old physPorts and virtPorts are not deleted when new ones are created. A flyspray task has been created for this issue. It can not be resolved until we determine how the bus will handle giving out ID's to functional ports that may be deleted.
src/cpu/o3/cpu.cc: src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: Change the setup of the physPort and virtPort to instead happen every time the CPU has a context activated. This is a little high overhead, but keeps it working correctly when the CPU does not have a physical memory attached to it until it switches in (like the case of switch CPUs). src/cpu/o3/thread_context.hh: Change function from being called at init() to just being called whenever the memory ports need to be connected. src/cpu/o3/thread_context_impl.hh: Update this to not delete the port if it's the same as the virtPort. src/cpu/thread_context.hh: Change function from being called at init() to whenever the memory ports need to be connected. src/cpu/thread_state.cc: Instead of initializing the ports, simply connect them, deleting any old ports that might exist. This allows these functions to be called multiple times. src/cpu/thread_state.hh: Ports are no longer initialized, but rather connected at context activation time. |
3675:dc883b610345 |
19-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Update Virtual and Physical ports.
src/cpu/o3/alpha/cpu_impl.hh: Handle the PhysicalPort and VirtualPort in the ThreadState. src/cpu/o3/cpu.cc: Initialize the thread context. src/cpu/o3/thread_context.hh: Add new function to initialize thread context. src/cpu/o3/thread_context_impl.hh: Use code now put into function. src/cpu/simple_thread.cc: Move code to ThreadState and use the new helper function. src/cpu/simple_thread.hh: Remove init() in this derived class; use init() from ThreadState base class. src/cpu/thread_state.cc: Move setting up of Physical and Virtual ports here. Change getMemFuncPort() to connectToMemFunc(), which connects a port to a functional port of the memory object below the CPU. src/cpu/thread_state.hh: Update functions. |
3661:efc80a01aeb6 |
14-Nov-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Make cpu's capable of having a phase shift |
3647:8121d4503cbc |
13-Nov-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Make CPU models signal to update the snoop ranges |
3640:3a2f7b451641 |
13-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
More interrupt reworking. |
3639:251dfe00c03d |
13-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Change warn to DPRINTF. |
3636:bc107a8b4e31 |
12-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for regression failure.
src/cpu/o3/fetch_impl.hh: Fetch needs to make sure it isn't waiting on an Icache access. |
3635:8f3b67d2accd |
12-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zamp:./local/clean/tmp/test-regress into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-busfix |
3634:7e9abbddf9da |
12-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for non-FS compile. |
3633:524f2aadbc89 |
12-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Updates to support new interrupt processing and removal of PcPAL.
src/arch/alpha/interrupts.hh: No need for this now that the ThreadContext is being used to set these IPRs in interrupts. Also split up the interrupt checking from the updating of the IPL and interrupt summary. src/arch/alpha/tlb.cc: Check the PC for whether or not it's in PAL mode, not the addr. src/cpu/o3/alpha/cpu.hh: Split up getting the interrupt from actually processing the interrupt. src/cpu/o3/alpha/cpu_impl.hh: Splut up the processing of interrupts. src/cpu/o3/commit_impl.hh: Update for ISA-oriented interrupt changes. src/cpu/o3/fetch_impl.hh: Fix broken if statement from PcPAL updates, and properly populate the request fields.
Also more debugging output. src/cpu/ozone/cpu_impl.hh: Updates for ISA-oriented interrupt stuff. src/cpu/ozone/front_end_impl.hh: Populate request fields properly. src/cpu/simple/base.cc: Update for interrupt stuff. |
3617:384e3b1eae06 |
11-Nov-2006 |
Nathan Binkert <binkertn@umich.edu> |
Get rid of the ParamContext for pseudo instructions and move the parameters to the BaseCPU object. |
3615:ea748987af03 |
11-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
The Lock_Flag_DepTag went away earlier, and using TheISA gives the false impression that this code is ISA independent. |
3594:e401993e0cbb |
10-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem |
3577:605c370622b1 |
08-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Move the check to see if you're in user mode into the isa directory. |
3565:6ad587fb7dfd |
07-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Put kernel_stats back into arch. |
3554:0ec75c89bd8b |
07-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Got rid of a stray blank line. |
3548:85e64c82c522 |
07-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Moved the switched version of kernel_stats.hh back to kern, and moved the base kernel_stats to base_kernel_stats |
3536:89aa06409e4d |
06-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Remote GDB support has been changed to use inheritance. Alpha should work, but isn't tested. Other architectures will not. |
3521:0b0b3551def0 |
03-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Got rid of "inPalMode". Some places are still effectively checking if they are in PAL mode, however. |
3520:4f4a2054fd85 |
03-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Add a new file which describes an ISA's interrupt handling mechanism. It records when interrupts are requested, and returns an interrupt to execute if the |
3512:cefe7f965104 |
09-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Draining fixes.
src/cpu/o3/cpu.cc: Handle draining properly when CPU isn't actually being used. src/cpu/simple/atomic.cc: Be sure to set status properly when draining. src/mem/bus.cc: Fix for draining. |
3500:8d5e32b3bc2e |
07-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Initialize mem dep unit properly.
src/cpu/o3/mem_dep_unit_impl.hh: Initialize mem dep unit properly, add debug output. |
3492:20b28fd2cab5 |
05-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Initialize pointer to NULL.
src/cpu/o3/lsq_unit_impl.hh: Be sure to initialize pointer to NULL. |
3484:9b7ac1654430 |
02-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Use ISA specific makeExtMI.
src/arch/alpha/utility.hh: For now makeExtMI will be specific to the ISA. |
3479:4fbcaa81d105 |
01-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem/ into zeep.eecs.umich.edu:/home/gblack/m5/newmemmemops |
3473:852a0bb230da |
10-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Change up some warnings to DPRINTFs. |
3468:cf23ad1ceef2 |
01-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Adjustments for the AlphaTLB changing to AlphaISA::TLB and changing register file functions to not take faults |
3456:94ba6265a8cf |
31-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Missed a few instances of this function. |
3454:26850ac19a39 |
31-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Move IntrFlag into the MiscRegFile and get rid of specialized accessor functions. |
3411:07ea0d74b798 |
23-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem |
3402:db60546818d0 |
31-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Remove mem parameter. Now the translating port asks the CPU's dcache's peer for its MemObject instead of having to have a paramter for the MemObject.
configs/example/fs.py: configs/example/se.py: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: src/cpu/thread_state.cc: src/cpu/thread_state.hh: tests/configs/o3-timing-mp.py: tests/configs/o3-timing.py: tests/configs/simple-atomic-mp.py: tests/configs/simple-atomic.py: tests/configs/simple-timing-mp.py: tests/configs/simple-timing.py: tests/configs/tsunami-simple-atomic-dual.py: tests/configs/tsunami-simple-atomic.py: tests/configs/tsunami-simple-timing-dual.py: tests/configs/tsunami-simple-timing.py: No need for mem parameter any more. src/cpu/checker/cpu.cc: Use new constructor for simple thread (no more MemObject parameter). src/cpu/checker/cpu.hh: Remove MemObject parameter. src/cpu/memtest/memtest.hh: Ports now take in their MemObject owner. src/cpu/o3/alpha/cpu_builder.cc: Remove mem parameter. src/cpu/o3/alpha/cpu_impl.hh: Remove memory parameter and clean up handling of TranslatingPort. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/mips/cpu_builder.cc: src/cpu/o3/mips/cpu_impl.hh: src/cpu/o3/params.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_builder.cc: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/simple_params.hh: src/cpu/ozone/thread_state.hh: src/cpu/simple/atomic.cc: Remove memory parameter. |
3383:8105c3e566ab |
20-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into zeep.eecs.umich.edu:/home/gblack/m5/newmem |
3376:ed8179dd13da |
16-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into zeep.eecs.umich.edu:/home/gblack/m5/newmem |
3349:fec4a86fa212 |
20-Oct-2006 |
Nathan Binkert <binkertn@umich.edu> |
Use PacketPtr everywhere |
3348:11f6ef023158 |
20-Oct-2006 |
Nathan Binkert <binkertn@umich.edu> |
refactor code for the packet, get rid of packet_impl.hh and call it packet_access.hh and fix the #includes so things compile right. |
3339:d1b3ec71baa4 |
19-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Small changes: ?? doesn't compile in warn statements Should have been false, where I had a true.
src/cpu/o3/lsq_impl.hh: Apparently you can't have ?? in a warn statement (Something about trigraphs) src/mem/cache/cache_impl.hh: Forgot to signal atomic mode in snoopProbe |
3327:b2a5cde9ea77 |
23-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix fetch to stop fetching upon encountering a fault in SE mode. Also change warning to a DPRINTF. |
3326:d9cc6bae9d77 |
23-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Add in support for LL/SC in the O3 CPU. Needs to be fully tested.
src/cpu/base_dyn_inst.hh: Extend BaseDynInst a little bit so it can be use as a TC as well (specifically for ll/sc code). src/cpu/base_dyn_inst_impl.hh: Add variable to track if the result of the instruction should be recorded. src/cpu/o3/alpha/cpu_impl.hh: Clear lock flag upon hwrei. src/cpu/o3/lsq_unit.hh: Use ISA specified handling of locked reads. src/cpu/o3/lsq_unit_impl.hh: Use ISA specified handling of locked writes. |
3319:1ec49a9bfaa3 |
18-Oct-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
only do this assert after you know you're not switched out or idle. |
3310:21adbb41a37e |
17-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Fixes for uni-coherence in timing mode for FS. Still a bug in atomic uni-coherence in FS.
src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_impl.hh: src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: Make CPU models handle coherence requests src/mem/cache/base_cache.cc: Properly signal coherence CSHRs src/mem/cache/coherence/uni_coherence.cc: Only deallocate once |
3300:393d1801068a |
13-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix assertion. I haven't tested it fully (I can't reproduce Lisa's error) but I believe it should fix what she's running into (which was definitely a bug).
src/cpu/o3/fetch_impl.hh: Move assertion to area where it should really always be true. Sometimes you might recvRetry and not necessarily be blocked (if there was a squash). |
3267:d3db53c60988 |
12-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into zeep.eecs.umich.edu:/home/gblack/m5/newmem |
3230:e86a03911728 |
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem
src/cpu/memtest/memtest.cc: src/cpu/memtest/memtest.hh: src/cpu/simple/timing.hh: tests/configs/o3-timing-mp.py: Hand merge. |
3229:cfb4b2250d26 |
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Comment out code that messed up SMT (but will be needed eventually).
src/cpu/o3/cpu.cc: Comment out reseting CPU structures for now. This can be updated to work in the future. |
3228:f47f69e61ded |
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Be sure to delete packet and sender state if the cache is blocked.
src/cpu/o3/lsq_unit.hh: Be sure to delete data if the cache is blocked. |
3227:fe19356d6f88 |
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix caches plus sampling switch over.
src/cpu/o3/cpu.cc: Fix up caches plus sampling switch over. |
3226:de4981baa276 |
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix outstanding bug (FS#158).
src/cpu/o3/cpu.cc: Extra debugging, fix a bug brought up on bug tracker. |
3221:669a04468c0d |
08-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Updates to O3 CPU. It should now work in FS mode, although sampling still has a bug.
src/cpu/o3/commit_impl.hh: Fixes for compile and sampling. src/cpu/o3/cpu.cc: Deallocate and activate threads properly. Also hopefully fix being able to use caches while switching over. src/cpu/o3/cpu.hh: Fixes for deallocating and activating threads. src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit.hh: Handle getting back a BadAddress result from the access. src/cpu/o3/iew_impl.hh: More debug output. src/cpu/o3/lsq_unit_impl.hh: Fixup store conditional handling (still a bit of a hack, but works now).
Also handle getting back a BadAddress result from the access. src/cpu/o3/thread_context_impl.hh: Deallocate context now records if the context should be fully removed. |
3192:f3e215dda3f6 |
09-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Have cpus send snoop ranges |
3184:8edaf4539e05 |
08-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Fixes for functional path.
If the cpu needs to update any state when it gets a functional write (LSQ??) then that code needs to be written.
src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_impl.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: CPU's can recieve functional accesses, they need to determine if they need to do anything with them. src/mem/bus.cc: src/mem/bus.hh: Make the fuctional path do the correct tye of snoop |
3172:2c84db071850 |
08-Oct-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Replace tests of LOCKED/UNCACHEABLE flags with isLocked()/isUncacheable(). |
3160:4d7fc8d7ef23 |
02-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into zeep.eecs.umich.edu:/home/gblack/m5/newmem
src/cpu/ozone/cpu_impl.hh: Hand merged |
3126:756092c6383c |
02-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Updates to fix merge issues and bring almost everything up to working speed. Ozone CPU remains untested, but everything else compiles and runs.
src/arch/alpha/isa_traits.hh: This got changed to the wrong version by accident. src/cpu/base.cc: Fix up progress event to not schedule itself if the interval is set to 0. src/cpu/base.hh: Fix up the CPU Progress Event to not print itself if it's set to 0. Also remove stats_reset_inst (something I added to m5 but isn't necessary here). src/cpu/base_dyn_inst.hh: src/cpu/checker/cpu.hh: Remove float variable of instResult; it's always held within the double part now. src/cpu/checker/cpu_impl.hh: Use thread and not cpuXC. src/cpu/o3/alpha/cpu_builder.cc: src/cpu/o3/checker_builder.cc: src/cpu/ozone/checker_builder.cc: src/cpu/ozone/cpu_builder.cc: src/python/m5/objects/BaseCPU.py: Remove stats_reset_inst. src/cpu/o3/commit_impl.hh: src/cpu/ozone/lw_back_end_impl.hh: Get TC, not XCProxy. src/cpu/o3/cpu.cc: Switch out updates from the version of m5 I have. Also remove serialize code that got added twice. src/cpu/o3/iew_impl.hh: src/cpu/o3/lsq_impl.hh: src/cpu/thread_state.hh: Remove code that was added twice. src/cpu/o3/lsq_unit.hh: Add back in stats that got lost in the merge. src/cpu/o3/lsq_unit_impl.hh: Use proper method to get flags. Also wake CPU if we're coming back from a cache miss. src/cpu/o3/thread_context_impl.hh: src/cpu/o3/thread_state.hh: Support profiling. src/cpu/ozone/cpu.hh: Update to use proper typename. src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/dyn_inst_impl.hh: Updates for newmem. src/cpu/ozone/lw_lsq_impl.hh: Get flags correctly. src/cpu/ozone/thread_state.hh: Reorder constructor initialization, use tc. src/sim/pseudo_inst.cc: Allow for loading of symbol file. Be sure to use ThreadContext and not ExecContext. |
3125:febd811bccc6 |
30-Sep-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zamp:./local/clean/o3-merge/m5 into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem
configs/boot/micro_memlat.rcS: configs/boot/micro_tlblat.rcS: src/arch/alpha/ev5.cc: src/arch/alpha/isa/decoder.isa: src/arch/alpha/isa_traits.hh: src/cpu/base.cc: src/cpu/base.hh: src/cpu/base_dyn_inst.hh: src/cpu/checker/cpu.hh: src/cpu/checker/cpu_impl.hh: src/cpu/o3/alpha/cpu_impl.hh: src/cpu/o3/alpha/params.hh: src/cpu/o3/checker_builder.cc: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/regfile.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/checker_builder.cc: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_back_end.hh: src/cpu/ozone/lw_back_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/thread_state.hh: src/cpu/simple/base.cc: src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: src/cpu/thread_state.hh: src/dev/ide_disk.cc: src/python/m5/objects/O3CPU.py: src/python/m5/objects/Root.py: src/python/m5/objects/System.py: src/sim/pseudo_inst.cc: src/sim/pseudo_inst.hh: src/sim/system.hh: util/m5/m5.c: Hand merge. |
3120:e49afeaf79e9 |
30-Sep-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Changed makeExtMI to take a ThreadContext instead of a pc. |
3112:76b70de314b6 |
15-Sep-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into ewok.(none):/home/gblack/m5/newmem |
3093:b09c33e66bce |
31-Aug-2006 |
Korey Sewell <ksewell@umich.edu> |
add ISA_HAS_DELAY_SLOT directive instead of "#if THE_ISA == ALPHA_ISA" throughout CPU models
src/arch/alpha/isa_traits.hh: src/arch/mips/isa_traits.hh: src/arch/sparc/isa_traits.hh: define 'ISA_HAS_DELAY_SLOT' src/cpu/base_dyn_inst.hh: src/cpu/o3/bpred_unit_impl.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/simple/base.cc: use ISA_HAS_DELAY_SLOT instead of THE_ISA == ALPHA_ISA |
3070:0ca43be10749 |
03-Sep-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fix up the parameters to getInstRecord |
3014:b4309193255a |
16-Aug-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Fixes for Kevins O3 model to work with the blocking caches.
src/cpu/o3/fetch_impl.hh: Fix ordering so dereference works src/cpu/o3/lsq_impl.hh: Check to make sure we didn't squash already src/cpu/o3/lsq_unit.hh: Fix for counting squashed retrys in the WB count src/cpu/o3/lsq_unit_impl.hh: Make sure to set retryID for stores, and clear it appropriately |
2986:99640058db70 |
15-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Some touchup to the reorganized includes and "using" directives. |
2980:eab855f06b79 |
15-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Cleaned up include files and got rid of many using directives in header files. |
2978:199dcea84fc4 |
11-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Started to add support for O3 for sparc. |
2972:f84c6c5309ce |
11-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Pushed most of constants.hh back into isa_traits.hh and regfile.hh and created a seperate file for the syscallreturn class. |
2965:82703e01285a |
26-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
MIPS ISA runs 'hello world' in O3CPU ...
src/arch/mips/isa/base.isa: special case syscall disasembly... maybe give own instruction class? src/arch/mips/isa/decoder.isa: add 'IsSerializeAfter' flag for syscall src/cpu/o3/commit.hh: Add skidBuffer to commit src/cpu/o3/commit_impl.hh: Use skidbuffer in MIPS ISA src/cpu/o3/fetch_impl.hh: Print name out when there is a fault src/cpu/o3/mips/cpu_impl.hh: change comment |
2946:015472193926 |
05-Jul-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zeep.pool:/z/saidi/work/m5.newmem.head |
2943:eb2b70e6116b |
18-Jul-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge m5.eecs.umich.edu:/bk/newmem into ewok.(none):/home/gblack/m5/newmem |
2935:d1223a6c9156 |
23-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
This changeset gets the MIPS ISA pretty much working in the O3CPU. It builds, runs, and gets very very close to completing the hello world succesfully but there are some minor quirks to iron out. Who would've known a DELAY SLOT introduces that much complexity?! arrgh!
Anyways, a lot of this stuff had to do with my project at MIPS and me needing to know how I was going to get this working for the MIPS ISA. So I figured I would try to touch it up and throw it in here (I hate to introduce non-completely working components... )
src/arch/alpha/isa/mem.isa: spacing src/arch/mips/faults.cc: src/arch/mips/faults.hh: Gabe really authored this src/arch/mips/isa/decoder.isa: add StoreConditional Flag to instruction src/arch/mips/isa/formats/basic.isa: Steven really did this file src/arch/mips/isa/formats/branch.isa: fix bug for uncond/cond control src/arch/mips/isa/formats/mem.isa: Adjust O3CPU memory access to use new memory model interface. src/arch/mips/isa/formats/util.isa: update LoadStoreBase template src/arch/mips/isa_traits.cc: update SERIALIZE partially src/arch/mips/process.cc: src/arch/mips/process.hh: no need for this for NOW. ASID/Virtual addressing handles it src/arch/mips/regfile/misc_regfile.hh: add in clear() function and comments for future usage of special misc. regs src/cpu/base_dyn_inst.hh: add in nextNPC variable and supporting functions.
add isCondDelaySlot function
Update predTaken and mispredicted functions src/cpu/base_dyn_inst_impl.hh: init nextNPC src/cpu/o3/SConscript: add MIPS files to compile src/cpu/o3/alpha/thread_context.hh: no need for my name on this file src/cpu/o3/bpred_unit_impl.hh: Update RAS appropriately for MIPS src/cpu/o3/comm.hh: add some extra communication variables to aid in handling the delay slots src/cpu/o3/commit.hh: minor name fix for nextNPC functions. src/cpu/o3/commit_impl.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/rename_impl.hh: Fix necessary variables and functions for squashes with delay slots src/cpu/o3/cpu.cc: Update function interface ...
adjust removeInstsNotInROB function to recognize delay slots insts src/cpu/o3/cpu.hh: update removeInstsNotInROB src/cpu/o3/decode.hh: declare necessary variables for handling delay slot src/cpu/o3/dyn_inst.hh: Add in MipsDynInst src/cpu/o3/fetch.hh: src/cpu/o3/iew.hh: src/cpu/o3/rename.hh: declare necessary variables and adjust functions for handling delay slot src/cpu/o3/inst_queue.hh: src/cpu/simple/base.cc: no need for my name here src/cpu/o3/isa_specific.hh: add in MIPS files src/cpu/o3/scoreboard.hh: dont include alpha specific isa traits! src/cpu/o3/thread_context.hh: no need for my name here, i just rearranged where the file goes src/cpu/static_inst.hh: add isCondDelaySlot function src/cpu/o3/mips/cpu.cc: src/cpu/o3/mips/cpu.hh: src/cpu/o3/mips/cpu_builder.cc: src/cpu/o3/mips/cpu_impl.hh: src/cpu/o3/mips/dyn_inst.cc: src/cpu/o3/mips/dyn_inst.hh: src/cpu/o3/mips/dyn_inst_impl.hh: src/cpu/o3/mips/impl.hh: src/cpu/o3/mips/params.hh: src/cpu/o3/mips/thread_context.cc: src/cpu/o3/mips/thread_context.hh: MIPS file for O3CPU...mirrors ALPHA definition |
2927:62f1518ae800 |
19-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
O3CPU fixes.
src/cpu/o3/lsq_unit.hh: LSQ needs to decrement the WB counter if the load is going to be replayed. src/cpu/o3/lsq_unit_impl.hh: LSQ needs to decrement the WB counter if the load is squashed. |
2926:48f2f450cbf6 |
19-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Some minor compiling fixes.
src/cpu/o3/iew.hh: Non-debug compile fixes. src/cpu/simple/atomic.cc: src/cpu/simple/atomic.hh: Merge fix. |
2923:db8a876258df |
14-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge
configs/test/fs.py: configs/test/test.py: SCCS merged |
2918:20cdaf201249 |
12-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Serialization changes to make O3CPU consistent with the other models.
src/cpu/o3/commit_impl.hh: Always set instruction. This is necessary for serialization as the instruction is also serialized. src/cpu/o3/cpu.cc: Change serialization so it matches other CPU's output. Also fix up some indexing. |
2911:854ee6cd377e |
14-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
forgot tid |
2910:7eb6f817e267 |
14-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
For now, halt context is the same as deallocating. suspend context will now take the thread off the activeThread list.
src/arch/mips/isa_traits.cc: add in copy MiscRegs unimplemented function |
2907:7b0ababb4166 |
13-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Move Dcache port creation from LSQUnit to LSQ in order to support Ron's recent changes, and using the O3CPU in SMT mode.
src/cpu/o3/lsq.hh: Update to have LSQ work with only one dcache port for all LSQ Units. LSQ has the dcache port, and the LSQ Units must tell the LSQ if the cache has become blocked. src/cpu/o3/lsq_impl.hh: Updates to have the LSQ work with only one dcache port for all LSQUnits. src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: Update for LSQ to create dcache port instead of LSQUnits. Now LSQUnits are given the dcache port from the LSQ, and also must check the LSQ if the cache is blocked prior to accessing the cache. |
2906:3d65b80fdb11 |
13-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for bug when squashing and the fetching. Now fetch checks if the cache data is valid. |
2905:62879b0282eb |
13-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Update for changes to draining. |
2900:7cccbae04d02 |
12-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem
src/cpu/o3/fetch_impl.hh: Hand merge. |
2894:a83675362809 |
11-Jul-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Fix ordering issue with squashed Icache Fetches and Static data in packet.
Now hello world works with 2 levels of cache with O3 CPU(multiple outstanding requests).
src/cpu/o3/fetch_impl.hh: Fix ordering issue with squashed Icache Fetches and Static data in packet. |
2893:58c423134221 |
12-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Track the PC of the cache data stored in fetch so it doesn't access memory multiple times if information is already in fetch. |
2886:2fdb9976b0a3 |
10-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge |
2880:a48d5059cd35 |
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/zooks/y/ksewell/research/m5-sim/newmem-o3 |
2877:4b56debc25d1 |
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Minor fix for SMT Hello Worlds to finish correctly. Still, there is a problem with the LSQ and indexing out of range in the buffer. I havent nailed down the fix yet, but it's coming ...
src/cpu/o3/commit_impl.hh: add space to DPRINT src/cpu/o3/cpu.cc: add newline to DPRINT src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: Each thread needs it's own squashedSeqNum for the case where they are both squashing at the same time and they dont write over each other's squash number. |
2876:a862ab9f93f8 |
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/zooks/y/ksewell/research/m5-sim/newmem-o3 |
2875:9b6f6b75b187 |
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Fix so that O3CPU doesnt segfault on exit. Major thing was to not execute commit if there are no active threads in CPU.
src/cpu/o3/alpha/thread_context.hh: call deallocate instead of deallocateContext src/cpu/o3/commit_impl.hh: dont run commit stage if there are no instructions src/cpu/o3/cpu.cc: add deallocate event, deactivateThread function, and edit deallocateContext. src/cpu/o3/cpu.hh: add deallocate event and add optional delay to deallocateContext src/cpu/o3/thread_context.hh: optional delay for deallocate src/cpu/o3/thread_context_impl.hh: edit DPRINTFs to say Thread Context instead of Alpha TC src/cpu/thread_context.hh: optional delay src/sim/syscall_emul.hh: name stuff |
2874:5389a28b80fb |
10-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Some minor cleanups.
src/cpu/SConscript: Change the error message to be slightly nicer. src/cpu/o3/commit.hh: Remove old code. src/cpu/o3/commit_impl.hh: Remove old unused code. |
2873:1377a68cd00e |
10-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Add parameters for backwards and forwards sizes for time buffers.
src/base/timebuf.hh: Add a function to return the size of the time buffer. |
2871:7ed5c9ef3eb6 |
07-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Support Ron's changes for hooking up ports.
src/cpu/checker/cpu.hh: Now that BaseCPU is a MemObject, the checker must define this function. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/iew.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_unit.hh: Implement getPort function so the connector can connect the ports properly. src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit_impl.hh: The connector handles connecting the ports now. src/python/m5/objects/O3CPU.py: Add ports to the parameters. |
2870:e81b23c19e5a |
07-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for bug when draining and a memory access is outstanding. |
2867:cc92d58a3210 |
07-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Switch out fixes for CPUs.
src/cpu/o3/cpu.cc: Fix up keeping proper state when switched out and drained. src/cpu/simple/timing.cc: src/cpu/simple/timing.hh: Keep track of the event we use to schedule fetch initially and upon resume. We may have to cancel the event if the CPU is switched out. |
2864:eab7ff8f6d72 |
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Support serializing and unserializing in the O3 CPU. Also a few small fixes for draining/switching CPUs.
src/cpu/o3/commit_impl.hh: Fix to clear drainPending variable on call to resume. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: Support serializing and unserializing in the O3 CPU. src/cpu/o3/lsq_impl.hh: Be sure to say we have no stores to write back if the active thread list is empty. src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: Slightly change how SimpleThread is used to copy from other ThreadContexts. |
2863:2592e056dc5c |
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix the O3CPU to support the multi-pass method for checking if the system has fully drained.
src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: Return a value so that the CPU can instantly return from draining if the pipeline is already drained. src/cpu/o3/cpu.cc: Use values returned from pipeline stages so that the CPU can instantly return from draining if the pipeline is already drained. |
2862:7bc3562e6405 |
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Various serialization changes to make it possible for the O3CPU to checkpoint.
src/arch/alpha/regfile.hh: Define serialize/unserialize functions on MiscRegFile itself. src/cpu/o3/regfile.hh: Remove old commented code. src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: Push common serialization code to ThreadState level. Also allow the SimpleThread to be used for checkpointing by other models. src/cpu/thread_state.cc: src/cpu/thread_state.hh: Move common serialization code into ThreadState. |
2852:7fc1b748dd81 |
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge |
2850:0b4a6b4c9b8a |
06-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Had to add this because for some reason gcc wasnt recognizing "THE_ISA == ALPHA_ISA"... wierd but OK |
2849:c285bf8ffb4a |
06-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/zooks/y/ksewell/research/m5-sim/newmem-o3 |
2848:f29a4a5c4d66 |
06-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Use O3DynInst in cpu_models.py and in static_inst_exec_sigs.hh instead of a specific ISA dyn. inst.
src/cpu/cpu_models.py: Use O3DynInst src/cpu/o3/dyn_inst.hh: declare O3DynInst here based off of ISA ... this must be updated for each ISA. src/cpu/static_inst.hh: take out O3 forward declarations here and include header file to keep this file clean |
2847:6b19f07d9666 |
06-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
more steps toward O3 SMT
src/arch/mips/isa/formats/fp.isa: Adjust for newmem src/cpu/cpu_models.py: Use O3DynInst instead of convoluted way src/cpu/o3/alpha/impl.hh: take out O3DynInst typedef here ... src/cpu/o3/cpu.cc: open up the SMT functions in the O3CPU src/cpu/static_inst.hh: Add O3DynInst src/cpu/o3/dyn_inst.hh: Use to get ISA-specific O3DynInst |
2845:18e6dde158f0 |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem |
2843:19c4c6c2b5b1 |
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Support for draining, and the new method of switching out. Now switching out happens after the pipeline has been drained, deferring the three way handshake to the normal drain mechanism. The calls of switchOut() and takeOverFrom() both take action immediately.
src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: Support for draining, new method of switching out. |
2840:227f7c4f8c81 |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Remove sampler and serializer. Now they are handled through C++ interacting with Python.
src/SConscript: src/cpu/base.cc: src/cpu/base.hh: src/cpu/checker/cpu.hh: src/cpu/checker/cpu_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_impl.hh: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/sim/pseudo_inst.cc: Remove sampler. src/sim/sim_object.cc: Remove serializer. |
2837:10ae172449b3 |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix up some merge problems.
src/base/traceflags.py: Remove BaseCPU traceflag. src/cpu/o3/alpha/params.hh: Move non-Alpha specific parameters out of this params class. src/cpu/o3/params.hh: Move non-Alpha specific params into this params class. |
2836:c8f549058964 |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge
src/base/traceflags.py: src/cpu/SConscript: Hand merge. src/cpu/o3/alpha/params.hh: Hand merge. This needs to get changed. |
2834:c8342a71404b |
03-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Fix for FS O3CPU compile ... missing forward class declaration/header file after files got split for ISA-independence
src/cpu/o3/alpha/thread_context.hh: Use 'this' when accessing cpu src/cpu/o3/cpu.hh: add numActiveThreds function src/cpu/o3/thread_context.hh: forward class declarations src/cpu/o3/thread_context_impl.hh: add quiesce event header file src/cpu/thread_context.hh: add exit() function to thread context (read comments in file) src/sim/syscall_emul.cc: adjust exitFunc syscall |
2832:c990b002e0be |
02-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
typo ... change 'single_thread' to 'round_robin_policy' |
2831:0a42b294727c |
02-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Fix default SMT configuration in O3CPU (i.e. fetch policy, workloads/numThreads)
Edit Test3 for newmem
src/base/traceflags.py: Add O3CPU flag src/cpu/base.cc: for some reason adding a BaseCPU flag doesnt work so just go back to old way... src/cpu/o3/alpha/cpu_builder.cc: Determine number threads by workload size instead of solely by parameter.
Default SMT fetch policy to RoundRobin if it's not specified in Config file src/cpu/o3/commit.hh: only use nextNPC for !ALPHA src/cpu/o3/commit_impl.hh: add FetchTrapPending as condition for commit src/cpu/o3/cpu.cc: panic if active threads is more than Impl::MaxThreads src/cpu/o3/fetch.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: name stuff src/cpu/o3/fetch_impl.hh: fatal if try to use SMT branch count, that's unimplemented right now src/python/m5/config.py: make it clearer that a parameter is not valid within a configuration class |
2829:f354c00bba05 |
01-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
fix cpu builder to build the correct name...
add activateThread event and functions
src/cpu/o3/alpha/cpu_builder.cc: Have CPU builder build a DerivO3CPU not a DerivAlphaO3CPU src/cpu/o3/cpu.cc: add activateThread Event
add activateThread function
adjust activateContext to schedule a thread to activate within the CPU instead of activating thread right away. This will lead to stages trying to use threads that arent ready yet and wasting execution time & possibly performance. src/cpu/o3/cpu.hh: add activateThread Event
add activateThread function
add schedule/descheculed activate thread event |
2828:6f7429218c08 |
30-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/zooks/y/ksewell/research/m5-sim/newmem-o3 |
2820:7fde0b0f8f78 |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Add some different parameters. The main change is that the writeback count is now limited so that it doesn't overflow the buffer.
src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_params.hh: Add in dispatchWidth, wbWidth, wbDepth parameters. wbDepth is the number of cycles of wbWidth instructions that can be buffered. src/cpu/o3/iew.hh: Include separate parameter for dispatch width. Also limit the number of outstanding writebacks so the writeback buffer isn't overflowed. The IQ must make sure with the IEW stage that it can issue instructions prior to issuing. src/cpu/o3/iew_impl.hh: Include separate parameter for dispatch width. Also limit the number of outstanding writebacks so the writeback buffer isn't overflowed. src/cpu/o3/inst_queue_impl.hh: IQ needs to check with the IEW to make sure it can issue instructions, and increments the IEW wb counter each time there is an outstanding instruction that will writeback. src/cpu/o3/lsq_unit_impl.hh: Be sure to decrement the writeback counter if there's a squashed load that returned. src/python/m5/objects/AlphaO3CPU.py: Change the parameters to include dispatch width, writeback width, and writeback depth. |
2818:a2b6429690b6 |
30-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
now O3CPU is totally independent of the ISA... all alpha specific stuff is the cpu/o3/alpha directory
src/cpu/o3/alpha/cpu.cc: src/cpu/o3/alpha/cpu_impl.hh: src/cpu/o3/alpha/impl.hh: filenames src/cpu/o3/alpha/thread_context.hh: public src/cpu/o3/base_dyn_inst.cc: src/cpu/o3/bpred_unit.cc: src/cpu/o3/commit.cc: src/cpu/o3/cpu.cc: src/cpu/o3/decode.cc: src/cpu/o3/fetch.cc: src/cpu/o3/iew.cc: src/cpu/o3/inst_queue.cc: src/cpu/o3/lsq.cc: src/cpu/o3/lsq_unit.cc: src/cpu/o3/mem_dep_unit.cc: src/cpu/o3/rename.cc: src/cpu/o3/rob.cc: use O3CPUImpl ... not Alpha src/cpu/o3/checker_builder.cc: filename |
2817:273f7fb94f83 |
30-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
Make O3CPU model independent of the ISA
Use O3CPU when building instead of AlphaO3CPU.
I could use some better python magic in the cpu_models.py file!
AUTHORS: add middle initial SConstruct: change from AlphaO3CPU to O3CPU src/cpu/SConscript: edits to build O3CPU instead of AlphaO3CPU src/cpu/cpu_models.py: change substitution template to use proper CPU EXEC CONTEXT For O3CPU Model...
Actually, some Python expertise could be used here. The 'env' variable is not passed to this file, so I had to parse through the ARGV to find the ISA... src/cpu/o3/base_dyn_inst.cc: src/cpu/o3/bpred_unit.cc: src/cpu/o3/commit.cc: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/decode.cc: src/cpu/o3/fetch.cc: src/cpu/o3/iew.cc: src/cpu/o3/inst_queue.cc: src/cpu/o3/lsq.cc: src/cpu/o3/lsq_unit.cc: src/cpu/o3/mem_dep_unit.cc: src/cpu/o3/rename.cc: src/cpu/o3/rob.cc: use isa_specific.hh src/sim/process.cc: only initi NextNPC if not ALPHA src/cpu/o3/alpha/cpu.cc: alphao3cpu impl src/cpu/o3/alpha/cpu.hh: move AlphaTC to it's own file src/cpu/o3/alpha/cpu_impl.hh: Move AlphaTC to it's own file ... src/cpu/o3/alpha/dyn_inst.cc: src/cpu/o3/alpha/dyn_inst.hh: src/cpu/o3/alpha/dyn_inst_impl.hh: include paths src/cpu/o3/alpha/impl.hh: include paths, set default MaxThreads to 2 instead of 4 src/cpu/o3/alpha/params.hh: set Alpha Specific Params here src/python/m5/objects/O3CPU.py: add O3CPU class src/cpu/o3/SConscript: include isa-specific build files src/cpu/o3/alpha/thread_context.cc: NEW HOME of AlphaTC src/cpu/o3/alpha/thread_context.hh: new home of AlphaTC src/cpu/o3/isa_specific.hh: includes ISA specific files src/cpu/o3/params.hh: base o3 params src/cpu/o3/thread_context.hh: base o3 thread context src/cpu/o3/thread_context_impl.hh: base o3 thead context impl |
2808:a88ea76f6738 |
27-Jun-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Make full CPU handle SE faults |
2794:0dd6cb8820e1 |
22-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Checker related updates.
src/cpu/o3/cpu.cc: Updates to make sure the checker is compiled in if enabled and also to include it only when it's used. |
2791:7b2a7e21909b |
22-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Change ThreadState constructor ordering to match the rest of the ThreadStates. |
2790:2f8e9762bee9 |
22-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Misc fixes.
src/cpu/o3/alpha_dyn_inst_impl.hh: Consolidate these calls into one. src/cpu/o3/commit_impl.hh: Include checker only if it's being used. src/cpu/o3/fetch_impl.hh: Do not deallocate request if it's a squashed response that was received. src/cpu/o3/lsq_unit.hh: Add in comment. src/cpu/o3/lsq_unit_impl.hh: Only include checker if it's being used. |
2783:381a5413b55a |
17-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Minor updates.
src/cpu/o3/alpha_cpu.hh: Fix #define in header. util/rundiff: Fix file comments to be more correct. util/tracediff: Update comments to be more correct. |
2765:2962455d1c0a |
17-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Split off instantiation into separate CC files for each of the models. This makes it easier to be able to specify only certain CPU models.
src/cpu/SConscript: Split off instantiations into separate CC files. This makes it easier to split them per CPU model. src/cpu/base_dyn_inst_impl.hh: Move instantations out of impl.hh file and into a cc file. src/cpu/checker/cpu_impl.hh: Move instantiations over to .cc files inside each CPU's directory. Makes it easier to only use what's actually included. src/cpu/o3/bpred_unit.cc: Pull Ozone instantiations out of this .cc file; put them into the ozone's CC file. src/cpu/o3/checker_builder.cc: Instantiate Checker for O3 CPU. src/cpu/ozone/checker_builder.cc: Instantiate Checker for Ozone CPU. |
2757:58e3a66e72f7 |
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge |
2756:7bf0d6481df9 |
15-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
Initial changes to allowed DetailedCPU to work with other architectures (i.e. Sparc & MIPS)
Still need to add some code to fetch & commit stages
src/cpu/o3/commit.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: Add nextNPC read & set functions src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: Add nextNPC |
2736:98dcdc08884d |
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Reorganization to move FuncUnit, FUDesc, and OpDesc out of the encumbered directory and into the normal cpu directory.
src/SConscript: Split off FuncUnits from old FUPool so I'm not including encumbered code. This was all written by Steve Raasch so it's safe to include in the main tree. src/cpu/o3/fu_pool.cc: Include the func unit file that's not in the encumbered directory. |
2734:af0d50755df7 |
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Miscellaneous minor fixes.
src/cpu/checker/cpu.cc: Add in comment. src/cpu/cpuevent.hh: Fix up comment. src/cpu/o3/bpred_unit.cc: Comment out Ozone instantiations. src/cpu/o3/dep_graph.hh: Include destructor. |
2733:e0eac8fc5774 |
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Two updates that got combined into one ChangeSet accidentally. They're both pretty simple so they shouldn't cause any trouble.
First: Rename FullCPU and its variants in the o3 directory to O3CPU to differentiate from the old model, and also to specify it's an out of order model.
Second: Include build options for selecting the Checker to be used. These options make sure if the Checker is being used there is a CPU that supports it also being compiled.
SConstruct: Add in option USE_CHECKER to allow for not compiling in checker code. The checker is enabled through this option instead of through the CPU_MODELS list. However it's still necessary to treat the Checker like a CPU model, so it is appended onto the CPU_MODELS list if enabled. configs/test/test.py: Name change for DetailedCPU to DetailedO3CPU. Also include option for max tick. src/base/traceflags.py: Add in O3CPU trace flag. src/cpu/SConscript: Rename AlphaFullCPU to AlphaO3CPU.
Only include checker sources if they're necessary. Also add a list of CPUs that support the Checker, and only allow the Checker to be compiled in if one of those CPUs are also being included. src/cpu/base_dyn_inst.cc: src/cpu/base_dyn_inst.hh: Rename typedef to ImplCPU instead of FullCPU, to differentiate from the old FullCPU. src/cpu/cpu_models.py: src/cpu/o3/alpha_cpu.cc: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_cpu_impl.hh: Rename AlphaFullCPU to AlphaO3CPU to differentiate from old FullCPU model. src/cpu/o3/alpha_dyn_inst.hh: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/alpha_impl.hh: src/cpu/o3/alpha_params.hh: src/cpu/o3/commit.hh: src/cpu/o3/cpu.hh: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/regfile.hh: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: src/cpu/o3/thread_state.hh: src/python/m5/objects/AlphaO3CPU.py: Rename FullCPU to O3CPU to differentiate from old FullCPU model. src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit_impl.hh: Rename FullCPU to O3CPU to differentiate from old FullCPU model. Also #ifdef the checker code so it doesn't need to be included if it's not selected. |
2732:d2443ce353d2 |
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Checker updates.
src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: Updates for checker. Output more informative messages on error. Rename some functions. Add in option to warn (and not exit) on load results being incorrect. src/cpu/checker/cpu_builder.cc: src/cpu/checker/o3_cpu_builder.cc: Add in parameter to warn (and not exit) on load result errors. src/cpu/o3/commit_impl.hh: src/cpu/o3/lsq_unit_impl.hh: Renamed checker functin. |
2731:822b96578fba |
14-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Minor code cleanup of BaseDynInst.
src/cpu/base_dyn_inst.cc: src/cpu/base_dyn_inst.hh: Minor code cleanup by putting several bools into a bitset instead. src/cpu/o3/commit_impl.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/rob_impl.hh: Changed around some things in BaseDynInst. |
2727:91e17c7ee622 |
13-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Minor updates for stats.
src/cpu/o3/commit_impl.hh: src/cpu/o3/fetch.hh: Update stats comments. src/cpu/o3/fetch_impl.hh: Differentiate stats. src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: Update for stats. src/cpu/o3/lsq.hh: LSQ now has stats. src/cpu/o3/lsq_impl.hh: Register stats of all LSQ units. src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: Add in stats. |
2722:610b13e19da0 |
13-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Compile fix. |
2720:695250d6fa42 |
12-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge fixes to make full system compile and run.
src/arch/alpha/linux/system.cc: src/cpu/o3/alpha_cpu_impl.hh: src/sim/system.cc: Merge fixes. |
2708:c4157b162e7b |
09-Jun-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Merge vm1.(none):/home/stever/bk/newmem into vm1.(none):/home/stever/bk/newmem-py
src/python/m5/__init__.py: src/sim/syscall_emul.cc: Hand merge. |
2703:638e5b90f4c6 |
12-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix output messages.
src/cpu/o3/decode_impl.hh: src/cpu/o3/rename_impl.hh: Fix output message. |
2702:8a3ee279559b |
12-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Clean up/shift some code around.
src/cpu/base_dyn_inst.cc: Clean up some code and update. src/cpu/base_dyn_inst.hh: Clean up some code and update with more descriptive function names. src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_params.hh: src/cpu/o3/commit.hh: Remove unused parameters. src/cpu/o3/commit_impl.hh: Remove unused parameters, also set squashCounter directly to the counted number of squashes. src/cpu/o3/fetch_impl.hh: Update for function name changes. src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: Remove unused parameter, move some code into a function. |
2699:c255fef3daaa |
09-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Two minor fixes.
src/cpu/o3/lsq_unit_impl.hh: Missed this name change. src/cpu/thread_state.cc: Fix for stats. |
2698:d5f35d41e017 |
09-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Removing of old code and adding in new comments.
src/cpu/base_dyn_inst.cc: Clean up old functions, comments. src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_params.hh: src/cpu/o3/cpu.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/ozone/lsq_unit.hh: src/cpu/ozone/lsq_unit_impl.hh: Remove old commented code. src/cpu/o3/fetch.hh: Remove old commented code, add in comments. src/cpu/o3/inst_queue_impl.hh: Move comment to better place. src/cpu/o3/lsq_unit.hh: Remove old commented code, add in new comments. src/cpu/o3/lsq_unit_impl.hh: Remove old commented code, rename variable. |
2696:30b38e36ff54 |
09-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Allow for fetch to retry access if the sendTiming call fails. |
2694:879ca5098a90 |
09-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Remove obsolete stuff.
src/cpu/o3/alpha_cpu.hh: Remove functions no longer used for reading and writing. |
2693:18c6be231eb1 |
09-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes for some outstanding issues in the LSQ. It should now be able to retry. It should also be able to handle LL/SC (through hacks) for the UP case.
src/cpu/o3/lsq_unit.hh: Handle being able to retry (untested but hopefully very close to working).
Handle lock flag for LL/SC hack. Hopefully the memory system will add in LL/SC soon.
Better output message. src/cpu/o3/lsq_unit_impl.hh: Handle being able to retry (untested but should be very close to working).
Make SC's work (hopefully) while the memory system doesn't have a LL/SC implementation. |
2692:e5b7553eff69 |
08-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Tell checker that an instruction is completed prior once it does the access to memory. As long as the checker does not access memory to verify the store's data (currently impossible in the O3 model), this will work fine.
src/cpu/o3/lsq_unit_impl.hh: Tell checker that an instruction is completed prior once it does the access to memory. |
2690:f4337c0d9e6f |
08-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Get O3 CPU mostly working in full system, and fix an FP bug that showed up.
It still does not yet handle retries.
src/cpu/base_dyn_inst.hh: Get working in full-system mode and fix some FP bugs. src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/thread_context.hh: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/thread_state.hh: src/cpu/thread_state.hh: Get working in full system. src/cpu/checker/o3_cpu_builder.cc: Checker does not take a MemObject as a simobj parameter. src/cpu/o3/alpha_dyn_inst.hh: Fix up float regs. src/cpu/o3/regfile.hh: Fix up an fp error, print out more useful output messages. |
2689:dbf969c18a65 |
07-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Update copyright. |
2683:d6b72bb2ed97 |
07-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Reorganization/renaming of CPUExecContext. Now it is called SimpleThread in order to clear up the confusion due to the many ExecContexts. It also derives from a common ThreadState object, which holds various state common to threads across CPU models.
Following with the previous check-in, ExecContext now refers only to the interface provided to the ISA in order to access CPU state. ThreadContext refers to the interface provided to all objects outside the CPU in order to access thread state. SimpleThread provides all thread state and the interface to access it, and is suitable for simple execution models such as the SimpleCPU.
src/SConscript: Include thread state file. src/arch/alpha/ev5.cc: src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/thread_context.hh: src/cpu/memtest/memtest.cc: src/cpu/memtest/memtest.hh: src/cpu/o3/cpu.cc: src/cpu/ozone/cpu_impl.hh: src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: Rename CPUExecContext to SimpleThread. src/cpu/base_dyn_inst.hh: Make thread member variables protected.. src/cpu/o3/alpha_cpu.hh: src/cpu/o3/cpu.hh: Make various members of ThreadState protected. src/cpu/o3/alpha_cpu_impl.hh: Push generation of TranslatingPort into the CPU itself. Make various members of ThreadState protected. src/cpu/o3/thread_state.hh: Pull a lot of common code into the base ThreadState class. src/cpu/ozone/thread_state.hh: Rename CPUExecContext to SimpleThread, move a lot of common code into base ThreadState class. src/cpu/thread_state.hh: Push a lot of common code into base ThreadState class. This goes along with renaming CPUExecContext to SimpleThread, and making it derive from ThreadState. src/cpu/simple_thread.cc: Rename CPUExecContext to SimpleThread, make it derive from ThreadState. This helps push a lot of common code/state into a single class that can be used by all CPUs. src/cpu/simple_thread.hh: Rename CPUExecContext to SimpleThread, make it derive from ThreadState. src/kern/system_events.cc: Rename cpu_exec_context to thread_context. src/sim/process.hh: Remove unused forward declaration. |
2681:6885b69f4075 |
07-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Clear misc regs at startup.
src/arch/alpha/regfile.hh: Define clear functions on the individual reg files. src/cpu/o3/regfile.hh: Be sure to clear the misc reg file at startup. |
2680:246e7104f744 |
06-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Change ExecContext to ThreadContext. This is being renamed to differentiate between the interface used objects outside of the CPU, and the interface used by the ISA. ThreadContext is used by objects outside of the CPU and is specifically defined in thread_context.hh. ExecContext is more implicit, and is defined by files such as base_dyn_inst.hh or cpu/simple/base.hh.
Further renames/reorganization will be coming shortly; what is currently CPUExecContext (the old ExecContext from m5) will be renamed to SimpleThread or something similar.
src/arch/alpha/arguments.cc: src/arch/alpha/arguments.hh: src/arch/alpha/ev5.cc: src/arch/alpha/faults.cc: src/arch/alpha/faults.hh: src/arch/alpha/freebsd/system.cc: src/arch/alpha/freebsd/system.hh: src/arch/alpha/isa/branch.isa: src/arch/alpha/isa/decoder.isa: src/arch/alpha/isa/main.isa: src/arch/alpha/linux/process.cc: src/arch/alpha/linux/system.cc: src/arch/alpha/linux/system.hh: src/arch/alpha/linux/threadinfo.hh: src/arch/alpha/process.cc: src/arch/alpha/regfile.hh: src/arch/alpha/stacktrace.cc: src/arch/alpha/stacktrace.hh: src/arch/alpha/tlb.cc: src/arch/alpha/tlb.hh: src/arch/alpha/tru64/process.cc: src/arch/alpha/tru64/system.cc: src/arch/alpha/tru64/system.hh: src/arch/alpha/utility.hh: src/arch/alpha/vtophys.cc: src/arch/alpha/vtophys.hh: src/arch/mips/faults.cc: src/arch/mips/faults.hh: src/arch/mips/isa_traits.cc: src/arch/mips/isa_traits.hh: src/arch/mips/linux/process.cc: src/arch/mips/process.cc: src/arch/mips/regfile/float_regfile.hh: src/arch/mips/regfile/int_regfile.hh: src/arch/mips/regfile/misc_regfile.hh: src/arch/mips/regfile/regfile.hh: src/arch/mips/stacktrace.hh: src/arch/sparc/faults.cc: src/arch/sparc/faults.hh: src/arch/sparc/isa_traits.hh: src/arch/sparc/linux/process.cc: src/arch/sparc/linux/process.hh: src/arch/sparc/process.cc: src/arch/sparc/regfile.hh: src/arch/sparc/solaris/process.cc: src/arch/sparc/stacktrace.hh: src/arch/sparc/ua2005.cc: src/arch/sparc/utility.hh: src/arch/sparc/vtophys.cc: src/arch/sparc/vtophys.hh: src/base/remote_gdb.cc: src/base/remote_gdb.hh: src/cpu/base.cc: src/cpu/base.hh: src/cpu/base_dyn_inst.hh: src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/exec_context.hh: src/cpu/cpu_exec_context.cc: src/cpu/cpu_exec_context.hh: src/cpu/cpuevent.cc: src/cpu/cpuevent.hh: src/cpu/exetrace.hh: src/cpu/intr_control.cc: src/cpu/memtest/memtest.hh: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/regfile.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/back_end.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/inorder_back_end.hh: src/cpu/ozone/lw_back_end.hh: src/cpu/ozone/lw_back_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/thread_state.hh: src/cpu/pc_event.cc: src/cpu/pc_event.hh: src/cpu/profile.cc: src/cpu/profile.hh: src/cpu/quiesce_event.cc: src/cpu/quiesce_event.hh: src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: src/cpu/static_inst.cc: src/cpu/static_inst.hh: src/cpu/thread_state.hh: src/dev/alpha_console.cc: src/dev/ns_gige.cc: src/dev/sinic.cc: src/dev/tsunami_cchip.cc: src/kern/kernel_stats.cc: src/kern/kernel_stats.hh: src/kern/linux/events.cc: src/kern/linux/events.hh: src/kern/system_events.cc: src/kern/system_events.hh: src/kern/tru64/dump_mbuf.cc: src/kern/tru64/tru64.hh: src/kern/tru64/tru64_events.cc: src/kern/tru64/tru64_events.hh: src/mem/vport.cc: src/mem/vport.hh: src/sim/faults.cc: src/sim/faults.hh: src/sim/process.cc: src/sim/process.hh: src/sim/pseudo_inst.cc: src/sim/pseudo_inst.hh: src/sim/syscall_emul.cc: src/sim/syscall_emul.hh: src/sim/system.cc: src/cpu/thread_context.hh: src/sim/system.hh: src/sim/vptr.hh: Change ExecContext to ThreadContext. |
2679:737e9f158843 |
06-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix checker to work in newmem in SE mode.
src/cpu/o3/fetch_impl.hh: Give the checker a pointer to the icachePort. src/cpu/o3/lsq_unit_impl.hh: Give the checker a pointer to the dcachePort. src/mem/request.hh: Allow checking for the scResult being valid prior to accessing it. |
2678:1f86b91dc3bb |
05-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes to get new CPU model working for simple test case. The CPU does not yet support retrying accesses.
src/cpu/base_dyn_inst.cc: Delete the allocated data in destructor. src/cpu/base_dyn_inst.hh: Only copy the addresses if the translation succeeded. src/cpu/o3/alpha_cpu.hh: Return actual translating port. Don't panic on setNextNPC() as it's always called, regardless of the architecture, when the process initializes. src/cpu/o3/alpha_cpu_impl.hh: Pass in memobject to the thread state in SE mode. src/cpu/o3/commit_impl.hh: Initialize all variables. src/cpu/o3/decode_impl.hh: Handle early resolution of branches properly. src/cpu/o3/fetch.hh: Switch structure back to requests. src/cpu/o3/fetch_impl.hh: Initialize all variables, create/delete requests properly. src/cpu/o3/lsq_unit.hh: Include sender state along with the packet. Also include a more generic writeback event that's only used for stores forwarding data to loads. src/cpu/o3/lsq_unit_impl.hh: Redo writeback code to support the response path of the memory system. src/cpu/o3/mem_dep_unit.cc: src/cpu/o3/mem_dep_unit_impl.hh: Wrap variables in #ifdefs. src/cpu/o3/store_set.cc: Include to get panic() function. src/cpu/o3/thread_state.hh: Create with MemObject as well. src/cpu/thread_state.hh: Have a translating port in the thread state object. src/python/m5/objects/AlphaFullCPU.py: Mem parameter no longer needed. |
2674:6d4afef73a20 |
04-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zamp:/z/ktlim2/clean/m5-o3 into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge
src/cpu/checker/o3_cpu_builder.cc: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/bpred_unit.cc: src/cpu/o3/commit.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/thread_state.hh: Hand merge. |
2670:9107b8bd08cd |
02-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zizzer.eecs.umich.edu:/.automount/zamp/z/ktlim2/clean/newmem |
2669:f2b336e89d2a |
02-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes to get compiling to work. This is mainly fixing up some includes; changing functions within the XCs; changing MemReqPtrs to Requests or Packets where appropriate.
Currently the O3 and Ozone CPUs do not work in the new memory system; I still need to fix up the ports to work and handle responses properly. This check-in is so that the merge between m5 and newmem is no longer outstanding.
src/SConscript: Need to include FU Pool for new CPU model. I'll try to figure out a cleaner way to handle this in the future. src/base/traceflags.py: Include new traces flags, fix up merge mess up. src/cpu/SConscript: Include the base_dyn_inst.cc as one of othe sources. Don't compile the Ozone CPU for now. src/cpu/base.cc: Remove an extra } from the merge. src/cpu/base_dyn_inst.cc: Fixes to make compiling work. Don't instantiate the OzoneCPU for now. src/cpu/base_dyn_inst.hh: src/cpu/o3/2bit_local_pred.cc: src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst.hh: src/cpu/o3/alpha_params.hh: src/cpu/o3/bpred_unit.cc: src/cpu/o3/btb.hh: src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/free_list.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/regfile.hh: src/cpu/o3/sat_counter.hh: src/cpu/op_class.hh: src/cpu/ozone/cpu.hh: src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/exec_context.hh: src/cpu/checker/o3_cpu_builder.cc: src/cpu/ozone/cpu_impl.hh: src/mem/request.hh: src/cpu/o3/fu_pool.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/back_end.hh: src/cpu/ozone/dyn_inst.cc: src/cpu/ozone/dyn_inst.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/inorder_back_end.hh: src/cpu/ozone/lw_back_end.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/ozone_impl.hh: src/cpu/ozone/thread_state.hh: Fixes to get compiling to work. src/cpu/o3/alpha_cpu.hh: Fixes to get compiling to work. Float reg accessors have changed, as well as MemReqPtrs to RequestPtrs. src/cpu/o3/alpha_dyn_inst_impl.hh: Fixes to get compiling to work. Pass in the packet to the completeAcc function. Fix up syscall function. |
2667:fe64b8353b1c |
09-Jun-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Move main control from C++ into Python. User script now invokes initialization and simulation loop after building configuration. These functions are exported from C++ to Python using SWIG.
SConstruct: Set up SWIG builder & scanner. Set up symlinking of source files into build directory (by not disabling the default behavior). configs/test/test.py: Rewrite to use new script-driven interface. Include a sample option. src/SConscript: Set up symlinking of source files into build directory (by not disabling the default behavior). Add SWIG-generated main_wrap.cc to source list. src/arch/SConscript: Set up symlinking of source files into build directory (by not disabling the default behavior). src/arch/alpha/ev5.cc: src/arch/alpha/isa/decoder.isa: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/trace/opt_cpu.cc: src/cpu/trace/trace_cpu.cc: src/sim/pseudo_inst.cc: src/sim/root.cc: src/sim/serialize.cc: src/sim/syscall_emul.cc: SimExit() is now exitSimLoop(). src/cpu/base.cc: SimExitEvent is now SimLoopExitEvent src/python/SConscript: Add SWIG build command for main.i. Use python/m5 in build dir as source for zip archive... easy now with file duplication enabled. src/python/m5/__init__.py: - Move copyright notice back to C++ so we can print it right away, even for interactive sessions. - Get rid of argument parsing code; just provide default option descriptors for user script to call optparse with. - Don't clutter m5 namespace by sucking in all of m5.config and m5.objects. - Move instantiate() function here from config.py. src/python/m5/config.py: - Move instantiate() function to __init__.py. - Param.Foo deferred type lookups must use m5.objects namespace now (not m5). src/python/m5/objects/AlphaConsole.py: src/python/m5/objects/AlphaFullCPU.py: src/python/m5/objects/AlphaTLB.py: src/python/m5/objects/BadDevice.py: src/python/m5/objects/BaseCPU.py: src/python/m5/objects/BaseCache.py: src/python/m5/objects/Bridge.py: src/python/m5/objects/Bus.py: src/python/m5/objects/CoherenceProtocol.py: src/python/m5/objects/Device.py: src/python/m5/objects/DiskImage.py: src/python/m5/objects/Ethernet.py: src/python/m5/objects/Ide.py: src/python/m5/objects/IntrControl.py: src/python/m5/objects/MemObject.py: src/python/m5/objects/MemTest.py: src/python/m5/objects/Pci.py: src/python/m5/objects/PhysicalMemory.py: src/python/m5/objects/Platform.py: src/python/m5/objects/Process.py: src/python/m5/objects/Repl.py: src/python/m5/objects/Root.py: src/python/m5/objects/SimConsole.py: src/python/m5/objects/SimpleDisk.py: src/python/m5/objects/System.py: src/python/m5/objects/Tsunami.py: src/python/m5/objects/Uart.py: Fix up imports (m5 namespace no longer includes m5.config). src/sim/eventq.cc: src/sim/eventq.hh: Support for Python-called simulate() function: - Use IsExitEvent flag to signal events that want to exit the simulation loop gracefully (instead of calling exit() to terminate the process). - Modify interface to hand exit event object back to caller so it can be inspected for cause. src/sim/host.hh: Add MaxTick constant. src/sim/main.cc: Move copyright notice back to C++ so we can print it right away, even for interactive sessions. Use PYTHONPATH environment var to set module path (instead of clunky code injection method). Move main control from here into Python: - Separate initialization code and simulation loop into separate functions callable from Python. - Make Python interpreter invocation more pure (more like directly invoking interpreter). Add -i and -p flags (only options on binary itself; other options processed by Python). Import readline package when using interactive mode. src/sim/sim_events.cc: SimExitEvent is now SimLoopExitEvent, and uses IsSimExit flag to terminate loop (instead of exiting simulator process). src/sim/sim_events.hh: SimExitEvent is now SimLoopExitEvent, and uses IsSimExit flag to terminate loop (instead of exiting simulator process). Get rid of a few unused constructors. src/sim/sim_exit.hh: SimExit() is now exitSimLoop(). Get rid of unused functions. Add comments. |
2665:a124942bacb8 |
31-May-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Updated Authors from bk prs info |
2654:9559cfa91b9d |
30-May-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/m5 into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem
SConstruct: src/SConscript: src/arch/SConscript: src/arch/alpha/faults.cc: src/arch/alpha/tlb.cc: src/base/traceflags.py: src/cpu/SConscript: src/cpu/base.cc: src/cpu/base.hh: src/cpu/base_dyn_inst.cc: src/cpu/cpu_exec_context.cc: src/cpu/cpu_exec_context.hh: src/cpu/exec_context.hh: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/regfile.hh: src/cpu/ozone/cpu.hh: src/cpu/simple/base.cc: src/cpu/base_dyn_inst.hh: src/cpu/o3/2bit_local_pred.cc: src/cpu/o3/2bit_local_pred.hh: src/cpu/o3/alpha_cpu.cc: src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_dyn_inst.cc: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/alpha_impl.hh: src/cpu/o3/alpha_params.hh: src/cpu/o3/bpred_unit.cc: src/cpu/o3/bpred_unit.hh: src/cpu/o3/bpred_unit_impl.hh: src/cpu/o3/btb.cc: src/cpu/o3/btb.hh: src/cpu/o3/comm.hh: src/cpu/o3/commit.cc: src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu_policy.hh: src/cpu/o3/decode.cc: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.cc: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/free_list.cc: src/cpu/o3/free_list.hh: src/cpu/o3/iew.cc: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.cc: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/mem_dep_unit.cc: src/cpu/o3/mem_dep_unit.hh: src/cpu/o3/mem_dep_unit_impl.hh: src/cpu/o3/ras.cc: src/cpu/o3/ras.hh: src/cpu/o3/rename.cc: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/rename_map.cc: src/cpu/o3/rename_map.hh: src/cpu/o3/rob.cc: src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: src/cpu/o3/sat_counter.cc: src/cpu/o3/sat_counter.hh: src/cpu/o3/store_set.cc: src/cpu/o3/store_set.hh: src/cpu/o3/tournament_pred.cc: src/cpu/o3/tournament_pred.hh: Hand merges. |
2632:1bb2f91485ea |
22-May-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
New directory structure: - simulator source now in 'src' subdirectory - imported files from 'ext' repository - support building in arbitrary places, including outside of the source tree. See comment at top of SConstruct file for more details. Regression tests are temporarily disabled; that syetem needs more extensive revisions.
SConstruct: Update for new directory structure. Modify to support build trees that are not subdirectories of the source tree. See comment at top of file for more details. Regression tests are temporarily disabled. src/arch/SConscript: src/arch/isa_parser.py: src/python/SConscript: Update for new directory structure. |