14298:e1c8c253ce95 |
13-Sep-2019 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu: Fix checker cpu instantiation
This change uses the params as instantiated from the default constructor to create the checker cpu. If any of these parameters are invalid for the checker cpu, the simulation will exit with a warning.
Change-Id: I0e58ed096c9ea5f413f2e9b64d8d184d9b0fc84e Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21079 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14297:b4519e586f5e |
10-Sep-2019 |
Jordi Vaquero <jordi.vaquero@metempsy.com> |
cpu, mem: Changing AtomicOpFunctor* for unique_ptr<AtomicOpFunctor>
This change is based on modify the way we move the AtomicOpFunctor* through gem5 in order to mantain proper ownership of the object and ensuring its destruction when it is no longer used.
Doing that we fix at the same time a memory leak in Request.hh where we were assigning a new AtomicOpFunctor* without destroying the previous one.
This change creates a new type AtomicOpFunctor_ptr as a std::unique_ptr<AtomicOpFunctor> and move its ownership as needed. Except for its only usage when AtomicOpFunc() is called.
Change-Id: Ic516f9d8217cb1ae1f0a19500e5da0336da9fd4f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20919 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14219:64ff727176ba |
27-Aug-2019 |
Ciro Santilli <ciro.santilli@arm.com> |
cpu: reset byte_enable across writeMem calls
data_write_req byteEnable which is used in ARM SVE partial writes was not being zeroed between writes.
As a result, non-SVE memory write instructions such as STP that followed SVE memory write instructions could still have the write mask active.
This could lead to wrong simulation behaviour, and to an assertion failure:
src/mem/packet.hh:1211: void Packet::writeData(uint8_t*) const: Assertion `req->getByteEnable().size() == getSize()' failed. '`
Change-Id: I74b5a82675e9923b0ffdf2c1dd9afb00c91cb204 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20448 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14207:2e03de47c687 |
26-Jun-2019 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Convert traffic gen to use new stats
Change-Id: Ife690a137c2dcfb6bcc8b22df996c84f0d231618 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19370 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
14198:9c2f67392409 |
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Make get(Data|Inst)Port return a Port and not a MasterPort.
No caller uses any of the MasterPort specific properties of these function's return values, so we can instead return a reference to the base Port class. This makes it possible for the data and inst ports to be of any port type, not just gem5 style MasterPorts. This makes life simpler for, for example, systemc based CPUs which might have TLM ports.
It also makes it possible for any two CPUs which have compatible ports to be switched between, as long as the ports they use support being unbound. Unfortunately that does not include TLM or systemc ports which are bound permanently.
Change-Id: I98fce5a16d2ef1af051238e929dd96d57a4ac838 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20240 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> |
14197:26cca0c29be6 |
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu, mem: Add new getSendFunctional method to the base CPU.
This returns a sendFunctional delegate references which can be used to send functional accesses directly, or more likely when constructing a PortProxy subclass. In those cases only the functional capabilities of those ports are needed so there's no reason to require a full port which supports all three protocols. Also, this removes the last remaining use of get(Data|Inst)Port which relies on those returning a port which supports the gem5 protocols, except the default implementations of this new function. If a CPU doesn't have traditional gem5 style ports, it can override this function to do whatever other behavior is necessary and return its real ports through get(Data|Inst)Port.
Change-Id: Ide4da81e3bc679662cd85902ba6bd537cce54a53 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20237 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
14195:c5efdb3319aa |
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Move the instruction port into o3's fetch stage.
That's where it's used, and that avoids having to pass it around using the top level getInstPort accessor.
Change-Id: I489a3f3239b3116292f3dcd78a3945fb468c6311 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20239 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
14194:967b9c450b04 |
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Move O3's data port into the LSQ.
That's where it's used, and putting it there avoids having to pass around the port using the top level getDataPort function.
Change-Id: I0dea25d0c5f4bb3f58a6574a8f2b2d242784caf2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20238 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
14192:595a4358b844 |
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu, dev, mem: Use the new Port methods.
Use getPeer, takeOverFrom, and << to simplify the use of ports in some areas.
Change-Id: Idfbda27411b5d6b742f5e4927894302ea6d6a53d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20235 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
14184:11ac1337c5e2 |
16-Aug-2019 |
Gabe Black <gabeblack@google.com> |
mem: Move ruby protocols into a directory called ruby_protocol.
Now that the gem5 protocols are split out, it would be nice to put them in their own protocol directory. It's also confusing to have files called *_protocol which are not in the protocol directory.
Change-Id: I7475ee111630050a2421816dfd290921baab9f71 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20230 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14176:c6c06f180cb9 |
23-Jul-2019 |
Ciro Santilli <ciro.santilli@arm.com> |
arch-arm, cpu: fix ARM ubsan build on GCC 7.4.0
In src/cpu/reg_class.hh, numPinnedWrites was unset because the constructors were not well factored out.
Change-Id: Ib2fc8d34a1adf5c48826d257a31dd24dfa64a08a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20048 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14147:638fe1150005 |
07-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Pull more arch specialization to the top of BaseCPU.py.
This simplifies the logic of the CPU python class, and brings us ever so slightly closer to factoring hardcoded ISA behavior out of non-ISA specific components.
Change-Id: I7e4511dd4e6076f5c214be5af2a0e33af0142563 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19889 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
14145:066ba9040e5e |
08-Aug-2019 |
Gabe Black <gabeblack@google.com> |
x86: Move some fixed or dummy config information into X86LocalApic.py.
The X86 local APIC doesn't actually use the pio_addr set in the config and instead computes what address it will respond to based on the initial ID of the CPU it's attached to. gem5's BasicPioDevice, which the X86LocalApic class inherits from, does not provide a default value for that parameter and will complain if *something* isn't set. The value used, 0x2000000000000000, is a dummy value which is the base of the region of the physical address space set aside for messages to local APICs from the CPU and from other local APICs.
Also, the clock for the local APIC's timer is defined to be the bus clock. The assumption seems to be that this has a 16:1 ratio with the CPU clock, and I vaguely remember finding that that was more or less unofficially true, even if it isn't necessary stringently defined to be that.
Since we were already just assuming that that ratio was correct and always setting up the local APICs clock that way, we can do that in the X86LocalApic class definition and remove some special x86 specific setup that we'd otherwise need for the x86 version of the Interrupt class. If that's not correct, it can still be overridden somewhere else in the config.
Change-Id: I50e84f899f44b1191c2ad79d05803b44f07001f9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19968 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14136:67b0ce25b683 |
05-Aug-2019 |
Jordi Vaquero <jordi.vaquero@metempsy.com> |
cpu-o3: fix atomic instructions non-speculative
Fix problem with O3 and AMO instructions. At initial stages amo instruction is considered a type of non-speculative store. After the instruction has been commited and during the squash step, acquire_release version of the AMO operation is considered speculative, that differents results in an assert fault. This fix ensures that AMO instructions are always considered non-speculative, during early stages and during squas/removal of the instruction.
Change-Id: Ia0c5fbb9dc44a9991337b57eb759b1ed08e4149e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19815 Maintainer: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14135:a0affe46d00a |
26-Jul-2019 |
Jordi Vaquero <jordi.vaquero@metempsy.com> |
cpu-o3: added _amo_op parameter in o3 LSQ
Fix bug with AMO (or RMW) instructions where the amo_op variable is not being propagated to the LSQ request.
Change-Id: I60c59641d9b497051376f638e27f3c4cc361f615 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19814 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> |
14112:fc1d29d3f09a |
04-Feb-2019 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu: Fix the type of the effective mem request size
A memory request size can be larger than 255 bytes (e.g. SVE with 2048-bit vector length) which could cause overflow in the 'uint8_t effSize' variable.
Change-Id: I77e0d02a49ea7f81cacfa5be7e4ae40434af3109 Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19175 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
14111:14c05f862590 |
15-Nov-2018 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu-o3: Fix too strict assert condition in writeback()
The assert() in the LSQ writeback() only allowed ReExec faults. However, a SplitRequest which completed the translation in PartialFault state (i.e. any but the very first cacheline translation failed) may end up here. The assert() condition is extended accordingly.
The patch also removes the superfluous/unused Complete/Squashed states from the LSQ request. (The completion of the request is recorded in the flags still.)
Change-Id: Ie575f4d3b4d5295585828ad8c7d3f4c7c1fe15d0 Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19174 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
14105:969b4e972b07 |
27-Feb-2019 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu: Add first-/non-faulting load support to Minor and O3
Some architectures allow masking faults of memory load instructions in some specific circumstances (e.g. first-faulting and non-faulting loads in Arm SVE). This patch adds support for such loads in the Minor and O3 CPU models.
Change-Id: I264a81a078f049127779aa834e89f0e693ba0bea Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19178 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14085:0075b0d29d55 |
28-Jun-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: isDrained renamed to isCpuDrained
cpu models inheriting from BaseCPU implement a draining checker called isDrained. This hides the base Drainable::isDrained method and might create confusion in the reader. This patch is renaming it to isCpuDrained in order to avoid any ambiguity
Change-Id: Ie5221da6a4673432c2403996e42d451cae960bbf Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19468 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14083:057fe59ed45a |
12-Jul-2019 |
Pouya Fotouhi <Pouya.Fotouhi@amd.com> |
cpu-o3: Set packet data type for IPR read
This change assigns packet data type to static for IPR read. Caused by change (e13d6dc9c0d7a4ae0215f1ee6793eb32570c5169), and has been reported a few times in the mailing list.
Change-Id: I0f02c20a16824e220df876e9e552bbc1c9636f95 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19449 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14081:f99ed78e5263 |
12-Jun-2019 |
Javier Bueno Hedo <javier.bueno@metempsy.com> |
cpu: Added the Multiperspective Perceptron Predictor with TAGE (8KB and 64KB)
Described by the following article: Jiménez, D. "Multiperspective perceptron predictor with TAGE." Championship Branch Prediction (CBP-5) (2016).
Change-Id: Ica3c121a4c94657d9015573085040e8a1984b069 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19188 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> |
14080:4472576445e7 |
15-Feb-2018 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu-o3: Reset fault status for mem access in pushRequest
Reset the fault status always before translation is initiated in pushRequest() in the LSQ. This avoids the problem when a strictly ordered load needs to be re-executed multiple times. If the translation is delayed at one of those attempts then the internal panicFault (from the previous execution attempt) can get fired at commit.
Change-Id: I0c22b2f7afd6e2cb00bc359a4a01042efd2d01d2 Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19388 Reviewed-by: Ciro Santilli <ciro.santilli@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14055:7c0185348b9b |
29-Jan-2019 |
Tiago Muck <tiago.muck@arm.com> |
cpu: Additional TrafficGen stats
Additional stats to keep track of read/write latencies and throughput.
Change-Id: I7684cd33cf68fffdef4ca9c3a6db360a0f531c18 Signed-off-by: Tiago Muck <tiago.muck@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18418 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14054:01ad1bff9630 |
28-Jan-2019 |
Tiago Muck <tiago.muck@arm.com> |
cpu: Limit TrafficGen outstanding reqs
Parameter to limit the number of requests waiting for a response.
Change-Id: I6cf9e8782a06ae978fb66f7c4278f4c9e9980c79 Signed-off-by: Tiago Muck <tiago.muck@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18417 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14053:9267d4f16524 |
28-Jan-2019 |
Tiago Muck <tiago.muck@arm.com> |
cpu: TrafficGen as BaseCPU
TrafficGen has additional attributes to behave like a BaseCPU. Python scripts that expect sim. objects derived from BaseCPU can now be used with TrafficGen without additional modifications.
Change-Id: Iee848b2ba0ac1851c487b1003da9bd96253d291a Signed-off-by: Tiago Muck <tiago.muck@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18416 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14034:937e704c6807 |
13-Feb-2019 |
Javier Bueno <javier.bueno@metempsy.com> |
cpu: Added the Multiperspective Perceptron Predictor (8KB and 64KB)
Described by the following article: Jiménez, D. "Multiperspective perceptron predictor." Championship Branch Prediction (CBP-5) (2016).
Change-Id: Iaa68ead7696e0b6ba05b4417d0322e8053e10d30 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/15495 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14030:a58e14bf581c |
08-Feb-2018 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu-o3: Increase LSQ buffer sizes to match max vector length
Change-Id: I5890c7cfa147125ce3389001f85d56d4b5a9911d Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13525 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Michael LeBeane <Michael.Lebeane@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
14027:91889263c6d1 |
24-Jan-2019 |
Tiago Muck <tiago.muck@arm.com> |
cpu: Fix rescheduling of progress check events
noRequestEvent needs to be rescheduled on recvRetry, otherwise the timeout may be triggered even though packets are being eventually sent. noResponseEvent scheduling is also fixed. This timeout should not be active when we are not expecting a response.
Change-Id: If9edb75b5b803caf9f99bf41ea3948b15a3f3d71 Signed-off-by: Tiago Muck <tiago.muck@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18793 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14025:3a133070aa2e |
26-Feb-2018 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
cpu-o3: Add support for pinned writes
This patch adds support for pinning registers for a certain number of consecutive writes. This is only relevant for timing CPU models (functional-only models are unaffected), and it is primarily needed to provide a realistic execution model for micro-coded operations whose microops can write to non-overlapping portions of a destination register, e.g. vector gather loads. In those cases, this mechanism can disable renaming for a sequence of consecutive writes, thus making the resulting execution more efficient: allocating a new physical register for each microop would introduce a read-modify-write chain of dependencies, while with these modifications the microops can write back in parallel.
Please note that this new feature is only leveraged by O3CPU for the time being.
Additional authors: - Gabor Dozsa <gabor.dozsa@arm.com>
Change-Id: I07eb5fdbd1fa0b748c9bdc1174d9f330fda34f81 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13520 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14024:abe47b13653d |
02-May-2019 |
Gabe Black <gabeblack@google.com> |
arch, base, cpu, gpu, sim: Merge getMemProxy and getVirtProxy.
These two functions were performing the same function but had two different names for historical reasons. This change merges them together, keeping the getVirtProxy name to be consistent with the getPhysProxy method used to get a non-translating proxy port.
Change-Id: Idd83c6b899f9343795075b030ccbc723a79e52a4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18581 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
14023:40eb7ed47e61 |
02-May-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Store the translating proxy with the same pointer in SE or FS mode.
Only one is active at a time, so they can share the same pointer.
Change-Id: Ie4ae1f0ffbf9448f6730f9c7d072bc85d6d423da Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18580 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> |
14022:a7cdc33dab35 |
02-May-2019 |
Gabe Black <gabeblack@google.com> |
cpu, sim: Return PortProxy &s from all the proxy accessors.
This is a step towards merging the accessors for SE and FS modes.
Change-Id: I76818ab88b97097ac363e243be9cc1911b283090 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18579 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
14016:265e8272c728 |
25-May-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
cpu: Added correct return type for ROB::countInsts
- return size_t (unsigned) according to the .size() return type - fixed typo in doc (source of warning with some compilers)
Change-Id: I48ee2e317cf41011a6fcb5ca45aef67e75329bfa Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18948 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14003:2b48980363fe |
23-May-2019 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
cpu: Remove assert causing issues with x86 Linux boot
Change-Id: I5e0b189ced0dd59ac6dbbb2c498c068e132b9b93 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18910 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13981:577196ddd040 |
02-May-2019 |
Gabe Black <gabeblack@google.com> |
arch, base, cpu, dev, mem, sim: Remove #if 0-ed out code.
This code will be preserved through version control, but otherwise creates clutter and will rot in place since it's never compiled.
Change-Id: Id265f6deac445116843956ea5cf1210d8127274e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18608 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13966:3189413c5894 |
01-Mar-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
Revert "cpu: fix how a thread starts up in MinorCPU"
This reverts commit 02dafc5498750d9734ba8f2a1608a846f90b71d1. The commit was part of a patchset which broke MinorCPU regressions (switcheroo)
Change-Id: I0a8098fc71abe5838014e587dbe372b258d8aa9f Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18604 Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13965:347e04956cfe |
01-Mar-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
Revert "cpu: stop scheduling suspended threads in MinorCPU"
This reverts commit 6a6668bbc4b038b98eb3ee64ffb034719316afd9. The commit was part of a patchset which broke MinorCPU regressions (switcheroo)
Change-Id: I3c16a6478ba44b9d27cdd3d64a710a356999df05 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18603 Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13964:e4dbd156a640 |
01-Mar-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
Revert "cpu: fix branching when thread is suspended in MinorCPU"
This reverts commit e437086341712f1435db655b3527ea29b3311f4e. The commit was part of a patchset which broke MinorCPU regressions (switcheroo)
Change-Id: Ib8482034c2402008ccfa552325a8eb31e731b619 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18602 Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13960:e1ab93677110 |
10-Apr-2019 |
Daniel <odanrc@yahoo.com.br> |
base: Move SatCounter to base directory
Saturating counters are used by many objects, not only the cpu predictors. Therefore, move the class to the base folder so that it can be more easily used.
Change-Id: I26f799324bdd8720ab8834c72a2002149cee777c Signed-off-by: Daniel <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17993 Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
13959:ea907b02c800 |
05-Apr-2019 |
Daniel <odanrc@yahoo.com.br> |
cpu: Revamp saturating counters
Revamp the SatCounter class, improving comments, implementing increment, decrement and read operators to solve an old todo, and adding missing error checking.
Change-Id: Ia057c423c90652ebd966b6b91a3471b17800f933 Signed-off-by: Daniel <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17992 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13957:25e9c77a8a99 |
06-Jan-2019 |
Jairo Balart <jairo.balart@metempsy.com> |
cpu: Make the indirect predictor into a SimObject
Change-Id: Ice6549773def7d3e944fae450d4a079bc351e2ba Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/15319 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13954:2f400a5f2627 |
07-Jul-2017 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
cpu,mem: Add support for partial loads/stores and wide mem. accesses
This changeset adds support for partial (or masked) loads/stores, i.e. loads/stores that can disable accesses to individual bytes within the target address range. In addition, this changeset extends the code to crack memory accesses across most CPU models (TimingSimpleCPU still TBD), so that arbitrarily wide memory accesses are supported. These changes are required for supporting ISAs with wide vectors.
Additional authors: - Gabor Dozsa <gabor.dozsa@arm.com> - Tiago Muck <tiago.muck@arm.com>
Change-Id: Ibad33541c258ad72925c0b1d5abc3e5e8bf92d92 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13518 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> |
13953:43ae8a30ec1f |
23-Oct-2018 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
cpu: Add a memory access predicate
This changeset introduces a new predicate to guard memory accesses. The most immediate use for this is to allow proper handling of predicated-false vector contiguous loads and predicated-false micro-ops of vector gather loads (added in separate changesets).
Change-Id: Ice6894fe150faec2f2f7ab796a00c99ac843810a Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17991 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Bradley Wang <radwang@ucdavis.edu> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> |
13910:d5deee7b4279 |
28-Apr-2019 |
Gabe Black <gabeblack@google.com> |
cpu: alpha: Delete all occurrances of the simPalCheck function.
This is now handled within the ISA description.
Change-Id: Ie409bb46d102e59d4eb41408d9196fe235626d32 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18434 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13908:6ab98c626b06 |
27-Apr-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Remove hwrei from the generic interfaces.
This mechanism is specific to Alpha and doesn't belong sprinkled around the CPU's generic mechanisms.
Change-Id: I87904d1a08df2b03eb770205e2c4b94db25201a1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18432 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13905:5cf30883255c |
27-Apr-2019 |
Gabe Black <gabeblack@google.com> |
arch: cpu: Track kernel stats using the base ISA agnostic type.
Then cast to the ISA specific type when necessary. This removes (mostly) an ISA specific aspect to some of the interfaces. The ISA specific version of the kernel stats still needs to be constructed and stored in a few places which means that kernel_stats.hh still needs to be a switching arch header, for instance.
In the future, I'd like to make the kernel its own object like the Process objects in SE mode, and then it would be able to instantiate and maintain its own stats.
Change-Id: I8309d49019124f6bea1482aaea5b5b34e8c97433 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18429 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13900:d4bcfecd871e |
28-Apr-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Get rid of the (read|set)RegOtherThread methods.
These are implemented by MIPS internally now.
Change-Id: If7465e1666e51e1314968efb56a5a814e62ee2d1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18436 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13898:bef1a83c6504 |
27-Apr-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Include debug flags regardless of whether the ISA is null.
Whether debug flags are available has no interaction with what the ISA is.
Change-Id: I71d9204f948618831796e6c7a4c16bbebfb1a4fb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18428 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13892:0182a0601f66 |
22-Apr-2019 |
Gabe Black <gabeblack@google.com> |
mem: Minimize the use of MemObject.
MemObject doesn't provide anything beyond its base ClockedObject any more, so this change removes it from most inheritance hierarchies. Occasionally MemObject is replaced with SimObject when I was fairly confident that the extra functionality of ClockedObject wasn't needed.
Change-Id: Ic014ab61e56402e62548e8c831eb16e26523fdce Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18289 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Gabe Black <gabeblack@google.com> |
13875:656d633621fa |
23-Apr-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
cpu,mem: missing override specifier
Change-Id: I731d3ef021596450ac307461f215760a148bb28a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18348 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13865:cca49fc49c57 |
13-Apr-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Eliminate the ProxyThreadContext class.
Replace it with direct inheritance from the ThreadContext class in the SimpleThread class which was the only place it was used.
Also take the opportunity to use some specialized types instead of ints, etc., add some consts, and fix some style issues.
Change-Id: I5d2cfa87b20dc43615e33e6755c9d016564e9c0e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18048 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13843:2d8dfe55d22a |
29-Mar-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: O3 switchFreeList checking VecElems instead of FloatRegs
Vector elements should be checked instead of floats since those are the ones mapped to the vector registers.
Change-Id: I36088ab90e63720d846fcf5b43360da105b6c736 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17850 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13836:dc081ef41ab8 |
28-Feb-2019 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu: Correctly account for executed instructions in simple cpus
Change-Id: I53f34b2d9db6e4d2e03dde42a970764bb2a5e701 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17730 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13831:4fba790d88be |
06-Mar-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
misc: Removed inconsistency in O3* debug msgs
Added consistency in the DEBUG message form, to allow a better parsing. Fixed sn/tid type parameter. Removed some annoying newlines
Change-Id: I4761c49fc12b874a7d8b46779475b606865cad4b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17248 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13830:b5d6aa6c0e99 |
25-Mar-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
arch-mips: added missing override specifier (o3)
Change-Id: Ic538825a2964fd62def672b933a83067a15bd12a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17648 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13818:f0126488ef9e |
26-Mar-2019 |
Javier Bueno <javier.bueno@metempsy.com> |
cpu: Added a probe to notify the address of retired instructions
A probe is added to notify the address of each retired instruction.
Change-Id: Iefc1b09d74b3aa0aa5773b17ba637bf51f5a59c9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17632 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13810:f50e3b82df73 |
01-Mar-2019 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fixed the indirect branch predictor GHR handling
The internal indirect predictor global history was not being updated properly, resulting in higher than expected miss rates
Also added a parameter to set the size of the indirect predictor GHR
Change-Id: Ibc797816974cba6719da65122801e8919559a003 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reported-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/16928 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Andrea Mondelli <Andrea.Mondelli@ucf.edu> Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13799:15badf7874ee |
19-Mar-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
misc: missing override specifier
Missing specifier of overridden virtual function declared in sim_object.hh
Removed redundant "virtual" keyword
Change-Id: I42aa3349b537c9e62607bce20cf1b3aabdb99bf2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17468 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> |
13787:f6bba838e2ff |
21-Mar-2019 |
Ryan Gambord <gambordr@oregonstate.edu> |
cpu-kvm: Added informative error message
PerfKvmCounter::attach fails if the user doesn't have privileges to make the perf_event_open syscall. This is the default privilege setting since kernel 4.6. I've seen some users in the mailing list resort to running as root; changing the perf_event_paranoid setting is an alternative.
Change-Id: I2bc6f76abb6e97bf34b408a611f64b1910f50a43 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17508 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13784:1941dc118243 |
07-Mar-2019 |
Gabe Black <gabeblack@google.com> |
arch, cpu, dev, gpu, mem, sim, python: start using getPort.
Replace the getMasterPort, getSlavePort, and getEthPort functions with getPort, and remove extraneous mechanisms that are no longer necessary.
Change-Id: Iab7e3c02d2f3a0cf33e7e824e18c28646b5bc318 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17040 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13762:36d5a1d9f5e6 |
01-Mar-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
cpu: Refactor of Physical Register implementation
The implementation of the PhyRegId class is shared between multiple cpu models. The o3/misc.hh should only be included in o3 models.
This patch removes the dependencies between different model implementations, allowing to add new O3-like CPU model.
Change-Id: Ibb812517043befe75c48fab3ce9605a0d272870b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/16908 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Bradley Wang <radwang@ucdavis.edu> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13759:9941fca869a9 |
16-Oct-2018 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
arch-arm,cpu: Add initial support for Arm SVE
This changeset adds initial support for the Arm Scalable Vector Extension (SVE) by implementing: - support for most data-processing instructions (no loads/stores yet); - basic system-level support.
Additional authors: - Javier Setoain <javier.setoain@arm.com> - Gabor Dozsa <gabor.dozsa@arm.com> - Giacomo Travaglini <giacomo.travaglini@arm.com>
Thanks to Pau Cabre for his contribution of bugfixes.
Change-Id: I1808b5ff55b401777eeb9b99c9a1129e0d527709 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13515 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13741:d994984b842a |
22-Feb-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
mem-cache: alias to mem::getMasterPort in TLB class
TLB:getMasterPort is used to obtain the PageWalkMasterPort if present and hides the BaseTLB::getMasterPort().
The TLB::getMasterPort() is renamed according to the expected behavior.
Change-Id: If4f61189094a706d59805cd10f4f814e5830eda8 Reviewed-on: https://gem5-review.googlesource.com/c/16648 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13734:a57152849a55 |
11-Feb-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
misc: Segmentation Fault during O3PipeView execution
During the O3PipeView execution, a potential invalid iterator is used to Update the instruction storeTick field.
If the store_idx iterator is the first() of the StoreQueue, the corresponding instruction is removed from the queue, leaving the iterator invalid and not usable in the TRACING_ON block.
This patch uses the store_inst variable to access (and update) the instruction tick, instead of the (potential) invalid one.
Change-Id: I671052ef282b9048e5239da8629b89e8afa86bf0 Reviewed-on: https://gem5-review.googlesource.com/c/16322 Maintainer: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
13733:e0829ba19545 |
26-Feb-2019 |
Srikant Bharadwaj <srikant.bharadwaj@amd.com> |
cpu: Fix indirect branch history updates
Recent changes to indirect branch predictor interface accesses non-existent buffers even when indirect predictor is not in use.
Change-Id: I0df9ac4d5f6f3cb63e4d1bd36949c27f7611eef6 Reviewed-on: https://gem5-review.googlesource.com/c/16668 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
13711:e796a82c5154 |
27-Jan-2019 |
Andreas Sandberg <andreas.sandberg@arm.com> |
python: Fix param -> int conversion issues
Python 3 doesn't convert params to integers automatically in range(). Add __index__ to CheckedInt to enable implicit conversions again. Add explicit conversions where necessary.
Change-Id: I2de6c9906d3bb7616f12ada6728b9e4b1928511c Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/16000 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13710:5ba1d8066ef0 |
25-Jun-2018 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu-o3: Add cache read ports limit to LSQ
This change introduces cache read ports to limit the number of per-cycle loads. Previously only the number of per-cycle stores could be limited.
Change-Id: I39bbd984056c5a696725ee2db462a55b2079e2d4 Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13517 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13709:dd6b7ac5801f |
26-Jan-2019 |
Andreas Sandberg <andreas.sandberg@arm.com> |
python: Make iterator handling Python 3 compatible
Many functions that used to return lists (e.g., dict.items()) now return iterators and their iterator counterparts (e.g., dict.iteritems()) have been removed. Switch calls to the Python 2.7 iterator methods to use the Python 3 equivalent and add explicit list conversions where necessary.
Change-Id: I0c18114955af8f4932d81fb689a0adb939dafaba Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15992 Reviewed-by: Juha Jäykkä <juha.jaykka@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
13693:85fa3a41014b |
14-Feb-2019 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
cpu: Add ISA* getter in Thread interface
This patch is adding a ISA* getter to the TC interface
Change-Id: Ib8ddc5d8fdd44e782f50a2ad15878a6bcf931e58 Reviewed-on: https://gem5-review.googlesource.com/c/16462 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
13688:5bb3bf2f2559 |
15-Feb-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Fix fast build broken due to unused variable
This fixes fast build for commit 25dc765889d948693995cfa622f001aa94b5364b (fast build is striping out assertions)
Change-Id: I9536ad58a3d85990b16a1f8c2515f6bf5d3acf71 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/16463 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13685:bb3377c81303 |
29-Jan-2019 |
Javier Bueno <javier.bueno@metempsy.com> |
cpu: Added 8KB and 64KB TAGE-SC-L branch predictor
The original paper of the branch predictor can be found here: http://www.jilp.org/cbp2016/paper/AndreSeznecLimited.pdf
Change-Id: I684863752407685adaacedebb699205c3559c528 Reviewed-on: https://gem5-review.googlesource.com/c/14855 Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13665:9c7fe3811b88 |
25-Jan-2019 |
Andreas Sandberg <andreas.sandberg@arm.com> |
python: Don't assume SimObjects live in the global namespace
The importer in Python 3 doesn't like the way we import SimObjects from the global namespace. Convert the existing SimObject declarations to import from m5.objects. As a side-effect, this makes these files consistent with configuration files.
Change-Id: I11153502b430822130722839e1fa767b82a027aa Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15981 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> |
13654:dc3878f03a0c |
06-Jan-2019 |
Jairo Balart <jairo.balart@metempsy.com> |
cpu: Proposal for changing the indirect branch predictor interface
Now the indirect branch predictor handles its own GHR instead of getting the one from the direction predictor.
Also, now the commit method of the indirect predictor is called for every pending branch on an update, as the indirect predictors may want to update their interal structures/histories with the information of each branch.
Change-Id: I7053fbea42a53960a3bc1ba32912cc99c160511e Reviewed-on: https://gem5-review.googlesource.com/c/15318 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13652:45d94ac03a27 |
22-Jan-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: support atomic memory request type with AtomicOpFunctor
This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU, MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory system.
Atomic memory instruction is treated as a special store instruction in all CPU models.
In simple CPUs, an AMO request with an associated AtomicOpFunctor is simply sent to L1 dcache.
In MinorCPU, an AMO request bypasses store buffer and waits for any conflicting store request(s) currently in the store buffer to retire before the AMO request is sent to the cache. AMO requests are not buffered in the store buffer, so their effects appear immediately in the cache.
In DerivO3CPU, an AMO request is inserted in the store buffer so that it is delivered to the cache only after all previous stores are issued to the cache. Data forwarding between between an outstanding AMO in the store buffer and a subsequent load is not allowed since the AMO request does not hold valid data until it's executed in the cache.
This implementation assumes that a target ISA implementation must insert enough memory fences as micro-ops around an atomic instruction to enforce a correct order of memory instructions with respect to its memory consistency model. Without extra memory fences, this implementation can allow AMOs and other memory instructions that do not conflict (i.e., not target the same address) to reorder.
This implementation also assumes that atomic instructions execute within a cache line boundary since the cache for now is not able to execute an operation on two different cache lines in one single step. Therefore, ISAs like x86 that require multi-cache-line atomic instructions need to either use a pair of locking load and unlocking store or change the cache implementation to guarantee the atomicity of an atomic instruction.
Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a Reviewed-on: https://gem5-review.googlesource.com/c/8188 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13647:7a9b7c0373b1 |
02-Apr-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: fix how branching is handled when a thread is suspended in MinorCPU
When a thread is suspended, all instructions after the suspension need to be discarded since the thread will take a different execution stream when it wakes up.
To do that, in MinorCPU, whenever a thread gets suspended, we change the current execution stream by updating the current branch with BranchData::SuspendThread reason.
Change-Id: I7cdcda22c1cf6e8ac8db8800b7d9ec052433fdf3 Reviewed-on: https://gem5-review.googlesource.com/c/9626 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@gmail.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13646:626670cc6da4 |
02-Apr-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: stop scheduling suspended threads in all stages of MinorCPU
This patch makes suspended threads non-schedulable in Fetch1, Fetch2, Decode and Execute stages in MinorCPU.
Change-Id: Ie79857e13b7b782d9c58c32310993a132b609cf9 Reviewed-on: https://gem5-review.googlesource.com/c/9625 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@gmail.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13644:6180ee72e061 |
02-Apr-2018 |
Tuan Ta <qtt2@cornell.edu> |
sim,cpu: make exit_group halt all threads in a group
When a thread calls exit_group, in addition to halting the thread itself, it needs to halt all other threads in its group (i.e., threads sharing the same thread group ID). This patch enables threads to do that.
Change-Id: Ib2e158fb27cf98843f177a64a2d643b1bbc94d03 Reviewed-on: https://gem5-review.googlesource.com/c/9623 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13641:648f3106ebdf |
02-Apr-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: fixed how O3 CPU executes an exit system call
When a thread executed an exit syscall in SE mode, the thread context was removed immediately in the same cycle, which left inflight squash operations and trap event incomplete. The problem happened when a new thread was assigned to the CPU later. The new thread started with some incomplete transactions of the previous thread (e.g., squashing). This problem could cause incorrect execution flow for the new thread (i.e., pc was not reset properly at the exit point), deadlock (i.e., some stage-to-stage signals were not reset) and incorrect rename map between logical and physical registers.
This patch adds a new state called 'Halting' to the thread context and defers removing thread context from a CPU until a trap event initiated by an exit syscall execution is processed. This patch also makes sure that the removal of a thread context happens after all inflight transactions of the to-be-removed thread in the pipeline complete.
Change-Id: If7ef1462fb8864e22b45371ee7ae67e2a5ad38b8 Reviewed-on: https://gem5-review.googlesource.com/c/8184 Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13632:483aaa00c69c |
02-Apr-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: fix how a thread starts up in MinorCPU
When a thread is activated by another thread calling a clone system call, the child thread's context is initialized in the middle of the clone system call and before the context is fully initialized. Therefore, the child thread starts fetching an unitialized PC, which could lead to a page fault.
This patch adds a pipeline wakeup event that is scheduled later in the cycle when the thread is activated. This event ensures that the first fetch only happens after the thread context is fully initialized (e.g., in case of clone syscall, it is when the parent thread copies its context over to the child thread).
When a thread first starts or wakes up, input queue to the Fetch2 stage needs to be drained since the execution flow is likely to change and previously fetched instructions in the queue may no longer be in the correct flow. This patch dumps/drains all inputs in the input queue of a thread context in the Fetch2 stage when the associated thread wakes up.
Change-Id: Iad970638e435858b7289cd471158cc0afdbbb0e5 Reviewed-on: https://gem5-review.googlesource.com/c/8182 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Brandon Potter <Brandon.Potter@amd.com> |
13628:332f730a1855 |
04-Feb-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
misc: added missing override specifier
Added missing specifier for various virtual functions.
Change-Id: I4783e92d78789a9ae182fad79aadceafb00b2458 Reviewed-on: https://gem5-review.googlesource.com/c/16103 Reviewed-by: Hoa Nguyen <hoanguyen@ucdavis.edu> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13627:ec1395943cd2 |
28-Jan-2019 |
Javier Bueno <javier.bueno@metempsy.com> |
cpu: Made the Loop Predictor a SimObject
The Loop Predictor implementation is now a SimObject so that other branch predictors can easily use it (including LTAGE, which is now using it). It has also been updated with the latest available loop predictor implementation from Andre Seznec:
http://www.irisa.fr/alf/downloads/seznec/TAGE-GSC-IMLI.tar
Change-Id: I60ad079a2c49b00a1f84d5cfd3611631883a4b57 Reviewed-on: https://gem5-review.googlesource.com/c/15775 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13626:d6a6358aa6db |
05-Jan-2019 |
Jairo Balart <jairo.balart@metempsy.com> |
cpu: Made TAGE a SimObject that can be used by other predictors
The TAGE implementation is now a SimObject so that other branch predictors can easily use it. It has also been updated with the latest available TAGE implementation from Andre Seznec:
http://www.irisa.fr/alf/downloads/seznec/TAGE-GSC-IMLI.tar
Change-Id: I2251b8b2d7f94124f9955f52b917dc3b064f090e Reviewed-on: https://gem5-review.googlesource.com/c/15317 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13622:ba31c2a23eca |
21-Nov-2018 |
Gabe Black <gabeblack@google.com> |
cpu, arch: Replace the CCReg type with RegVal.
Most architectures weren't using the CCReg type, and in x86 and arm it was already a uint64_t.
Change-Id: I0b3d5e690e6b31db6f2627f449c89bde0f6750a6 Reviewed-on: https://gem5-review.googlesource.com/c/14515 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> |
13611:c8b7847b4171 |
19-Nov-2018 |
Gabe Black <gabeblack@google.com> |
arch: cpu: Rename *FloatRegBits* to *FloatReg*.
Now that there's no plain FloatReg, there's no reason to distinguish FloatRegBits with a special suffix since it's the only way to read or write FP registers.
Change-Id: I3a60168c1d4302aed55223ea8e37b421f21efded Reviewed-on: https://gem5-review.googlesource.com/c/14460 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
13610:5d5404ac6288 |
16-Oct-2018 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
arch,cpu: Add vector predicate registers
Latest-gen. vector/SIMD extensions, including the Arm Scalable Vector Extension (SVE), introduce the notion of a predicate register file. This changeset adds this feature across architectures and CPU models.
Change-Id: Iebcadbad89c0a582ff8b1b70de353305db603946 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13715 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
13601:f5c84915eb7f |
10-Jan-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu, arch, arch-arm: Wire unused VecElem code in the O3 model
VecElem code had been introduced in order to simulate change of renaming for vector registers. Most of the work is happening on the rename_map switchRenameMode. Change of renaming can happen after a squash in the pipeline. This patch is also changing the interface to the ISA part so that a PCState is used instead of ISA in order to check if rename mode has changed.
Change-Id: I8af795d771b958e0a0d459abfeceff5f16b4b5d4 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15601 |
13600:f39b1083ac84 |
10-Jan-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: O3 rename using the flatIndex instead of index
This patch is replacing the RegId::index with RegId::flatIndex so that it provides a valid register number when used by a VecElem register.
Change-Id: I5b000abb9457cd325c2a3021e772a75ea33d8a4c Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15600 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
13598:39220222740c |
04-Jan-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Fix VecElemClass bugs in cpu models
This patch is:
* Adding a missing VecElemClass entry * Fixing assertion in rename map which was checking the number of free vector registers rather than free vector element registers * Fixing assertion in read/setVecElemOperand APIs. * Using the right register index in SimpleThread * Using VecElem instead of VecReg on O3 readArchVecElem
Change-Id: I265320dcbe35eb47075991301dfc99333c5190c4 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15598 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13597:6b0f8e9cdeb5 |
10-Jan-2019 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Add VecElem entries in MinorCPU Scoreboard
This patch is: * Increasing the number of bits in the Scoreboard so that it is keeping track of VecElemClass dependencies. * Fixing VecElemClass entry in the scoreboard table so that it correctly uses flatIndex rather than index.
Change-Id: Ie4877e5fe410b1437447adebbe289602a443f7c0 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15597 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
13590:d7e018859709 |
13-Feb-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu-o3: O3 LSQ Generalisation
This patch does a large modification of the LSQ in the O3 model. The main goal of the patch is to remove the 'an operation can be served with one or two memory requests' assumption that is present in the LSQ and the instruction with the req, reqLow, reqHigh triplet, and generalising it to operations that can be addressed with one request, and operations that require many requests, embodied in the SingleDataRequest and the SplitDataRequest.
This modification has been done mimicking the minor model to an extent, shifting the responsibilities of dealing with VtoP translation and tracking the status and resources from the DynInst to the LSQ via the LSQRequest. The LSQRequest models the information concerning the operation, handles the creation of fragments for translation and request as well as assembling/splitting the data accordingly.
With this modifications, the implementation of vector ISAs, particularly on the memory side, become more rich, as the new model permits a dissociation of the ISA characteristics as vector length, from the microarchitectural characteristics that govern how contiguous loads are executing, allowing exploration of different LSQ to DL1 bus widths to understand the tradeoffs in complexity and performance.
Part of the complexities introduced stem from the fact that gem5 keeps a large amount of metadata regarding, in particular, memory operations, thus, when an instruction is squashed while some operation as TLB lookup or cache access is ongoing, when the relevant structure communicates to the LSQ that the operation is over, it tries to access some pieces of data that should have died when the instruction is squashed, leading to asserts, panics, or memory corruption. To ensure the correct behaviour, the LSQRequest rely on assesing who is their owner, and self-destroying if they detect their owner is done with the request, and there will be no subsequent action. For example, in the case of an instruction squashed whal the TLB is doing a walk to serve the translation, when the translation is served by the TLB, the LSQRequest detects that the instruction was squashed, and as the translation is done, no one else expect to access its information, and therefore, it self-destructs. Having destroyed the LSQRequest earlier, would lead to wrong behaviour as the TLB walk may access some fields of it.
Additional authors: - Gabor Dozsa <gabor.dozsa@arm.com>
Change-Id: I9578a1a3f6b899c390cdd886856a24db68ff7d0c Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13516 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
13582:989577bf6abc |
18-Oct-2018 |
Gabe Black <gabeblack@google.com> |
arch: cpu: Stop passing around misc registers by reference.
These values are all basic integers (specifically uint64_t now), and so passing them by const & is actually less efficient since there's a extra level of indirection and an extra value, and the same sized value (a 64 bit pointer vs. a 64 bit int) is being passed around.
Change-Id: Ie9956b8dc4c225068ab1afaba233ec2b42b76da3 Reviewed-on: https://gem5-review.googlesource.com/c/13626 Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
13563:68c171235dc5 |
03-Jan-2019 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Make the smtCommitPolicy a Param.ScopedEnum
The smtCommitPolicy is a parameter in the o3 cpu that can have 3 different values. Previously this setting was done through a string and a parser function would turn it into a c++ enum value. This changeset turns the string into a python Param.ScopedEnum.
Change-Id: I3625f2c08a1ae0c3b0dce7a641c6ae1ce3fd79a5 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15400 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13562:8fe39a3fc056 |
03-Jan-2019 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Make the smtROBPolicy a Param.ScopedEnum
The smtROBPolicy is a parameter in the o3 cpu that can have 3 different values. Previously this setting was done through a string and a parser function would turn it into a c++ enum value. This changeset turns the string into a python Param.ScopedEnum.
Change-Id: Ie104d055dbbc6e44997ae0c1470de714239be5a3 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15399 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13561:523608bb180c |
03-Jan-2019 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Make the smtIQPolicy a Param.ScopedEnum
The smtIQPolicy is a parameter in the o3 cpu that can have 3 different values. Previously this setting was done through a string and a parser function would turn it into a c++ enum value. This changeset turns the string into a python Param.ScopedEnum.
Change-Id: Ieecf0a19427dd250b0d5ae3d531ab46a37326ae5 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15398 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13560:f8732494c155 |
24-Dec-2018 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Make the smtLSQPolicy a Param.ScopedEnum
The smtLSQPolicy is a parameter in the o3 cpu that can have 3 different values. Previously this setting was done through a string and a parser function would turn it into a c++ enum value. This changeset turns the string into a python Param.ScopedEnum.
Change-Id: I82041b88bd914c5dc660058d9e3998e3114e7c35 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15397 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13559:e9983a972327 |
03-Jan-2019 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Make the smtFetchPolicy a Param.ScopedEnum
The smtFetchPolicy is a parameter in the o3 cpu that can have 5 different values. Previously this setting was done through a string and a parser function would turn it into a c++ enum value. This changeset turns the string into a python Param.ScopedEnum.
Change-Id: Iafb4b4b27587541185ea912e5ed581bce09695f5 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15396 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
13557:fc33e6048b25 |
13-Oct-2018 |
Gabe Black <gabeblack@google.com> |
cpu: dev: sim: gpu-compute: Banish some ISA specific register types.
These types are IntReg, FloatReg, FloatRegBits, and MiscReg. There are some remaining types, specifically the vector registers and the CCReg. I'm less familiar with these new types of registers, and so will look at getting rid of them at some later time.
Change-Id: Ide8f76b15c531286f61427330053b44074b8ac9b Reviewed-on: https://gem5-review.googlesource.com/c/13624 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> |
13546:6cd6d7b19498 |
12-Dec-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Fix usage of setArchVecElem
setArchVecElem should create a VecElemClass RegId, and not a VecRegClass. Initializing a VecRegClass with three arguments makes it panic
Change-Id: I6c398d67305bfe7bea12cb02edd4f4c3a202e69a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15655 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13501:ce73744918e7 |
19-Nov-2018 |
Gabe Black <gabeblack@google.com> |
cpu: Stop using unions to store FP registers.
These are now accessed only as integer values.
Change-Id: I21ae6537ebbcbaa02890384194ee1ce001c092bb Reviewed-on: https://gem5-review.googlesource.com/c/14458 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
13500:6e0a2a7c6d8c |
19-Nov-2018 |
Gabe Black <gabeblack@google.com> |
arch, cpu: Remove float type accessors.
Use the binary accessors instead.
Change-Id: Iff1877e92c79df02b3d13635391a8c2f025776a2 Reviewed-on: https://gem5-review.googlesource.com/c/14457 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
13494:ed4ed5351b16 |
01-Dec-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fixed typos in parameter/stats descriptions
Change-Id: I7b3274a3e37128da35f497da150af08343e97ee6 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14795 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13493:91ae6168ef27 |
23-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Added parameters to enable/disable features in LTAGE
They are for the following features in the LTAGE loop predictor: - Hashing for calculating the loop table entry - Add direction information - Add speculative iteration number information
Change-Id: I395f4526163ee0d0229d1e87cde2bb046f1dd43a Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14597 Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Louis Delhez <ldelhez@ucla.edu> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13492:3679580cd1e7 |
10-Dec-2018 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
cpu-o3: Fix bug in LSQUnit(uint32_t, uint32_t) ctor
Change 9af1214 added a new ctor to the LSQUnit, however there is a typo/bug because it sizes the SQEntries member variable to lqEntries + 1, as opposed to sqEntries + 1. This change corrects the issue by using sqEntries.
Change-Id: I19dfaa5c0e335bd7b84343a92034147d7c5d914e Reviewed-on: https://gem5-review.googlesource.com/c/15015 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13475:5189e2334f1a |
28-Nov-2018 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
base, sim: Add missing destructors
Derived classes with virtual functions need to define a virtual destructor or a protected destructor otherwise calling the base class destructor has undefined behavior. This change adds a virtual distructor in the base class.
Change-Id: I1c855aa56dff6585ff99b9147bdb4eb9729a0a53 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/14815 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13472:7ceacede4f1e |
01-Mar-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu: Change raw pointers to STL Containers
This patch changes two members from being raw pointers to being STL containers. The reason behind, other than cleanlyness and arguable OO best practices is that containers have more intronspections capabilities than naked pointers do, as the size is known.
Using STL containers adds little overhead and eases the automation of process during debugging (gdb).
Change-Id: I4d9d3eedafa8b5e50ac512ea93b458a4200229f2 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13126 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13455:56e25a5f9603 |
22-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Added new stats to TAGE and LTAGE branch predictors
They are basically used to tell wich component of the predictor is providing the prediction and whether it is correct or wrong
Change-Id: I7b3db66535f159091f1b37d70c2d942d50b20fb2 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14535 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13454:19a5b4fb1f1f |
19-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: split LTAGE implementation into a base TAGE and a derived LTAGE
The new derived LTAGE is equivalent to the original LTAGE implementation The default values of the TAGE branch predictor match the settings of the 8C-TAGE configuration described in https://www.jilp.org/vol8/v8paper1.pdf
Change-Id: I8323adbfd5c9a45db23cfff234218280e639f9ed Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14435 Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13453:4a7a060ea26e |
10-Feb-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu,arch-arm: Initialise data members
The value that is not initialized has a bogus value that manifests when using some debug-flags what makes the usage of tracediff a bit more challenging.
In addition, while debugging with other techniques, it introduces the problem of understanding if the value of a field is 'intended' or just an effect of the lack of initialisation.
Change-Id: Ied88caa77479c6f1d5166d80d1a1a057503cb106 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13125 Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
13449:2f7efa89c58b |
26-Nov-2018 |
Gabe Black <gabeblack@google.com> |
arch, base, cpu, gpu, mem: Replace assert(0 or false with panic.
Neither assert(0) nor assert(false) give any hint as to why control getting to them is bad, and their more descriptive versions, assert(0 && "description") and assert(false && "description"), jury rig assert to add an error message when the utility function panic() already does that directly with better formatting options.
This change replaces that flavor of call to assert with panic, except in the actual code which processes the formatting that panic uses (to avoid infinitely recurring error handling), and in some *.sm files since I don't know what rules those have to follow and don't want to accidentaly break them.
Change-Id: I8addfbfaf77eaed94ec8191f2ae4efb477cefdd0 Reviewed-on: https://gem5-review.googlesource.com/c/14636 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13444:26f81be73cb7 |
17-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Made LTAGE parameters configurable
This includes TAGE tag sizes, TAGE table sizes, U counters reset period, loop predictor associativity, path history size, the USE_ALT_ON_NA size and the WITHLOOP size
Change-Id: I935823f0a5794f5d55b744263798897a813dc1bd Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14417 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13443:a111cb197897 |
17-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fixed useful counter handling in LTAGE
Increased to 2 bits of useful counter per TAGE entry as described in the LTAGE paper (and made the size configurable)
Changed how the useful counters are incremented/decremented as described in the LTAGE paper
Change-Id: I8c692cc7c180d29897cb77781681ff498a1d16c8 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14215 Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13442:5314c50529a5 |
11-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fixes on the loop predictor part of LTAGE
Fixed the following fields of the loop predictor entries as described on the LTAGE paper: - Age counter (it was 3 bits and it should be 8 bits) - Tag (it was 16 bits and it should be 14 bits). Also some times it used int variables and some times uint16_t, leading to wrong behaviour - Confidence counter (it was 2 bits ins some parts of the code and 3 bits in some other parts. It should be 2 bits) - Iteration counters (they were 16 bits and they should be 14 bits) All the new sizes are now configurable
Change-Id: I8884c7454c1e510b65160eb4d5749d3259d34096 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14216 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13433:fd8c49bea81f |
09-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fix LTAGE max number of allocations on update
The LTAGE paper states that only one TAGE entry can be allocated when updating
Change-Id: I6cfb4d80ce835e93d4bf5099ef88a7d425abaddd Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14195 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13432:6ce67b7e6e44 |
07-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
configs: Added an option for choosing branch predictor type
Added the parameter "--bp-type" to set the branch predictor type Added the parameter "--list-bp-types" to list all the available branch predictor types
Change-Id: Ia6aae90c784aef359b6d8233c8383cd7a871aca1 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14015 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13429:a1e199fd8122 |
06-Feb-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu: Fix the usage of const DynInstPtr
Summary: Usage of const DynInstPtr& when possible and introduction of move operators to RefCountingPtr.
In many places, scoped references to dynamic instructions do a copy of the DynInstPtr when a reference would do. This is detrimental to performance. On top of that, in case there is a need for reference tracking for debugging, the redundant copies make the process much more painful than it already is.
Also, from the theoretical point of view, a function/method that defines a convenience name to access an instruction should not be considered an owner of the data, i.e., doing a copy and not a reference is not justified.
On a related topic, C++11 introduces move semantics, and those are useful when, for example, there is a class modelling a HW structure that contains a list, and has a getHeadOfList function, to prevent doing a copy to an internal variable -> update pointer, remove from the list -> update pointer, return value making a copy to the assined variable -> update pointer, destroy the returned value -> update pointer.
Change-Id: I3bb46c20ef23b6873b469fd22befb251ac44d2f6 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13105 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13420:5cb2b90e1cb5 |
08-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fixed ratio of pred to hyst bits for LTAGE Bimodal
The LTAGE paper states 1 hyst bit shared for 4 pred bits. Made this ratio configurable use 4 by default. Also changed the Bimodal structure to use two std::vector<bool> (one for pred and one for hyst bits)
Change-Id: I6793e8e358be01b75b8fd181ddad50f259862d79 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14120 Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13413:b84a7c832ead |
07-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fixed PC shifting on LTAGE branch predictor
The PC needs to be shifted according to the instShiftAmt parameter
Change-Id: I272619c093695b56cf7f8ff7163e3b5d23205d16 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14035 Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com> Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13172:b816da4c5e9f |
14-May-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Fix MinorCPU executing Crypto Instructions
Crypto instruction classes added to the MinorDefaultFloatSimdFU.
Change-Id: I0cd4aa422bec74285595312a8cf01f5f425a82cd Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13251 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13169:eb3b2bea4231 |
08-May-2018 |
Matt Horsnell <matt.horsnell@arm.com> |
arch-arm: AArch32 Crypto AES
This patch implements the AArch32 AES instructions from the Crypto extension.
Change-Id: I51e6deda748b0c26135bcfe9d0c7128f3af91f3d Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Matt Horsnell <matt.horsnell@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13248 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13168:4965381c122d |
11-Apr-2018 |
Matt Horsnell <matt.horsnell@arm.com> |
arch-arm: AArch32 Crypto SHA
This patch implements the AArch32 secure hashing instructions from the Crypto extension.
Change-Id: Iaba8424ab71800228a9aff039d93f5c35ee7d8e5 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13247 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13113:687a2b956f7b |
01-Oct-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Fix typo in header guard for Noncaching cpu
Change-Id: If8ec5f5f49e99d4989658273723b943dd8df84c6 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/13144 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13020:b5f05a988593 |
07-Sep-2018 |
Earl Ou <shunhsingou@google.com> |
Fix SConstruct for asan build
Sometimes it's easier to debug gem5 built with ASan enabled. This CL fixes some build error when using --with-asan.
Bug: None Test: ./scripts/build_gem5 --with-asan --with-ubsan build/ARM/gem5.debug
Change-Id: Iaaaaebc3f25749e11f97bf454ddd0153b3de56e7 Reviewed-on: https://gem5-review.googlesource.com/12511 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
13012:5fbc6b9c64bc |
15-Mar-2016 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Replace the fastmem with a new CPU model
The AtomicSimpleCPU used to be able to access memory directly to speed up simulation if no caches are used. This is fine as long as no switching between CPU models is required. In order to switch to a new CPU model that requires caches, we currently need to checkpoint the system and restore it into a new configuration. The new 'atomic_noncaching' memory mode provides a solution that avoids this issue since caches are bypassed in this mode. This changeset removes the old fastmem option from the AtomicSimpleCPU and introduces a new CPU, NonCachingSimpleCPU, which derives from the AtomicSimpleCPU.
The NonCachingSimpleCPU uses the same mechanism as the AtomicSimpleCPU used to use when accessing memory in when fastmem was enabled.
This changeset also introduces a new switcheroo test that tests switching between a NonCachingSimpleCPU and a TimingSimpleCPU with caches.
Change-Id: If01893f9b37528b14f530c11ce6f53c097582c21 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/12419 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12919:ddba3d442656 |
20-Jul-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Stream/SubstreamID support in TrafficGen
This patch is adding support for generating memory requests which set the StreamID/SubstreamID field, so that is possible to emulate devices attached to an external IOMMU/SMMU with a Traffic generator.
Change-Id: Iea068de581ae7125a9d49314124a08c045c75b49 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/12188 |
12918:123729464862 |
24-Jul-2018 |
Michiel W. van Tol <Michiel.VanTol@arm.com> |
cpu: Turn BaseTrafficGen numSuppressed into a stat
This is changing numSuppressed from being a warn only variable into a Stat so that it is visible at the end of simulation.
Change-Id: I934782e796c898bfc0e773cc88c597a68e403272 Reviewed-on: https://gem5-review.googlesource.com/11849 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12895:16e3712d8189 |
02-Aug-2018 |
Jason Lowe-Power <jason@lowepower.com> |
misc: Appease GCC 8
GCC 8 adds a number of new warnings to -Wall which generate errors.
- Fix memset to 0 for structs by adding casts. - Fix cast with const when the const was ignored. - Fix catch a polymorphic type by value
We now compile with GCC 8!
Change-Id: Iab70ce11190eee67608fc25c0bedff170152b153 Signed-off-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-on: https://gem5-review.googlesource.com/11949 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
12892:796175b0e2dc |
15-Feb-2018 |
Brandon Potter <brandon.potter@amd.com> |
scons,ruby: do not generate unnecessary files
Do not generate garnet tester file or Ruby debug headers without a Ruby protocol (i.e. PROTOCOL=None). It makes no sense to include these files into the build when there will be no protocol to utilize them.
Change-Id: I8db4dd532f60008217a10c88a2e089f85df9d104 Reviewed-on: https://gem5-review.googlesource.com/8381 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12858:07c81183e089 |
19-Jul-2018 |
Bradley Wang <radwang@ucdavis.edu> |
cpu: Add hash functionality for RegId class
Having a hash function defined within the header will allow all classes using RegId to use the class as a Key in a STL unordered_map.
Change-Id: I32fd302a087c74e844dcbfce93fef9d0ed98d6bf Signed-off-by: Bradley Wang <radwang@ucdavis.edu> Reviewed-on: https://gem5-review.googlesource.com/11870 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12857:6fc1b2a47d76 |
19-Jul-2018 |
Bradley Wang <radwang@ucdavis.edu> |
cpu: Removed unnecessary file reg_class_impl.hh
Previously, reg_class_impl.hh was added in order to prevent a cyclic dependency between it and the_isa.hh (See http://reviews.gem5.org/r/3754). It was determined that this was not necessary. The two files had almost entirely the same includes, and the current test-suite including multiple gcc and clang compilers on both MacOS and Linux successfully built the library with all functionality moved into the reg_class.hh file.
Change-Id: I0319e187b9eb280726a003951bb1ce315ffe17f5 Signed-off-by: Bradley Wang <radwang@ucdavis.edu> Reviewed-on: https://gem5-review.googlesource.com/11869 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12845:19729e2e70d8 |
19-Jul-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Warn when (un)serializing a traffic generator
When checkpointing a system with a traffic generator, a warning is produced so that the user is reminded serialization does not keep all the traffic generator internal state.
Change-Id: I3c49c912c9ff3a4208f55b2da0a88fc694147280 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/11831 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12844:c934a1338314 |
18-Jul-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Allow creation of traffic gen from generic SimObjects
This patch allows to instantiate a Traffic generator starting from a generic SimObject, so that linking to a BaseTrafficGen only is no longer mandatory. This permits SimObjects different than a BaseTrafficGen to instantiate generators and to manually specify the MasterID they will be using when generating memory requests.
Change-Id: Ic286cfa49fd9c9707e6f12a4ea19993dd3006b2b Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/11789 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12833:4566a9331697 |
20-Jan-2018 |
Hanhwi Jang <jang.hanhwi@gmail.com> |
cpu-o3: Missing freeing the heads of DepGraph in IQ squashing
Free the squahsed instructions' heads of DepGraph in IQ squashing
In a system with large register file (ex.2048), the number of DynInst hits the hardcoded limit (1500). This is caused by missing freeing the heads of DepGraph in IQ. IQ only clears out the heads when instructions reach writeback stage. If a instruction is squashed before writeback stage, its head of dependency graph, which holds the instruction's DynInstPtr, would not be cleared out. This prevents freeing the DynInst of the squahsed instruction even after it is committed.
Change-Id: I05b3db93cb6ad8960183d7ae765149c7f292e5b3 Reviewed-on: https://gem5-review.googlesource.com/7481 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12813:2c023816bec9 |
27-Apr-2018 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Add a Python-enabled traffic generator
The current traffic generator relies on a configuration file that describes a small machine to generate stimuli. This configuration file is usually generated by the gem5 Python configuration. This creates an unnecessary and fragile step.
This changeset introduces a Python-based trace module. When instantiated, the module exposes a start method that takes an iterable object as a parameter (e.g., a generator). The iterable object is expected to represent a list of generators that will be run one after the other. For example:
system.tgen = PyTrafficGen() m5.instantiate()
def trace(): yield system.tgen.createIdle(1000) yield system.tgen.createExit(0)
system.tgen.start(trace())
Change-Id: I58e60ca517e86c197859f4daaa67750066abdc1c Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/11518 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12812:8f14879aebe1 |
02-May-2018 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Support trace termination in BaseTrafficGen
Make the BaseTrafficGen handle cases where getNextPacket() can't find a new packet and returns NULL. In that case, assume the generator has run out of packets and switch to the next generator.
Change-Id: I5ca6ead550005812fb849ed9ce6b5007a65ddfa7 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/11517 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12811:269967d5b4e4 |
26-Apr-2018 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Unify error handling for address generators
Unify error handling and create factory methods for address generators.
Change-Id: Ic3ab705e1bb58affd498a7db176536ebc721b904 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/11516 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12810:485ca1c27812 |
26-Apr-2018 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Split the traffic generator into two classes
The traffic generator currently assumes that it is always driven from a configuration file. Split it into a base class (BaseTrafficGen) that handles basic packet generation and a derived class that implements the config handling (TrafficGen).
Change-Id: I9407f04c40ad7e40a263c8d1ef29d37ff8e6f1b4 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/11515 |
12804:f47e75dce5c6 |
26-Apr-2018 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Remove reduntant protobuf includes
Change-Id: Ic34b94b3a2ea951bc023cfce2d09ce304a602e41 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/11512 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12784:08091a9f1c7a |
21-Jun-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Fix bug introduced by RequestPtr type change
Missing buffer allocation in mwaitAtomic.
Change-Id: Ifccb6df2427df8b0daac5ee6a99e5cca0b20825e Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/11469 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12769:f9c0d0a09dac |
02-Apr-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: Prevent suspended TimingSimple CPUs from fetching next instructions
In TimingSimpleCPU model, when a CPU is suspended by a syscall (e.g., futex(FUTEX_WAIT)), the CPU waits for another CPU to wake it up (e.g., FUTEX_WAKE operation). While staying Idle, the suspended CPU should not try to fetch next instructions after the syscall.
This patch added a status check before a CPU schedule a fetch event after a fault is handled.
Change-Id: I0cc953135686c9b35afe94942aa1d0b245ec60a2 Reviewed-on: https://gem5-review.googlesource.com/8181 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Brandon Potter <Brandon.Potter@amd.com> |
12768:9a299ec956ac |
12-Feb-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: add a new instruction type 'Atomic'
This patch adds a new flag named 'Atomic' to support ISA implementations that use AtomicOpFunctor to handle atomic instructions instead of a pair of locking load and unlocking store.
Change-Id: I1fbee6e54432396cb49dfc59ad9006b75812d115 Reviewed-on: https://gem5-review.googlesource.com/8187 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12759:ab260678199a |
08-Jun-2018 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu-minor: Remove redundant thread startup call
Don't call startup() twice on each of the threads.
Change-Id: Ibe3d1f25c4fdff291ee310abb9bcad3b184bab20 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/11037 |
12749:223c83ed9979 |
04-Jun-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
misc: Using smart pointers for memory Requests
This patch is changing the underlying type for RequestPtr from Request* to shared_ptr<Request>. Having memory requests being managed by smart pointers will simplify the code; it will also prevent memory leakage and dangling pointers.
Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10996 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> |
12748:ae5ce8e42de7 |
03-Jun-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
misc: Substitute pointer to Request with aliased RequestPtr
Every usage of Request* in the code has been replaced with the RequestPtr alias. This is a preparing patch for when RequestPtr will be the typdefed to a smart pointer to Request rather then a raw pointer to Request.
Change-Id: I73cbaf2d96ea9313a590cdc731a25662950cd51a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10995 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12710:c2939b3ba4ba |
15-Feb-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: Avoid unnecessary dynamic_pointer_cast in atomic model
In the atomic model a dynamic_pointer_cast is performed at every tick to check if the fault is a SyscallRetryFault. This was happening even when there was no generated fault.
Change-Id: I7f4afeffffdf4f988230e05286602d8d9a919c6c Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10101 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12680:91f4d6668b4f |
04-Apr-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
sim,cpu,mem,arch: Introduced MasterInfo data structure
With this patch a gem5 System will store more info about its Masters. While it was previously keeping track of the Master name and Master ID only, it is now adding a per-Master pointer to the SimObject related to the Master. This will make it possible for a client to query a System for a Master using either the master's name or the master's pointer.
Change-Id: I8b97d328a65cd06f329e2cdd3679451c17d2b8f6 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/9781 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> |
12622:91cce46512f2 |
27-Mar-2018 |
Gabe Black <gabeblack@google.com> |
cpu: Remove ExtMachInst typedefs from the O3 CPU model.
These typedefs aren't used, and they expose ISA specific types outside the ISA implementations.
Change-Id: I64b9cec18d6f92765eebbdf8c8f1de15c0deba34 Reviewed-on: https://gem5-review.googlesource.com/9404 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
12621:982f22db6230 |
27-Mar-2018 |
Gabe Black <gabeblack@google.com> |
arch: cpu: Make the ExtMachInst type a template argument in InstMap.
This doesn't completely hide the ISA specific ExtMachInst type inside the ISAs since it still gets applied in arch/generic, but it at least pulls it into the arch directory.
Change-Id: Ic2188d59696530d7ecafdff0785d71867182701d Reviewed-on: https://gem5-review.googlesource.com/9403 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
12619:00be589dbe16 |
27-Mar-2018 |
Gabe Black <gabeblack@google.com> |
cpu: Stop extracting inst_flags from the machInst.
The instruction representation is already encoded in the trace protobuf, so there's no reason to encode a part of it again. This is especially true since this supposedly generic code is extracting the first 8 bits of the machInst, a totally arbitrary set of bits for most ISAs. If certain bits within a machine instruction are actually relevant, the consumer of the trace should be able to interpret the instruction bytes which are already there and extract the same bits within the context of whatever ISA they're appropriate for.
Change-Id: Idaebe6a110d7d4812c3d7c434582d5a9470bcec1 Reviewed-on: https://gem5-review.googlesource.com/9401 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
12615:ccdc49c36ad3 |
25-Jan-2018 |
Gabe Black <gabeblack@google.com> |
cpu: Use the new asBytes function in the protobuf inst tracer.
Use this function to get the binary representation of the instruction rather than referencing the ExtMachInst typed machInst member of the StaticInst directly. ExtMachInst is an ISA specific type and can't always be straightforwardly squished into a 32 bit integer.
Change-Id: Ic1f74d6d86eb779016677ae45c022939ce3e2b9f Reviewed-on: https://gem5-review.googlesource.com/7563 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
12614:0bc465e1f5fb |
24-Jan-2018 |
Gabe Black <gabeblack@google.com> |
arch: Add a virtual asBytes function to the StaticInst class.
This function takes a pointer to a buffer and the current size of the buffer as a pass by reference argument. If the size of the buffer is sufficient, the function stores a binary representation of itself (generally the ISA defined instruction encoding) in the buffer, and sets the size argument to how much space it used. This could be used by ISAs which have two instruction sizes (ARM and thumb, for example). If the buffer size isn't sufficient, then the size parameter should be set to what size is required, and then the function should return without modifying the buffer.
The buffer itself should be aligned to the same standard as memory returned by new, specifically "The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type and then used to access the object or array in the storage allocated...". This will avoid having to memcpy buffers to avoid unaligned accesses.
To standardize the representation of the data, it should be stored in the buffer as little endian. Since most hosts (including ARM and x86 hosts) will be little endian, this will almost always be a no-op.
Change-Id: I2f31aa0b4f9c0126b44f47a881c2901243279bd6 Reviewed-on: https://gem5-review.googlesource.com/7562 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> |
12612:a64e6b723e5f |
27-Jul-2017 |
Jason Lowe-Power <jason@lowepower.com> |
ruby: Make sure addresses print in hex
Added fix in the invalid transition panic and various places in ruby random tester.
Change-Id: I879264da58369faf7de49d1a28b2da1cb935ef0a Signed-off-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-on: https://gem5-review.googlesource.com/8941 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> |
12563:8d59ed22ae79 |
06-Mar-2018 |
Gabe Black <gabeblack@google.com> |
scons: Switch from the print statement to the print function.
Starting with version 3, scons imposes using the print function instead of the print statement in code it processes. To get things building again, this change moves all python code within gem5 to use the function version. Another change by another author separately made this same change to the site_tools and site_init.py files.
Change-Id: I2de7dc3b1be756baad6f60574c47c8b7e80ea3b0 Reviewed-on: https://gem5-review.googlesource.com/8761 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> |
12537:aeff8f3d80c9 |
13-Feb-2018 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu-o3: Don't add non-speculative mem barriers to the IQ twice
There are cases where the IEW adds a non-speculative instruction to the IQ twice. This can happen if an instruction is flagged as IsMemBarrier and IsNonSpeculative. Avoid adding non-speculative instructions in the IEW to the IQ by checking if it has been added already.
Change-Id: Ifcff676a451b57b2406ce00ed8dae19ed399515f Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Javier Setoain <javier.setoain@arm.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/8374 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12489:76d7f5f55f40 |
08-Nov-2017 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
cpu: MinorCPU handling IsSquashAfter flag
MinorCPU was not handling IsSquashAfter flagged instructions. The behaviour was to force a branch (hence enforcing refetching) for SerializeAfter instructions only. This has now been extended to SquashAfter in order to correctly support ISB barrier instruction behaviour.
Change-Id: Ie525b23350b0de121372d3b98b433e36b097d5c4 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5702 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12470:d5152049f316 |
22-Jan-2016 |
Glenn Bergmans <glenn.bergmans@arm.com> |
arm: DT autogeneration - Generate cpus node
Equips cpu models with a method to generate the cpu node.
Note: even though official documentation requires that CPU ids start counting from 0 in every cluster, GEM5 requires a globally unique cpu_id.
Change-Id: Ida3e17af3124a68ef7dbf2449cd034dfc3ec39df Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5963 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12440:e028053ee1fc |
30-Nov-2017 |
Xiaoyu Ma <xiaoyuma@google.com> |
sim: Allow passing a user-defined L2XBar to addTwoLevelCacheHierarchy().
Before this CL, the addTwoLevelCacheHierarchy() function uses the default L2XBar class as the interconnect between CPU L1 caches and L2. This CL allows passing a user-defined bus to overwrite the default L2XBar by adding an optional argument to the function.
Change-Id: I917657272fd4924ee0bed882a226851afba26847 Reviewed-on: https://gem5-review.googlesource.com/7364 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12434:715d029898f4 |
08-Jan-2018 |
Gabe Black <gabeblack@google.com> |
cpu: Make the CPU's TLB parameter a BaseTLB.
This is instead of the architecture specific version.
Change-Id: I906ec16eee1f65f0e9b9c24b401430f9ea01637b Reviewed-on: https://gem5-review.googlesource.com/7349 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12429:beefb9f5f551 |
09-Jan-2018 |
BKP <brandon.potter@amd.com> |
style: change C/C++ source permissions to noexec
Several files in the repository were tracked with execute permissions even though the files are just normal C/C++ files (and the one .isa).
Change-Id: I976b096acab4a1fc74c5699ef1f9b222c1e635c2 Reviewed-on: https://gem5-review.googlesource.com/7241 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12427:b0611f1ad833 |
20-Dec-2017 |
Gabe Black <gabeblack@google.com> |
alpha,arm,mips,power,riscv,sparc,x86,cpu: Get rid of ISA_HAS_DELAY_SLOT.
This constant is, first, a #define, and second only used in one place.
In that one place, it appears that the code it guards is no longer necessary in general. It was originally written to avoid refetching a block of data that you're still in, even if you've moved slightly farther in it because you're skipping the next instruction due to an annulled branch delay slot. In reality however, in SPARC, the one ISA I'm aware of which has this sort of branching behavior, the PC state object will correctly determine that no branch is happening in these cases. Code lower down in the loop will then recompute where fetching should continue based on the next PC, automatically skipping the annulled branch slot without misinterpretting the gap as a branch.
This change therefore also removes this block of code.
Change-Id: I820ebc9df10aeb4fcb69c12f6a784e9ec616743c Reviewed-on: https://gem5-review.googlesource.com/6821 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12422:9d6162c8c1de |
05-Jan-2018 |
Gabe Black <gabeblack@google.com> |
cpu: Use the NotAnInst flag to avoid passing an inst to fetch faults.
When a fault happens in fetch in O3, a dummy inst is created to carry the fault through the pipeline to commit, but conceptually there isn't actually any instruction since we failed to fetch one.
This change marks the dummy instruction as NotAnInst, and when any such instruction gets to commit, the fault object associated with it is invoked and passed a null static inst pointer instead of a pointer to the dummy inst.
Change-Id: I18d993083406deb625402e06af4ba0d4772ca5a3 Reviewed-on: https://gem5-review.googlesource.com/7124 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> |
12421:871001341442 |
05-Jan-2018 |
Gabe Black <gabeblack@google.com> |
cpu: Add a NotAnInst flag to the BaseDynInst class.
This flag means that the instruction isn't an actual instruction, it's just a placeholder to carry a fault down a pipeline, for instance.
Change-Id: I1cc12b068662dbd3d3b089c9941a07b6e88b57e3 Reviewed-on: https://gem5-review.googlesource.com/7123 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> |
12420:f5c80f4ed41f |
05-Jan-2018 |
Gabe Black <gabeblack@google.com> |
cpu, power: Get rid of the remnants of the EA computation insts.
Get rid of some remnants of a system which was intended to separate address computation into its own instruction object.
Change-Id: I23f9ffd70fcb89a8ea5bbb934507fb00da9a0b7f Reviewed-on: https://gem5-review.googlesource.com/7122 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> |
12406:86bde4a026b5 |
22-Dec-2017 |
Gabe Black <gabeblack@google.com> |
arch,cpu: "virtualize" the TLB interface.
CPUs have historically instantiated the architecture specific version of the TLBs to avoid a virtual function call, making them a little bit more dependent on what the current ISA is. Some simple performance measurement, the x86 twolf regression on the atomic CPU, shows that there isn't actually any performance benefit, and if anything the simulator goes slightly faster (although still within margin of error) when the TLB functions are virtual.
This change switches everything outside of the architectures themselves to use the generic BaseTLB type, and then inside the ISA for them to cast that to their architecture specific type to call into architecture specific interfaces.
The ARM TLB needed the most adjustment since it was using non-standard translation function signatures. Specifically, they all took an extra "type" parameter which defaulted to normal, and translateTiming returned a Fault. translateTiming actually doesn't need to return a Fault because everywhere that consumed it just stored it into a structure which it then deleted(?), and the fault is stored in the Translation object when the translation is done.
A little more work is needed to fully obviate the arch/tlb.hh header, so the TheISA::TLB type is still visible outside of the ISAs. Specifically, the TlbEntry type is used in the generic PageTable which lives in src/mem.
Change-Id: I51b68ee74411f9af778317eff222f9349d2ed575 Reviewed-on: https://gem5-review.googlesource.com/6921 Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12405:01736aa058a5 |
20-Dec-2017 |
Gabe Black <gabeblack@google.com> |
cpu: Use the generic nop static inst instead of decoding the arch version.
This removes a dependence on the ISA.
Change-Id: I01013bc70558f0831327213912bcac11258066a6 Reviewed-on: https://gem5-review.googlesource.com/6824 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12404:fe5af2331a48 |
20-Dec-2017 |
Gabe Black <gabeblack@google.com> |
cpu: Add a pointer to a generic Nop StaticInst.
This can be used whenever generic code needs a filler instruction that doesn't do anything.
Change-Id: Ib245d3e880a951e229eb315a09ecc7c47e6ae00f Reviewed-on: https://gem5-review.googlesource.com/6823 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12400:e467cdf48c76 |
20-Dec-2017 |
Gabe Black <gabeblack@google.com> |
cpu: Fix exit_gen.cc which used misc.hh instead of logging.hh.
Change-Id: I868021a01eb3e7902a4d64283bdfaa93c6d9f964 Reviewed-on: https://gem5-review.googlesource.com/6822 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12397:a6d362560825 |
01-Aug-2017 |
Riken Gohil <Riken.Gohil@arm.com> |
cpu-tester: Added ExitGen to TrafficGen
Added the ExitGen to the TrafficGenerator which allows an EXIT state to be added to the TrafficGen configuration file. Entering this state will cause the simulation to exit immediately. Please note that if multiple TrafficGen instances have an EXIT state, the first of these to be encountered will cause the simulation to terminate.
Change-Id: Ieea51f05ffb780771f007787a2b119f79143d0c1 Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5723 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12396:3d04ea44fafb |
12-Jul-2017 |
Riken Gohil <Riken.Gohil@arm.com> |
cpu-tester: Refactoring traffic generators into separate files.
Change-Id: I2372a0a88e276dcb0c06c3d0a789e010cfba8013 Reviewed-by: Matteo Andreozzi <matteo.andreozzi@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5722 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
12392:e0dbdf30a2a5 |
13-Dec-2017 |
Jason Lowe-Power <jason@lowepower.com> |
misc: Updates for gcc7.2 for x86
GCC 7.2 is much stricter than previous GCC versions. The following changes are needed:
* There is now a warning if there is an implicit fallthrough between two case statments. C++17 adds the [[fallthrough]]; declaration. However, to support non C++17 standards (i.e., C++11), we use M5_FALLTHROUGH. M5_FALLTHROUGH checks for [[fallthrough]] compliant C++17 compiler and if that doesn't exist, it defaults to nothing (no older compilers generate warnings). * The above resulted in a couple of bugs that were found. This is noted in the review request on gerrit. * throw() for dynamic exception specification is deprecated * There were a couple of new uninitialized variable warnings * Can no longer perform bitwise operations on a bool. * Must now include <functional> for std::function * Compiler bug for void* lambda. Changed to auto as work around. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82878
Change-Id: I5d4c782a4e133fa4cdb119e35d9aff68c6e2958e Signed-off-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-on: https://gem5-review.googlesource.com/5802 Reviewed-by: Gabe Black <gabeblack@google.com> |
12386:2bf5fb25a5f1 |
13-Dec-2017 |
Gabe Black <gabeblack@google.com> |
arm,sparc,x86,base,cpu,sim: Replace the Twin(32|64)_t types with.
Replace them with std::array<>s.
Change-Id: I76624c87a1cd9b21c386a96147a18de92b8a8a34 Reviewed-on: https://gem5-review.googlesource.com/6602 Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12385:288c62455dde |
13-Dec-2017 |
Gabe Black <gabeblack@google.com> |
cpu,alpha,mips,power,riscv,sparc: Get rid of eaComp and memAccInst.
Neither of these were used, particularly memAccInst.
Change-Id: I4ac9e44cf624e5de42519d586d7b699f08a2cdfc Reviewed-on: https://gem5-review.googlesource.com/6601 Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
12372:fd63af762679 |
06-Dec-2017 |
Matt Sinclair <mattdsinclair@gmail.com> |
x86,misc: add additional info on faulting X86 instruction, fetched PC
Print faulting instruction for unmapped address panic in faults.cc and print extra info about corresponding fetched PC in base.cc.
Change-Id: Id9e15d3e88df2ad6b809fb3cf9f6ae97e9e97e0f Reviewed-on: https://gem5-review.googlesource.com/6461 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> |
12355:568ec3a0c614 |
07-Feb-2017 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu: Add support for CMOs in the cpu models
Cache maintenance operations go through the write channel of the cpu. This changes makes sure that the cpu does not try to fill in the packet with data.
Change-Id: Ic83205bb1cda7967636d88f15adcb475eb38d158 Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5055 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12334:e0ab29a34764 |
30-Nov-2017 |
Gabe Black <gabeblack@google.com> |
misc: Rename misc.(hh|cc) to logging.(hh|cc)
These files aren't a collection of miscellaneous stuff, they're the definition of the Logger interface, and a few utility macros for calling into that interface (panic, warn, etc.).
Change-Id: I84267ac3f45896a83c0ef027f8f19c5e9a5667d1 Reviewed-on: https://gem5-review.googlesource.com/6226 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Maintainer: Gabe Black <gabeblack@google.com> |
12325:48e41e644187 |
24-Nov-2017 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Don't override ISA if provided by user
The BaseCPU.createThreads() method currently overrides the BaseCPU.isa parameter. This is sometimes undesirable. Change the behavior so that the default value for the isa parameter is the empty list and teach createThreads() to only override the ISA if none has been specified.
Change-Id: I2ac5535e55fc57057e294d3c6a93088b33bf7b84 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jack Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com> Reviewed-on: https://gem5-review.googlesource.com/6121 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12324:6142a2fec8d9 |
16-Jun-2016 |
David Guillen Fandos <david.guillen@arm.com> |
cpu-minor: Add missing instruction stats
Change-Id: I811b552989caf3601ac65a128dbee6b7bb405d7f Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Updated to use IsVector instruction flag. ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5732 Reviewed-by: Gabe Black <gabeblack@google.com> |
12319:db37ad4d5395 |
23-Nov-2017 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu-o3: Add missing vector stat initializers
All of the O3 vector stats added by 'arch: ISA parser additions of vector registers' are currently missing their stat initializers. Add the missing stat initialization to InstructionQueue::regStats.
Change-Id: Idc4b8e2824120a2542d8a604340a1b41bde6aa28 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/6101 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12284:b91c036913da |
20-Jul-2017 |
Jose Marinho <jose.marinho@arm.com> |
cpu, cpu, sim: move Cycle probe update
Move the code responsible for performing the actual probe point notify into BaseCPU. Use BaseCPU activateContext and suspendContext to keep track of sleep cycles. Create a probe point (ppActiveCycles) that does not count cycles where the processor was asleep. Rename ppCycles to ppAllCycles to reflect its nature.
Change-Id: I1907ddd07d0ff9f2ef22cc9f61f5f46c630c9d66 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5762 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12279:48bca1fee7a0 |
09-Oct-2017 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Prevent cpu from suspending if it is already draining
Suspending the current thread context while draining due to a quiesce pseudo instruction (for example a wfi instruction) could deadlock the cpu and prevent it from successfully draining. This change ensures that the cpu is not draining before suspending the thread context.
Change-Id: I7c019847f5a870d4bc9ce2b19936bc3dc45e5fd7 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5881 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12277:e6455b421c4b |
19-Oct-2017 |
Jose Marinho <jose.marinho@arm.com> |
cpu: Make automatic transition to OFF optional
Add the power_gating_on_idle option to control whether a core automatically enters the power gated state. The default behaviour is to transition to clock gated when idle, but not to power gated. When this option is set to true, the core automatically transitions to the power gated state after a configurable latency.
Change-Id: Ida98c7fc532de4140d0e511c25613769b47b3702 Reviewed-on: https://gem5-review.googlesource.com/5741 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12276:22c220be30c5 |
16-Mar-2017 |
Anouk Van Laer <anouk.vanlaer@arm.com> |
pwr: Adds logic to enter power gating for the cpu model
If the CPU has been clock gated for a sufficient amount of time (configurable via pwrGatingLatency), the CPU will go into the OFF power state. This does not model hardware, just behaviour.
Change-Id: Ib3681d1ffa6ad25eba60f47b4020325f63472d43 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/3969 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12255:9ef9176e4bb2 |
21-Sep-2017 |
Radhika Jagtap <radhika.jagtap@arm.com> |
cpu, probe: Fix elastic trace register dependency
Change-Id: I017852eac183fac3f914fdb96d7e72a56ea9d682 Reviewed-by: Nathanael Premillieu <nathanael.premillieu@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5121 Reviewed-by: Matthias Jung <jungma@eit.uni-kl.de> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12224:5c4d885d3507 |
02-Oct-2017 |
Jason Lowe-Power <jason@lowepower.com> |
cpu-o3: Add M5_VAR_USED to variable
Fixes compile error for gem5.fast on CLANG due to unused variable.
Change-Id: Iabe777a27d75ee8bfa7b214fff577aed3c7582c7 Signed-off-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-on: https://gem5-review.googlesource.com/4980 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> |
12217:0a16f4c03c02 |
27-Jul-2017 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Check predication before the SQ size for a debug print
The size of the store entry in the LSQ is used to indicate a fault in the execution of the store. At the same time, a store that is predicated false will also have 0 size in the corresponding store queue entry. This changeset ensures that we check if the store was predicated false before checking the size field. This way we avoid printing stores as faulting when they are only predicated false.
Change-Id: Ie07982197bd73d7b44d26a3257d54ecb103a952a Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/4821 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12216:70bb3ae0fbfc |
25-Jul-2017 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Avoid early checker verification for store conditionals
The O3CPU allows stores to commit before they are completed and as soon as they enter the store queue. This is the reason why stores are verified by the the checker CPU, separately, once they complete and after they are sent to the memory.
Store conditionals, on the other hand, have an additional writeback stage in the pipeline as they return their result to a register, similarly to loads. This is the reason why they do not commit before they receive a response from the memory. This allows store conditionals to be verified by the checker CPU as soon as they commit in the same way as all other non-store insturctions.
At the same time, the presense of a checker CPU should not require changes to way we handle instructions. This change removes explicit calls to: * incorrectly set the extra data of the request to 0 (a subsequent call to completeAcc already does this without making any ISA assumptions about the return value of the failed store conditional) * complete failing store conditionals
Change-Id: If21d70b21caa55b35e9fdcc50f254c590465d3c3 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/4820 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12181:2150eff234c1 |
25-Aug-2017 |
Gabe Black <gabeblack@google.com> |
stats: Get rid of some kernel stats related cruft.
The kernel stat mechanism should really be refactored and moved somewhere else, but in the mean time there's some old cruft that can be cleared away.
Change-Id: I21e725de590dda0d20bf3bc675bbe976c7b1bd86 Reviewed-on: https://gem5-review.googlesource.com/4600 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12180:72159e1f6701 |
24-Aug-2017 |
Rico Amslinger <rico.amslinger@informatik.uni-augsburg.de> |
cpu: Fix bi-mode branch predictor thresholds
When different sizes were set for the choice and global saturation counter (e.g. ex5_big), the threshold calculation used the wrong size. Thus the branch predictor always predicted "not taken" for choice > global.
Change-Id: I076549ff1482e2280cef24a0d16b7bb2122d4110 Reviewed-on: https://gem5-review.googlesource.com/4560 Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12179:432a44667130 |
01-Sep-2017 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu-minor: Fix for addr range coverage calculation
Coverage was wrongly set to PartialAddrRangeCoverage in the case of disjoint adjacent ranges
Change-Id: I29aaf5145e6cdcf5f0b8f4e009d57ee57bd4c944 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/4640 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12171:b11b56bba18f |
28-Aug-2017 |
Matthias Hille <matthiashille8@gmail.com> |
cpu-o3: fix data pkt initialization for split load
When a split load hits a memory region where IPRs are mapped, the Writebackevent which is scheduled for that was carrying a data packet that was not correctly initialized which caused an assertion to fire when the Writeback event is processed.
Change-Id: I71a4e291f0086f7468d7e8124a0a8f098088972f Signed-off-by: Matthias Hille <matthiashille8@gmail.com> Reported-by: Matthias Hille <matthiashille8@gmail.com> Reviewed-on: https://gem5-review.googlesource.com/4620 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12155:5dc92ea01323 |
27-Jul-2017 |
Andreas Sandberg <andreas.sandberg@arm.com> |
kvm: Add a helper method to access device event queues
The VM's event queue is normally used for devices in multi-core KVM mode. Add a helper method, BaseKvmCPU::deviceEventQueue(), to access this queue. This makes the intention of code migrating to device event queues clearer.
Change-Id: Ifb10f553a6d7445c8d562f658cf9d0b1f4c577ff Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/4287 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12154:9a9bc3c1b788 |
20-Jul-2017 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu, kvm: Fix deadlock issue when resuming a drained system
The KVM CPU sometimes needs to access devices when drain() is called. This typically happens on ARM when synchronizing devices that use the system register interface. When called from drain(), the event queue isn't locked since drain is called from the outside when the simulator isn't servicing any events. In such cases, performing a migration to the device's queue will unlock a mutex that isn't locked. This typically results in a deadlock when resuming the system since the lock will be in an undefined state.
Change-Id: Ibdcc2e034e916a929124f297e72aae306cf66728 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> Reviewed-on: https://gem5-review.googlesource.com/4286 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12144:3f2976f87529 |
18-Jul-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu: Add missing rename of vector registers in the O3 CPU
The introduction of a new vector register class broke rename in the O3 CPU due to an unhandled register class in DefaultRename<Impl>::renameSrcRegs(). This patch fixes adds the necessary handling to avoid a panic when the vector register file is used.
Change-Id: Ie380ab35ec4a151db15402f25b25b58931ee0581 Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/4140 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12143:e48005f585f2 |
06-Apr-2017 |
Anouk Van Laer <anouk.vanlaer@arm.com> |
cpu,o3: Fixed checkpointing bug occuring in the o3 CPU
Checkpointing a system with out-of-order CPUs might get stuck if one of the CPUs has been put to sleep. The quiesce instruction cannot get drained hence checkpointing never finishes.
This commit resolves that by activating all suspended thread contexts when draining the system.
Change-Id: I817ab1672b4ead777bd8e12a0445829481c46fdc Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/3970 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12129:879f7ad9e246 |
28-Jun-2017 |
Sean Wilson <spwilson2@wisc.edu> |
testers: Refactor some Event subclasses to lambdas
Change-Id: I897b6162a827216b7bad74d955c0e50e06a5a3ec Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3926 Maintainer: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12128:75e1a5bed42e |
27-Jun-2017 |
Sean Wilson <spwilson2@wisc.edu> |
kvm, mem: Refactor some Event subclasses into lambdas
Change-Id: Ifafdcf4692d58a17f90e66ff8de8fa3e146c34bb Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3924 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12127:4207df055b0d |
28-Jun-2017 |
Sean Wilson <spwilson2@wisc.edu> |
cpu: Refactor some Event subclasses to lambdas
Change-Id: If765c6100d67556f157e4e61aa33c2b7eeb8d2f0 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3923 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12122:20512f6810d7 |
28-Jun-2017 |
Jose Marinho <jose.marinho@arm.com> |
cpu, sim: Add param to force CPUs to wait for GDB
By setting the BaseCPU parameter wait_for_dbg_connection, the GDB server blocks during initialisation waiting for the remote debugger to connect before starting the simulated CPU.
Change-Id: I4d62c68ce9adf69344bccbb44f66e30b33715a1c [ Update info message to include remote GDB port, rename param. ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/3963 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12111:ec02ad5ff091 |
24-Apr-2017 |
Andreas Sandberg <andreas.sandberg@arm.com> |
kvm, arm: Don't forward IRQ/FIQ when using the kernel's GIC
The BaseArmKvmCPU is responsible for forwarding the IRQ and FIQ signals from gem5's simulated GIC to KVM. However, these signals shouldn't be used when the in-kernel GIC emulator is used.
Instead of delivering the interrupts to the guest, we should just ignore them since any such pending interrupts are likely to be an artifact of CPU switching or incorrect draining.
Change-Id: I083b72639384272157f92f44a6606bdf0be7413c Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com> Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> Reviewed-on: https://gem5-review.googlesource.com/3660 |
12110:c24ee249b8ba |
05-Apr-2017 |
Rekai Gonzalez-Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
arch: ISA parser additions of vector registers
Reiley's update :) of the isa parser definitions. My addition of the vector element operand concept for the ISA parser. Nathanael's modification creating a hierarchy between vector registers and its constituencies to the isa parser.
Some fixes/updates on top to consider instructions as vectors instead of floating when they use the VectorRF. Some counters added to all the models to keep faithful counts.
Change-Id: Id8f162a525240dfd7ba884c5a4d9fa69f4050101 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2706 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12109:f29e9c5418aa |
05-Apr-2017 |
Rekai Gonzalez-Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
cpu: Added interface for vector reg file
This patch adds some more functionality to the cpu model and the arch to interface with the vector register file.
This change consists mainly of augmenting ThreadContexts and ExecContexts with calls to get/set full vectors, underlying microarchitectural elements or lanes. Those are meant to interface with the vector register file. All classes that implement this interface also get an appropriate implementation.
This requires implementing the vector register file for the different models using the VecRegContainer class.
This change set also updates the Result abstraction to contemplate the possibility of having a vector as result.
The changes also affect how the remote_gdb connection works.
There are some (nasty) side effects, such as the need to define dummy numPhysVecRegs parameter values for architectures that do not implement vector extensions.
Nathanael Premillieu's work with an increasing number of fixes and improvements of mine.
Change-Id: Iee65f4e8b03abfe1e94e6940a51b68d0977fd5bb Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues and CC reg free list initialisation ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2705 |
12107:998b4c54ee51 |
05-Apr-2017 |
Rekai Gonzalez-Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
cpu: Result refactoring
The Result union used to collect the result of an instruction is now a class of its own, with its constructor, and explicit casting methods for cleanliness.
This is also a stepping stone to have vector registers, and instructions that produce a vector register as output.
Change-Id: I6f40c11cb5e835d8b11f7804a4e967aff18025b9 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2703 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12106:7784fac1b159 |
05-Apr-2017 |
Rekai Gonzalez-Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
cpu: Simplify the rename interface and use RegId
With the hierarchical RegId there are a lot of functions that are redundant now.
The idea behind the simplification is that instead of having the regId, telling which kind of register read/write/rename/lookup/etc. and then the function panic_if'ing if the regId is not of the appropriate type, we provide an interface that decides what kind of register to read depending on the register type of the given regId.
Change-Id: I7d52e9e21fc01205ae365d86921a4ceb67a57178 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2702 |
12105:742d80361989 |
05-Apr-2017 |
Nathanael Premillieu <nathanael.premillieu@arm.com> |
cpu: Physical register structural + flat indexing
Mimic the changes done on the architectural register indexes on the physical register indexes. This is specific to the O3 model. The structure, called PhysRegId, contains a register class, a register index and a flat register index. The flat register index is kept because it is useful in some cases where the type of register is not important (dependency graph and scoreboard for example). Instead of directly using the structure, most of the code is working with a const PhysRegId* (typedef to PhysRegIdPtr). The actual PhysRegId objects are stored in the regFile.
Change-Id: Ic879a3cc608aa2f34e2168280faac1846de77667 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2701 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12104:edd63f9c6184 |
05-Apr-2017 |
Nathanael Premillieu <nathanael.premillieu@arm.com> |
arch, cpu: Architectural Register structural indexing
Replace the unified register mapping with a structure associating a class and an index. It is now much easier to know which class of register the index is referring to. Also, when adding a new class there is no need to modify existing ones.
Change-Id: I55b3ac80763702aa2cd3ed2cbff0a75ef7620373 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2700 |
12085:de78ea63e0ca |
07-Jun-2017 |
Sean Wilson <spwilson2@wisc.edu> |
cpu, gpu-compute: Replace EventWrapper use with EventFunctionWrapper
Change-Id: Idd5992463bcf9154f823b82461070d1f1842cea3 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3746 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
12022:256a709054f3 |
09-Apr-2017 |
Alec Roelke <ar4jc@virginia.edu> |
cpu: fix problem with forwarding and locked load
If a (regular) store is followed closely enough by a locked load that overlaps, the LSQ will forward the store's data to the locked load and never tell the cache about the locked load. As a result, the cache will not lock the address and all future store-conditional requests on that address will fail. This patch fixes that by preventing forwarding if the memory request is a locked load and adding another case to the LSQ forwarding logic that delays the locked load request if a store in the LSQ contains all or part of the data that is requested.
[Merge second and last if blocks because their bodies are the same.]
Change-Id: I895cc2b9570035267bdf6ae3fdc8a09049969841 Reviewed-on: https://gem5-review.googlesource.com/2400 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
11988:665cd5f8b52b |
27-Feb-2017 |
Andreas Sandberg <andreas.sandberg@arm.com> |
python: Use PyBind11 instead of SWIG for Python wrappers
Use the PyBind11 wrapping infrastructure instead of SWIG to generate wrappers for functionality that needs to be exported to Python. This has several benefits:
* PyBind11 can be redistributed with gem5, which means that we have full control of the version used. This avoid a large number of hard-to-debug SWIG issues we have seen in the past.
* PyBind11 doesn't rely on a custom C++ parser, instead it relies on wrappers being explicitly declared in C++. The leads to slightly more boiler-plate code in manually created wrappers, but doesn't doesn't increase the overall code size. A big benefit is that this avoids strange compilation errors when SWIG doesn't understand modern language features.
* Unlike SWIG, there is no risk that the wrapper code incorporates incorrect type casts (this has happened on numerous occasions in the past) since these will result in compile-time errors.
As a part of this change, the mechanism to define exported methods has been redesigned slightly. New methods can be exported either by declaring them in the SimObject declaration and decorating them with the cxxMethod decorator or by adding an instance of PyBindMethod/PyBindProperty to the cxx_exports class variable. The decorator has the added benefit of making it possible to add a docstring and naming the method's parameters.
The new wrappers have the following known issues:
* Global events can't be memory managed correctly. This was the case in SWIG as well.
Change-Id: I88c5a95b6cf6c32fa9e1ad31dfc08b2e8199a763 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Andrew Bardsley <andrew.bardsley@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2231 Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Pierre-Yves PĂ©neau <pierre-yves.peneau@lirmm.fr> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
11943:0a924b294735 |
27-Jan-2017 |
Curtis Dunham <Curtis.Dunham@arm.com> |
arm, kvm: implement GIC state transfer
This also allows checkpointing of a Kvm GIC via the Pl390 model.
Change-Id: Ic85d81cfefad630617491b732398f5e6a5f34c0b Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2444 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Weiping Liao <weipingliao@google.com> |
11918:38a88569ba4d |
16-Aug-2016 |
Radhika Jagtap <radhika.jagtap@arm.com> |
cpu: Print progress messages in Trace CPU
This change adds the ability to print a message at intervals of committed instruction count to indicate progress in the trace replay.
Change-Id: I8363502354c42bfc52936d2627986598b63a5797 Reviewed-by: Rekai Gonzalez Alberquilla <rekai.gonzalezalberquilla@arm.com> Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2321 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
11886:43b882cada33 |
27-Feb-2017 |
Brandon Potter <brandon.potter@amd.com> |
syscall_emul: [PATCH 15/22] add clone/execve for threading and multiprocess simulations
Modifies the clone system call and adds execve system call. Requires allowing processes to steal thread contexts from other processes in the same system object and the ability to detach pieces of process state (such as MemState) to allow dynamic sharing. |
11877:5ea85692a53e |
20-Jul-2015 |
Brandon Potter <brandon.potter@amd.com> |
syscall_emul: [patch 13/22] add system call retry capability
This changeset adds functionality that allows system calls to retry without affecting thread context state such as the program counter or register values for the associated thread context (when system calls return with a retry fault).
This functionality is needed to solve problems with blocking system calls in multi-process or multi-threaded simulations where information is passed between processes/threads. Blocking system calls can cause deadlock because the simulator itself is single threaded. There is only a single thread servicing the event queue which can cause deadlock if the thread hits a blocking system call instruction.
To illustrate the problem, consider two processes using the producer/consumer sharing model. The processes can use file descriptors and the read and write calls to pass information to one another. If the consumer calls the blocking read system call before the producer has produced anything, the call will block the event queue (while executing the system call instruction) and deadlock the simulation.
The solution implemented in this changeset is to recognize that the system calls will block and then generate a special retry fault. The fault will be sent back up through the function call chain until it is exposed to the cpu model's pipeline where the fault becomes visible. The fault will trigger the cpu model to replay the instruction at a future tick where the call has a chance to succeed without actually going into a blocking state.
In subsequent patches, we recognize that a syscall will block by calling a non-blocking poll (from inside the system call implementation) and checking for events. When events show up during the poll, it signifies that the call would not have blocked and the syscall is allowed to proceed (calling an underlying host system call if necessary). If no events are returned from the poll, we generate the fault and try the instruction for the thread context at a distant tick. Note that retrying every tick is not efficient.
As an aside, the simulator has some multi-threading support for the event queue, but it is not used by default and needs work. Even if the event queue was completely multi-threaded, meaning that there is a hardware thread on the host servicing a single simulator thread contexts with a 1:1 mapping between them, it's still possible to run into deadlock due to the event queue barriers on quantum boundaries. The solution of replaying at a later tick is the simplest solution and solves the problem generally. |
11839:dd6df2e47c14 |
14-Feb-2017 |
Curtis Dunham <Curtis.Dunham@arm.com> |
sim, kvm: make KvmVM a System parameter
A KVM VM is typically a child of the System object already, but for solving future issues with configuration graph resolution, the most logical way to keep track of this object is for it to be an actual parameter of the System object.
Change-Id: I965ded22203ff8667db9ca02de0042ff1c772220 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11800:54436a1784dc |
09-Nov-2016 |
Brandon Potter <brandon.potter@amd.com> |
style: [patch 3/22] reduce include dependencies in some headers
Used cppclean to help identify useless includes and removed them. This involved erroneously included headers, but also cases where forward declarations could have been used rather than a full include. |
11793:ef606668d247 |
09-Nov-2016 |
Brandon Potter <brandon.potter@amd.com> |
style: [patch 1/22] use /r/3648/ to reorganize includes |
11787:af41594e9b3c |
02-Jan-2017 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Remove redundant export_method_cxx_predecls
The headers declared in export_method_cxx_predecls are redundant since a SimObject's main header is automatically included.
Change-Id: Ied9e84630b36960e54efe91d16f8c66fba7e0da0 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> Reviewed-by: Joe Gross <joseph.gross@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
11784:00fd5dce5e7e |
21-Dec-2016 |
Arthur Perais <arthur.perais@inria.fr> |
cpu: implement an L-TAGE branch predictor
This patch implements an L-TAGE predictor, based on André Seznec's code available from CBP-2 (http://hpca23.cse.tamu.edu/taco/camino/cbp2/cbp-src/realistic-seznec.h).
Signed-off-by Jason Lowe-Power <jason@lowepower.com> |
11783:f94c14fd6561 |
21-Dec-2016 |
Arthur Perais <arthur.perais@inria.fr> |
cpu: disallow speculative update of branch predictor tables (o3)
The Minor and o3 cpu models share the branch prediction code. Minor relies on the BPredUnit::squash() function to update the branch predictor tables on a branch mispre- diction. This is fine because Minor executes in-order, so the update is on the correct path. However, this causes the branch predictor to be updated on out-of-order branch mispredictions when using the o3 model, which should not be the case.
This patch guards against speculative update of the branch prediction tables. On a branch misprediction, BPredUnit::squash() calls BpredUnit::update(..., squashed = true). The underlying branch predictor tests against the value of squashed. If it is true, it restores any speculatively updated internal state it might have (e.g., global/local branch history), then returns. If false, it updates its prediction tables. Previously, exist- ing predictors did not test against the "squashed" parameter.
To accomodate for this change, the Minor model must now call BPredUnit::squash() then BPredUnit::update(..., squashed = false) on branch mispredictions. Before, calling BpredUnit::squash() performed the prediction tables update.
The effect is a slight MPKI improvement when using the o3 model. A further patch should perform the same modifications for the indirect target predictor and BTB (less critical).
Signed-off-by: Jason Lowe-Power <jason@lowepower.com> |
11782:c2e1ead33662 |
21-Dec-2016 |
Arthur Perais <arthur.perais@inria.fr> |
cpu: correct comments in tournament branch predictor
The tournament predictor is presented as doing speculative update of the global history and non-speculative update of the local history used to generate the branch prediction. However, the code does speculative update of both histories.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com> |
11781:1ae84c76066b |
21-Dec-2016 |
Arthur Perais <arthur.perais@inria.fr> |
cpu: Resolve targets of predicted 'taken' decode for O3
The target of taken conditional direct branches does not need to be resolved in IEW: the target can be computed at decode, usually using the decoded instruction word and the PC.
The higher-than-necessary penalty is taken only on conditional branches that are predicted taken but miss in the BTB. Thus, this is mostly inconsequential on IPC if the BTB is big/associative enough (fewer capacity/conflict misses). Nonetheless, what gem5 simulates is not representative of how conditional branch targets can be handled.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com> |
11780:9af039ea0c1e |
21-Dec-2016 |
Arthur Perais <arthur.perais@inria.fr> |
cpu: Clarify meaning of cachePorts variable in lsq_unit.hh of O3
cachePorts currently constrains the number of store packets written to the D-Cache each cycle), but loads currently affect this variable. This leads to unexpected congestion (e.g., setting cachePorts to a realistic 1 will in fact allow a store to WB only if no loads have accessed the D-Cache this cycle). In the absence of arbitration, this patch decouples how many loads can be done per cycle from how many stores can be done per cycle.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com> |
11743:68aac9f965e3 |
05-Dec-2016 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu: Change traffic generators to use different values for writes
Previously all traffic generators would use the same value for write requests. With this change traffic generators use their master id as the payload of write requests making them more useful for the memchecker.
Change-Id: Id1a6b8f02853789b108ef6003f4c32ab929bb123 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com> |
11723:0596db108c53 |
30-Nov-2016 |
Alec Roelke <ar4jc@virginia.edu> |
arch: [Patch 1/5] Added RISC-V base instruction set RV64I
First of five patches adding RISC-V to GEM5. This patch introduces the base 64-bit ISA (RV64I) in src/arch/riscv for use with syscall emulation. The multiply, floating point, and atomic memory instructions will be added in additional patches, as well as support for more detailed CPU models. The loader is also modified to be able to parse RISC-V ELF files, and a "Hello world\!" example for RISC-V is added to test-progs.
Patch 2 will implement the multiply extension, RV64M; patch 3 will implement the floating point (single- and double-precision) extensions, RV64FD; patch 4 will implement the atomic memory instructions, RV64A, and patch 5 will add support for timing, minor, and detailed CPU models that is missing from the first four patches (such as handling locked memory).
[Removed several unused parameters and imports from RiscvInterrupts.py, RiscvISA.py, and RiscvSystem.py.] [Fixed copyright information in RISC-V files copied from elsewhere that had ARM licenses attached.] [Reorganized instruction definitions in decoder.isa so that they are sorted by opcode in preparation for the addition of ISA extensions M, A, F, D.] [Fixed formatting of several files, removed some variables and instructions that were missed when moving them to other patches, fixed RISC-V Foundation copyright attribution, and fixed history of files copied from other architectures using hg copy.] [Fixed indentation of switch cases in isa.cc.] [Reorganized syscall descriptions in linux/process.cc to remove large number of repeated unimplemented system calls and added implmementations to functions that have received them since it process.cc was first created.] [Fixed spacing for some copyright attributions.] [Replaced the rest of the file copies using hg copy.] [Fixed style check errors and corrected unaligned memory accesses.] [Fix some minor formatting mistakes.] Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com> |
11721:b0853929e223 |
30-Nov-2016 |
Jason Lowe-Power <jason@lowepower.com> |
cpu: Remove branch predictor function predictInOrder
This function was used by the now-defunct InOrderCPU model. Since this model is no longer in gem5, this function was not called from anywhere in the code. |
11683:f1e198a028be |
15-Oct-2016 |
Fernando Endo <fernando.endo2@gmail.com> |
cpu, arm: Distinguish Float* and SimdFloat*, create FloatMem* opClass
Modify the opClass assigned to AArch64 FP instructions from SimdFloat* to Float*. Also create the FloatMemRead and FloatMemWrite opClasses, which distinguishes writes to the INT and FP register banks. Change the latency of (Simd)FloatMultAcc to 5, based on the Cortex-A72, where the "latency" of FMADD is 3 if the next instruction is a FMADD and has only the augend to destination dependency, otherwise it's 7 cycles.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com> |
11661:2bc3962f59fe |
06-Oct-2016 |
Tushar Krishna <tushar@ece.gatech.edu> |
ruby: rename networktest to garnet_synthetic_traffic. networktest is essentially a collection of synthetic traffic patterns for the network. The protocol name and the tester having the same name led to multiple python configuration files with the same name, adding confusion. This patch renames networktest to garnet_synthetic_traffic, and also adds more synthetic traffic patterns. |
11650:fe601d7bd955 |
22-Sep-2016 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu: Fix the O3 CPU Drain
The drain did not wait until stages were ready again. Therefore, as a result of messages in the TimeBuffer being drain, the state after the drain was not consistent and asserts fired in some places when the draining happened after a stage got blocked, but before the notification arrived to the previous stages.
Change-Id: Ib50b3b40b7f745b62c1eba2931dec76860824c71 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11633:40c951e58c2b |
15-Sep-2016 |
Radhika Jagtap <radhika.jagtap@arm.com> |
cpu: Support exit when any one Trace CPU completes replay
This change adds a Trace CPU param to exit simulation early, i.e. when the first (any one) trace execution is complete. With this change the user gets a choice to configure exit as either when the last CPU finishes (default) or first CPU finishes replay. Configuring an early exit enables simulating and measuring stats strictly when memory-system resources are being stressed by all Trace CPUs.
Change-Id: I3998045fdcc5cd343e1ca92d18dd7f7ecdba8f1d Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> |
11632:a96d6787b385 |
15-Sep-2016 |
Radhika Jagtap <radhika.jagtap@arm.com> |
cpu: Adjust for trace offset and fix stats
This change subtracts the time offset present in the trace from all the event times when nodes and request are sent so that the replay starts immediately when the simulation starts. This makes the stats accurate when the time offset in traces is large, for example when traces are generated in the middle of a workload execution. It also solves the problem of unnecessary DRAM refresh events that would keep occuring during the large time offset before even a single request is replayed into the system.
Change-Id: Ie0898842615def867ffd5c219948386d952af7f7 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> |
11631:6d147afa8fc6 |
15-Sep-2016 |
Radhika Jagtap <radhika.jagtap@arm.com> |
cpu: Add frequency scaling to the Trace CPU
This change adds a simple feature to scale the frequency of the Trace CPU.
The compute delays in the input traces provide timing. This change adds a freqency multiplier parameter to the Trace CPU set to 1.0 by default. The compute delay is manipulated to effectively achieve the frequency at which the nodes become ready and thus scale the frequency of the Trace CPU.
Change-Id: Iaabbd57806941ad56094fcddbeb38fcee1172431 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> |
11629:22f08c96bf7f |
13-Sep-2016 |
Michael LeBeane <michael.lebeane@amd.com> |
kvm: Support timing accesses for KVM cpu This patch enables timing accesses for KVM cpu. A new state, RunningMMIOPending, is added to indicate that there are outstanding timing requests generated by KVM in the system. KVM's tick() is disabled and the simulation does not enter into KVM until all outstanding timing requests have completed. The main motivation for this is to allow KVM CPU to perform MMIO in Ruby, since Ruby does not support atomic accesses. |
11627:fe32a5238754 |
13-Sep-2016 |
Michael LeBeane <michael.lebeane@amd.com> |
sim: Refactor quiesce and remove FS asserts The quiesce family of magic ops can be simplified by the inclusion of quiesceTick() and quiesce() functions on ThreadContext. This patch also gets rid of the FS guards, since suspending a CPU is also a valid operation for SE mode. |
11614:29606f000389 |
22-Aug-2016 |
David Hashe <david.j.hashe@gmail.com> |
cpu, mem, sim: Change how KVM maps memory
Only map memories into the KVM guest address space that are marked as usable by KVM. Create BackingStoreEntry class containing flags for is_conf_reported, in_addr_map, and kvm_map. |
11612:985d9b9a68bf |
14-Aug-2016 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Add missing override in Minor's exec context
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11611:818913b8ce80 |
14-Aug-2016 |
Reiley Jeapaul <Reiley.Jeyapaul@arm.com> |
cpu: Fixed clang errors. Added 'override' keyword for virtual functions.
Change-Id: Ic37311443ca11ee6d95bceffea599e054e7aa110 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11608:6319a1125f1c |
14-Aug-2016 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu, arch: fix the type used for the request flags
Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11568:91e95eb78191 |
21-Jul-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix Minor SMT WFI/drain interaction issues
The behavior of WFI is to cause minor to cease evaluating pipeline logic until an interrupt is observed, however a user may wish to drain the system while a core is sleeping due to a WFI. This patch makes WFI drain. If an actual drain occurs during a WFI, the CPU is already drained and will immediately be ready for swapping, checkpointing, etc. This should not negatively impact performance as WFI instructions are 'stream-changing' (treated like unpredicted branches), so all remaining instructions are wrong-path and will be squashed rapidly.
Change-Id: I63833d5acb53d8dde78f9f0c9611de0ece385e45 |
11567:560d7fbbddd1 |
21-Jul-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add SMT support to MinorCPU
This patch adds SMT support to the MinorCPU. Currently RoundRobin or Random thread scheduling are supported.
Change-Id: I91faf39ff881af5918cca05051829fc6261f20e3 |
11540:582b379f6d4f |
20-Jun-2016 |
Andreas Sandberg <andreas.sandberg@arm.com> |
mem: Resolve TrafficGen trace relative to the config
The traffic generator currently resolves relative trace paths relative to gem5's current working directory. This can lead to surprising results for relative paths where the expectation would normally be that they are resolved relative to the configuration file. This changeset implements config-relative trace file lookups. The old behavior is kept as a fallback for configs that expect that behavior.
Change-Id: I1bda4e16725842666ffc37dcb6838c23a6ff138c Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> |
11526:5b81895e5d5e |
06-Jun-2016 |
David Guillen Fandos <david.guillen@arm.com> |
pwr: Low-power idle power state for idle CPUs
Add functionality to the BaseCPU that will put the entire CPU into a low-power idle state whenever all threads in it are idle.
Change-Id: I984d1656eb0a4863c87ceacd773d2d10de5cfd2b |
11523:81332eb10367 |
06-Jun-2016 |
David Guillen Fandos <david.guillen@arm.com> |
stats: Fixing regStats function for some SimObjects
Fixing an issue with regStats not calling the parent class method for most SimObjects in Gem5. This causes issues if one adds new stats in the base class (since they are never initialized properly!).
Change-Id: Iebc5aa66f58816ef4295dc8e48a357558d76a77c Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11522:348411ec525a |
06-Jun-2016 |
Stephan Diestelhorst <stephan.diestelhorst@arm.com> |
sim: Call regStats of base-class as well
We want to extend the stats of objects hierarchically and thus it is necessary to register the statistics of the base-class(es), as well. For now, these are empty, but generic stats will be added there.
Patch originally provided by Akash Bagdia at ARM Ltd. |
11499:16ceeed96e1c |
27-May-2016 |
Ilias Vougioukas <Ilias.Vougioukas@ARM.com> |
cpu: fix lastStopped unserialisation
MinorCPU fix for corrupt numCycles when resuming from a previous simulation. --- src/cpu/minor/cpu.cc | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) |
11491:6ffc99023568 |
26-May-2016 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Add a basic progress check to the TrafficGen
This patch adds a progress check to the TrafficGen so that it is easier to detect deadlock scenarios where the generator gets stuck waiting for a retry, and makes no further progress.
Change-Id: Ifb8779ad0939f52c0518d0e867bac73f99b82e2b Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com> |
11435:0f1b46dde3fa |
07-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
mem: Remove threadId from memory request class
In general, the ThreadID parameter is unnecessary in the memory system as the ContextID is what is used for the purposes of locks/wakeups. Since we allocate sequential ContextIDs for each thread on MT-enabled CPUs, ThreadID is unnecessary as the CPUs can identify the requesting thread through sideband info (SenderState / LSQ entries) or ContextID offset from the base ContextID for a cpu.
This is a re-spin of 20264eb after the revert (bd1c6789) and includes some fixes of that commit. |
11434:b5aed9d2d54e |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Implement per-thread GHRs
Branch predictors that use GHRs should index them on a per-thread basis. This makes that so.
This is a re-spin of fb51231 after the revert (bd1c6789). |
11433:72b075cdc336 |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add an indirect branch target predictor
This patch adds a configurable indirect branch predictor that can be indexed by a combination of GHR and path history hashes. Implements the functionality described in:
"Target prediction for indirect jumps" by Chang, Hao, and Patt http://dl.acm.org/citation.cfm?id=264209
This is a re-spin of fb9d142 after the revert (bd1c6789). |
11432:4209ec56e923 |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix BTB threading oversight
The extant BTB code doesn't hash on the thread id but does check the thread id for 'btb hits'. This results in 1-thread of a multi-threaded workload taking a BTB entry, and all other threads missing for the same branch missing. |
11430:bd1c6789c33f |
07-Apr-2016 |
Andreas Sandberg <andreas.sandberg@arm.com> |
Revert to 74c1e6513bd0 (sim: Thermal support for Linux) |
11429:cf5af0cc3be4 |
06-Apr-2016 |
Andreas Sandberg <andreas.sandberg@arm.com> |
Revert power patch sets with unexpected interactions
The following patches had unexpected interactions with the current upstream code and have been reverted for now:
e07fd01651f3: power: Add support for power models 831c7f2f9e39: power: Low-power idle power state for idle CPUs 4f749e00b667: power: Add power states to ClockedObject
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11428:20264eb69fbf |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
mem: Remove threadId from memory request class
In general, the ThreadID parameter is unnecessary in the memory system as the ContextID is what is used for the purposes of locks/wakeups. Since we allocate sequential ContextIDs for each thread on MT-enabled CPUs, ThreadID is unnecessary as the CPUs can identify the requesting thread through sideband info (SenderState / LSQ entries) or ContextID offset from the base ContextID for a cpu. |
11427:fb512311295e |
05-Apr-2016 |
Curtis Dunham <Curtis.Dunham@arm.com> |
cpu: Implement per-thread GHRs
Branch predictors that use GHRs should index them on a per-thread basis. This makes that so. |
11426:fb9d14204674 |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add an indirect branch target predictor
This patch adds a configurable indirect branch predictor that can be indexed by a combination of GHR and path history hashes. Implements the functionality described in:
"Target prediction for indirect jumps" by Chang, Hao, and Patt http://dl.acm.org/citation.cfm?id=264209 |
11425:e24d92c62860 |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix BTB threading oversight
The extant BTB code doesn't hash on the thread id but does check the thread id for 'btb hits'. This results in 1-thread of a multi-threaded workload taking a BTB entry, and all other threads missing for the same branch missing. |
11423:831c7f2f9e39 |
09-Dec-2014 |
Akash Bagdia <akash.bagdia@ARM.com> |
power: Low-power idle power state for idle CPUs
Add functionality to the BaseCPU that will put the entire CPU into a low-power idle state whenever all threads in it are idle. |
11422:4f749e00b667 |
18-Nov-2014 |
Akash Bagdia <akash.bagdia@ARM.com> |
power: Add power states to ClockedObject
Add 4 power states to the ClockedObject, provides necessary access functions to check and update the power state. Default power state is UNDEFINED, it is responsibility of the respective simulation model to provide the startup state and any other logic for state change.
Add number of transition stat. Add distribution of time spent in clock gated state. Add power state residency stat.
Add dump call back function to allow stats update of distribution and residency stats. |
11419:9c7b55faea5d |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add instruction opclass histogram to minor |
11415:d6c8016a9a03 |
05-Apr-2016 |
Geoffrey Blake <Geoffrey.Blake@arm.com> |
cpu: Query CPU for inst executed from Python
This patch adds the ability for the simulator to query the number of instructions a CPU has executed so far per hw-thread. This can be used to enable more flexible periodic events such as taking checkpoints starting 1s into simulation and X instructions thereafter. |
11399:3f805b5c48ae |
30-Mar-2016 |
Andreas Sandberg <andreas.sandberg@arm.com> |
kvm: Add an option to force context sync on kvm entry/exit
This changeset adds an option to force the kvm-based CPUs to always synchronize the gem5 thread context representation on entry/exit into the kernel. This is very useful for debugging. Unfortunately, it is also the only way to get reliable register contents when using remote gdb functionality. The long-term solution for the latter would be to implement a kvm-specific thread context.
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> Reviewed-by: Alexandru Dutu <alexandru.dutu@amd.com> |
11393:48b748cc6497 |
20-Mar-2016 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: warn if TrafficGen is suppressing a large numer of packets
Add a basic warning for every 10000 packet that is suppressed to alert the user. |
11365:83c3e117464e |
05-May-2015 |
Rekai Gonzalez Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
cpu: Change literal integer constants to meaningful labels
fu_pool and inst_queue were using -1 for "no such FU" and -2 for "all those FUs are busy at the moment" when requesting for a FU and replying. This patch introduces new constants NoCapableFU and NoFreeFU respectively.
In addition, the condition (idx == -2 || idx != -1) is equivalent to (idx != -1), so this patch also simplifies that. |
11363:f3f72c0ab03e |
27-Nov-2015 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Shutdown KVM and disconnect performance counters on fork
We can't/shouldn't use KVM after a fork since the child and parent probably point to the same VM. Knowing the exact effects of this is hard, but they are likely to be messy. We also disconnect the performance counters attached to the guest. This works around what seems to be a kernel bug where spurious SIGIOs get delivered to the forked child process.
Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se> [sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version] Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> [andreas.sandberg@arm.com: Fatal if entering KVM in child process ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11359:b0b976a1ceda |
27-Nov-2015 |
Andreas Sandberg <andreas@sandberg.pp.se> |
base: Add support for changing output directories
This changeset adds support for changing the simulator output directory. This can be useful when the simulation goes through several stages (e.g., a warming phase, a simulation phase, and a verification phase) since it allows the output from each stage to be located in a different directory. Relocation is done by calling core.setOutputDir() from Python or simout.setOutputDirectory() from C++.
This change affects several parts of the design of the gem5's output subsystem. First, files returned by an OutputDirectory instance (e.g., simout) are of the type OutputStream instead of a std::ostream. This allows us to do some more book keeping and control re-opening of files when the output directory is changed. Second, new subdirectories are OutputDirectory instances, which should be used to create files in that sub-directory.
Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se> [sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version] Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11357:6668387fa488 |
10-Aug-2015 |
Stephan Diestelhorst <stephan.diestelhorst@arm.com> |
mem, cpu: Add assertions to snoop invalidation logic
This patch adds assertions that enforce that only invalidating snoops will ever reach into the logic that tracks in-order load completion and also invalidation of LL/SC (and MONITOR / MWAIT) monitors. Also adds some comments to MSHR::replaceUpgrades(). |
11356:a80884911971 |
19-Jul-2015 |
Krishnendra Nathella <krinat01@arm.com> |
cpu: Fix LLSC atomic CPU wakeup
Writes to locked memory addresses (LLSC) did not wake up the locking CPU. This can lead to deadlocks on multi-core runs. In AtomicSimpleCPU, recvAtomicSnoop was checking if the incoming packet was an invalidation (isInvalidate) and only then handled a locked snoop. But, writes are seen instead of invalidates when running without caches (fast-forward configurations). As as simple fix, now handleLockedSnoop is also called even if the incoming snoop packet are from writes. |
11351:bfc1285c61eb |
24-Feb-2016 |
Matteo Andreozzi <Matteo.Andreozzi@arm.com> |
cpu: TraceGen fix for tick frequency check
Bug fix for check on protobuf file frequency being different than global frequency.
The ASCII encoder script is also fixed, and the example trace used in the regressions is updated. |
11347:faf5195f6ca7 |
23-Feb-2016 |
Andreas Hansson <andreas.hansson@arm.com> |
scons: Add missing override to appease clang
Make clang happy...again. |
11341:bda2c39fd9fd |
15-Feb-2016 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Add missing overrides to appease clang
Since the last round of fixes a few new issues have snuck in. We should consider switching the regression runs to clang. |
11331:cd5c48db28e6 |
10-Feb-2016 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Deduce if cache should forward snoops
This patch changes how the cache determines if snoops should be forwarded from the memory side to the CPU side. Instead of having a parameter, the cache now looks at the port connected on the CPU side, and if it is a snooping port, then snoops are forwarded. Less error prone, and less parameters to worry about.
The patch also tidies up the CPU classes to ensure that their I-side port is not snooping by removing overrides to the snoop request handler, such that snoop requests will panic via the default MasterPort implement |
11325:67cc559d513a |
06-Feb-2016 |
Steve Reinhardt <steve.reinhardt@amd.com> |
style: eliminate explicit boolean comparisons
Result of running 'hg m5style --skip-all --fix-control -a' to get rid of '== true' comparisons, plus trivial manual edits to get rid of '== false'/'== False' comparisons.
Left a couple of explicit comparisons in where they didn't seem unreasonable: invalid boolean comparison in src/arch/mips/interrupts.cc:155 >> DPRINTF(Interrupt, "Interrupts OnCpuTimerINterrupt(tc) == true\n");<< invalid boolean comparison in src/unittest/unittest.hh:110 >> "EXPECT_FALSE(" #expr ")", (expr) == false)<< |
11321:02e930db812d |
06-Feb-2016 |
Steve Reinhardt <steve.reinhardt@amd.com> |
style: fix missing spaces in control statements
Result of running 'hg m5style --skip-all --fix-control -a'. |
11320:42ecb523c64a |
06-Feb-2016 |
Steve Reinhardt <steve.reinhardt@amd.com> |
style: remove trailing whitespace
Result of running 'hg m5style --skip-all --fix-white -a'. |
11303:f694764d656d |
17-Jan-2016 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu. arch: add initiateMemRead() to ExecContext interface
For historical reasons, the ExecContext interface had a single function, readMem(), that did two different things depending on whether the ExecContext supported atomic memory mode (i.e., AtomicSimpleCPU) or timing memory mode (all the other models). In the former case, it actually performed a memory read; in the latter case, it merely initiated a read access, and the read completion did not happen until later when a response packet arrived from the memory system.
This led to some confusing things, including timing accesses being required to provide a pointer for the return data even though that pointer was only used in atomic mode.
This patch splits this interface, adding a new initiateMemRead() function to the ExecContext interface to replace the timing-mode use of readMem().
For consistency and clarity, the readMemTiming() helper function in the ISA definitions is renamed to initiateMemRead() as well. For x86, where the access size is passed in explicitly, we can also get rid of the data parameter at this level. For other ISAs, where the access size is determined from the type of the data parameter, we have to keep the parameter for that purpose. |
11302:bce9037689b0 |
17-Jan-2016 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu: remove unnecessary data ptr from O3 internal read() funcs
The read() function merely initiates a memory read operation; the data doesn't arrive until the access completes and a response packet is received from the memory system. Thus there's no need to provide a data pointer; its existence is historical.
Getting this pointer out of this internal o3 interface sets the stage for similar cleanup in the ExecContext interface. Also found that we were pointlessly setting the contents at this pointer on a store forward (the useful memcpy happens just a few lines below the deleted one). |
11294:a368064a2ab5 |
11-Jan-2016 |
Andreas Hansson <andreas.hansson@arm.com> |
scons: Enable -Wextra by default
Make best use of the compiler, and enable -Wextra as well as -Wall. There are a few issues that had to be resolved, but they are all trivial. |
11284:b3926db25371 |
31-Dec-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Make cache terminology easier to understand
This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions.
The following name changes are made:
* the packet memInhibit flag is renamed to cacheResponding
* the packet sharedAsserted flag is renamed to hasSharers
* the packet NeedsExclusive attribute is renamed to NeedsWritable
* the packet isSupplyExclusive is renamed responderHadWritable
* the MSHR pendingDirty is renamed to pendingModified
The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. |
11266:452e10b868ea |
20-Jul-2015 |
Brad Beckmann <Brad.Beckmann@amd.com> |
ruby: more flexible ruby tester support
This patch allows the ruby random tester to use ruby ports that may only support instr or data requests. This patch is similar to a previous changeset (8932:1b2c17565ac8) that was unfortunately broken by subsequent changesets. This current patch implements the support in a more straight-forward way. Since retries are now tested when running the ruby random tester, this patch splits up the retry and drain check behavior so that RubyPort children, such as the GPUCoalescer, can perform those operations correctly without having to duplicate code. Finally, the patch also includes better DPRINTFs for debugging the tester. |
11253:daf9f91b11e9 |
07-Dec-2015 |
Radhika Jagtap <radhika.jagtap@ARM.com> |
cpu: Support virtual addr in elastic traces
This patch adds support to optionally capture the virtual address and asid for load/store instructions in the elastic traces. If they are present in the traces, Trace CPU will set those fields of the request during replay. |
11252:18bb597fc40c |
07-Dec-2015 |
Radhika Jagtap <radhika.jagtap@ARM.com> |
cpu: Create record type enum for elastic traces
This patch replaces the booleans that specified the elastic trace record type with an enum type. The source of change is the proto message for elastic trace where the enum is introduced. The struct definitions in the elastic trace probe listener as well as the Trace CPU replace the boleans with the proto message enum.
The patch does not impact functionality, but traces are not compatible with previous version. This is preparation for adding new types of records in subsequent patches. |
11249:0733a1c08600 |
07-Dec-2015 |
Radhika Jagtap <radhika.jagtap@ARM.com> |
cpu: Add TraceCPU to playback elastic traces
This patch defines a TraceCPU that replays trace generated using the elastic trace probe attached to the O3 CPU model. The elastic trace is an execution trace with data dependencies and ordering dependencies annoted to it. It also replays fixed timestamp instruction fetch trace that is also generated by the elastic trace probe.
The TraceCPU inherits from BaseCPU as a result of which some methods need to be defined. It has two port subclasses inherited from MasterPort for instruction and data ports. It issues the memory requests deducing the timing from the trace and without performing real execution of micro-ops. As soon as the last dependency for an instruction is complete, its computational delay, also provided in the input trace is added. The dependency-free nodes are maintained in a list, called 'ReadyList', ordered by ready time. Instructions which depend on load stall until the responses for read requests are received thus achieving elastic replay. If the dependency is not found when adding a new node, it is assumed complete. Thus, if this node is found to be completely dependency-free its issue time is calculated and it is added to the ready list immediately. This is encapsulated in the subclass ElasticDataGen.
If ready nodes are issued in an unconstrained way there can be more nodes outstanding which results in divergence in timing compared to the O3CPU. Therefore, the Trace CPU also models hardware resources. A sub-class to model hardware resources is added which contains the maximum sizes of load buffer, store buffer and ROB. If resources are not available, the node is not issued. The 'depFreeQueue' structure holds nodes that are pending issue.
Modeling the ROB size in the Trace CPU as a resource limitation is arguably the most important parameter of all resources. The ROB occupancy is estimated using the newly added field 'robNum'. We need to use ROB number as sequence number is at times much higher due to squashing and trace replay is focused on correct path modeling.
A map called 'inFlightNodes' is added to track nodes that are not only in the readyList but also load nodes that are executed (and thus removed from readyList) but are not complete. ReadyList handles what and when to execute next node while the inFlightNodes is used for resource modelling. The oldest ROB number is updated when any node occupies the ROB or when an entry in the ROB is released. The ROB occupancy is equal to the difference in the ROB number of the newly dependency-free node and the oldest ROB number in flight.
If no node dependends on a non load/store node then there is no reason to track it in the dependency graph. We filter out such nodes but count them and add a weight field to the subsequent node that we do include in the trace. The weight field is used to model ROB occupancy during replay.
The depFreeQueue is chosen to be FIFO so that child nodes which are in program order get pushed into it in that order and thus issued in the in program order, like in the O3CPU. This is also why the dependents is made a sequential container, std::set to std::vector. We only check head of the depFreeQueue as nodes are issued in order and blocking on head models that better than looping the entire queue. An alternative choice would be to inspect top N pending nodes where N is the issue-width. This is left for future as the timing correlation looks good as it is.
At the start of an execution event, first we attempt to issue such pending nodes by checking if appropriate resources have become available. If yes, we compute the execute tick with respect to the time then. Then we proceed to complete nodes from the readyList.
When a read response is received, sometimes a dependency on it that was supposed to be released when it was issued is still not released. This occurs because the dependent gets added to the graph after the read was sent. So the check is made less strict and the dependency is marked complete on read response instead of insisting that it should have been removed on read sent.
There is a check for requests spanning two cache lines as this condition triggers an assert fail in the L1 cache. If it does then truncate the size to access only until the end of that line and ignore the remainder. Strictly-ordered requests are skipped and the dependencies on such requests are handled by simply marking them complete immediately.
The simulated seconds can be calculated as the difference between the final_tick stat and the tickOffset stat. A CountedExitEvent that contains a static int belonging to the Trace CPU class as a down counter is used to implement multi Trace CPU simulation exit. |
11247:76f75db08e09 |
07-Dec-2015 |
Radhika Jagtap <radhika.jagtap@ARM.com> |
proto, probe: Add elastic trace probe to o3 cpu
The elastic trace is a type of probe listener and listens to probe points in multiple stages of the O3CPU. The notify method is called on a probe point typically when an instruction successfully progresses through that stage.
As different listener methods mapped to the different probe points execute, relevant information about the instruction, e.g. timestamps and register accesses, are captured and stored in temporary InstExecInfo class objects. When the instruction progresses through the commit stage, the timing and the dependency information about the instruction is finalised and encapsulated in a struct called TraceInfo. TraceInfo objects are collected in a list instead of writing them out to the trace file one a time. This is required as the trace is processed in chunks to evaluate order dependencies and computational delay in case an instruction does not have any register dependencies. By this we achieve a simpler algorithm during replay because every record in the trace can be hooked onto a record in its past. The instruction dependency trace is written out as a protobuf format file. A second trace containing fetch requests at absolute timestamps is written to a separate protobuf format file.
If the instruction is not executed then it is not added to the trace. The code checks if the instruction had a fault, if it predicated false and thus previous register values were restored or if it was a load/store that did not have a request (e.g. when the size of the request is zero). In all these cases the instruction is set as executed by the Execute stage and is picked up by the commit probe listener. But a request is not issued and registers are not written. So practically, skipping these should not hurt the dependency modelling.
If squashing results in squashing younger instructions, it may happen that the squash probe discards the inst and removes it from the temporary store but execute stage deals with the instruction in the next cycle which results in the execute probe seeing this inst as 'new' inst. A sequence number of the last processed trace record is used to trap these cases and not add to the temporary store.
The elastic instruction trace and fetch request trace can be read in and played back by the TraceCPU. |
11246:93d2a1526103 |
07-Dec-2015 |
Radhika Jagtap <radhika.jagtap@ARM.com> |
probe: Add probe in Fetch, IEW, Rename and Commit
This patch adds probe points in Fetch, IEW, Rename and Commit stages as follows.
A probe point is added in the Fetch stage for probing when a fetch request is sent. Notify is fired on the probe point when a request is sent succesfully in the first attempt as well as on a retry attempt.
Probe points are added in the IEW stage when an instruction begins to execute and when execution is complete. This points can be used for monitoring the execution time of an instruction.
Probe points are added in the Rename stage to probe renaming of source and destination registers and when there is squashing. These probe points can be used to track register dependencies and remove when there is squashing.
A probe point for squashing is added in Commit to probe squashed instructions. |
11243:f876d08c7b21 |
04-Dec-2015 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: fix unitialized variable which may cause assertion failure
The assert in lsq_unit_impl.hh line 963 needs pktPending to be initialized to NULL (I got the assertion failure several times without the fix).
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
11225:9bc552f9e4b0 |
22-Nov-2015 |
Nathanael Premillieu <nathananel.premillieu@arm.com> |
cpu: Fix base FP and CC register index in o3 insertThread()
Note that the method is not used, and could possibly be deleted. |
11222:c6461e8dfc0a |
22-Nov-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Fix memory leak in traffic generator
In cases where we discard the packet, make sure to also delete it and the associated request. |
11221:2fb745f69681 |
20-Nov-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Enforce 1 interrupt controller per thread
Consider it a fatal configuration error if the number of interrupt controllers doesn't match the number of threads in an SMT configuration. |
11213:f0c7b76cadab |
16-Nov-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
o3: drop unused statistic wbPenalized and wbPenalizedRate |
11169:44b5c183c3cd |
12-Oct-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Add explicit overrides and fix other clang >= 3.5 issues
This patch adds explicit overrides as this is now required when using "-Wall" with clang >= 3.5, the latter now part of the most recent XCode. The patch consequently removes "virtual" for those methods where "override" is added. The latter should be enough of an indication.
As part of this patch, a few minor issues that clang >= 3.5 complains about are also resolved (unused methods and variables). |
11168:f98eb2da15a4 |
12-Oct-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Remove redundant compiler-specific defines
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions. |
11165:d90aec9435bd |
09-Oct-2015 |
Rekai Gonzalez Alberquilla <Rekai.GonzalezAlberquilla@arm.com> |
isa: Add parameter to pick different decoder inside ISA
The decoder is responsible for splitting instructions in micro operations (uops). Given that different micro architectures may split operations differently, this patch allows to specify which micro architecture each isa implements, so different cores in the system can split instructions differently, also decoupling uop splitting (microArch) from ISA (Arch). This is done making the decodification calls templates that receive a type 'DecoderFlavour' that maps the name of the operation to the class that implements it. This way there is only one selection point (converting the command line enum to the appropriate DecodeFeatures object). In addition, there is no explicit code replication: template instantiation hides that, and the compiler should be able to resolve a number of things at compile-time. |
11162:63d53fd63269 |
06-Oct-2015 |
Steve Reinhardt <steve.reinhardt@amd.com> |
sim: add ExecMacro to Exec* compound debug flags
Really should have been there in the first place, IMO. Makes debugging x86 execution a lot easier. |
11153:20bbfe5b2b86 |
30-Sep-2015 |
Curtis Dunham <Curtis.Dunham@arm.com> |
base: remove Trace::enabled flag
The DTRACE() macro tests both Trace::enabled and the specific flag. This change uses the same administrative interface for enabling/disabling tracing, but masks the SimpleFlags settings directly. This eliminates a load for every DTRACE() test, e.g. DPRINTF. |
11151:ca4ea9b5c052 |
30-Sep-2015 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu,isa,mem: Add per-thread wakeup logic
Changes wakeup functionality so that only specific threads on SMT capable cpus are woken. |
11150:a8a64cca231b |
30-Sep-2015 |
Mitch Hayenga <mitch.hayenga@arm.com> |
isa,cpu: Add support for FS SMT Interrupts
Adds per-thread interrupt controllers and thread/context logic so that interrupts properly get routed in SMT systems. |
11148:1bc3d93c7eaa |
30-Sep-2015 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add per-thread monitors
Adds per-thread address monitors to support FullSystem SMT. |
11147:cc8d6e99cf46 |
30-Sep-2015 |
Mitch Hayenga <mitch.hayenga@arm.com> |
config,cpu: Add SMT support to Atomic and Timing CPUs
Adds SMT support to the "simple" CPU models so that they can be used with other SMT-supported CPUs. Example usage: this enables the TimingSimpleCPU to be used to warmup caches before swapping to detailed mode with the in-order or out-of-order based CPU models. |
11146:0fd6976303bc |
30-Sep-2015 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Change thread assignments for heterogenous SMT
Trying to run an SE system with varying threads per core (SMT cores + Non-SMT cores) caused failures due to the CPU id assignment logic. The comment about thread assignment (worrying about core 0 not having tid 0) seems not to be valid given that our configuration scripts initialize them in order.
This removes that constraint so a heterogenously threaded sytem can work. |
11098:8e96720a382c |
15-Sep-2015 |
Andrew Lukefahr <lukefahr@umich.edu> |
cpu: pred: Local Predictor Reset in Tournament Predictor
When a branch gets squashed, it's speculative branch predictor state should get rolled back in squash(). However, only the globalHistory state was being rolled back. This patch adds (at least some) support for rolling back the local predictor state also.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
11097:da477ae38907 |
15-Sep-2015 |
Hongil Yoon <ongal@cs.wisc.edu> |
cpu, o3: consider split requests for LSQ checksnoop operations
This patch enables instructions in LSQ to track two physical addresses for corresponding two split requests. Later, the information is used in checksnoop() to search for/invalidate the corresponding LD instructions.
The current implementation has kept track of only the physical address that is referenced by the first split request. Thus, for checksnoop(), the line accessed by the second request has not been considered, causing potential correctness issues.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
11061:25b53a7195f7 |
29-Aug-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
ruby: eliminate type uint64 and int64 These types are being replaced with uint64_t and int64_t. |
11056:842f56345a42 |
21-Aug-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Reflect that packet address and size are always valid
This patch simplifies the packet, and removes the possibility of creating a packet without a valid address and/or size. Under no circumstances are these fields set at a later point, and thus they really have to be provided at construction time.
The patch also fixes a case there the MinorCPU creates a packet without a valid address and size, only to later delete it. |
11050:65fc1db5d795 |
21-Aug-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Move invldPid constant from Request to BaseCPU
A more natural home for this constant. |
11049:dfb0aa3f0649 |
19-Aug-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
ruby: reverts to changeset: bf82f1f7b040 |
11031:3815437cb231 |
14-Aug-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
ruby: eliminate type uint64 and int64
These types are being replaced with uint64_t and int64_t. |
11025:4872dbdea907 |
14-Aug-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
ruby: replace Address by Addr This patch eliminates the type Address defined by the ruby memory system. This memory system would now use the type Addr that is in use by the rest of the system. |
11017:6ec228f6c143 |
11-Aug-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
ruby: drop some redundant includes |
11005:e7f403b6b76f |
07-Aug-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
base: Declare a type for context IDs
Context IDs used to be declared as ad hoc (usually as int). This changeset introduces a typedef for ContextIDs and a constant for invalid context IDs. |
10960:b51a2a09ac7d |
20-Jul-2015 |
David Hashe <david.hashe@amd.com> |
cpu: Fixed a bug on where to fetch the next instruction from
Figure out if the next instruction to fetch comes from the micro-op ROM or not. Otherwise, wrong instructions may be fetched. |
10950:d262e02c26b3 |
31-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Update debug message from Fetch1 isDrained() in Minor
Fix a spurious %s and include the state of the Fetch1 stage in the debug printout. |
10949:7fc527ab626a |
31-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Fix Minor drain issues when switched out
The Minor CPU currently doesn't drain properly when it is switched out. This happens because Fetch 1 expects to be in the FetchHalted state when it is drained. However, because the CPU is switched out, it is stuck in the FetchWaitingForPC state. Fix this by ignoring drain requests and returning DrainState::Drained from MinorCPU::drain() if the CPU is switched out. This is always safe since a switched out CPU, by definition, doesn't have any instructions in flight. |
10946:6f10e35b57d1 |
30-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Only activate thread 0 in Minor if the CPU is active
Minor currently activates thread 0 in startup() to work around an issue where activateContext() is called from LiveProcess before the process entry point is known. When activateContext() is called, Minor creates a branch instruction to the process's entry point. The first time it is called, the branch points to an undefined location (0). The call in startup() updates the branch to point to the actual entry point.
When instantiating a switched out Minor CPU, it still tries to activate thread 0. This is clearly incorrect since a switched out CPU can't have any active threads. This changeset adds a check to ensure that the thread is active before reactivating it. |
10945:369861e3d5af |
30-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
cpu: Fix drain issues in the Minor CPU
The drain refactor patches introduced a couple of bugs in the way Minor handles draining. This patch fixes an incorrect assert and a case of infinite recursion when the CPU signals drain done. |
10936:93890720a932 |
30-Jul-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Fix issue identified by UBSan |
10935:acd48ddd725f |
28-Jul-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
revert 5af8f40d8f2c |
10934:5af8f40d8f2c |
26-Jul-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: implements vector registers
This adds a vector register type. The type is defined as a std::array of a fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now. |
10933:e1309937d313 |
26-Jul-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: slight correction to identation in rename_impl.hh |
10920:58fbfddff18d |
10-Jul-2015 |
Brandon Potter <brandon.potter@amd.com> |
ruby: replace global g_abs_controls with per-RubySystem var
This is another step in the process of removing global variables from Ruby to enable multiple RubySystem instances in a single simulation.
The list of abstract controllers is per-RubySystem and should be represented that way, rather than as a global.
Since this is the last remaining Ruby global variable, the src/mem/ruby/Common/Global.* files are also removed. |
10913:38dbdeea7f1f |
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Refactor and simplify the drain API
The drain() call currently passes around a DrainManager pointer, which is now completely pointless since there is only ever one global DrainManager in the system. It also contains vestiges from the time when SimObjects had to keep track of their child objects that needed draining.
This changeset moves all of the DrainState handling to the Drainable base class and changes the drain() and drainResume() calls to reflect this. Particularly, the drain() call has been updated to take no parameters (the DrainManager argument isn't needed) and return a DrainState instead of an unsigned integer (there is no point returning anything other than 0 or 1 any more). Drainable objects should return either DrainState::Draining (equivalent to returning 1 in the old system) if they need more time to drain or DrainState::Drained (equivalent to returning 0 in the old system) if they are already in a consistent state. Returning DrainState::Running is considered an error.
Drain done signalling is now done through the signalDrainDone() method in the Drainable class instead of using the DrainManager directly. The new call checks if the state of the object is DrainState::Draining before notifying the drain manager. This means that it is safe to call signalDrainDone() without first checking if the simulator has requested draining. The intention here is to reduce the code needed to implement draining in simple objects. |
10910:32f3d1c454ec |
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Make the drain state a global typed enum
The drain state enum is currently a part of the Drainable interface. The same state machine will be used by the DrainManager to identify the global state of the simulator. Make the drain state a global typed enum to better cater for this usage scenario. |
10905:a6ca6831e775 |
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Refactor the serialization base class
Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically:
* Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section.
* Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects.
* Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections).
* The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects.
* Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code. |
10897:a90d22342aa5 |
04-Jul-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
o3: correct the number of cc registers in rename map |
10860:cba0f26038b4 |
01-Jun-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
kvm, arm: Add support for aarch64
This changeset adds support for aarch64 in kvm. The CPU module supports both checkpointing and online CPU model switching as long as no devices are simulated by the host kernel. It currently has the following limitations:
* The system register based generic timer can only be simulated by the host kernel. Workaround: Use a memory mapped timer instead to simulate the timer in gem5.
* Simulating devices (e.g., the generic timer) in the host kernel requires that the host kernel also simulates the GIC.
* ID registers in the host and in gem5 must match for switching between simulated CPUs and KVM. This is particularly important for ID registers describing memory system capabilities (e.g., ASID size, physical address size).
* Switching between a virtualized CPU and a simulated CPU is currently not supported if in-kernel device emulation is used. This could be worked around by adding support for switching to the gem5 (e.g., the KvmGic) side of the device models. A simpler workaround is to avoid in-kernel device models altogether. |
10859:0ba6f47025d1 |
01-Jun-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
kvm, arm, dev: Add an in-kernel GIC implementation
This changeset adds a GIC implementation that uses the kernel's built-in support for simulating the interrupt controller. Since there is currently no support for state transfer between gem5 and the kernel, the device model does not support serialization and CPU switching (which would require switching to a gem5-simulated GIC). |
10858:6734ec272816 |
01-Jun-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
kvm: Handle inst events at the current instruction count
There are cases (particularly when attaching GDB) when instruction events are scheduled at the current instruction tick. This used to trigger an assertion error in kvm. This changeset adds a check for this condition and forces KVM to do a quick entry that completes any pending IO operations, but does not execute any new instructions, before servicing the event. We could check if we need to enter KVM at all, but forcing a quick entry is makes the code slightly cleaner and does not hurt correctness (performance is hardly an issue in these cases). |
10857:d2d5212578db |
01-Jun-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
kvm, arm: Move ARM-specific files to arch/arm/kvm/
This changeset moves the ARM-specific KVM CPU implementation to arch/arm/kvm/. This change is expected to keep the source tree somewhat cleaner as we start adding support for ARMv8 and KVM in-kernel interrupt controller simulation. |
10851:a50657a4f0c1 |
26-May-2015 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: Fix a bug in counting issued instructions in MinorCPU
The MinorCPU would count bubbles in Execute::issue as part of the num_insts_issued and so sometimes reach the instruction issue limit incorrectly.
Fixed by checking for a bubble in one new place. |
10843:adec1cf1c300 |
23-May-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
kvm: Fix dumping code for large registers
The register dumping code in kvm tries to print the bytes in large registers (128 bits and larger) instead of printing them as hex. This changeset fixes that. |
10842:0c0506575409 |
23-May-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
kvm, x86: Guard x86-specific APIs in KvmVM
Protect x86-specific APIs in KvmVM with compile-time guards to avoid breaking ARM builds. |
10835:d4b162a57400 |
15-May-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Appease gcc 5.1
Three minor issues are resolved:
1. Apparently gcc 5.1 does not like negation of booleans followed by bitwise AND.
2. Somehow the compiler also gets confused and warns about NoopMachInst being unused (removing it causes compilation errors though). Most likely a compiler bug.
3. There seems to be a number of instances where loop unrolling causes false positives for the array-bounds check. For now, switch to std::array. Potentially we could disable the warning for newer gcc versions, but switching to std::array is probably a good move in any case. |
10824:308771bd2647 |
05-May-2015 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
mem, cpu: Add a separate flag for strictly ordered memory
The Request::UNCACHEABLE flag currently has two different functions. The first, and obvious, function is to prevent the memory system from caching data in the request. The second function is to prevent reordering and speculation in CPU models.
This changeset gives the order/speculation requirement a separate flag (Request::STRICT_ORDER). This flag prevents CPU models from doing the following optimizations:
* Speculation: CPU models are not allowed to issue speculative loads.
* Write combining: CPU models and caches are not allowed to merge writes to the same cache line.
Note: The memory system may still reorder accesses unless the UNCACHEABLE flag is set. It is therefore expected that the STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent this behavior. |
10821:581fb2484bd6 |
05-May-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Snoop into caches on uncacheable accesses
This patch takes a last step in fixing issues related to uncacheable accesses. We do not separate uncacheable memory from uncacheable devices, and in cases where it is really memory, there are valid scenarios where we need to snoop since we do not support cache maintenance instructions (yet). On snooping an uncacheable access we thus provide data if possible. In essence this makes uncacheable accesses IO coherent.
The snoop filter is also queried to steer the snoops, but not updated since the uncacheable accesses do not allocate a block. |
10814:46b6043bd32c |
05-May-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Work around gcc 4.9 issues with Num_OpClasses
This patch fixes a recent issue with gcc 4.9 (and possibly more) being convinced that indices outside the array bounds are used when initialising the FUPool members. |
10807:dac26eb4cb64 |
29-Apr-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: replace issueLatency with bool pipelined
Currently, each op class has a parameter issueLat that denotes the cycles after which another op of the same class can be issued. As of now, this latency can either be one cycle (fully pipelined) or same as execution latency of the op (not at all pipelined). The fact that issueLat is a parameter of type Cycles makes one believe that it can be set to any value. To avoid the confusion, the parameter is being renamed as 'pipelined' with type boolean. If set to true, the op would execute in a fully pipelined fashion. Otherwise, it would execute in an unpipelined fashion. |
10806:b9410e821c41 |
29-Apr-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: single cycle default div microop latency on x86
This patch sets the default latency of the division microop to a single cycle on x86. This is because the division instructions DIV and IDIV have been implemented as loops of div microops, where each microop computes a single bit of the quotient. |
10797:855cafd64da1 |
22-Apr-2015 |
Brandon Potter <brandon.potter@amd.com> |
cpu: remove conditional check (count > 0) on o3 IQ squashes
The o3 cpu instruction queue model uses the count variable to track the number of unissued instructions in the queue. Previously, the squash method used this variable to avoid executing the doSquash method when there were no unissued instructions in the pipeline. A corner case problem exists when only issued instructions exist in the pipeline and a squash occurs; the doSquash code is not invoked and subsequently does not clean up state properly. |
10790:378b344385a8 |
20-Apr-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Remove the InOrderCPU from the tree
This patch takes the final step in removing the InOrderCPU from the tree. Rest in peace.
The MinorCPU is now used to model an in-order microarchitecture, and long term the MinorCPU will eventually be renamed InOrderCPU. |
10786:ee82c2c30421 |
14-Apr-2015 |
Malek Musleh <malek.musleh@gmail.com> |
config, cpu: fix progress interval for switched CPUs This patch ensures that the CPU progress Event is triggered for the new set of switched_cpus that get scheduled (e.g. during fast-forwarding). it also avoids printing the interval state if the cpu is currently switched out.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10785:f56c10663a01 |
13-Apr-2015 |
Dibakar Gope <gope@wisc.edu> |
cpu: re-organizes the branch predictor structure.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10774:68d688cbe26c |
03-Apr-2015 |
Nikos Nikoleris <nikos.nikoleris@gmail.com> |
cpu: fix system total instructions accounting
The totalInstructions counter is only incremented when the whole instruction is commited and not on every microop. It was incorrectly reset in atomic and timing cpus.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>" |
10761:c7e392e343eb |
26-Mar-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Fix InstPBTrace inheritance
This patch fixes an issue that prevented gem5 to be built with C++ config and without Python. |
10760:8f5993cfa916 |
23-Mar-2015 |
Steve Reinhardt <steve.reinhardt@amd.com> |
mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMW
Makes x86-style locked operations even more distinct from LLSC operations. Using "locked" by itself should be obviously ambiguous now. |
10746:2e65cd110a97 |
19-Mar-2015 |
Wendy Elsasser <wendy.elsasser@arm.com> |
cpu: Fix TrafficGen message format
Fix erroneous message format for fatal error. Previously, code did not have type indicator (% instead of %d).
Also removed redundant fatal check.
Ran modified sweep.py with in range and out of range values to test. |
10739:4cfe55719da5 |
11-Feb-2015 |
Steve Reinhardt <steve.reinhardt@amd.com> |
mem: restructure Packet cmd initialization a bit more
Refactor the way that specific MemCmd values are generated for packets. The new approach is a little more elegant in that we assign the right value up front, and it's also more amenable to non-heap-allocated Packet objects.
Also replaced the code in the Minor model that was still doing it the ad-hoc way.
This is basically a refinement of http://repo.gem5.org/gem5/rev/711eb0e64249. |
10734:cbed6a2cbc35 |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: another assert instead of check |
10733:705aca3c1240 |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: Remove unused code in iew, add assert instead. |
10732:60482901c996 |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: commit: mark pipeline delay variable as consts |
10731:17c5d36dfdac |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: remove unused stat variables. |
10730:11cb85883e6a |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: combine if with same condition |
10729:41c93a3c1051 |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: remove member variable squashCounter The variable is used in only one place and a whole new function setNextStatus() has been defined just to compute the value of the variable. Instead of calling the function, the value is now computed in the loop that preceded the function call. |
10728:0fd6a08a7332 |
09-Mar-2015 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: remove unused function annotateMemoryUnits() |
10720:67b3e74de9ae |
02-Mar-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Move crossbar default latencies to subclasses
This patch introduces a few subclasses to the CoherentXBar and NoncoherentXBar to distinguish the different uses in the system. We use the crossbar in a wide range of places: interfacing cores to the L2, as a system interconnect, connecting I/O and peripherals, etc. Needless to say, these crossbars have very different performance, and the clock frequency alone is not enough to distinguish these scenarios.
Instead of trying to capture every possible case, this patch introduces dedicated subclasses for the three primary use-cases: L2XBar, SystemXBar and IOXbar. More can be added if needed, and the defaults can be overridden. |
10717:4f8c1bd6fdb8 |
02-Mar-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
arm: Share a port for the two table walker objects
This patch changes how the MMU and table walkers are created such that a single port is used to connect the MMU and the TLBs to the memory system. Previously two ports were needed as there are two table walker objects (stage one and stage two), and they both had a port. Now the port itself is moved to the Stage2MMU, and each TableWalker is simply using the port from the parent.
By using the same port we also remove the need for having an additional crossbar joining the two ports before the walker cache or the L2. This simplifies the creation of the CPU cache topology in BaseCPU.py considerably. Moreover, for naming and symmetry reasons, the TLB walker port is connected through the stage-one table walker thus making the naming identical to x86. Along the same line, we use the stage-one table walker to generate the master id that is used by all TLB-related requests. |
10715:ced453290507 |
02-Mar-2015 |
Rekai <Rekai.GonzalezAlberquilla@arm.com> |
cpu: o3 register renaming request handling improved
Now, prior to the renaming, the instruction requests the exact amount of registers it will need, and the rename_map decides whether the instruction is allowed to proceed or not. |
10713:eddb533708cb |
02-Mar-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Split port retry for all different packet classes
This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks. |
10711:7f67a8d786a2 |
02-Mar-2015 |
Stephan Diestelhorst <stephan.diestelhorst@arm.com> |
cpu: Add a PC-value to the traffic generator requests
Have the traffic generator add its masterID as the PC address to the requests. That way, prefetchers (and other components) that use a PC for request classification will see per-tester streams of requests. This enables us to test strided prefetchers with the memchecker, too. |
10704:63810213a687 |
16-Feb-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: TrafficGen sinks snoops without complaining
To be able to use the TrafficGen in a system with caches we need to allow it to sink incoming snoop requests. By default the master port panics, so silently ignore any snoops. |
10698:829adc48e175 |
16-Feb-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Make readMiscRegNoEffect const throughout
Finally took the plunge and made this apply to all ISAs, not just ARM. |
10695:ef2c71a5f02e |
16-Feb-2015 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: add support for outputing a protobuf formatted CPU trace
Doesn't support x86 due to static instruction representation. |
10688:22452667fd5c |
11-Feb-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Tidy up the MemTest and make false sharing more obvious
The MemTest class really only tests false sharing, and as such there was a lot of old cruft that could be removed. This patch cleans up the tester, and also makes it more clear what the assumptions are. As part of this simplification the reference functional memory is also removed.
The regression configs using MemTest are updated to reflect the changes, and the stats will be bumped in a separate patch. The example config will be updated in a separate patch due to more extensive re-work.
In a follow-on patch a new tester will be introduced that uses the MemChecker to implement true sharing. |
10687:276da6265ab8 |
11-Feb-2015 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
sim: Move the BaseTLB to src/arch/generic/
The TLB-related code is generally architecture dependent and should live in the arch directory to signify that. |
10683:94901e131a7f |
06-Feb-2015 |
Alexandru Dutu <alexandru.dutu@amd.com> |
cpu: Idle CPU status logic revised
This patch sets the CPU status to idle when the last active thread gets suspended. |
10669:aae98c1cf4a0 |
03-Feb-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Ensure timing CPU sinks response before sending new request
This patch changes how the timing CPU deals with processing responses, always scheduling an event, even if it is for the current tick. This helps to avoid situations where a new request shows up before a response is finished in the crossbar, and also is more in line with any realistic behaviour. |
10666:3c42be107634 |
25-Jan-2015 |
Ali Saidi <Ali.Saidi@ARM.com> |
arm: always set the IsFirstMicroop flag
While the IsFirstMicroop flag exists it was only occasionally used in the ARM instructions that gem5 microOps and therefore couldn't be relied on to be correct. |
10665:aef704eaedd2 |
25-Jan-2015 |
Ali Saidi <Ali.Saidi@ARM.com> |
sim: Clean up InstRecord
Track memory size and flags as well as add some comments and consts. |
10664:61a0b02aa800 |
25-Jan-2015 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Remove all notion that we know when the cpu is misspeculating.
We have no way of knowing if a CPU model is on the wrong path with our execute-in-execute CPU models. Don't pretend that we do. |
10663:fae54a666162 |
25-Jan-2015 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Put all CPU instruction tracers in a single file |
10662:c3fd4c020e49 |
25-Jan-2015 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: remove legion tracer
If someone wants to debug with legion again they can restore the code from the repository, but no need to have it hang around indefinately. |
10653:e3fc6bc7f97e |
22-Jan-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Clean up Request initialisation
This patch tidies up how we create and set the fields of a Request. In essence it tries to use the constructor where possible (as opposed to setPhys and setVirt), thus avoiding spreading the information across a number of locations. In fact, setPhys is made private as part of this patch, and a number of places where we callede setVirt instead uses the appropriate constructor. |
10651:333350e4e334 |
20-Jan-2015 |
Nikos Nikoleris <nikos.nikoleris@gmail.com> |
cpu: commit probe notification on every microop or macroop The ppCommit should notify the attached listener every time the cpu commits a microop or non microcoded insturction. The listener can then decide whether it will process only the last microop (eg. SimPoint probe).
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10647:899c0e7e85f1 |
20-Jan-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Fix retry bug in MinorCPU LSQ |
10643:43e80296995d |
10-Jan-2015 |
Nikos Nikoleris <nikos.nikoleris@gmail.com> |
cpu: fix RetiredStores probe point
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10634:c5a2c5ef6e68 |
03-Jan-2015 |
Andrew Lukefahr <lukefahr@umich.edu> |
minor: fixed LSQ MasterPortID
Minor was reporting the data cache access as ".inst" accesses. This just switches the MasterPortID to dataMasterPortId.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10605:8fc6e7a835d1 |
10-Dec-2014 |
Gabe Black <gabeblack@google.com> |
Let other objects set up memory like regions in a KVM VM. |
10596:1eec33d2fc52 |
05-Dec-2014 |
Gabe Black <gabeblack@google.com> |
cpu: Only check for PC events on instruction boundaries.
Only the instruction address is actually checked, so there's no need to check repeatedly while we're working through the microops of a macroop and that's not changing. |
10581:7c4f1d0a8cff |
02-Dec-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: Fix retries on barrier/store in Minor's store buffer
This patch fixes a case where a store in Minor's store buffer never leaves the store buffer as it is pre-maturely counted as having been issued, leading to the store buffer idling.
LSQ::StoreBuffer::numUnissuedAccesses should count the number of accesses either in memory, or still in the store buffer after being completed.
For stores which are also barriers, the store will stay in the store buffer for a cycle after it is completed and will be cleaned up by the barrier clearing code (to ensure that barriers are completed in-order). To acheive this, numUnissuedAccesses is not decremented when a store-barrier is issued to memory, but when its barrier effect is cleared.
Without this patch, the correct behaviour happens when a memory transaction is immediately accepted, but not if it needs a retry. |
10580:953d7b741619 |
02-Dec-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: Fix memoryIssueLimit checking in Minor
This patch fixes the checking of the number of memory instructions issued per cycles in the Minor CPU. |
10575:a8d612fa170b |
02-Dec-2014 |
Marco Elver <Marco.Elver@ARM.com> |
cpu, o3: Ignored invalidate causing same-address load reordering
In case the memory subsystem sends a combined response with invalidate (e.g. ReadRespWithInvalidate), we cannot ignore the invalidate part of the response.
If we were to ignore the invalidate part, under certain circumstances this effectively leads to reordering of loads to the same address which is not permitted under any memory consistency model implemented in gem5.
Consider the case where a later load's address is computed before an earlier load in program order, and is therefore sent to the memory subsystem first. At some point the earlier load's address is computed and in doing so correctly marks the later load as a possibleLoadViolation. In the meantime some other node writes and sends invalidations to all other nodes. The invalidation races with the later load's ReadResp, and arrives before ReadResp and is deferred. Upon receipt of the ReadResp, the response is changed to ReadRespWithInvalidate, and sent to the CPU. If we ignore the invalidate part of the packet, we let the later load read the old value of the address. Eventually the earlier load's ReadResp arrives, but with new data. As there was no invalidate snoop (sunk into the ReadRespWithInvalidate), and if we did not process the invalidate of the ReadRespWithInvalidate, we obtain a load reordering.
A similar scenario can be constructed where the earlier load's address is computed after ReadRespWithInvalidate arrives for the younger load. In this case hitExternalSnoop needs to be set to true on the ReadRespWithInvalidate, so that upon knowing the address of the earlier load, checkViolations will cause the later load to be squashed.
Finally we must account for the case where both loads are sent to the memory subsystem (reordered), a snoop invalidate arrives and correctly sets the later loads fault to ReExec. However, before the CPU processes the fault, the later load's ReadResp arrives and the writeback discards the outstanding fault. We must add a check to ensure that we do not skip any unprocessed faults. |
10573:3b405d11d6dc |
02-Dec-2014 |
Stephan Diestelhorst <stephan.diestelhorst@arm.com> |
cpu: Move packet deallocation to recvTimingResp in the O3 CPU
Move the packet deallocations in the O3 CPU so that the completeDataAccess deals only with the LSQ specific parts and the generic recvTimingResp frees the packet in all other cases. |
10566:c99c8d2a7c31 |
02-Dec-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Assume all dynamic packet data is array allocated
This patch simplifies how we deal with dynamically allocated data in the packet, always assuming that it is array allocated, and hence should be array deallocated (delete[] as opposed to delete). The only uses of dataDynamic was in the Ruby testers.
The ARRAY_DATA flag in the packet is removed accordingly. No defragmentation of the flags is done at this point, leaving a gap in the bit masks.
As the last part the patch, it renames dataDynamicArray to dataDynamic. |
10563:755b18321206 |
02-Dec-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Add const getters for write packet data
This patch takes a first step in tightening up how we use the data pointer in write packets. A const getter is added for the pointer itself (getConstPtr), and a number of member functions are also made const accordingly. In a range of places throughout the memory system the new member is used.
The patch also removes the unused isReadWrite function. |
10562:b99fdc295c34 |
02-Dec-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Remove null-check bypassing in Packet::getPtr
This patch removes the parameter that enables bypassing the null check in the Packet::getPtr method. A number of call sites assume the value to be non-null.
The one odd case is the RubyTester, which issues zero-sized prefetches(!), and despite being reads they had no valid data pointer. This is now fixed, but the size oddity remains (unless anyone object or has any good suggestions).
Finally, in the Ruby Sequencer, appropriate checks are made for flush packets as they have no valid data pointer. |
10553:c1ad57c53a36 |
23-Nov-2014 |
Alexandru Dutu <alexandru.dutu@amd.com> |
kvm, x86: Adding support for SE mode execution This patch adds methods in KvmCPU model to handle KVM exits caused by syscall instructions and page faults. These types of exits will be encountered if KvmCPU is run in SE mode. |
10537:47fe87b0cf97 |
14-Nov-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arm: Fixes based on UBSan and static analysis
Another churn to clean up undefined behaviour, mostly ARM, but some parts also touching the generic part of the code base.
Most of the fixes are simply ensuring that proper intialisation. One of the more subtle changes is the return type of the sign-extension, which is changed to uint64_t. This is to avoid shifting negative values (undefined behaviour) in the ISA code. |
10533:d1dce0b728b6 |
12-Nov-2014 |
Ali Saidi <ali.saidi@arm.com> |
arm: Fix timing wakeup with LLSC |
10529:05b5a6cf3521 |
06-Nov-2014 |
Marc Orr <morr@cs.wisc.edu> |
x86 isa: This patch attempts an implementation at mwait.
Mwait works as follows: 1. A cpu monitors an address of interest (monitor instruction) 2. A cpu calls mwait - this loads the cache line into that cpu's cache. 3. The cpu goes to sleep. 4. When another processor requests write permission for the line, it is evicted from the sleeping cpu's cache. This eviction is forwarded to the sleeping cpu, which then wakes up.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10527:d0c2ba70dc12 |
06-Nov-2014 |
Andrew Lukefahr <lukefahr@umich.edu> |
cpu: Minor Draining Bug
Fixes a bug where Minor drains in the midst of committing a conditional store.
While committing a conditional store, lastCommitWasEndOfMacroop is true (from the previous instruction) as we still haven't finished the conditional store. If a drain occurs before the cache response, Minor would check just lastCommitWasEndOfMacroop, which was true, and set drainState=DrainHaltFetch, which increases the streamSeqNum. This caused the conditional store to be squashed when the memory responded and it completed. However, to the memory the store succeeded, while to the instruction sequence it never occurred.
In the case of an LLSC, the instruction sequence will replay the squashed STREX, which will fail as the cache is no longer in LLSC. Then the instruction sequence will loop back to a LDREX, which receives the updated (incorrect) value.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10511:e57f5bffc553 |
30-Oct-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add writeback modeling for drain functionality
It is possible for the O3 CPU to consider itself drained and later have a squashed instruction perform a writeback. This patch re-adds tracking of in-flight instructions to prevent falsely signaling a drained event. |
10510:7e54a9a9f6b2 |
30-Oct-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add drain check functionality to IEW
IEW did not check the instQueue and memDepUnit to ensure they were drained. This caused issues when drainSanityCheck() did check those structures after asserting IEW was drained. |
10505:38c7a9ea7729 |
30-Oct-2014 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Add support to checker for CACHE_BLOCK_ZERO commands.
The checker didn't know how to properly validate these new commands. |
10504:58d5d471b598 |
30-Oct-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: Fix barrier push to store buffer when full bug in Minor
This patch fixes a bug where a completing load or store which is also a barrier can push a barrier into the store buffer without first checking that there is a free slot.
The bug was not fatal but would print a warning that the store buffer was full when inserting. |
10487:5914229e6b16 |
20-Oct-2014 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: corrects base FP and CC register index in removeThread() |
10474:799c8ee4ecba |
16-Oct-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Use shared_ptr for all Faults
This patch takes quite a large step in transitioning from the ad-hoc RefCountingPtr to the c++11 shared_ptr by adopting its use for all Faults. There are no changes in behaviour, and the code modifications are mostly just replacing "new" with "make_shared". |
10473:4cbe53150053 |
16-Oct-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
o3: Use shared_ptr for MemDepEntry
This patch transitions the o3 MemDepEntry from the ad-hoc RefCountingPtr to the c++11 shared_ptr. There are no changes in behaviour, and the code modifications are mainly replacing "new" with "make_shared". |
10464:2a0fe8bca031 |
16-Oct-2014 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Probe points for basic PMU stats
This changeset adds probe points that can be used to implement PMU counters for CPU stats. The following probes are supported:
* BaseCPU::ppCycles / Cycles * BaseCPU::ppRetiredInsts / RetiredInsts * BaseCPU::ppRetiredLoads / RetiredLoads * BaseCPU::ppRetiredStores / RetiredStores * BaseCPU::ppRetiredBranches RetiredBranches |
10462:e975e8afba8b |
16-Oct-2014 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Add branch predictor PMU probe points
This changeset adds probe points that can be used to implement PMU counters for branch predictor stats. The following probes are supported:
* BPRedUnit::ppBranches / Branches * BPRedUnit::ppMisses / Misses |
10450:933cc91f63e1 |
11-Oct-2014 |
Andrew Lukefahr <lukefahr@umich.edu> |
cpu: Fix o3 SMT IQCount bug
Commmitted by: Nilay Vaish <nilay@cs.wisc.edu> |
10426:cba563d00376 |
09-Oct-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Remove Ozone CPU from the source tree
The Ozone CPU is now very much out of date and completely non-functional, with no one actively working on restoring it. It is a source of confusion for new users who attempt to use it before realizing its current state. RIP |
10417:710ee116eb68 |
27-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Use const StaticInstPtr references where possible
This patch optimises the passing of StaticInstPtr by avoiding copying the reference-counting pointer. This avoids first incrementing and then decrementing the reference-counting pointer. |
10416:dd64a2984966 |
27-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
scons: Address issues related to gcc 4.9.1
Fix a number few minor issues to please gcc 4.9.1. Removing the '-fuse-linker-plugin' flag means no libraries are part of the LTO process, but hopefully this is an acceptable loss, as the flag causes issues on a lot of systems (only certain combinations of gcc, ld and ar work). |
10412:6400a2ab4e22 |
27-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Fix a bunch of minor issues identified by static analysis
Add some missing initialisation, and fix a handful benign resource leaks (including some false positives). |
10408:a59c189de383 |
20-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Remove unused deallocateContext calls
The call paths for de-scheduling a thread are halt() and suspend(), from the thread context. There is no call to deallocateContext() in general, though some CPUs chose to define it. This patch removes the function from BaseCPU and the cores which do not require it. |
10407:a9023811bf9e |
20-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate
activate(), suspend(), and halt() used on thread contexts had an optional delay parameter. However this parameter was often ignored. Also, when used, the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were ever specified). This patch removes the delay parameter and 'Events' associated with them across all ISAs and cores. Unused activate logic is also removed. |
10405:7a618c07e663 |
20-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Rename Bus to XBar to better reflect its behaviour
This patch changes the name of the Bus classes to XBar to better reflect the actual timing behaviour. The actual instances in the config scripts are not renamed, and remain as e.g. iobus or membus.
As part of this renaming, the code has also been clean up slightly, making use of range-based for loops and tidying up some comments. The only changes outside the bus/crossbar code is due to the delay variables in the packet. |
10392:0100f00a229e |
20-Sep-2014 |
Wendy Elsasser <wendy.elsasser@arm.com> |
cpu: Update DRAM traffic gen
Add new DRAM_ROTATE mode to traffic generator. This mode will generate DRAM traffic that rotates across banks per rank, command types, and ranks per channel
The looping order is illustrated below: for (ranks per channel) for (command types) for (banks per rank) // Generate DRAM Command Series
This patch also adds the read percentage as an input argument to the DRAM sweep script. If the simulated read percentage is 0 or 100, the middle for loop does not generate additional commands. This loop is used only when the read percentage is set to 50, in which case the middle loop will toggle between read and write commands.
Modified sweep.py script, which generates DRAM traffic. Added input arguments and support for new DRAM_ROTATE mode. The script now has input arguments for: 1) Read percentage 2) Number of ranks 3) Address mapping 4) Traffic generator mode (DRAM or DRAM_ROTATE)
The default values are: 100% reads, 1 rank, RoRaBaCoCh address mapping, and DRAM traffic gen mode
For the DRAM traffic mode, added multi-rank support. |
10386:c81407818741 |
20-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
base: Clean up redundant string functions and use C++11
This patch does a bit of housekeeping on the string helper functions and relies on the C++11 standard library where possible. It also does away with our custom string hash as an implementation is already part of the standard library. |
10383:b31580e27d1f |
20-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add ExecFlags debug flag
Adds a debug flag to print out the flags a instruction is tagged with. |
10381:ab8b8601b6ff |
20-Sep-2014 |
Dam Sunwoo <dam.sunwoo@arm.com> |
cpu: use probes infrastructure to do simpoint profiling
Instead of having code embedded in cpu model to do simpoint profiling use the probes infrastructure to do it. |
10379:c00f6d7e2681 |
19-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Pass faults by const reference where possible
This patch changes how faults are passed between methods in an attempt to copy as few reference-counting pointer instances as possible. This should avoid unecessary copies being created, contributing to the increment/decrement of the reference counters. |
10378:a3e23d599e11 |
19-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Use a deque in o3 rename instruction queue
Switch from a list to a data structure with better data layout. |
10376:28c63d075e0c |
19-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Use safe_cast when assumptions are made about return value
This patch changes two dynamic_cast to safe_cast as we assume the return value is not NULL (without checking). |
10368:a7cb233caa7b |
12-Sep-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: Fix memory access in Minor not setting parent Request flags
This patch fixes cases where uncacheable/memory type flags are not set correctly on a memory op which is split in the LSQ. Without this patch, request->request if freely used to check flags where the flags should actually come from the accumulation of request fragment flags.
This patch also fixes a bug where an uncacheable access which passes through tryToSendRequest more than once can increment LSQ::numAccessesInMemorySystem more than once. |
10367:bf52480abd01 |
12-Sep-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
style: Fix line continuation, especially in debug messages
This patch closes a number of space gaps in debug messages caused by the incorrect use of line continuation within strings. (There's also one consistency change to a similar, but correct, use of line continuation) |
10366:128c1ed03f4e |
12-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
minor: Fix typo in DPRINTF for Minor branch prediction |
10363:c870b43d2ba6 |
09-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Only iterate over possible threads on the o3 cpu
Some places in O3 always iterated over "Impl::MaxThreads" even if a CPU had fewer threads. This removes a few of those instances. |
10360:919c02740209 |
09-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Fix a number of unitialised variables and members
Static analysis unearther a bunch of uninitialised variables and members, and this patch addresses the problem. In all cases these omissions seem benign in the end, but at least fixing them means less false positives next time round. |
10348:c91b23c72d5e |
03-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
base: Use the global Mersenne twister throughout
This patch tidies up random number generation to ensure that it is done consistently throughout the code base. In essence this involves a clean-up of Ruby, and some code simplifications in the traffic generator.
As part of this patch a bunch of skewed distributions (off-by-one etc) have been fixed.
Note that a single global random number generator is used, and that the object instantiation order will impact the behaviour (the sequence of numbers will be unaffected, but if module A calles random before module B then they would obviously see a different outcome). The dependency on the instantiation order is true in any case due to the execution-model of gem5, so we leave it as is. Also note that the global ranom generator is not thread safe at this point.
Regressions using the memtest, TrafficGen or any Ruby tester are affected and will be updated accordingly. |
10342:711eb0e64249 |
13-May-2014 |
Curtis Dunham <Curtis.Dunham@arm.com> |
mem: Refactor assignment of Packet types
Put the packet type swizzling (that is currently done in a lot of places) into a refineCommand() member function. |
10340:40d24a672351 |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix o3 drain bug
For X86, the o3 CPU would get stuck with the commit stage not being drained if an interrupt arrived while drain was pending. isDrained() makes sure that pcState.microPC() == 0, thus ensuring that we are at an instruction boundary. However, when we take an interrupt we execute:
pcState.upc(romMicroPC(entry)); pcState.nupc(romMicroPC(entry) + 1); tc->pcState(pcState);
As a result, the MicroPC is no longer zero. This patch ensures the drain is delayed until no interrupts are present. Once draining, non-synchronous interrupts are deffered until after the switch. |
10338:8bee5f4edb92 |
29-Apr-2014 |
Curtis Dunham <Curtis.Dunham@arm.com> |
arm: use condition code registers for ARM ISA
Analogous to ee049bf (for x86). Requires a bump of the checkpoint version and corresponding upgrader code to move the condition code register values to the new register file. |
10335:1b627a6ddac0 |
03-Sep-2014 |
Dam Sunwoo <dam.sunwoo@arm.com> |
cpu: fix bimodal predictor to use correct global history reg
A small bug in the bimodal predictor caused significant degradation in performance on some benchmarks. This was caused by using the wrong globalHistoryReg during the update phase. This patches fixes the bug and brings the performance to normal level. |
10333:6be8945d226b |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix cache blocked load behavior in o3 cpu
This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked.
Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that.
Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior. |
10332:1ba825974ee6 |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix o3 quiesce fetch bug
O3 is supposed to stop fetching instructions once a quiesce is encountered. However due to a bug, it would continue fetching instructions from the current fetch buffer. This is because of a break statment that only broke out of the first of 2 nested loops. It should have broken out of both. |
10331:ed05298e8566 |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix SMT scheduling issue with the O3 cpu
The o3 cpu could attempt to schedule inactive threads under round-robin SMT mode.
This is because it maintained an independent priority list of threads from the active thread list. This priority list could be come stale once threads were inactive, leading to the cpu trying to fetch/commit from inactive threads.
Additionally the fetch queue is now forcibly flushed of instrctuctions from the de-scheduled thread.
Relevant output:
24557000: system.cpu: [tid:1]: Calling deactivate thread. 24557000: system.cpu: [tid:1]: Removing from active threads list
24557500: system.cpu: FullO3CPU: Ticking main, FullO3CPU. 24557500: system.cpu.fetch: Running stage. 24557500: system.cpu.fetch: Attempting to fetch from [tid:1] |
10330:f54586c894e3 |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix incorrect speculative branch predictor behavior
When a branch mispredicted gem5 would squash all history after and including the mispredicted branch. However, the mispredicted branch is still speculative and its history is required to rollback state if another, older, branch mispredicts. This leads to things like RAS corruption. |
10329:12e3be8203a5 |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add a fetch queue to the o3 cpu
This patch adds a fetch queue that sits between fetch and decode to the o3 cpu. This effectively decouples fetch from decode stalls allowing it to be more aggressive, running futher ahead in the instruction stream. |
10328:867b536a68be |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix o3 front-end pipeline interlock behavior
The o3 pipeline interlock/stall logic is incorrect. o3 unnecessicarily stalled fetch and decode due to later stages in the pipeline. In general, a stage should usually only consider if it is stalled by the adjacent, downstream stage. Forcing stalls due to later stages creates and results in bubbles in the pipeline. Additionally, o3 stalled the entire frontend (fetch, decode, rename) on a branch mispredict while the ROB is being serially walked to update the RAT (robSquashing). Only should have stalled at rename. |
10327:5b6279635c49 |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Change writeback modeling for outstanding instructions
As highlighed on the mailing list gem5's writeback modeling can impact performance. This patch removes the limitation on maximum outstanding issued instructions, however the number that can writeback in a single cycle is still respected in instToCommit(). |
10319:4207f9bfcceb |
03-Sep-2014 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
arch, cpu: Factor out the ExecContext into a proper base class
We currently generate and compile one version of the ISA code per CPU model. This is obviously wasting a lot of resources at compile time. This changeset factors out the interface into a separate ExecContext class, which also serves as documentation for the interface between CPUs and the ISA code. While doing so, this changeset also fixes up interface inconsistencies between the different CPU models.
The main argument for using one set of ISA code per CPU model has always been performance as this avoid indirect branches in the generated code. However, this argument does not hold water. Booting Linux on a simulated ARM system running in atomic mode (opt/10.linux-boot/realview-simple-atomic) is actually 2% faster (compiled using clang 3.4) after applying this patch. Additionally, compilation time is decreased by 35%. |
10309:ccb1801742a1 |
01-Sep-2014 |
Nilay Vaish <nilay@cs.wisc.edu> |
mem: change the namespace Message to ProtoMessage The namespace Message conflicts with the Message data type used extensively in Ruby. Since Ruby is being moved to the same Master/Slave ports based configuration style as the rest of gem5, this conflict needs to be resolved. Hence, the namespace is being renamed to ProtoMessage. |
10302:0e9e99e6369a |
01-Sep-2014 |
Nilay Vaish <nilay@cs.wisc.edu> |
ruby: eliminate type Time There is another type Time in src/base class which results in a conflict. |
10281:c7187ee80868 |
13-Aug-2014 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
scons: Build the branch predictor for all CPUs
The branch predictor is normally only built when a CPU that uses a branch predictor is built. The list of CPUs is currently incomplete as the simple CPUs support branch predictors (for warming, branch stats, etc). In practice, all CPU models now use branch predictors, so this changeset removes the CPU model check and replaces it with a check for the NULL ISA. |
10276:4cbfdcdb2144 |
13-Aug-2014 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Don't forward declare RefCountingPtr
RefCountingPtr is sometimes forward declared to avoid having to include refcnt.hh. This does not work since we typically return instances of RefCountingPtr rather than references to instances. The only reason this currently works is that we include refcnt.hh in cprintf.hh, which "leaks" the header to most other source files. This changeset replaces such forward declarations with an include of refcnt.hh. |
10273:6e6557085eb7 |
13-Aug-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Modernise the branch predictor (STL and C++11)
This patch does some minor house keeping of the branch predictor by adopting STL containers, and shifting some iterator to use range-based for loops.
The predictor history is also changed from a list to a deque as we never to insertion/deletion other than at the front and back. |
10266:d4090f0cab30 |
10-Aug-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Ensure the traffic generator suppresses non-memory packets
This patch adds a check to ensure that packets which are not going to a memory range are suppressed in the traffic generator. Thus, if a trace is collected in full-system, the packets destined for devices are not played back. |
10259:ebb376f73dd2 |
23-Jul-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: `Minor' in-order CPU model
This patch contains a new CPU model named `Minor'. Minor models a four stage in-order execution pipeline (fetch lines, decompose into macroops, decompose macroops into microops, execute).
The model was developed to support the ARM ISA but should be fixable to support all the remaining gem5 ISAs. It currently also works for Alpha, and regressions are included for ARM and Alpha (including Linux boot).
Documentation for the model can be found in src/doc/inside-minor.doxygen and its internal operations can be visualised using the Minorview tool utils/minorview.py.
Minor was designed to be fairly simple and not to engage in a lot of instruction annotation. As such, it currently has very few gathered stats and may lack other gem5 features.
Minor is faster than the o3 model. Sample results:
Benchmark | Stat host_seconds (s) ---------------+--------v--------v-------- (on ARM, opt) | simple | o3 | minor | timing | timing | timing ---------------+--------+--------+-------- 10.linux-boot | 169 | 1883 | 1075 10.mcf | 117 | 967 | 491 20.parser | 668 | 6315 | 3146 30.eon | 542 | 3413 | 2414 40.perlbmk | 2339 | 20905 | 11532 50.vortex | 122 | 1094 | 588 60.bzip2 | 2045 | 18061 | 9662 70.twolf | 207 | 2736 | 1036 |
10244:d2deb51a4abf |
30-Jun-2014 |
Anthony Gutierrez <atgutier@umich.edu> |
cpu: implement a bi-mode branch predictor |
10240:15f822e9410a |
21-Jun-2014 |
Binh Pham <binhpham@cs.rutgers.edu> |
o3: make dispatch LSQ full check more selective
Dispatch should not check LSQ size/LSQ stall for non load/store instructions.
This work was done while Binh was an intern at AMD Research. |
10239:592f0bb6bd6f |
21-Jun-2014 |
Binh Pham <binhpham@cs.rutgers.edu> |
o3: split load & store queue full cases in rename
Check for free entries in Load Queue and Store Queue separately to avoid cases when load cannot be renamed due to full Store Queue and vice versa.
This work was done while Binh was an intern at AMD Research. |
10231:cb2e6950956d |
31-May-2014 |
Steve Reinhardt <steve.reinhardt@amd.com> |
style: eliminate equality tests with true and false
Using '== true' in a boolean expression is totally redundant, and using '== false' is pretty verbose (and arguably less readable in most cases) compared to '!'.
It's somewhat of a pet peeve, perhaps, but I had some time waiting for some tests to run and decided to clean these up.
Unfortunately, SLICC appears not to have the '!' operator, so I had to leave the '== false' tests in the SLICC code. |
10225:01df075d9f93 |
23-May-2014 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: o3: remove stat totalCommittedInsts This patch removes the stat totalCommittedInsts. This variable was used for recording the total number of instructions committed across all the threads of a core. The instructions committed by each thread are recorded invidually. The total would now be generated by summing these individual counts. |
10202:0f00b9e7305a |
09-May-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: Useful getters for ActivityRecorder
Add some useful getters to ActivityRecorder |
10201:30a20d2072c1 |
09-May-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: Add flag name printing to StaticInst
This patch adds a the member function StaticInst::printFlags to allow all of an instruction's flags to be printed without using the individual is... member functions or resorting to exposing the 'flags' vector
It also replaces the enum definition StaticInst::Flags with a Python-generated enumeration and adds to the enum generation mechanism in src/python/m5/params.py to allow Enums to be placed in namespaces other than Enums or, alternatively, in wrapper structs allowing them to be inherited by other classes (so populating that class's name-space with the enumeration element names). |
10200:1ab8753de4d8 |
09-May-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: Timebuf const accessors
Add const accessors for timebuf elements. |
10194:e6d2e8083d9c |
09-May-2014 |
Geoffrey Blake <Geoffrey.Blake@arm.com> |
arch, arm: Preserve TLB bootUncacheability when switching CPUs
The ARM TLBs have a bootUncacheability flag used to make some loads and stores become uncacheable when booting in FS mode. Later the flag is cleared to let those loads and stores operate as normal. When doing a takeOverFrom(), this flag's state is not preserved and is momentarily reset until the CPSR is touched. On single core runs this is a non-issue. On multi-core runs this can lead to crashes on the O3 CPU model from the following series of events: 1) takeOverFrom executed to switch from Atomic -> O3 2) All bootUncacheability flags are reset to true 3) Core2 tries to execute a load covered by bootUncacheability, it is flagged as uncacheable 4) Core2's load needs to replay due to a pipeline flush 3) Core1 core does an action on CPSR 4) The handling code for CPSR then checks all other cores to determine if bootUncacheability can be set to false 5) Asynchronously set bootUncacheability on all cores to false 6) Core2 replays load previously set as uncacheable and notices it is now flagged as cacheable, leads to a panic. This patch implements takeOverFrom() functionality for the ARM TLBs to preserve flag values when switching from atomic -> detailed. |
10193:d717abc806aa |
09-May-2014 |
Curtis Dunham <Curtis.Dunham@arm.com> |
cpu: add more instruction mix statistics
For the o3, add instruction mix (OpClass) histogram at commit (stats also already collected at issue). For the simple CPUs we add a histogram of executed instructions |
10190:fb83d025d1c3 |
09-May-2014 |
Akash Bagdia <akash.bagdia@arm.com> |
cpu, arm: Allow the specification of a socket field
Allow the specification of a socket ID for every core that is reflected in the MPIDR field in ARM systems. This allows studying multi-socket / cluster systems with ARM CPUs. |
10177:402b0e25c41b |
23-Apr-2014 |
Mitchell Hayenga <Mitchell.Hayenga@ARM.com> |
cpu: Fix setTranslateLatency() bug for squashed instructions
setTranslateLatency could sometimes improperly access a deleted request packet after an instruction was squashed. |
10175:e639ff917d2e |
01-Apr-2014 |
Mitch Hayenga <Mitch.Hayenga@ARM.com> |
cpu: Fix case where o3 lsq could print out uninitialized data
In the O3 LSQ, data read/written is printed out in DPRINTFs. However, the data field is treated as a character string with a null terminated. However the data field is not encoded this way. This patch removes that possibility by removing the data part of the print. |
10172:790a214be1f4 |
23-Apr-2014 |
Dam Sunwoo <dam.sunwoo@arm.com> |
cpu: Add O3 CPU width checks
O3CPU has a compile-time maximum width set in o3/impl.hh, but checking the configuration against this limit was not implemented anywhere except for fetch. Configuring a wider pipe than the limit can silently cause various issues during the simulation. This patch adds the proper checking in the constructor of the various pipeline stages. |
10164:2d2c60bda8b2 |
19-Apr-2014 |
Faissal Sleiman <sleimanf@umich.edu> |
o3: Fix occupancy checks for SMT A number of calls to isEmpty() and numFreeEntries() should be thread-specific.
In cpu.cc, the fact that tid is /*commented*/ out is a bug. Say the rob has instructions from thread 0 (isEmpty() returns false), and none from thread 1. If we are trying to squash all of thread 1, then readTailInst(thread 1) will be called because rob->isEmpty() returns false. The result is end_it is not in the list and the while statement loops indefinitely back over the cpu's instList.
In iew_impl.hh, all threads are told they have the entire remaining IQ, when each thread actually has a certain allocation. The result is extra stalls at the iew dispatch stage which the rename stage usually takes care of.
In commit_impl.hh, rob->readHeadInst(thread 1) can be called if the rob only contains instructions from thread 0. This returns a dummyInst (which may work since we are trying to squash all instructions, but hardly seems like the right way to do it).
In rob_impl.hh this fix skips the rest of the function more frequently and is more efficient.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10157:5c2ecad1a3c9 |
09-Apr-2014 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm, x86: Add initial support for multicore simulation
Simulating a SMP or multicore requires devices to be shared between multiple KVM vCPUs. This means that locking is required when accessing devices. This changeset adds the necessary locking to allow devices to execute correctly. It is implemented by temporarily migrating the KVM CPU to the VM's (and devices) event queue when handling MMIO. Similarly, the VM migrates to the interrupt controller's event queue when delivering an interrupt.
The support for fast-forwarding of multicore simulations added by this changeset assumes that all devices in a system are simulated in the same thread and each vCPU has its own thread. Special care must be taken to ensure that devices living under the CPU in the object hierarchy (e.g., the interrupt controller) do not inherit the parent CPUs thread and are assigned to device thread. The KvmVM object is assumed to live in the same thread as the other devices in the system. |
10149:45a67d84fd4a |
25-Mar-2014 |
Marco Elver <marco.elver@ed.ac.uk> |
cpu: o3: lsq: Fix TSO implementation This patch fixes violation of TSO in the O3CPU, as all loads must be ordered with all other loads. In the LQ, if a snoop is observed, all subsequent loads need to be squashed if the system is TSO.
Prior to this patch, the following case could be violated:
P0 | P1 ; MOV [x],mail=/usr/spool/mail/nilay | MOV EAX,[y] ; MOV [y],mail=/usr/spool/mail/nilay | MOV EBX,[x] ;
exists (1:EAX=1 /\ 1:EBX=0) [is a violation]
The problem was found using litmus [http://diy.inria.fr].
Committed by: Nilay Vaish <nilay@cs.wisc.edu |
10138:0e40c53fe85c |
23-Mar-2014 |
Neha Agarwal <neha.agarwal@arm.com> |
cpu: DRAM Traffic Generator
This patch enables a new 'DRAM' mode to the existing traffic generator, catered to generate specific requests to DRAM based on required hit length (stride size) and bank utilization. It is an add on to the Random mode.
The basic idea is to control how many successive packets target the same page, and how many banks are being used in parallel. This gives a two-dimensional space that stresses different aspects of the DRAM timing.
The configuration file needed to use this patch has to be changed as follow: (reference to Random Mode, LPDDR3 memory type)
'STATE 0 10000000000 RANDOM 50 0 134217728 64 3004 5002 0' -> 'STATE 0 10000000000 DRAM 50 0 134217728 32 3004 5002 0 96 1024 8 6 1'
The last 4 parameters to be added are: <stride size (bytes), page size(bytes), number of banks available in DRAM, number of banks to be utilized, address mapping scheme>
The address mapping information is used to get the stride address stream of the specified size and to know where to find the bank bits. The configuration file has a parameter where '0'-> RoCoRaBaCh, '1'-> RoRaBaCoCh/RoRaBaChCo address-mapping schemes. Note that the generator currently assumes a single channel and a single rank. This is to avoid overwhelming the traffic generator with information about the memory organisation. |
10128:013bba88efab |
23-Mar-2014 |
Stan Czerniawski <stan.czerniawski@arm.com> |
cpu: Add basic check to TrafficGen initial state
Prevent incomplete configuration of TrafficGen class from causing segmentation faults. If an 'INIT' line is not present in the configuration file then the currState variable will remain uninitialized which may result in a crash. |
10114:bd83b4f6a12e |
16-Mar-2014 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Clean up signal handling
KVM used to use two signals, one for instruction count exits and one for timer exits. There is really no need to distinguish between the two since they only trigger exits from KVM. This changeset unifies and renames the signals and adds a method, kick(), that can be used to raise the control signal in the vCPU thread. It also removes the early timer warning since we do not normally see if the signal was delivered. |
10113:f02b907bb9e8 |
16-Mar-2014 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: x86: Adjust PC to remove the CS segment base address
gem5 seems to store the PC as RIP+CS_BASE. This is not what KVM expects, so we need to subtract CS_BASE prior to transferring the PC into KVM. This changeset adds the necessary PC manipulation and refactors thread context updates slightly to avoid reading registers multiple times from KVM. |
10112:1a2f64842044 |
16-Mar-2014 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: x86: Add support for x86 INIT and STARTUP handling
This changeset adds support for INIT and STARTUP IPI handling. We currently handle both of these interrupts in gem5 and transfer the state to KVM. Since we do not have a BIOS loaded, we pretend that the INIT interrupt suspends the CPU after reset. |
10111:fd90d9e55e5c |
12-Mar-2014 |
Paul Rosenfeld <dramninjas@gmail.com> |
alpha: Small removal of dead comments/code from alpha ISA
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10110:580b47334a97 |
07-Mar-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Make CPU and ThreadContext getters const
This patch merely tidies up the CPU and ThreadContext getters by making them const where appropriate. |
10104:ff709c429b7b |
07-Mar-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
scons: Fixes uninitialized warnings issued by clang
Small fixes to appease recent clang versions. |
10099:fbfb38d33a0a |
03-Mar-2014 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: x86: Always assume segments to be usable
When transferring segment registers into kvm, we need to find the value of the unusable bit. We used to assume that this could be inferred from the selector since segments are generally unusable if their selector is 0. This assumption breaks in some weird corner cases. Instead, we just assume that segments are always usable. This is what qemu does so it should work. |
10098:484f50943e13 |
03-Mar-2014 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Initialize signal handlers from startupThread()
Signal handlers in KVM are controlled per thread and should be initialized from the thread that is going to execute the CPU. This changeset moves the initialization call from startup() to startupThread(). |
10095:e8001be2e86e |
02-Mar-2014 |
Christopher Torng <clt67@cornell.edu> |
cpu: Enable fast-forwarding for MIPS InOrderCPU and O3CPU A copyRegs() function is added to MIPS utilities to copy architectural state from the old CPU to the new CPU during fast-forwarding. This addition alone enables fast-forwarding for the o3 cpu model running MIPS.
The patch also adds takeOverFrom() and drainResume() functions to the InOrderCPU to enable it to take over from another CPU. This change enables fast-forwarding for the inorder cpu model running MIPS, but not for Alpha.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10073:2360411a16be |
20-Feb-2014 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Add support for multi-system simulation
The introduction of parallel event queues added most of the support needed to run multiple VMs (systems) within the same gem5 instance. This changeset fixes up signal delivery so that KVM's control signals are delivered to the thread that executes the CPU's event queue. Specifically:
* Timers and counters are now initialized from a separate method (startupThread) that is scheduled as the first event in the thread-specific event queue. This ensures that they are initialized from the thread that is going to execute the CPUs event queue and enables signal delivery to the right thread when exiting from KVM.
* The POSIX-timer-based KVM timer (used to force exits from KVM) has been updated to deliver signals to the thread that's executing KVM instead of the process (thread is undefined in that case). This assumes that the timer is instantiated from the thread that is going to execute the KVM vCPU.
* Signal masking is now done using pthread_sigmask instead of sigprocmask. The behavior of the latter is undefined in threaded applications.
* Since signal masks can be inherited, make sure to actively unmask the control signals when setting up the KVM signal mask.
There are currently no facilities to multiplex between multiple KVM CPUs in the same event queue, we are therefore limited to configurations where there is only one KVM CPU per event queue. In practice, this means that multi-system configurations can be simulated, but not multiple CPUs in a shared-memory configuration. |
10061:3b0d0c988ed6 |
09-Feb-2014 |
Andreas Sandberg <andreas@sandberg.pp.se> |
cpu: simple: Add support for using branch predictors
This changesets adds branch predictor support to the BaseSimpleCPU. The simple CPUs normally don't need a branch predictor, however, there are at least two cases where it can be desirable:
1) A simple CPU can be used to warm the branch predictor of an O3 CPU before switching to the slower O3 model.
2) The simple CPU can be used as a quick way of evaluating/debugging new branch predictors since it exposes branch predictor statistics.
Limitations: * Since the simple CPU doesn't speculate, only one instruction will be active in the branch predictor at a time (i.e., the branch predictor will never see speculative branches).
* The outcome of a branch prediction does not affect the performance of the simple CPU. |
10051:6157b07daac7 |
29-Jan-2014 |
Xiangyu Dong <rioshering@gmail.com> |
cpu: fix bug when TrafficGen deschedules event
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10037:5cac77888310 |
24-Jan-2014 |
ARM gem5 Developers |
arm: Add support for ARMv8 (AArch64 & AArch32)
Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64 kernel you are restricted to AArch64 user-mode binaries. This will be addressed in a later patch.
Note: Virtualization is only supported in AArch32 mode. This will also be fixed in a later patch.
Contributors: Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation) Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation) Mbou Eyole (AArch64 NEON, validation) Ali Saidi (AArch64 Linux support, code integration, validation) Edmund Grimley-Evans (AArch64 FP) William Wang (AArch64 Linux support) Rene De Jong (AArch64 Linux support, performance opt.) Matt Horsnell (AArch64 MP, validation) Matt Evans (device models, code integration, validation) Chris Adeniyi-Jones (AArch64 syscall-emulation) Prakash Ramrakhyani (validation) Dam Sunwoo (validation) Chander Sudanthi (validation) Stephan Diestelhorst (validation) Andreas Hansson (code integration, performance opt.) Eric Van Hensbergen (performance opt.) Gabe Black |
10034:f2ce7114b137 |
24-Jan-2014 |
Geoffrey Blake <Geoffrey.Blake@arm.com> |
checker: CheckerCPU handling of MiscRegs was incorrect
The CheckerCPU model in pre-v8 code was not checking the updates to miscellaneous registers due to some methods for setting misc regs were not instrumented. The v8 patches exposed this by calling the instrumented misc reg update methods and then invoking the checker before the main CPU had updated its misc regs, leading to false positives about register mismatches. This patch fixes the non-instrumented misc reg update methods and places calls to the checker in the proper places in the O3 model. |
10033:21c14a2b2117 |
24-Jan-2014 |
Ali Saidi <Ali.Saidi@ARM.com> |
arch, cpu: Add support for flattening misc register indexes.
With ARMv8 support the same misc register id results in accessing different registers depending on the current mode of the processor. This patch adds the same orthogonality to the misc register file as the others (int, float, cc). For all the othre ISAs this is currently a null-implementation.
Additionally, a system variable is added to all the ISA objects. |
10032:5a7852a013d4 |
24-Jan-2014 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
cpu: Add support for Memory+Barrier instruction types in O3 cpu. |
10031:79d034cd6ba3 |
24-Jan-2014 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Add support for instructions that zero cache lines. |
10030:b531e328342d |
24-Jan-2014 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Add CPU support for generatig wake up events when LLSC adresses are snooped.
This patch add support for generating wake-up events in the CPU when an address that is currently in the exclusive state is hit by a snoop. This mechanism is required for ARMv8 multi-processor support. |
10024:fc10e1f9f124 |
24-Jan-2014 |
Dam Sunwoo <dam.sunwoo@arm.com> |
mem: per-thread cache occupancy and per-block ages
This patch enables tracking of cache occupancy per thread along with ages (in buckets) per cache blocks. Cache occupancy stats are recalculated on each stat dump. |
10023:91faf6649de0 |
24-Jan-2014 |
Matt Horsnell <matt.horsnell@ARM.com> |
base: add support for probe points and common probes
The probe patch is motivated by the desire to move analytical and trace code away from functional code. This is achieved by the probe interface which is essentially a glorified observer model.
What this means to users: * add a probe point and a "notify" call at the source of an "event" * add an isolated module, that is being used to carry out *your* analysis (e.g. generate a trace) * register that module as a probe listener Note: an example is given for reference in src/cpu/o3/simple_trace.[hh|cc] and src/cpu/SimpleTrace.py
What is happening under the hood: * every SimObject maintains has a ProbeManager. * during initialization (src/python/m5/simulate.py) first regProbePoints and the regProbeListeners is called on each SimObject. this hooks up the probe point notify calls with the listeners.
FAQs: Why did you develop probe points: * to remove trace, stats gathering, analytical code out of the functional code. * the belief that probes could be generically useful.
What is a probe point: * a probe point is used to notify upon a given event (e.g. cpu commits an instruction)
What is a probe listener: * a class that handles whatever the user wishes to do when they are notified about an event.
What can be passed on notify: * probe points are templates, and so the user can generate probes that pass any type of argument (by const reference) to a listener.
What relationships can be generated (1:1, 1:N, N:M etc): * there isn't a restriction. You can hook probe points and listeners up in a 1:1, 1:N, N:M relationship. They become useful when a number of modules listen to the same probe points. The idea being that you can add a small number of probes into the source code and develop a larger number of useful analysis modules that use information passed by the probes.
Can you give examples: * adding a probe point to the cpu's commit method allows you to build a trace module (outputting assembler), you could re-use this to gather instruction distribution (arithmetic, load/store, conditional, control flow) stats.
Why is the probe interface currently restricted to passing a const reference: * the desire, initially at least, is to allow an interface to observe functionality, but not to change functionality. * of course this can be subverted by const-casting.
What is the performance impact of adding probes: * when nothing is actively listening to the probes they should have a relatively minor impact. Profiling has suggested even with a large number of probes (60) the impact of them (when not active) is very minimal (<1%). |
10020:2f33cb012383 |
24-Jan-2014 |
Matt Horsnell <matt.horsnell@ARM.com> |
mem: track per-request latencies and access depths in the cache hierarchy
Add some values and methods to the request object to track the translation and access latency for a request and which level of the cache hierarchy responded to the request. |
10017:c75015bbbd78 |
24-Jan-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Relax check on squashed non-speculative instructions
This patch relaxes the check performed when squashing non-speculative instructions, as it caused problems with loads that were marked ready, and then stalled on a blocked cache. The assertion is now allowing memory references to be non-faulting. |
10015:0d1467be20eb |
24-Jan-2014 |
Dam Sunwoo <dam.sunwoo@arm.com> |
cpu: remove faulty simpoint basic block inst count assertion
This patch removes an assertion in the simpoint profiling code that asserts that a previously-seen basic block has the exact same number of instructions executed as before. This can be false if the basic block generates aborts or takes interrupts at different locations within the basic block. The basic block profiling are not affected significantly as these events are rare in general. |
9992:6e39e3641dd8 |
03-Dec-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
cpu: call BaseCPU startup() function in o3 cpu |
9986:7cab06691984 |
15-Oct-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Set the perf exclude_host attribute if available
The performance counting framework in Linux 3.2 and onwards supports an attribute to exclude events generated by the host when running KVM. Setting this attribute allows us to get more reliable measurements of the guest machine. For example, on a highly loaded system, the instruction counts from the guest can be severely distorted by the host kernel (e.g., by page fault handlers).
This changeset introduces a check for the attribute and enables it in the KVM CPU if present. |
9984:94b8d1af6c81 |
26-Nov-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Remove the unused hostFreq member from BaseKvmCPU |
9983:2cce74fe359e |
25-Nov-2013 |
Steve Reinhardt <stever@gmail.com>, Nilay Vaish <nilay@cs.wisc.edu>, Ali Saidi <Ali.Saidi@ARM.com> |
sim: simulate with multiple threads and event queues This patch adds support for simulating with multiple threads, each of which operates on an event queue. Each sim object specifies which eventq is would like to be on. A custom barrier implementation is being added using which eventqs synchronize.
The patch was tested in two different configurations: 1. ruby_network_test.py: in this simulation L1 cache controllers receive requests from the cpu. The requests are replied to immediately without any communication taking place with any other level. 2. twosys-tsunami-simple-atomic: this configuration simulates a client-server system which are connected by an ethernet link.
We still lack the ability to communicate using message buffers or ports. But other things like simulation start and end, synchronizing after every quantum are working.
Committed by: Nilay Vaish |
9982:b2bfc23f932c |
15-Nov-2013 |
Anthony Gutierrez <atgutier@umich.edu> |
cpu: allow the fetch buffer to be smaller than a cache line
the current implementation of the fetch buffer in the o3 cpu is only allowed to be the size of a cache line. some architectures, e.g., ARM, have fetch buffers smaller than a cache line, see slide 22 at: http://www.arm.com/files/pdf/at-exploring_the_design_of_the_cortex-a15.pdf
this patch allows the fetch buffer to be set to values smaller than a cache line. |
9981:44ef5ed3aee0 |
15-Nov-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Fix Checker register index use
This patch fixes an issue in the checker CPU register indexing. The code will not even compile using LTO as deep inlining causes the used index to be outside the array bounds. |
9954:72a72649a156 |
31-Oct-2013 |
Faissal Sleiman <Faissal.Sleiman@arm.com> |
cpu: Construct ROB with cpu params struct instead of each variable
Most other structures/stages get passed the cpu params struct. |
9948:6cbe5c9d0ebb |
31-Oct-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Fix O3 issuse with load+barrier instructions.
Fix a problem in the O3 CPU for instructions that are both memory loads and memory barriers (e.g. load acquire) and to uncacheable memory. This combination can confuse the commit stage into commitng an instruction that hasn't executed and got it's value yet. At the same time refactor the code slightly to remove duplication between two of the cases. |
9944:4ff1c5c6dcbc |
17-Oct-2013 |
Matt Horsnell <matt.horsnell@ARM.com> |
cpu: add consistent guarding to *_impl.hh files. |
9938:d3b7970e1b33 |
17-Oct-2013 |
Faissal Sleiman <Faissal.Sleiman@arm.com> |
cpu: Removing an unused variable in rename |
9937:49a534f54e72 |
17-Oct-2013 |
Faissal Sleiman <Faissal.Sleiman@arm.com> |
cpu: Change IEW DPRINTF to use IEW debug flag
IEW DPRINTF uses Decode debug flag, which appears to be a copying error. This patch changes this to the IEW Debug flag. |
9936:f00546aff354 |
17-Oct-2013 |
Faissal Sleiman <Faissal.Sleiman@arm.com> |
cpu: Put in assertions to check for maximum supported LQ/SQ size
LSQSenderState represents the LQ/SQ index using uint8_t, which supports up to 256 entries (including the sentinel entry). Sending packets to memory with a higher index than 255 truncates the index, such that the response matches the wrong entry. For instance, this can result in a deadlock if a store completion does not clear the head entry. |
9932:2efeed2cef09 |
17-Oct-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Fix O3 uncacheable load that is replayed but misses the TLB
This change fixes an issue in the O3 CPU where an uncachable instruction is attempted to be executed before it reaches the head of the ROB. It is determined to be uncacheable, and is replayed, but a PanicFault is attached to the instruction to make sure that it is properly executed before committing. If the TLB entry it was using is replaced in the interveaning time, the TLB returns a delayed translation when the load is replayed at the head of the ROB, however the LSQ code can't differntiate between the old fault and the new one. If the translation isn't complete it can't be faulting, so clear the fault. |
9925:7840e90aff6c |
16-Oct-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Fix latency calculation of IPR accesses
When handling IPR accesses in doMMIOAccess, the KVM CPU used clockEdge() to convert between cycles and ticks. This is incorrect since doMMIOAccess is supposed to return a latency in ticks rather than when the access is done. This changeset fixes this issue by returning clockPeriod() * ipr_delay instead. |
9921:ee049bfce978 |
15-Oct-2013 |
Yasuko Eckert <yasuko.eckert@amd.com> |
arch/x86: add support for explicit CC register file
Convert condition code registers from being specialized ("pseudo") integer registers to using the recently added CC register class.
Nilay Vaish also contributed to this patch. |
9920:028e4da64b42 |
15-Oct-2013 |
Yasuko Eckert <yasuko.eckert@amd.com> |
cpu: add a condition-code register class
Add a third register class for condition codes, in parallel with the integer and FP classes. No ISAs use the CC class at this point though. |
9919:803903a8dac1 |
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu/o3: clean up rename map and free list
Restructured rename map and free list to clean up some extraneous code and separate out common code that can be reused across different register classes (int and fp at this point). Both components now consist of a set of Simple* objects that are stand-alone rename map & free list for each class, plus a Unified* object that presents a unified interface across all register classes and then redirects accesses to the appropriate Simple* object as needed.
Moved free list initialization to PhysRegFile to better isolate knowledge of physical register index mappings to that class (and remove the need to pass a number of parameters to the free list constructor).
Causes a small change to these stats: cpu.rename.int_rename_lookups cpu.rename.fp_rename_lookups because they are now categorized on a per-operand basis rather than a per-instruction basis. That is, an instruction with mixed fp/int/misc operand types will have each operand categorized independently, where previously the lookup was categorized based on the instruction type. |
9918:2c7219e2d999 |
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu: rename *_DepTag constants to *_Reg_Base
Make these names more meaningful.
Specifically, made these substitutions:
s/FP_Base_DepTag/FP_Reg_Base/g; s/Ctrl_Base_DepTag/Misc_Reg_Base/g; s/Max_DepTag/Max_Reg_Index/g; |
9916:9c3a4595cce9 |
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu/o3: clean up scoreboard object
It had a bunch of fields (and associated constructor parameters) thet it didn't really use, and the array initialization was needlessly verbose.
Also just hardwired the getReg() method to aleays return true for misc regs, rather than having an array of bits that we always kept marked as ready. |
9915:d9e3ad574162 |
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu/o3: clean up physical register file
No need for PhysRegFile to be a template class, or have a pointer back to the CPU. Also made some methods for checking the physical register type (int vs. float) based on the phys reg index, which will come in handy later. |
9914:30bb185d0902 |
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu/inorder: merge register class enums
The previous patch introduced a RegClass enum to clean up register classification. The inorder model already had an equivalent enum (RegType) that was used internally. This patch replaces RegType with RegClass to get rid of the now-redundant code. |
9913:7f43babfde6a |
15-Oct-2013 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu: clean up architectural register classification
Move from a poorly documented scheme where the mapping of unified architectural register indices to register classes is hardcoded all over to one where there's an enum for the register classes and a function that encapsulates the mapping. |
9904:e672a39fd426 |
03-Oct-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Service events in the instruction event queues
This changset adds calls to the service the instruction event queues that accidentally went missing from commit [0063c7dd18ec]. The original commit only included the code needed to schedule instruction stops from KVM and missed the functionality to actually service the events. |
9892:0063c7dd18ec |
30-Sep-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Add support for thread-specific instruction events
Instruction events are currently ignored when executing in KVM. This changeset adds support for triggering KVM exits based on instruction counts using hardware performance counters. Depending on the underlying performance counter implementation, there might be some inaccuracies due to instructions being counted in the host kernel when entering/exiting KVM.
Due to limitations/bugs in Linux's performance counter interface, we can't reliably change the period of an overflow counter. We work around this issue by detaching and reattaching the counter if we need to reconfigure it. |
9890:2bad3d5120e5 |
30-Sep-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: FPU synchronization support on x86
This changeset adds support for synchronizing the FPU and SIMD state of a virtual x86 CPU with gem5. It supports both the XSave API and the KVM_(GET|SET)_FPU kernel API. The XSave interface can be disabled using the useXSave parameter (in case of kernel issues). Unfortunately, KVM_(GET|SET)_FPU interface seems to be buggy in some kernels (specifically, the MXCSR register isn't always synchronized), which means that it might not be possible to synchronize MXCSR on old kernels without the XSave interface.
This changeset depends on the __float80 type in gcc and might not build using llvm. |
9886:a74065ea10ff |
30-Sep-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: x86: Fix segment registers to make them VMX compatible
There are cases when the segment registers in gem5 are not compatible with VMX. This changeset works around all known such issues. Specifically:
* The accessed bits in CS, SS, DD, ES, FS, GS are forced to 1. * The busy bit in TR is forced to 1. * The protection level of SS is forced to the same protection level as CS. The difference /seems/ to be caused by a bug in gem5's x86 implementation. |
9884:d1a5e147e72d |
24-Sep-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Add x86 segment register verification to help debugging |
9883:7e0dff1c165b |
24-Sep-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Initial x86 support
This changeset adds support for KVM on x86. Full support is split across a number of commits since some features are relatively complex. This changeset includes support for:
* Integer state synchronization (including segment regs) * CPUID (gem5's CPUID values are inserted into KVM) * x86 legacy IO (remapped and handled by gem5's memory system) * Memory mapped IO * PCI * MSRs * State dumping
Most of the functionality is fairly straight forward. There are some quirks to support PCI enumerations since this is done in the TLB(!) in the simulated CPUs. We currently replicate some of that code.
Unlike the ARM implementation, the x86 implementation of the virtual CPU does not use the cycles hardware counter. KVM on x86 simulates the time stamp counter (TSC) in the kernel. If we just measure host cycles using perfevent, we might end up measuring a slightly different number of cycles. If we don't get the cycle accounting right, we might end up rewinding the TSC, with all kinds of chaos as a result.
An additional feature of the KVM CPU on x86 is extended state dumping. This enables Python scripts controlling the simulator to request dumping of a subset of the processor state. The following methods are currenlty supported:
* dumpFpuRegs * dumpIntRegs * dumpSpecRegs * dumpDebugRegs * dumpXCRs * dumpXSave * dumpVCpuEvents * dumpMSRs
Known limitations: * M5 ops are currently not supported. * FPU synchronization is not supported (only affects CPU switching).
Both of the limitations will be addressed in separate commits. |
9882:372d3611c693 |
19-Sep-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Correctly handle the return value from handleIpr(Read|Write)
The KVM base class incorrectly assumed that handleIprRead and handleIprWrite both return ticks. This is not the case, instead they return cycles. This changeset converts the returned cycles to ticks when handling IPR accesses. |
9881:638e865d70c6 |
19-Sep-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Fix a case where the run timers weren't armed properly
There is a possibility that the timespec used to arm a timer becomes zero if the number of ticks used when arming a timer is close to the resolution of the timer. Due to the semantics of POSIX timers, this actually disarms the timer. This changeset fixes this issue by eliminating the rounding error (we always round away from zero now). It also reuses the minimum number of cycles, which were previously only used for cycle-based timers, to calculate a more useful resolution. |
9868:44a67004d6b4 |
11-Sep-2013 |
Joel Hestness <jthestness@gmail.com> |
cpu: Dynamically instantiate O3 CPU LSQUnits
Previously, the LSQ would instantiate MaxThreads LSQUnits in the body of it's object, but it would only initialize numThreads LSQUnits as specified by the user. This had the effect of leaving some LSQUnits uninitialized when the number of threads was less than MaxThreads, and when adding statistics to the LSQUnit that must be initialized, this caused the stats initialization check to fail. By dynamically instantiating LSQUnits, they are all initialized and this avoids uninitialized LSQUnits from floating around during runtime. |
9850:87d6b41749e9 |
04-Sep-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Resurrect the NOISA build target and rename it NULL
This patch makes it possible to once again build gem5 without any ISA. The main purpose is to enable work around the interconnect and memory system without having to build any CPU models or device models.
The regress script is updated to include the NULL ISA target. Currently no regressions make use of it, but all the testers could (and perhaps should) transition to it. |
9849:603e2ed487f3 |
04-Sep-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Move the branch predictor out of the BaseCPU
The branch predictor is guarded by having either the in-order or out-of-order CPU as one of the available CPU models and therefore should not be used in the BaseCPU. This patch moves the parameter to the relevant CPU classes. |
9848:a733a8eb6363 |
04-Sep-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Header clean up for NOISA resurrection
This patch is a first step to getting NOISA working again. A number of redundant includes make life more difficult than it has to be and this patch simply removes them. There are also some redundant forward declarations removed. |
9840:c562aa658a6f |
20-Aug-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Fix timing CPU isDrained comment formatting
This patch fixes up the comment formatting for isDrained in the timing CPU. |
9837:13a21202375d |
19-Aug-2013 |
Lena Olson <lena@cs.wisc,edu> |
cpu: Accurately count idle cycles for simple cpu
Added a couple missing updates to the notIdleFraction stat. Without these, it sometimes gives a (not) idle fraction that is greater than 1 or less than 0. |
9834:dcd0f0091854 |
19-Aug-2013 |
Sascha Bischoff <sascha.bischoff@arm.com> |
cpu: Fix TrafficGen trace playback
This patch addresses an issue with trace playback in the TrafficGen where the trace was reset but the header was not read from the trace when a captured trace was played back for a second time. This resulted in parsing errors as the expected message was not found in the trace file.
The header check is moved to an init funtion which is called by the constructor and when the trace is reset. This ensures that the trace header is read each time when the trace is replayed.
This patch also addresses a small formatting issue in a panic. |
9830:5995f4d33a11 |
19-Aug-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Fix timing CPU drain check
This patch modifies the SimpleTimingCPU drain check to also consider the fetch event. Previously, there was an assumption that there is never a fetch event scheduled if the CPU is not executing microcode. However, when a context is activated, a fetch even is scheduled, and microPC() is zero. |
9822:7f7cbcece75a |
19-Aug-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Fix a bug in the O3 CPU introduced by the cache line patch
This patch fixes a bug in the O3 fetch stage that was introduced when the cache line size was moved to the system. By mistake, the initialisation and resetting of the fetch stage was merged and put in the constructor. The resetting is now re-added where it should be. |
9817:2492d7ccda7e |
19-Jul-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
cpu: Remove unused getBranchPred() method from BaseCPU
Remove unused virtual getBranchPred() method from BaseCPU as it is not implemented by any of the CPU models. It used to always return NULL. |
9814:7ad2b0186a32 |
18-Jul-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Set the cache line size on a system level
This patch removes the notion of a peer block size and instead sets the cache line size on the system level.
Previously the size was set per cache, and communicated through the interconnect. There were plenty checks to ensure that everyone had the same size specified, and these checks are now removed. Another benefit that is not yet harnessed is that the cache line size is now known at construction time, rather than after the port binding. Hence, the block size can be locally stored and does not have to be queried every time it is used.
A follow-on patch updates the configuration scripts accordingly. |
9809:c94060248b3e |
15-Jul-2013 |
Umesh Bhaskar <umesh.b2006@gmail.com> |
debug : Fixes the issue wherein Debug symbols were not getting dumped into trace files for SE mode |
9793:6e6cefc1db1f |
27-Jun-2013 |
Akash Bagdia <akash.bagdia@arm.com> |
sim: Add the notion of clock domains to all ClockedObjects
This patch adds the notion of source- and derived-clock domains to the ClockedObjects. As such, all clock information is moved to the clock domain, and the ClockedObjects are grouped into domains.
The clock domains are either source domains, with a specific clock period, or derived domains that have a parent domain and a divider (potentially chained). For piece of logic that runs at a derived clock (a ratio of the clock its parent is running at) the necessary derived clock domain is created from its corresponding parent clock domain. For now, the derived clock domain only supports a divider, thus ensuring a lower speed compared to its parent. Multiplier functionality implies a PLL logic that has not been modelled yet (create a separate clock instead).
The clock domains should be used as a mechanism to provide a controllable clock source that affects clock for every clocked object lying beneath it. The clock of the domain can (in a future patch) be controlled by a handler responsible for dynamic frequency scaling of the respective clock domains.
All the config scripts have been retro-fitted with clock domains. For the System a default SrcClockDomain is created. For CPUs that run at a different speed than the system, there is a seperate clock domain created. This domain incorporates the CPU and the associated caches. As before, Ruby runs under its own clock domain.
The clock period of all domains are pre-computed, such that no virtual functions or multiplications are needed when calling clockPeriod. Instead, the clock period is pre-computed when any changes occur. For this to be possible, each clock domain tracks its children. |
9788:5558ee8dd7d9 |
27-Jun-2013 |
Akash Bagdia <akash.bagdia@arm.com> |
config: Remove redundant explicit setting of default clocks
This patch removes the explicit setting of the clock period for certain instances of CoherentBus, NonCoherentBus and IOCache where the specified clock is same as the default value of the system clock. As all the values used are the defaults, there are no performance changes. There are similar cases where the toL2Bus is set to use the parent CPU clock which is already the default behaviour.
The main motivation for these simplifications is to ease the introduction of clock domains. |
9783:8d327ffdba62 |
27-Jun-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Consider instructions waiting for FU completion in draining
This patch changes the IEW drain check to include the FU pool as there can be instructions that are "stored" in FU completion events and thus not covered by the existing checks. With this patch, we simply include a check to see if all the FUs are considered non-busy in the next tick.
Without this patch, the pc-switcheroo-full regression fails after minor changes to the cache timing (aligning to clock edge). |
9760:9db8a438608c |
18-Jun-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Use the address finalization code in the TLB
Reuse the address finalization code in the TLB instead of replicating it when handling MMIO. This patch also adds support for injecting memory mapped IPR requests into the memory system. |
9755:9df73385c878 |
11-Jun-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Add more VM stats
This changeset adds the following stats to KVM: * numVMHalfEntries: Number of entries into KVM to finalize pending IO operations without executing guest instructions. These typically happen as a result of a drain where the guest must finalize some operations before the guest state is consistent. * numExitSignal: Number of VM exits that have been triggered by a signal. These usually happen as a result of the timer that limits the time spent in KVM. |
9754:91fbf7b7e933 |
11-Jun-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Separate host frequency from simulated CPU frequency
We used to use the KVM CPU's clock to specify the host frequency. This was not ideal for several reasons. One of them being that the clock parameter of a CPU determines the frequency of some of the components connected to the CPU. This changeset adds a separate hostFreq parameter that should be used to specify the host frequency until we add code to autodetect it. The hostFactor should still be used to specify the conversion factor between the host performance and that of the simulated system. |
9753:b9a742cdd75a |
11-Jun-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Don't handle IO and execute in the same tick
We currently execute instructions in the guest and then handle any IO request right after we break out of the virtualized environment. This has the effect of executing IO requests in the exact same tick as the first instruction in the sequence that was just run. There seem to be cases where this simplification upsets some timing-sensitive devices.
This changeset splits execute and IO (and other services) across multiple ticks. This is implemented by adding a separate RunningService state to the CPU state machine. When a VM requires service, it enters into this state and pending IO is then serviced in the future instead of immediately. The delay between getting the request and servicing it depends on the number of cycles executed in the guest, which allows other components to catch up with the CPU. |
9752:a152d7f114b8 |
11-Jun-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Maintain a local instruction counter and update totalNumInsts
Update the system's totalNumInst counter when exiting from KVM and maintain an internal absolute instruction count instead of relying on the one from perf. |
9749:cffb82b745cf |
11-Jun-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
cpu: Add support for scheduling multiple inst/load stop events
Currently, the only way to get a CPU to stop after a fixed number of instructions/loads is to set a property on the CPU that causes a SimLoopExitEvent to be scheduled when the CPU is constructed. This is clearly not ideal in cases where the simulation script wants the CPU to stop at multiple instruction counts (e.g., SimPoint generation).
This changeset adds the methods scheduleInstStop() and scheduleLoadStop() to the BaseCPU. These methods are exported to Python and are designed to be used from the simulation script. By using these methods instead of the old properties, a simulation script can schedule a stop at any point during simulation or schedule multiple stops. The number of instructions specified when scheduling a stop is relative to the current point of execution. |
9735:fb040456eb46 |
03-Jun-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Allow architectures to override the cycle accounting mechanism
Some architectures have special registers in the guest that can be used to do cycle accounting. This is generally preferrable since the prevents the guest from seeing a non-monotonic clock. This changeset adds a virtual method, getHostCycles(), that the architecture-specific code can override to implement this functionallity. The default implementation uses the hwCycles counter. |
9734:749e3799e532 |
03-Jun-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Add handling of EAGAIN when creating timers
timer_create can apparently return -1 and set errno to EAGAIN if the kernel suffered a temporary failure when allocating a timer. This happens from time to time, so we need to handle it. |
9732:7b86c22356ab |
02-Jun-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Add a call to thread->startup() in startup()
It is now required to initialize the thread context by calling startup() on it. Failing to do so currently causes decoder in x86-based CPUs to get very confused when restoring from checkpoints. |
9723:01856e32fda1 |
30-May-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Prune the stale TraceCPU
This patch prunes the TraceCPU as the code is stale and the functionality that it provided can now be achieved with the TrafficGen using its trace playback mode.
The TraceCPU was able to play back pre-recorded memory traces of a few different formats, and to achieve this level of flexibility with the TrafficGen, use the util/encode_packet_trace (with suitable modifications) to create a protobuf trace off-line. |
9722:2ddec848b8e8 |
30-May-2013 |
Sascha Bischoff <sascha.bischoff@arm.com> |
cpu: Check that minimum TrafficGen period is less than max period
Add a check which ensures that the minumum period for the LINEAR and RANDOM traffic generator states is less than or equal to the maximum period. If the minimum period is greater than the maximum period a fatal is triggered. |
9721:dd486672c9d0 |
30-May-2013 |
Sascha Bischoff <sascha.bischoff@arm.com> |
cpu: Fix bug when reading in TrafficGen state transitions
This patch fixes a bug with the traffic generator which occured when reading in the state transitions from the configuration file. Previously, the size of the vector which stored the transitions was used to get the size of the transitions matrix, rather than using the number of states. Therefore, if there were more transitions than states, i.e. some transitions has a probability of less than 1, then the traffic generator would fatal when trying to check the transitions.
This issue has been addressed by using the number of input states, rather then the number of transitions. |
9720:090935b1b797 |
30-May-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Add request elasticity to the traffic generator
This patch adds an optional request elasticity to the traffic generator, effectievly compensating for it in the case of the linear and random generators, and adding it in the case of the trace generator. The accounting is left with the top-level traffic generator, and the individual generators do the necessary math as part of determining the next packet tick.
Note that in the linear and random generators we have to compensate for the blocked time to not be elastic, i.e. without this patch the aforementioned generators will slow down in the case of back-pressure. |
9719:b67ea6252629 |
30-May-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Block traffic generator when requests have to retry
This patch changes the queued port for a conventional master port and stalls the traffic generator when requests are not immediately accepted. This is a first step to allowing elasticity in the injection of requests.
The patch also adds stats for the sent packets and retries, and slightly changes how the nextPacketTick and getNextPacket interact. The advancing of the trace is now moved to getNextPacket and nextPacketTick is only responsible for answering the question when the next packet should be sent. |
9718:1cfcc2960e9f |
30-May-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Move traffic generator sending out of generator states
This patch moves the responsibility for sending packets out of the generator states and leaves it with the top-level traffic generator. The main aim of this patch is to enable a transition to non-queued ports, i.e. with send/retry flow control, and to do so it is much more convenient to not wrap the port interactions and instead leave it all local to the traffic generator.
The generator states now only govern when they are ready to send something new, and the generation of the packets to send. They thus have no knowledge of the port that is used. |
9717:dd2e46b239c1 |
30-May-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Fold together the StateGraph and the TrafficGen
This patch simplifies the object hierarchy of the traffic generator by getting rid of the StateGraph class and folding this functionality into the traffic generator itself.
The main goal of this patch is to facilitate upcoming changes by reducing the number of affected layers. |
9704:dd6a9d314e40 |
30-May-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Make hash struct instead of class to please clang
This patch changes the type of the hash function for BasicBlockRanges to match the original definition of the templatized type. Without this, clang raises a warning and combined with the "-Werror" flag this causes compilation to fail. |
9691:b1be1df904c9 |
14-May-2013 |
Anthony Gutierrez <atgutier@umich.edu> |
cpu: remove local/globalHistoryBits params from branch pred
having separate params for the local/globalHistoryBits and the local/globalPredictorSize can lead to inconsistencies if they are not carefully set. this patch dervies the number of bits necessary to index into the local/global predictors based on their size.
the value of the localHistoryTableSize for the ARM O3 CPU has been increased to 1024 from 64, which is more accurate for an A15 based on some correlation against A15 hardware. |
9690:8055cd04be78 |
14-May-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Add support for disabling coalesced MMIO
Add the option useCoalescedMMIO to the BaseKvmCPU. The default behavior is to disable coalesced MMIO since this hasn't been heavily tested. |
9689:a1ea7e67a9d9 |
14-May-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Dump state before panic in KVM exit handlers |
9688:cce7dd32aed3 |
14-May-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Fix the memory interface used by KVM
The CpuPort class was removed before the KVM patches were committed, which means that the KVM interface currently doesn't compile. This changeset adds the BaseKvmCPU::KVMCpuPort class which derives from MasterPort. This class is used on the data and instruction ports instead of the old CpuPort. |
9684:00dca8a9b560 |
01-May-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Add a stat counting number of instructions executed
This changeset adds a 'numInsts' stat to the KVM-based CPU. It also cleans up the variable names in kvmRun to make the distinction between host cycles and estimated simulated cycles clearer. As a bonus feature, it also fixes a warning (unreferenced variable) when compiling in fast mode. |
9683:2c52e4537e6c |
01-May-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Add checkpoint debug print
Add a debug print (when the Checkpoint debug flag is set) on serialize and unserialize. Additionally, dump the KVM state before serializing. The KVM state isn't dumped after unserializing since the state is loaded lazily on the next KVM entry. |
9682:c4d3b62b3fcf |
01-May-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
kvm: Make MMIO requests uncacheable
Device accesses are normally uncacheable. This change probably doesn't make any difference since we normally disable caching when KVM is active. However, there might be devices that check this, so we'd better enable this flag to be safe. |
9675:e7798df2f0a7 |
23-Apr-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Fix TraceGen flag initalisation
This patch ensures the flags are always initialised. |
9667:b34619c4961b |
22-Apr-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Use request flags in trace playback
This patch changes the TraceGen such that it uses the optional request flags from the protobuf trace if they are present. |
9666:74aca4cb081e |
22-Apr-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Make the generators usable outside the TrafficGen module
This patch enables the use of the generator behaviours outside the TrafficGen module. This is useful e.g. to allow packet replay modes for other devices in the system without having to replace them with a TrafficGen in the configuration files.
This change also enables more specific behaviours to be composed as specific modules, e.g. BaseBandModem can use a number of generators and have application-specific parameters based around a specific set of generators. |
9660:5ca6098b9560 |
22-Apr-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
kvm: Add support for pseudo-ops on ARM
This changeset adds support for m5 pseudo-ops when running in kvm-mode. Unfortunately, we can't trap the normal gem5 co-processor entry in KVM (it doesn't seem to be possible to trap accesses to non-existing co-processors). We therefore use BZJ instructions to cause a trap from virtualized mode into gem5. The BZJ instruction is becomes a normal branch to the gem5 fallback code when running in simulated mode, which means that this patch does not need to change the ARM ISA-specific code.
Note: This requires a patched host kernel. |
9658:da0fb0e05277 |
22-Apr-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
kvm: Add support for state dumping on ARM |
9657:0e15490aad4f |
22-Apr-2013 |
Andreas Sandberg <andreas.sandberg@arm.com> |
kvm: Add basic support for ARM
Architecture specific limitations: * LPAE is currently not supported by gem5. We therefore panic if LPAE is enabled when returning to gem5. * The co-processor based interface to the architected timer is unsupported. We can't support this due to limitations in the KVM API on ARM. * M5 ops are currently not supported. This requires either a kernel hack or a memory mapped device that handles the guest<->m5 interface. |
9655:78c9adc85718 |
22-Apr-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
kvm: Add experimental support for a perf-based execution timer
Add support for using the CPU cycle counter instead of a normal POSIX timer to generate timed exits to gem5. This should, in theory, provide better resolution when requesting timer signals.
The perf-based timer requires a fairly recent kernel since it requires a working PERF_EVENT_IOC_PERIOD ioctl. This ioctl has existed in the kernel for a long time, but it used to be completely broken due to an inverted match when the kernel copied things from user space. Additionally, the ioctl does not change the sample period correctly on all kernel versions which implement it. It is currently only known to work reliably on kernel version 3.7 and above on ARM. |
9652:553ad940c9db |
22-Apr-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
kvm: Avoid synchronizing the TC on every KVM exit
Reduce the number of KVM->TC synchronizations by overloading the getContext() method and only request an update when the TC is requested as opposed to every time KVM returns to gem5. |
9651:f551c8ad12a5 |
22-Apr-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
kvm: Basic support for hardware virtualized CPUs
This changeset introduces the architecture independent parts required to support KVM-accelerated CPUs. It introduces two new simulation objects:
KvmVM -- The KVM VM is a component shared between all CPUs in a shared memory domain. It is typically instantiated as a child of the system object in the simulation hierarchy. It provides access to KVM VM specific interfaces.
BaseKvmCPU -- Abstract base class for all KVM-based CPUs. Architecture dependent CPU implementations inherit from this class and implement the following methods:
* updateKvmState() -- Update the architecture-dependent KVM state from the gem5 thread context associated with the CPU.
* updateThreadContext() -- Update the thread context from the architecture-dependent KVM state.
* dump() -- Dump the KVM state using (optional).
In order to deliver interrupts to the guest, CPU implementations typically override the tick() method and check for, and deliver, interrupts prior to entering KVM.
Hardware-virutalized CPU currently have the following limitations: * SE mode is not supported. * PC events are not supported. * Timing statistics are currently very limited. The current approach simply scales the host cycles with a user-configurable factor. * The simulated system must not contain any caches. * Since cycle counts are approximate, there is no way to request an exact number of cycles (or instructions) to be executed by the CPU. * Hardware virtualized CPUs and gem5 CPUs must not execute at the same time in the same simulator instance. * Only single-CPU systems can be simulated. * Remote GDB connections to the guest system are not supported.
Additionally, m5ops requires an architecture specific interface and might not be supported. |
9650:d79319eb68d5 |
22-Apr-2013 |
Timothy M. Jones <timothy.jones@arm.com> |
cpu: Let python scripts obtain the number of instructions executed |
9649:c717bd5e0a1d |
22-Apr-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
arm: Enable support for triggering a sim panic on kernel panics
Add the options 'panic_on_panic' and 'panic_on_oops' to the LinuxArmSystem SimObject. When these option are enabled, the simulator panics when the guest kernel panics or oopses. Enable panic on panic and panic on oops in ARM-based test cases. |
9648:f10eb34e3e38 |
22-Apr-2013 |
Dam Sunwoo <dam.sunwoo@arm.com> |
sim: separate nextCycle() and clockEdge() in clockedObjects
Previously, nextCycle() could return the *current* cycle if the current tick was already aligned with the clock edge. This behavior is not only confusing (not quite what the function name implies), but also caused problems in the drainResume() function. When exiting/re-entering the sim loop (e.g., to take checkpoints), the CPUs will drain and resume. Due to the previous behavior of nextCycle(), the CPU tick events were being rescheduled in the same ticks that were already processed before draining. This caused divergence from runs that did not exit/re-entered the sim loop. (Initially a cycle difference, but a significant impact later on.)
This patch separates out the two behaviors (nextCycle() and clockEdge()), uses nextCycle() in drainResume, and uses clockEdge() everywhere else. Nothing (other than name) should change except for the drainResume timing. |
9647:5b6b315472e7 |
22-Apr-2013 |
Dam Sunwoo <dam.sunwoo@arm.com> |
cpu: generate SimPoint basic block vector profiles
This patch is based on http://reviews.m5sim.org/r/1474/ originally written by Mitch Hayenga. Basic block vectors are generated (simpoint.bb.gz in simout folder) based on start and end addresses of basic blocks.
Some comments to the original patch are addressed and hooks are added to create and resume from checkpoints based on instruction counts dictated by external SimPoint analysis tools.
SimPoint creation/resuming options will be implemented as a separate patch. |
9644:07352f119e48 |
22-Apr-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: fix a switching issue with the o3 cpu.
This change fixes the switcheroo test that broke earlier this month. The code that was checking for the pipeline being blocked wasn't checking for a pending translation, only for a icache access. |
9624:43bd6562745e |
29-Mar-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
o3cpu: commit: changes interrupt handling Currently the commit stage keeps a local copy of the interrupt object. Since the interrupt is usually handled several cycles after the commit stage becomes aware of it, it is possible that the local copy of the interrupt object may not be the interrupt that is actually handled. It is possible that another interrupt occurred in the interval between interrupt detection and interrupt handling.
This patch creates a copy of the interrupt just before the interrupt is handled. The local copy is ignored. |
9608:e2b6b86fda03 |
26-Mar-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Remove CpuPort and use MasterPort in the CPU classes
This patch changes the port in the CPU classes to use MasterPort instead of the derived CpuPort. The functions of the CpuPort are now distributed across the relevant subclasses. The port accessor functions (getInstPort and getDataPort) now return a MasterPort instead of a CpuPort. This simplifies creating derivative CPUs that do not use the CpuPort. |
9592:e63321487404 |
20-Mar-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Avoid including inorder TLBUnit to avoid gcc LTO bug
This patch comments out the inclusion of the inorder TLBUnit which is only used in the 9-stage pipeline. With the TLBUnit present, gcc >= 4.6 in combination with LTO ends up throwing away the definition of the TLBUnit destructor, and consequently fail to link. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53808 for more details about the bug, and http://gcc.gnu.org/ml/gcc/2012-06/msg00397.html for the discussion thread that also touches on similar issues seen with clang. |
9584:1a21964b7227 |
12-Mar-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
cpu: Fix state transition bug in the traffic generator
The traffic generator used to incorrectly determine the next state in when state 0 had a non-zero probability. Due to the way the next transition was determined, state 0 could never be entered other than as an initial state. This changeset updates the transitition() method to correctly handle such cases and cases where the transition matrix is a 1x1 matrix. |
9574:5bb4346cbfa7 |
04-Mar-2013 |
Ali Saidi <saidi@eecs.umich.edu> |
cpu: fix a switching issue with the o3 cpu.
This change fixes the switcheroo test that broke earlier this month. The code that was checking for the pipeline being blocked wasn't checking for a pending translation, only for a icache access. |
9557:8666e81607a6 |
19-Feb-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
scons: Fix warnings issued by clang 3.2svn (XCode 4.6)
This patch fixes the warnings that clang3.2svn emit due to the "-Wall" flag. There is one case of an uninitialised value in the ARM neon ISA description, and then a whole range of unused private fields that are pruned. |
9554:406fbcf60223 |
19-Feb-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
scons: Add warning for missing declarations
This patch enables warnings for missing declarations. To avoid issues with SWIG-generated code, the warning is only applied to non-SWIG code. |
9550:e0e2c8f83d08 |
19-Feb-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
scons: Fix up numerous warnings about name shadowing
This patch address the most important name shadowing warnings (as produced when using gcc/clang with -Wshadow). There are many locations where constructor parameters and function parameters shadow local variables, but these are left unchanged. |
9544:1a075d9bc1bc |
19-Feb-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
x86: Move APIC clock divider to Python
This patch moves the 16x APIC clock divider to the Python code to avoid the post-instantiation modifications to the clock. The x86 APIC was the only object setting the clock after creation time and this required some custom functionality and configuration. With this patch, the clock multiplier is moved to the Python code and the objects are instantiated with the appropriate clock. |
9542:683991c46ac8 |
19-Feb-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Add predecessor to SenderState base class
This patch adds a predecessor field to the SenderState base class to make the process of linking them up more uniform, and enable a traversal of the stack without knowing the specific type of the subclasses.
There are a number of simplifications done as part of changing the SenderState, particularly in the RubyTest. |
9534:5de6389b72f7 |
15-Feb-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Document exec trace flags |
9532:01f0fac41c84 |
15-Feb-2013 |
Geoffrey Blake <geoffrey.blake@arm.com> |
cpu: Avoid duplicate entries in tracking structures for writes to misc regs
setMiscReg currently makes a new entry for each write to a misc reg without checking for duplicates, this can cause a triggering of the assert if an instruction get replayed and writes to the same misc regs multiple times. This fix prevents duplicate entries and instead updates the value. |
9531:1114ead790eb |
15-Feb-2013 |
Geoffrey Blake <geoffrey.blake@arm.com> |
cpu: Fix rename mis-handling serializing instructions when resource constrained
The rename can mis-handle serializing instructions (i.e. strex) if it gets into a resource constrained situation and the serializing instruction has to be placed on the skid buffer to handle blocking. In this situation the instruction informs the pipeline it is serializing and logs that the next instruction must be serialized, but since we are blocking the pipeline defers this action to place the serializing instruction and incoming instructions into the skid buffer. When resuming from blocking, rename will pull the serializing instruction from the skid buffer and the current logic will see this as the "next" instruction that has to be serialized and because of flags set on the serializing instruction, it passes through the pipeline stage as normal and resets rename to non-serializing. This causes instructions to follow the serializing inst incorrectly and eventually leads to an error in the pipeline. To fix this rename should check first if it has to block before checking for serializing instructions. |
9527:68154bc0e0ea |
15-Feb-2013 |
Matt Horsnell <Matt.Horsnell@arm.com> |
o3: fix tick used for renaming and issue with range selection
Fixes the tick used from rename: - previously this gathered the tick on leaving rename which was always 1 less than the dispatch. This conflated the decode ticks when back pressure built in the pipeline. - now picks up tick on entry.
Added --store_completions flag: - will additionally display the store completion tail in the viewer. - this highlights periods when large numbers of stores are outstanding (>16 LSQ blocking)
Allows selection by tick range (previously this caused an infinite loop) |
9524:d6ffa982a68b |
15-Feb-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
sim: Add a system-global option to bypass caches
Virtualized CPUs and the fastmem mode of the atomic CPU require direct access to physical memory. We currently require caches to be disabled when using them to prevent chaos. This is not ideal when switching between hardware virutalized CPUs and other CPU models as it would require a configuration change on each switch. This changeset introduces a new version of the atomic memory mode, 'atomic_noncaching', where memory accesses are inserted into the memory system as atomic accesses, but bypass caches.
To make memory mode tests cleaner, the following methods are added to the System class:
* isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'. * isTimingMode() -- True if the memory mode is 'timing'. * bypassCaches() -- True if caches should be bypassed.
The old getMemoryMode() and setMemoryMode() methods should never be used from the C++ world anymore. |
9523:b8c8437f71d9 |
15-Feb-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Refactor memory system checks
CPUs need to test that the memory system is in the right mode in two places, when the CPU is initialized (unless it's switched out) and on a drainResume(). This led to some code duplication in the CPU models. This changeset introduces the verifyMemoryMode() method which is called by BaseCPU::init() if the CPU isn't switched out. The individual CPU models are responsible for calling this method when resuming from a drain as this code is CPU model specific. |
9519:bed1c3244425 |
15-Feb-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Make checker CPUs inherit from CheckerCPU in the Python hierarchy
Checker CPUs currently don't inherit from the CheckerCPU in the Python object hierarchy. This has two consequences: * It makes CPU model discovery from the Python world somewhat complicated as there is no way of testing if a CPU is a checker. * Parameters are duplicated in the checker configuration specification.
This changeset makes all checker CPUs inherit from the base checker CPU class. |
9518:8faae62af8c3 |
15-Feb-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Add CPU metadata om the Python classes
The configuration scripts currently hard-code the requirements of each CPU. This is clearly not optimal as it makes writing new configuration scripts painful and adding new CPU models requires existing scripts to be updated. This patch adds the following class methods to the base CPU and all relevant CPUs:
* memory_mode -- Return a string describing the current memory mode (invalid/atomic/timing).
* require_caches -- Does the CPU model require caches?
* support_take_over -- Does the CPU support CPU handover? |
9516:8bb2deb544a5 |
15-Feb-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: include set in o3/commit_impl.
While the majority of compilers seemed to pickup set from else where, one version of gcc 4.7 complains, so explictly add it. |
9514:40e2bf800921 |
15-Feb-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: fix case with o3 cpu blocking and unblocking decode in cycle
Fix a case in the O3 CPU where the decode stage blocks and unblocks in a single cycle sending both signals to fetch which causes an assert or worse. The previous check could never work before since the status was set to Blocked before a test for the status being Unblocking was executed. |
9513:690357ffbce2 |
15-Feb-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: Fix a livelock in the o3 cpu.
Check if an instruction just enabled interrupts and we've previously had an interrupt pending that was not handled because interrupts were subsequently disabled before the pipeline reached a place to handle the interrupt. In that case squash now to make sure the interrupt is handled. |
9480:d059f8a95a42 |
24-Jan-2013 |
Nilay Vaish <nilay@cs.wisc.edu>, Timothy Jones <timothy.jones@cl.cam.ac.uk> |
branch predictor: move out of o3 and inorder cpus This patch moves the branch predictor files in the o3 and inorder directories to src/cpu/pred. This allows sharing the branch predictor across different cpu models.
This patch was originally posted by Timothy Jones in July 2010 but never made it to the repository. |
9479:f9e76b1eb79a |
22-Jan-2013 |
Andrea Pellegrini <andrea.pellegrini@gmail.com> |
o3 cpu: fix zero reg problem There was an issue w/ the rename logic, which would assign a previous physical register to the ZeroReg architectural register in x86. This issue was giving problems for instructions squashed in threads w/ ID different from 0, sometimes allowing non-mispredicted instructions to obtain a value different from zero when reading the zeroReg. |
9478:ba80f7d4f452 |
22-Jan-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
x86, cpu: corrects 270c9a75e91f, take over decoder on cpu switch The changes made by the changeset 270c9a75e91f do not work well with switching of cpus. The problem is that decoder for the old thread context holds state that is not taken over by the new decoder.
This patch adds a takeOverFrom() function to Decoder class in each ISA. Except for x86, functions in other ISAs are blank. For x86, the function copies state from the old decoder to the new decoder. |
9476:4a14ff47b8e3 |
19-Jan-2013 |
Joel Hestness <hestness@cs.wisc.edu> |
O3 IEW: Make incrWb and decrWb clearer
Move the increment/decrement of wbOutstanding outside of the comparison in incrWb and decrWb in the IEW. This also fixes a compiler bug with gcc 4.4.7, which incorrectly optimizes "-- ==" as "-=". |
9475:736909f5c13b |
17-Jan-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
ruby: remove calls to g_system_ptr->getTime() This patch further removes calls to g_system_ptr->getTime() where ever other clocked objects are available for providing current time. |
9462:116396961ad1 |
12-Jan-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
base simple cpu: removes commented out code about cache ops |
9461:67a6ba6604c8 |
12-Jan-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
x86: Changes to decoder, corrects 9376 The changes made by the changeset 9376 were not quite correct. The patch made changes to the code which resulted in decoder not getting initialized correctly when the state was restored from a checkpoint.
This patch adds a startup function to each ISA object. For x86, this function sets the required state in the decoder. For other ISAs, the function is empty right now. |
9448:569d1e8f74e4 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Unify the serialization code for all of the CPU models
Cleanup the serialization code for the simple CPUs and the O3 CPU. The CPU-specific code has been replaced with a (un)serializeThread that serializes the thread state / context of a specific thread. Assuming that the thread state class uses the CPU-specific thread state uses the base thread state serialization code, this allows us to restore a checkpoint with any of the CPU models. |
9446:644f2a2c9bfc |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Flush TLBs on switchOut()
This changeset inserts a TLB flush in BaseCPU::switchOut to prevent stale translations when doing repeated switching. Additionally, the TLB flushing functionality is exported to the Python to make debugging of switching/checkpointing easier.
A simulation script will typically use the TLB flushing functionality to generate a reference trace. The following sequence can be used to simulate a handover (this depends on how drain is implemented, but is generally the case) between identically configured CPU models:
m5.drain(test_sys) [ cpu.flushTLBs() for cpu in test_sys.cpu ] m5.resume(test_sys)
The generated trace should normally be identical to a trace generated when switching between identically configured CPU models or checkpointing and resuming. |
9444:ab47fe7f03f0 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Rewrite O3 draining to avoid stopping in microcode
Previously, the O3 CPU could stop in the middle of a microcode sequence. This patch makes sure that the pipeline stops when it has committed a normal instruction or exited from a microcode sequence. Additionally, it makes sure that the pipeline has no instructions in flight when it is drained, which should make draining more robust.
Draining is controlled in the commit stage, which checks if the next PC after a committed instruction is in microcode. If this isn't the case, it requests a squash of all instructions after that the instruction that just committed and immediately signals a drain stall to the fetch stage. The CPU then continues to execute until the pipeline and all associated buffers are empty. |
9443:0cb3209bc5c7 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Make sure that a drained atomic CPU isn't executing ucode
Currently, the atomic CPU can be in the middle of a microcode sequence when it is drained. This leads to two problems:
* When switching to a hardware virtualized CPU, we obviously can't execute gem5 microcode.
* Since curMacroStaticInst is populated when executing microcode, repeated switching between CPUs executing microcode leads to incorrect execution.
After applying this patch, the CPU will be on a proper instruction boundary, which means that it is safe to switch to any CPU model (including hardware virtualized ones). This changeset fixes a bug where the multiple switches to the same atomic CPU sometimes corrupts the target state because of dangling pointers to the currently executing microinstruction.
Note: This changeset moves tick event descheduling from switchOut() to drain(), which makes timing consistent between just draining a system and draining /and/ switching between two atomic CPUs. This makes debugging quite a lot easier (execution traces get the same timing), but the latency of the last instruction before a drain will not be accounted for correctly (it will always be 1 cycle).
Note 2: This changeset removes so_state variable, the locked variable, and the tickEvent from checkpoints since none of them contain state that needs to be preserved across checkpoints. The so_state is made redundant because we don't use the drain state variable anymore, the lock variable should never be set when the system is drained, and the tick event isn't scheduled. |
9442:36967173340c |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Make sure that a drained timing CPU isn't executing ucode
Currently, the timing CPU can be in the middle of a microcode sequence or multicycle (stayAtPC is true) instruction when it is drained. This leads to two problems:
* When switching to a hardware virtualized CPU, we obviously can't execute gem5 microcode.
* If stayAtPC is true we might execute half of an instruction twice when restoring a checkpoint or switching CPUs, which leads to an incorrect execution.
After applying this patch, the CPU will be on a proper instruction boundary, which means that it is safe to switch to any CPU model (including hardware virtualized ones). This changeset also fixes a bug where the timing CPU sometimes switches out with while stayAtPC is true, which corrupts the target state after a CPU switch or checkpoint.
Note: This changeset removes the so_state variable from checkpoints since the drain state isn't used anymore. |
9441:1133617844c8 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Fix broken thread context handover
The thread context handover code used to break when multiple handovers were performed during the same quiesce period. Previously, the thread contexts would assign the TC pointer in the old quiesce event to the new TC. This obviously broke in cases where multiple switches were performed within the same quiesce period, in which case the TC pointer in the quiesce event would point to an old CPU.
The new implementation deschedules pending quiesce events in the old TC and schedules a new quiesce event in the new TC. The code has been refactored to remove most of the code duplication. |
9440:fdc91cab5760 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Fix O3 LSQ debug dumping constness and formatting |
9437:8088e94a9de0 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Fix broken squashAfter implementation in O3 CPU
Commit can currently both commit and squash in the same cycle. This confuses other stages since the signals coming from the commit stage can only signal either a squash or a commit in a cycle. This changeset changes the behavior of squashAfter so that it commits all instructions, including the instruction that requested the squash, in the first cycle and then starts to squash in the next cycle. |
9436:4a0223da4924 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
o3 cpu: Remove unused variables |
9433:34971d2e0019 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Rename defer_registration->switched_out
The defer_registration parameter is used to prevent a CPU from initializing at startup, leaving it in the "switched out" mode. The name of this parameter (and the help string) is confusing. This patch renames it to switched_out, which should be more descriptive. |
9432:f902aa5773a8 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Remove unused params.hh header file in inorder CPU |
9430:a113f27b68bd |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Introduce sanity checks when switching between CPUs
This patch introduces the following sanity checks when switching between CPUs:
* Check that the set of new and old CPUs do not overlap. Having an overlap between the set of new CPUs and the set of old CPUs is currently not supported. Doing such a switch used to result in the following assertion error: BaseCPU::takeOverFrom(BaseCPU*): \ Assertion `!new_itb_port->isConnected()' failed.
* Check that all new CPUs are in the switched out state.
* Check that all old CPUs are in the switched in state. |
9429:7c787b8030c6 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Correctly call parent on switchOut() and takeOverFrom()
This patch cleans up the CPU switching functionality by making sure that CPU models consistently call the parent on switchOut() and takeOverFrom(). This has the following implications that might alter current functionality:
* The call to BaseCPU::switchout() in the O3 CPU is moved from signalDrained() (!) to switchOut().
* A call to BaseSimpleCPU::switchOut() is introduced in the simple CPUs. |
9428:029dfe6324d3 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Unify SimpleCPU and O3 CPU serialization code
The O3 CPU used to copy its thread context to a SimpleThread in order to do serialization. This was a bit of a hack involving two static SimpleThread instances and a magic constructor that was only used by the O3 CPU.
This patch moves the ThreadContext serialization code into two global procedures that, in addition to the normal serialization parameters, take a ThreadContext reference as a parameter. This allows us to reuse the serialization code in all ThreadContext implementations. |
9427:ddf45c1d54d4 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Initialize the O3 pipeline from startup()
The entire O3 pipeline used to be initialized from init(), which is called before initState() or unserialize(). This causes the pipeline to be initialized from an incorrect thread context. This doesn't currently lead to correctness problems as instructions fetched from the incorrect start PC will be squashed a few cycles after initialization.
This patch will affect the regressions since the O3 CPU now issues its first instruction fetch to the correct PC instead of 0x0. |
9426:0548b3e9734d |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Implement a flat register interface in thread contexts
Some architectures map registers differently depending on their mode of operations. There is currently no architecture independent way of accessing all registers. This patch introduces a flat register interface to the ThreadContext class. This interface is useful, for example, when serializing or copying thread contexts. |
9425:a24092160ec7 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
arch: Move the ISA object to a separate section
After making the ISA an independent SimObject, it is serialized automatically by the Python world. Previously, this just resulted in an empty ISA section. This patch moves the contents of the ISA to that section and removes the explicit ISA serialization from the thread contexts, which makes it behave like a normal SimObject during serialization.
Note: This patch breaks checkpoint backwards compatibility! Use the cpt_upgrader.py utility to upgrade old checkpoints to the new format. |
9424:d631aac65246 |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Check that the memory system is in the correct mode
This patch adds checks to all CPU models to make sure that the memory system is in the correct mode at startup and when resuming after a drain. Previously, we only checked that the memory system was in the right mode when resuming. This is inadequate since this is a configuration error that should be detected at startup as well as when resuming. Additionally, since the check was done using an assert, it wasn't performed when NDEBUG was set (e.g., the fast target). |
9403:af9066bc088c |
07-Jan-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Share the send functionality between traffic generators
This patch moves the packet creating and sending to a member function in the shared base class to avoid code duplication. |
9402:f6e3c60f04e5 |
07-Jan-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Add support for protobuf input for the trace generator
This patch adds support for reading input traces encoded using protobuf according to what is done in the CommMonitor.
A follow-up patch adds a Python script that can be used to convert the previously used ASCII traces to protobuf equivalents. The appropriate regression input is updated as part of this patch. |
9400:b4a3d0953757 |
07-Jan-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Encapsulate traffic generator input in a stream
This patch encapsulates the traffic generator input in a stream class such that the parsing is not visible to the trace generator. The change takes us one step closer to using protobuf-based input traces for the trace replay.
The functionality of the current input stream is identical to what it was, and the ASCII format remains the same for now. |
9391:8f24dcb13b85 |
07-Jan-2013 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Fix the traffic gen read percentage
This patch fixes the computation that determines whether to perform a read or a write such that the two corner cases (0 and 100) are both more efficient and handled correctly. |
9384:877293183bdf |
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
arch: Make the ISA class inherit from SimObject
The ISA class on stores the contents of ID registers on many architectures. In order to make reset values of such registers configurable, we make the class inherit from SimObject, which allows us to use the normal generated parameter headers.
This patch introduces a Python helper method, BaseCPU.createThreads(), which creates a set of ISAs for each of the threads in an SMT system. Although it is currently only needed when creating multi-threaded CPUs, it should always be called before instantiating the system as this is an obvious place to configure ID registers identifying a thread/CPU. |
9383:55fa95053ee8 |
07-Jan-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
o3: Fix issue with LLSC ordering and speculation
This patch unlocks the cpu-local monitor when the CPU sees a snoop to a locked address. Previously we relied on the cache to handle the locking for us, however some users on the gem5 mailing list reported a case where the cpu speculatively executes a ll operation after a pending sc operation in the pipeline and that makes the cache monitor valid. This should handle that case by invaliding the local monitor. |
9382:1c97b57d5169 |
07-Jan-2013 |
Ali Saidi <Ali.Saidi@ARM.com> |
cpu: rename the misleading inSyscall to noSquashFromTC
isSyscall was originally created because during handling of a syscall in SE mode the threadcontext had to be updated. However, in many places this is used in FS mode (e.g. fault handlers) and the name doesn't make much sense. The boolean actually stops gem5 from squashing speculative and non-committed state when a write to a threadcontext happens, so re-name the variable to something more appropriate |
9377:6f294e7a93d1 |
04-Jan-2013 |
Gabe Black <gblack@eecs.umich.edu> |
Decoder: Remove the thread context get/set from the decoder.
This interface is no longer used, and getting rid of it simplifies the decoders and code that sets up the decoders. The thread context had been used to read architectural state which was used to contextualize the instruction memory as it came in. That was changed so that the state is now sent to the decoders to keep locally if/when it changes. That's significantly more efficient.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
9365:644be05ee7c2 |
11-Dec-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
ruby: modify the directed tester to read/write streams The directed tester supports only generating only read or only write accesses. The patch modifies the tester to support streams that have both read and write accesses. |
9360:515891d9057a |
06-Dec-2012 |
Erik Tomusk <E.Tomusk@sms.ed.ac.uk> |
TournamentBP: Fix some bugs with table sizes and counters globalHistoryBits, globalPredictorSize, and choicePredictorSize are decoupled. globalHistoryBits controls how much history is kept, global and choice predictor sizes control how much of that history is used when accessing predictor tables. This way, global and choice predictors can actually be different sizes, and it is no longer possible to walk off the predictor arrays and cause a seg fault.
There are now individual thresholds for choice, global, and local saturating counters, so that taken/not taken decisions are correct even when the predictors' counters' sizes are different.
The interface for localPredictorSize has been removed from TournamentBP because the value can be calculated from localHistoryBits.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
9359:63bd743e1acb |
06-Dec-2012 |
Malek Musleh <malek.musleh@gmail.com> |
inorder cpu: add missing DPRINTF argument
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
9358:aa761458ddcb |
06-Dec-2012 |
Nathanael Premillieu <nathanael.premillieu@irisa.fr> |
o3 cpu: remove some unused buggy functions in the lsq Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
9342:6fec8f26e56d |
02-Nov-2012 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
sim: Move the draining interface into a separate base class
This patch moves the draining interface from SimObject to a separate class that can be used by any object needing draining. However, objects not visible to the Python code (i.e., objects not deriving from SimObject) still depend on their parents informing them when to drain. This patch also gets rid of the CountedDrainEvent (which isn't really an event) and replaces it with a DrainManager. |
9341:a0eff1e9c773 |
02-Nov-2012 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
cpu: O3 add a header declaring the DerivO3CPU
SWIG needs a complete declaration of all wrapped objects. This patch adds a header file with the DerivO3CPU class and includes it in the SWIG interface. |
9340:40f8c6a8f38d |
02-Nov-2012 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
cpu: Add header files for checker CPUs
In order to create reliable SWIG wrappers, we need to include the declaration of the wrapped class in the SWIG file. Previously, we didn't expose the declaration of checker CPUs. This patch adds header files for such CPUs and include them in the SWIG wrapper. |
9338:97b4a2be1e5b |
02-Nov-2012 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
sim: Include object header files in SWIG interfaces
When casting objects in the generated SWIG interfaces, SWIG uses classical C-style casts ( (Foo *)bar; ). In some cases, this can degenerate into the equivalent of a reinterpret_cast (mainly if only a forward declaration of the type is available). This usually works for most compilers, but it is known to break if multiple inheritance is used anywhere in the object hierarchy.
This patch introduces the cxx_header attribute to Python SimObject definitions, which should be used to specify a header to include in the SWIG interface. The header should include the declaration of the wrapped object. We currently don't enforce header the use of the header attribute, but a warning will be generated for objects that do not use it. |
9332:ae2a5329ce96 |
02-Nov-2012 |
Dam Sunwoo <dam.sunwoo@arm.com> |
ARM: dump stats and process info on context switches
This patch enables dumping statistics and Linux process information on context switch boundaries (__switch_to() calls) that are used for Streamline integration (a graphical statistics viewer from ARM). |
9327:07a22ace275d |
02-Nov-2012 |
Mrinmoy Ghosh <mrinmoy.ghosh@arm.com> |
o3: Fix a couple of issues with the local predictor.
Fix some issues with the local predictor and the way it's indexed. |
9301:1e8d01c15a77 |
15-Oct-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
memtest: move check on outstanding requests The Memtest tester allows for only one request to be outstanding for a particular physical address. The check has been written separately for reads and writes. This patch moves the check earlier than its current position so that it need not be written separately for reads and writes. |
9294:8fb03b13de02 |
15-Oct-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Port: Add protocol-agnostic ports in the port hierarchy
This patch adds an additional level of ports in the inheritance hierarchy, separating out the protocol-specific and protocl-agnostic parts. All the functionality related to the binding of ports is now confined to use BaseMaster/BaseSlavePorts, and all the protocol-specific parts stay in the Master/SlavePort. In the future it will be possible to add other protocol-specific implementations.
The functions used in the binding of ports, i.e. getMaster/SlavePort now use the base classes, and the index parameter is updated to use the PortID typedef with the symbolic InvalidPortID as the default. |
9290:90dd57ca9a7e |
15-Oct-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Fix: Address a few minor issues identified by cppcheck
This patch addresses a number of smaller issues identified by the code inspection utility cppcheck. There are a number of identified leaks in the arm/linux/system.cc (although the function only get's called once so it is not a major problem), a few deletes in dev/x86/i8042.cc that were not array deletes, and sprintfs where the character array had one element less than needed. In the IIC tags there was a function allocating an array of longs which is in fact never used. |
9284:f4ff625eae56 |
15-Oct-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Regression: Use CPU clock and 32-byte width for L1-L2 bus
This patch changes the CoherentBus between the L1s and L2 to use the CPU clock and also four times the width compared to the default bus. The parameters are not intending to fit every single scenario, but rather serve as a better startingpoint than what we previously had.
Note that the scripts that do not use the addTwoLevelCacheHiearchy are not affected by this change.
A separate patch will update the stats. |
9260:9ca8345d24c4 |
25-Sep-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Pack the comm structures a bit better to reduce their size. |
9258:baa17ba80e06 |
25-Sep-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Squash outstanding walks when instructions are squashed. |
9254:f1b35c618252 |
25-Sep-2012 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
sim: Move CPU-specific methods from SimObject to the BaseCPU class |
9252:f350fac86d0f |
25-Sep-2012 |
Djordje Kovacevic <djordje.kovacevic@arm.com> |
CPU: Add abandoned instructions to O3 Pipe Viewer |
9241:6cfb9a7acb1b |
21-Sep-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
TrafficGen: Add a basic traffic generator
This patch adds a traffic generator to the code base. The generator is aimed to be used as a black box model to create appropriate use-cases and benchmarks for the memory system, and in particular the interconnect and the memory controller.
The traffic generator is a master module, where the actual behaviour is captured in a state-transition graph where each state generates some sort of traffic. By constructing a graph it is possible to create very elaborate scenarios from basic generators. Currencly the set of generators include idling, linear address sweeps, random address sequences and playback of traces (recording will be done by the Communication Monitor in a follow-up patch). At the moment the graph and the states are described in an ad-hoc line-based format, and in the future this should be aligned with our used of e.g. the Google protobufs. Similarly for the traces, the format is currently a simplistic ad-hoc line-based format that merely serves as a starting point.
In addition to being used as a black-box model for system components, the traffic generator is also useful for creating test cases and regressions for the interconnect and memory system. In future patches we will use the traffic generator to create DRAM test cases for the controller model.
The patch following this one adds a basic regressions which also contains an example configuration script and trace file for playback. |
9235:5aa4896ed55a |
19-Sep-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
AddrRange: Transition from Range<T> to AddrRange
This patch takes the final plunge and transitions from the templated Range class to the more specific AddrRange. In doing so it changes the obvious Range<Addr> to AddrRange, and also bumps the range_map to be AddrRangeMap.
In addition to the obvious changes, including the removal of redundant includes, this patch also does some house keeping in preparing for the introduction of address interleaving support in the ranges. The Range class is also stripped of all the functionality that is never used. |
9220:37e6eb40cf91 |
12-Sep-2012 |
Joel Hestness <hestness@cs.wisc.edu> |
Base CPU: Initialize profileEvent to NULL The profileEvent pointer is tested against NULL in various places, but it is not initialized unless running in full-system mode. In SE mode, this can result in segmentation faults when profileEvent default intializes to something other than NULL. |
9218:7e9e34d4203b |
12-Sep-2012 |
Anthony Gutierrez <atgutier@umich.edu> |
stats: remove duplicate instruction stats from the commit stage
these stats are duplicates of insts/opsCommitted, cause confusion, and are poorly named. |
9208:2451e60d4555 |
11-Sep-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
Ruby: Use uint8_t instead of uint8 everywhere |
9194:149a32e42697 |
07-Sep-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Get rid of incorrect assert in RAS. |
9184:a1a8f137b796 |
07-Sep-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Param: Transition to Cycles for relevant parameters
This patch is a first step to using Cycles as a parameter type. The main affected modules are the CPUs and the Ruby caches. There are definitely plenty more places that are affected, but this patch serves as a starting point to making the transition.
An important part of this patch is to actually enable parameters to be specified as Param.Cycles which involves some changes to params.py. |
9180:ee8d7a51651d |
28-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Clock: Add a Cycles wrapper class and use where applicable
This patch addresses the comments and feedback on the preceding patch that reworks the clocks and now more clearly shows where cycles (relative cycle counts) are used to express time.
Instead of bumping the existing patch I chose to make this a separate patch, merely to try and focus the discussion around a smaller set of changes. The two patches will be pushed together though.
This changes done as part of this patch are mostly following directly from the introduction of the wrapper class, and change enough code to make things compile and run again. There are definitely more places where int/uint/Tick is still used to represent cycles, and it will take some time to chase them all down. Similarly, a lot of parameters should be changed from Param.Tick and Param.Unsigned to Param.Cycles.
In addition, the use of curTick is questionable as there should not be an absolute cycle. Potential solutions can be built on top of this patch. There is a similar situation in the o3 CPU where lastRunningCycle is currently counting in Cycles, and is still an absolute time. More discussion to be had in other words.
An additional change that would be appropriate in the future is to perform a similar wrapping of Tick and probably also introduce a Ticks class along with suitable operators for all these classes. |
9179:666bc9df1e49 |
28-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Clock: Rework clocks to avoid tick-to-cycle transformations
This patch introduces the notion of a clock update function that aims to avoid costly divisions when turning the current tick into a cycle. Each clocked object advances a private (hidden) cycle member and a tick member and uses these to implement functions for getting the tick of the next cycle, or the tick of a cycle some time in the future.
In the different modules using the clocks, changes are made to avoid counting in ticks only to later translate to cycles. There are a few oddities in how the O3 and inorder CPU count idle cycles, as seen by a few locations where a cycle is subtracted in the calculation. This is done such that the regression does not change any stats, but should be revisited in a future patch.
Another, much needed, change that is not done as part of this patch is to introduce a new typedef uint64_t Cycle to be able to at least hint at the unit of the variables counting Ticks vs Cycles. This will be done as a follow-up patch.
As an additional follow up, the thread context still uses ticks for the book keeping of last activate and last suspend and this should probably also be changed into cycles as well. |
9178:6a0ff1770e6e |
28-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Port: Stricter port bind/unbind semantics
This patch tightens up the semantics around port binding and checks that the ports that are being bound are currently not connected, and similarly connected before unbind is called.
The patch consequently also changes the order of the unbind and bind for the switching of CPUs to ensure that the rules are adhered to. Previously the ports would be "over-written" without any check.
There are no changes in behaviour due to this patch, and the only place where the unbind functionality is used is in the CPU. |
9176:6807aa361e80 |
28-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Checker: Fix checker CPU ports
This patch updates how the checker CPU handles the ports such that the regressions will once again run without causing a panic.
A minor amount of tidying up was also done as part of this patch. |
9171:ae88ecf37145 |
27-Aug-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
Ruby: Remove RubyEventQueue This patch removes RubyEventQueue. Consumer objects now rely on RubySystem or themselves for scheduling events. |
9165:f9e3dac185ba |
22-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Packet: Remove NACKs from packet and its use in endpoints
This patch removes the NACK frrom the packet as there is no longer any module in the system that issues them (the bridge was the only one and the previous patch removes that).
The handling of NACKs was mostly avoided throughout the code base, by using e.g. panic or assert false, but in a few locations the NACKs were actually dealt with (although NACKs never occured in any of the regressions). Most notably, the DMA port will now never receive a NACK and the backoff time is thus never changed. As a consequence, the entire backoff mechanism (similar to a PCI bus) is now removed and the DMA port entirely relies on the bus performing the arbitration and issuing a retry when appropriate. This is more in line with e.g. PCIe.
Surprisingly, this patch has no impact on any of the regressions. As mentioned in the patch that removes the NACK from the bridge, a follow-up patch should change the request and response buffer size for at least one regression to also verify that the system behaves as expected when the bridge fills up. |
9161:e353c178fb36 |
21-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Remove overloaded function_trace_start parameter
This patch removes the overloading of the parameter, which seems both redundant, and possibly incorrect.
The inorder CPU is particularly interesting as it uses a different name for the parameter, and never make any use of it internally. |
9158:d152d34a4adf |
21-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Clock: Make Tick unsigned and remove UTick
This patch makes the Tick unsigned and removes the UTick typedef. The ticks should never be negative, and there was only one major issue with removing it, caused by the o3 CPU using a -1 as an initial value.
The patch has no impact on any regressions. |
9157:e0bad9d7bbd6 |
21-Aug-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Clock: Move the clock and related functions to ClockedObject
This patch moves the clock of the CPU, bus, and numerous devices to the new class ClockedObject, that sits in between the SimObject and MemObject in the class hierarchy. Although there are currently a fair amount of MemObjects that do not make use of the clock, they potentially should do so, e.g. the caches should at some point have the same clock as the CPU, potentially with a 1:n ratio. This patch does not introduce any new clock objects or object hierarchies (clusters, clock domains etc), but is still a step in the direction of having a more structured approach clock domains.
The most contentious part of this patch is the serialisation of clocks that some of the modules (but not all) did previously. This serialisation should not be needed as the clock is set through the parameters even when restoring from the checkpoint. In other words, the state is "stored" in the Python code that creates the modules.
The nextCycle methods are also simplified and the clock phase parameter of the CPU is removed (this could be part of a clock object once they are introduced). |
9152:86c0e6ca5e7c |
15-Aug-2012 |
Anthony Gutierrez <atgutier@umich.edu> |
O3,ARM: fix some problems with drain/switchout functionality and add Drain DPRINTFs
This patch fixes some problems with the drain/switchout functionality for the O3 cpu and for the ARM ISA and adds some useful debug print statements.
This is an incremental fix as there are still a few bugs/mem leaks with the switchout code. Particularly when switching from an O3CPU to a TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA I haven't encountered any more assertion failures; now the kernel will typically panic inside of simulation. |
9144:c773314f7098 |
06-Aug-2012 |
Steve Reinhardt <steve.reinhardt@amd.com> |
process: add progName() virtual function
This replaces a (potentially uninitialized) string field with a virtual function so that we can have a safe interface without requiring changes to the eio code. |
9132:c8d4b0595448 |
27-Jul-2012 |
Anthony Gutierrez <atgutier@umich.edu> |
checker: make checker cpu id match its host's cpu id
when using the checker i ran into problems where an instruction reading the cpu id register failed because the ids did not match, and hence, the result of the instruction did not match. this patch ensures that the ids match so this instruction does not fail. this problem only seemed to manifest itself when multiple cores were in the system, either multi-core, or extra switched- out cores present in the system. |
9108:ad76a669e9d9 |
11-Jul-2012 |
Brad Beckmann <Brad.Beckmann@amd.com> |
ruby: remove the cpu assumptions for the random tester |
9101:d39368c6f502 |
11-Jul-2012 |
Brad Beckmann <Brad.Beckmann@amd.com> |
cpu: added assertions to ensure the correct proxies are used |
9095:0e6bd7082fac |
09-Jul-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Port: Align port names in C++ and Python
This patch is a first step to align the port names used in the Python world and the C++ world. Ultimately it serves to make the use of config.json together with output from the simulation easier, including post-processing of statistics.
Most notably, the CPU, cache, and bus is addressed in this patch, and there might be other ports that should be updated accordingly. The dash name separator has also been replaced with a "." which is what is used to concatenate the names in python, and a separation is made between the master and slave port in the bus. |
9087:b5a084a6159b |
09-Jul-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Port: Move retry from port base class to Master/SlavePort
This patch is the last part of moving all protocol-related functionality out of the Port base class. All the send/recv functions are already moved, and the retry (which still governs all the timing transport functions) is the only part that remained in the base class.
The only point where this currently causes a bit of inconvenience is in the bus where the retry list is global and holds Port pointers (not Master/SlavePort). This is about to change with the split into a request/response bus and will soon be removed anyway.
The patch has no impact on any regressions. |
9086:496304c8017d |
09-Jul-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Fix: Address a few benign memory leaks
This patch is the result of static analysis identifying a number of memory leaks. The leaks are all benign as they are a result of not deallocating memory in the desctructor. The fix still has value as it removes false positives in the static analysis. |
9078:6222624550e7 |
29-Jun-2012 |
Nathanael Premillieu <npremill@irisa.fr> |
O3: Track if the RAS has been pushed or not to pop the RAS if neccessary.
Add new flag (named pushedRAS) in the PredictorHistory structure. This flag tracks whether the RAS has been pushed or not during a prediction. Then, in the squash function it is used to pop the RAS if necessary. |
9066:35ac3a6f8ee0 |
08-Jun-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Timing CPU: Remove a redundant port pointer
This patch is trivial and merely prunes a pointer that was never set or used. |
9058:cc47e11ccec1 |
05-Jun-2012 |
Anthony Gutierrez <atgutier@umich.edu> |
cpu: Don't init simple and inorder CPUs if they are defered.
initCPU() will be called to initialize switched out CPUs for the simple and inorder CPU models. this patch prevents those CPUs from being initialized because they should get their state from the active CPU when it is switched out. |
9057:f5ee56466b91 |
05-Jun-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
ISA: Back-out NoopMachInst as a StaticInstPtr change. |
9046:a1104cc13db2 |
05-Jun-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Clean up the O3 structures and try to pack them a bit better.
DynInst is extremely large the hope is that this re-organization will put the most used members close to each other. |
9044:904ddeecc653 |
05-Jun-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
sim: Remove FastAlloc
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe. After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc when running twolf for ARM. |
9040:cdfe09f9bdee |
04-Jun-2012 |
Gabe Black <gblack@eecs.umich.edu> |
ISA: Turn the ExtMachInst NoopMachinst into the StaticInstPtr NoopStaticInst.
This eliminates a use of the ExtMachInst type outside of the ISAs. |
9036:6385cf85bf12 |
31-May-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Bus: Split the bus into a non-coherent and coherent bus
This patch introduces a class hierarchy of buses, a non-coherent one, and a coherent one, splitting the existing bus functionality. By doing so it also enables further specialisation of the two types of buses.
A non-coherent bus connects a number of non-snooping masters and slaves, and routes the request and response packets based on the address. The request packets issued by the master connected to a non-coherent bus could still snoop in caches attached to a coherent bus, as is the case with the I/O bus and memory bus in most system configurations. No snoops will, however, reach any master on the non-coherent bus itself. The non-coherent bus can be used as a template for modelling PCI, PCIe, and non-coherent AMBA and OCP buses, and is typically used for the I/O buses.
A coherent bus connects a number of (potentially) snooping masters and slaves, and routes the request and response packets based on the address, and also forwards all requests to the snoopers and deals with the snoop responses. The coherent bus can be used as a template for modelling QPI, HyperTransport, ACE and coherent OCP buses, and is typically used for the L1-to-L2 buses and as the main system interconnect.
The configuration scripts are updated to use a NoncoherentBus for all peripheral and I/O buses.
A bit of minor tidying up has also been done. |
9031:32ecc0217c5e |
30-May-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Packet: Unify the use of PortID in packet and port
This patch removes the Packet::NodeID typedef and unifies it with the Port::PortId. The src and dest fields in the packet are used to hold a port id (e.g. in the bus), and thus the two should actually be the same.
The typedef PortID is now global (in base/types.hh) and aligned with the ThreadID in terms of capitalisation and naming of the InvalidPortID constant.
Before this patch, two flags were used for valid destination and source, rather than relying on a named value (InvalidPortID), and this is now redundant, as the src and dest field themselves are sufficient to tell whether the current value is a valid port identifier or not. Consequently, the VALID_SRC and VALID_DST are removed.
As part of the cleaning up, a number of int parameters and local variables are updated to use PortID.
Note that Ruby still has its own NodeID typedef. Furthermore, the MemObject getMaster/SlavePort still has an int idx parameter with a default value of -1 which should eventually change to PortID idx = InvalidPortID. |
9024:5851586f399c |
26-May-2012 |
Gabe Black <gblack@eecs.umich.edu> |
ISA,CPU: Generalize and split out the components of the decode cache.
This will allow it to be specialized by the ISAs. The existing caching scheme is provided by the BasicDecodeCache in the GenericISA namespace and is built from the generalized components. |
9023:e9201a7bce59 |
26-May-2012 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Merge the predecoder and decoder.
These classes are always used together, and merging them will give the ISAs more flexibility in how they cache things and manage the process. |
9022:bb25e7646c41 |
25-May-2012 |
Gabe Black <gblack@eecs.umich.edu> |
ISA: Make the decode function part of the ISA's decoder. |
9021:736048daf279 |
25-May-2012 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Simplify the implementation of the decode cache.
Also reorganize it to make it more amenable to being rearranged later. |
9020:14321ce30881 |
25-May-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Decode: Make the Decoder class defined per ISA. |
8991:69fad6658160 |
10-May-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
gem5: fix some iterator use and erase bugs |
8990:5d80de4bbf96 |
10-May-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
gem5: fix a number of use after free issues |
8975:7f36d4436074 |
01-May-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Separate requests and responses for timing accesses
This patch moves send/recvTiming and send/recvTimingSnoop from the Port base class to the MasterPort and SlavePort, and also splits them into separate member functions for requests and responses: send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq, send/recvTimingSnoopResp. A master port sends requests and receives responses, and also receives snoop requests and sends snoop responses. A slave port has the reciprocal behaviour as it receives requests and sends responses, and sends snoop requests and receives snoop responses.
For all MemObjects that have only master ports or slave ports (but not both), e.g. a CPU, or a PIO device, this patch merely adds more clarity to what kind of access is taking place. For example, a CPU port used to call sendTiming, and will now call sendTimingReq. Similarly, a response previously came back through recvTiming, which is now recvTimingResp. For the modules that have both master and slave ports, e.g. the bus, the behaviour was previously relying on branches based on pkt->isRequest(), and this is now replaced with a direct call to the apprioriate member function depending on the type of access. Please note that send/recvRetry is still shared by all the timing accessors and remains in the Port base class for now (to maintain the current bus functionality and avoid changing the statistics of all regressions).
The packet queue is split into a MasterPort and SlavePort version to facilitate the use of the new timing accessors. All uses of the PacketQueue are updated accordingly.
With this patch, the type of packet (request or response) is now well defined for each type of access, and asserts on pkt->isRequest() and pkt->isResponse() are now moved to the appropriate send member functions. It is also worth noting that sendTimingSnoopReq no longer returns a boolean, as the semantics do not alow snoop requests to be rejected or stalled. All these assumptions are now excplicitly part of the port interface itself. |
8965:1ebd7c856abc |
25-Apr-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Add the PortId type and a corresponding id field to Port
This patch introduces the PortId type, moves the definition of INVALID_PORT_ID to the Port class, and also gives every port an id to reflect the fact that each element in a vector port has an identifier/index.
Previously the bus and Ruby testers (and potentially other users of the vector ports) added the id field in their port subclasses, and now this functionality is always present as it is moved to the base class. |
8955:bbceb6297329 |
15-Apr-2012 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Tidy up some formatting and a DPRINTF in the simple CPU base class.
Put the { on the same line as the if and put a space between the if and the open paren. Also, use the # format modifier which puts a 0x in front of hex values automatically. If the ExtMachInst type isn't integral and actually prints something more complicated, the # falls away harmlessly and we aren't left with a phantom 0x followed by a bunch of unrelated text. |
8950:a6830d615eff |
14-Apr-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Ruby: Use MasterPort base-class pointers where possible
This patch simplifies future patches by changing the pointer type used in a number of the Ruby testers to use MasterPort instead of using a derived CpuPort class. There is no reason for using the more specialised pointers, and there is no longer a need to do any casting.
With the latest changes to the tester, organising ports as readers and writes, things got a bit more complicated, and the "type" now had to be removed to be able to fall back to using MasterPort rather than CpuPort. |
8949:3fa1ee293096 |
14-Apr-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Remove the Broadcast destination from the packet
This patch simplifies the packet by removing the broadcast flag and instead more firmly relying on (and enforcing) the semantics of transactions in the classic memory system, i.e. request packets are routed from a master to a slave based on the address, and when they are created they have neither a valid source, nor destination. On their way to the slave, the request packet is updated with a source field for all modules that multiplex packets from multiple master (e.g. a bus). When a request packet is turned into a response packet (at the final slave), it moves the potentially populated source field to the destination field, and the response packet is routed through any multiplexing components back to the master based on the destination field.
Modules that connect multiplexing components, such as caches and bridges store any existing source and destination field in the sender state as a stack (just as before).
The packet constructor is simplified in that there is no longer a need to pass the Packet::Broadcast as the destination (this was always the case for the classic memory system). In the case of Ruby, rather than using the parameter to the constructor we now rely on setDest, as there is already another three-argument constructor in the packet class.
In many places where the packet information was printed as part of DPRINTFs, request packets would be printed with a numeric "dest" that would always be -1 (Broadcast) and that field is now removed from the printing. |
8948:e95ee70f876c |
14-Apr-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Separate snoops and normal memory requests/responses
This patch introduces port access methods that separates snoop request/responses from normal memory request/responses. The differentiation is made for functional, atomic and timing accesses and builds on the introduction of master and slave ports.
Before the introduction of this patch, the packets belonging to the different phases of the protocol (request -> [forwarded snoop request -> snoop response]* -> response) all use the same port access functions, even though the snoop packets flow in the opposite direction to the normal packet. That is, a coherent master sends normal request and receives responses, but receives snoop requests and sends snoop responses (vice versa for the slave). These two distinct phases now use different access functions, as described below.
Starting with the functional access, a master sends a request to a slave through sendFunctional, and the request packet is turned into a response before the call returns. In a system without cache coherence, this is all that is needed from the functional interface. For the cache-coherent scenario, a slave also sends snoop requests to coherent masters through sendFunctionalSnoop, with responses returned within the same packet pointer. This is currently used by the bus and caches, and the LSQ of the O3 CPU. The send/recvFunctional and send/recvFunctionalSnoop are moved from the Port super class to the appropriate subclass.
Atomic accesses follow the same flow as functional accesses, with request being sent from master to slave through sendAtomic. In the case of cache-coherent ports, a slave can send snoop requests to a master through sendAtomicSnoop. Just as for the functional access methods, the atomic send and receive member functions are moved to the appropriate subclasses.
The timing access methods are different from the functional and atomic in that requests and responses are separated in time and send/recvTiming are used for both directions. Hence, a master uses sendTiming to send a request to a slave, and a slave uses sendTiming to send a response back to a master, at a later point in time. Snoop requests and responses travel in the opposite direction, similar to what happens in functional and atomic accesses. With the introduction of this patch, it is possible to determine the direction of packets in the bus, and no longer necessary to look for both a master and a slave port with the requested port id.
In contrast to the normal recvFunctional, recvAtomic and recvTiming that are pure virtual functions, the recvFunctionalSnoop, recvAtomicSnoop and recvTimingSnoop have a default implementation that calls panic. This is to allow non-coherent master and slave ports to not implement these functions. |
8946:fb6c89334b86 |
14-Apr-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
clang/gcc: Fix compilation issues with clang 3.0 and gcc 4.6
This patch addresses a number of minor issues that cause problems when compiling with clang >= 3.0 and gcc >= 4.6. Most importantly, it avoids using the deprecated ext/hash_map and instead uses unordered_map (and similarly so for the hash_set). To make use of the new STL containers, g++ and clang has to be invoked with "-std=c++0x", and this is now added for all gcc versions >= 4.6, and for clang >= 3.0. For gcc >= 4.3 and <= 4.5 and clang <= 3.0 we use the tr1 unordered_map to avoid the deprecation warning.
The addition of c++0x in turn causes a few problems, as the compiler is more stringent and adds a number of new warnings. Below, the most important issues are enumerated:
1) the use of namespaces is more strict, e.g. for isnan, and all headers opening the entire namespace std are now fixed.
2) another other issue caused by the more stringent compiler is the narrowing of the embedded python, which used to be a char array, and is now unsigned char since there were values larger than 128.
3) a particularly odd issue that arose with the new c++0x behaviour is found in range.hh, where the operator< causes gcc to complain about the template type parsing (the "<" is interpreted as the beginning of a template argument), and the problem seems to be related to the begin/end members introduced for the range-type iteration, which is a new feature in c++11.
As a minor update, this patch also fixes the build flags for the clang debug target that used to be shared with gcc and incorrectly use "-ggdb". |
8941:a47fd7c2d44e |
06-Apr-2012 |
Brad Beckmann <Brad.Beckmann@amd.com> |
rubytest: remove spurious printf |
8932:1b2c17565ac8 |
06-Apr-2012 |
Brad Beckmann <Brad.Beckmann@amd.com> |
rubytest: seperated read and write ports.
This patch allows the ruby tester to support protocols where the i-cache and d-cache are managed by seperate controllers. |
8931:7a1dfb191e3f |
06-Apr-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Enable multiple distributed generalized memories
This patch removes the assumption on having on single instance of PhysicalMemory, and enables a distributed memory where the individual memories in the system are each responsible for a single contiguous address range.
All memories inherit from an AbstractMemory that encompasses the basic behaviuor of a random access memory, and provides untimed access methods. What was previously called PhysicalMemory is now SimpleMemory, and a subclass of AbstractMemory. All future types of memory controllers should inherit from AbstractMemory.
To enable e.g. the atomic CPU and RubyPort to access the now distributed memory, the system has a wrapper class, called PhysicalMemory that is aware of all the memories in the system and their associated address ranges. This class thus acts as an infinitely-fast bus and performs address decoding for these "shortcut" accesses. Each memory can specify that it should not be part of the global address map (used e.g. by the functional memories by some testers). Moreover, each memory can be configured to be reported to the OS configuration table, useful for populating ATAG structures, and any potential ACPI tables.
Checkpointing support currently assumes that all memories have the same size and organisation when creating and resuming from the checkpoint. A future patch will enable a more flexible re-organisation. |
8930:f51b4b4f0d5e |
05-Apr-2012 |
Tushar Krishna <tushar@csail.mit.edu> |
NetworkTest: remove unnecessary memory allocation |
8926:570b44fe6e04 |
03-Apr-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Atomic: Remove the physmem_port and access memory directly
This patch removes the physmem_port from the Atomic CPU and instead uses the system pointer to access the physmem when using the fastmem option. The system already keeps track of the physmem and the valid memory address ranges, and with this patch we merely make use of that existing functionality. As a result of this change, the overloaded getMasterPort in the Atomic CPU can be removed, thus unifying the CPUs. |
8922:17f037ad8918 |
30-Mar-2012 |
William Wang <william.wang@arm.com> |
MEM: Introduce the master/slave port sub-classes in C++
This patch introduces the notion of a master and slave port in the C++ code, thus bringing the previous classification from the Python classes into the corresponding simulation objects and memory objects.
The patch enables us to classify behaviours into the two bins and add assumptions and enfore compliance, also simplifying the two interfaces. As a starting point, isSnooping is confined to a master port, and getAddrRanges to slave ports. More of these specilisations are to come in later patches.
The getPort function is not getMasterPort and getSlavePort, and returns a port reference rather than a pointer as NULL would never be a valid return value. The default implementation of these two functions is placed in MemObject, and calls fatal.
The one drawback with this specific patch is that it requires some code duplication, e.g. QueuedPort becomes QueuedMasterPort and QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort (avoiding multiple inheritance). With the later introduction of the port interfaces, moving the functionality outside the port itself, a lot of the duplicated code will disappear again. |
8921:e53972f72165 |
30-Mar-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Unify initMemProxies across CPUs and simulation modes
This patch unifies where initMemProxies is called, in the init() method of each BaseCPU subclass, before TheISA::initCPU is called. Moreover, it also ensures that initMemProxies is called in both full-system and syscall-emulation mode, thus unifying also across the modes. An additional check is added in the ThreadState to ensure that initMemProxies is only called once. |
8913:8b223e308b08 |
22-Mar-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Scons: Remove Werror=False in SConscript files
This patch removes the overriding of "-Werror" in a handful of cases. The code compiles with gcc 4.6.3 and clang 3.0 without any warnings, and thus without any errors. There are no functional changes introduced by this patch. In the future, rather than ypassing "-Werror", address the warnings. |
8907:26256a3e8fa4 |
21-Mar-2012 |
Andrew Lukefahr <lukefahr@umich.edu> |
O3: Fix sizing of decode to rename skid buffer. |
8905:f6faef9f888d |
21-Mar-2012 |
Brian Grayson <b.grayson@samsung.com> |
O3: Fix size of skid buffer between fetch and decode when widths are different |
8902:75b524b64c28 |
19-Mar-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
gcc: Clean-up of non-C++0x compliant code, first steps
This patch cleans up a number of minor issues aiming to get closer to compliance with the C++0x standard as interpreted by gcc and clang (compile with std=c++0x and -pedantic-errors). In particular, the patch cleans up enums where the last item was succeded by a comma, namespaces closed by a curcly brace followed by a semi-colon, and the use of the GNU-extension typeof (replaced by templated functions). It does not address variable-length arrays, zero-size arrays, anonymous structs, range expressions in switch statements, and the use of long long. The generated CPU code also has a large number of issues that remain to be fixed, mainly related to overflows in implicit constant conversion (due to shifts). |
8901:bba76d164f9e |
19-Mar-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
clang: Fix recently introduced clang compilation errors
This patch makes the code compile with clang 2.9 and 3.0 again by making two very minor changes. Firt, it maintains a strict typing in the forward declaration of the BaseCPUParams. Second, it adds a FullSystemInt flag of the type unsigned int next to the boolean FullSystem flag. The FullSystemInt variable can be used in decode-statements (expands to switch statements) in the instruction decoder. |
8895:ad5f1f128faf |
11-Mar-2012 |
Brian Grayson <b.grayson@samsung.com> |
O3: Add fatal when fetchWidth > Impl::MaxWidth. |
8890:9cf2327b7f5d |
09-Mar-2012 |
Geoffrey Blake <geoffrey.blake@arm.com> |
O3/Ozone: Eliminate dead code counting software prefetch insts
Eliminates dead code in the O3 and Ozone CPU models that counted software prefetch instructions separately for the ALPHA ISA only. |
8888:befcf4d79fc1 |
09-Mar-2012 |
Geoffrey Blake <geoffrey.blake@arm.com> |
CheckerCPU: Add function stubs to non-ARM ISA source to compile with CheckerCPU
Making the CheckerCPU a runtime time option requires the code to be compatible with ISAs other than ARM. This patch adds the appropriate function stubs to allow compilation. |
8887:20ea02da9c53 |
09-Mar-2012 |
Geoffrey Blake <geoffrey.blake@arm.com> |
CheckerCPU: Make CheckerCPU runtime selectable instead of compile selectable
Enables the CheckerCPU to be selected at runtime with the --checker option from the configs/example/fs.py and configs/example/se.py configuration files. Also merges with the SE/FS changes. |
8877:82ab797f8384 |
02-Mar-2012 |
Steve Reinhardt <steve.reinhardt@amd.com> |
DynInst: get rid of dead MyHash code.
Not sure what this was ever used for, but it doesn't seem used anymore. |
8876:44f8e7bb7fdf |
02-Mar-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Check that the interrupt controller is created when needed
This patch adds a creation-time check to the CPU to ensure that the interrupt controller is created for the cases where it is needed, i.e. if the CPU is not being switched in later and not a checker CPU.
The patch also adds the "createInterruptController" call to a number of the regression scripts. |
8863:50ce4deacda9 |
01-Mar-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
x86: Fix switching of CPUs This patch prevents creation of interrupt controller for cpus that will be switched in later |
8854:04d1736a5098 |
24-Feb-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Ruby: Simplify tester ports by not using SimpleTimingPort
This patch simplfies the master ports used by RubyDirectedTester and RubyTester by avoiding the use of SimpleTimingPort. Neither tester made any use of the functionality offered by SimpleTimingPort besides a trivial implementation of recvFunctional (only snoops) and recvRangeChange (not relevant since there is only one master).
The patch does not change or add any functionality, it merely makes the introduction of a master/slave port easier (in a future patch). |
8853:0216ed80991b |
24-Feb-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Move all read/write blob functions from Port to PortProxy
This patch moves the readBlob/writeBlob/memsetBlob from the Port class to the PortProxy class, thus making a clear separation of the basic port functionality (recv/send functional/atomic/timing), and the higher-level functional accessors available on the port proxies.
There are only a few places in the code base where the blob functions were used on ports, and they are all for peeking into the memory system without making a normal memory access (in the memtest, and the malta and tsunami pchip). The memtest also exemplifies how easy it is to create a non-translating proxy if desired. The malta and tsunami pchip used a slave port to perform a functional read, and this is now changed to rely on the physProxy of the system (to which they already have a pointer). |
8852:c744483edfcf |
24-Feb-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Make port proxies use references rather than pointers
This patch is adding a clearer design intent to all objects that would not be complete without a port proxy by making the proxies members rathen than dynamically allocated. In essence, if NULL would not be a valid value for the proxy, then we avoid using a pointer to make this clear.
The same approach is used for the methods using these proxies, such as loadSections, that now use references rather than pointers to better reflect the fact that NULL would not be an acceptable value (in fact the code would break and that is how this patch started out).
Overall the concept of "using a reference to express unconditional composition where a NULL pointer is never valid" could be done on a much broader scale throughout the code base, but for now it is only done in the locations affected by the proxies. |
8851:7e966326ef5b |
24-Feb-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Move port creation to the memory object(s) construction
This patch moves all port creation from the getPort method to be consistently done in the MemObject's constructor. This is possible thanks to the Swig interface passing the length of the vector ports. Previously there was a mix of: 1) creating the ports as members (at object construction time) and using getPort for the name resolution, or 2) dynamically creating the ports in the getPort call. This is now uniform. Furthermore, objects that would not be complete without a port have these ports as members rather than having pointers to dynamically allocated ports.
This patch also enables an elaboration-time enumeration of all the ports in the system which can be used to determine the masterId. |
8850:ed91b534ed04 |
24-Feb-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Round-two unifying instr/data CPU ports across models
This patch continues the unification of how the different CPU models create and share their instruction and data ports. Most importantly, it forces every CPU to have an instruction and a data port, and gives these ports explicit getters in the BaseCPU (getDataPort and getInstPort). The patch helps in simplifying the code, make assumptions more explicit, andfurther ease future patches related to the CPU ports.
The biggest changes are in the in-order model (that was not modified in the previous unification patch), which now moves the ports from the CacheUnit to the CPU. It also distinguishes the instruction fetch and load-store unit from the rest of the resources, and avoids the use of indices and casting in favour of keeping track of these two units explicitly (since they are always there anyways). The atomic, timing and O3 model simply return references to their already existing ports. |
8843:7d3ac6813147 |
13-Feb-2012 |
Mrinmoy Ghosh <mrinmoy.ghosh@arm.com> |
BPred: Fix RAS to handle predicated call/return instructions.
Change RAS to fix issues with predicated call/return instructions. Handled all cases in the life of a predicated call and return instruction. |
8842:a02932e2e73d |
13-Feb-2012 |
Mrinmoy Ghosh <mrinmoy.ghosh@arm.com> |
BP: Fix several Branch Predictor issues. 1. Updates the Branch Predictor correctly to the state just after a mispredicted branch, if a squash occurs. 2. If a BTB does not find an entry, the branch is predicted not taken. The global history is modified to correctly reflect this prediction. 3. Local history is now updated at the fetch stage instead of execute stage. 4. In the Update stage of the branch predictor the local predictors are now correctly updated according to the state of local history during fetch stage.
This patch also improves performance by as much as 17% on some benchmarks |
8839:eeb293859255 |
13-Feb-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Introduce the master/slave port roles in the Python classes
This patch classifies all ports in Python as either Master or Slave and enforces a binding of master to slave. Conceptually, a master (such as a CPU or DMA port) issues requests, and receives responses, and conversely, a slave (such as a memory or a PIO device) receives requests and sends back responses. Currently there is no differentiation between coherent and non-coherent masters and slaves.
The classification as master/slave also involves splitting the dual role port of the bus into a master and slave port and updating all the system assembly scripts to use the appropriate port. Similarly, the interrupt devices have to have their int_port split into a master and slave port. The intdev and its children have minimal changes to facilitate the extra port.
Note that this patch does not enforce any port typing in the C++ world, it merely ensures that the Python objects have a notion of the port roles and are connected in an appropriate manner. This check is carried when two ports are connected, e.g. bus.master = memory.port. The following patches will make use of the classifications and specialise the C++ ports into masters and slaves. |
8834:21e8d54ecf07 |
12-Feb-2012 |
Anthony Gutierrez <atgutier@umich.edu> |
cpu: add separate stats for insts/ops both globally and per cpu model |
8832:247fee427324 |
12-Feb-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
mem: Add a master ID to each request object.
This change adds a master id to each request object which can be used identify every device in the system that is capable of issuing a request. This is part of the way to removing the numCpus+1 stats in the cache and replacing them with the master ids. This is one of a series of changes that make way for the stats output to be changed to python. |
8824:a42647b4a6b6 |
10-Feb-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
O3 CPU: Improve handling of delayed commit flag The delayed commit flag is used in conjunction with interrupt pending flag to figure out whether or not fetch stage should get more instructions. This patch clears this flag when instructions are squashed. Also, in case an interrupt is pending, currently it is not possible to access the instruction cache. This patch allows accessing the cache in case this flag is set. |
8823:ae411fcf4935 |
10-Feb-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
O3 CPU: Strengthen condition for handling interrupts The condition for handling interrupts is to check whether or not the cpu's instruction list is empty. As observed, this can lead to cases in which even though the instruction list is empty, interrupts are handled when they should not be. The condition is being strengthened so that interrupts get handled only when the last committed microop did not had IsDelayedCommit set. |
8822:e7ae13867098 |
10-Feb-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
O3 CPU: Provide the squashing instruction This patch adds a function to the ROB that will get the squashing instruction from the ROB's list of instructions. This squashing instruction is used for figuring out the macroop from which the fetch stage should fetch the microops. Further, a check has been added that if the instructions are to be fetched from the cache maintained by the fetch stage, then the data in the cache should be valid and the PC of the thread being fetched from is same as the address of the cache block. |
8821:bba1a976c293 |
10-Feb-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
O3 Fetch: Check if PC is pointing to Microcode ROM |
8820:f39690f70bab |
10-Feb-2012 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Record the system pointer all the time for the simple CPU.
This pointer was only being stored in code that came from SE mode. The system pointer is always meaningful and available, so it should always be stored. |
8818:8f354c5a1634 |
07-Feb-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Checker: Access workload element 0 only if there is an element 0. |
8817:c36441eed919 |
07-Feb-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Faults: Turn off arch/faults.hh
Because there are no longer architecture independent but specialized functions in arch/XXX/faults.hh, code that isn't using the faults from a particular ISA no longer needs to be able to include them through the switching header file arch/faults.hh. By removing that header file (arch/faults.hh), the potential interface between ISA code and non ISA code is narrowed. |
8809:bb10807da889 |
01-Feb-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head, hopefully the last time for this batch. |
8808:8af87554ad7e |
31-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with main repository. |
8807:35e77c938919 |
29-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Yet another merge with the main repository. |
8806:669e93d79ed9 |
29-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Implement Ali's review feedback.
Try to decrease indentation, and remove some redundant FullSystem checks. |
8799:dac1e33e07b0 |
28-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with the main repo. |
8798:adaa92be9037 |
16-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge yet again with the main repository. |
8797:3202eb01e01e |
07-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Another merge with the main repository. |
8796:a2ae5c378d0a |
07-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with the main repository again. |
8795:0909f8ed7aa0 |
07-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with main repository. |
8794:e2ac2b7164dd |
18-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Get rid of includes of config/full_system.hh. |
8793:5f25086326ac |
18-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Get rid of FULL_SYSTEM in the CPU directory. |
8784:05fb20d7064b |
02-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Get rid of FULL_SYSTEM in sim. |
8780:89e0822462a1 |
01-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Get rid of uses of FULL_SYSTEM in Alpha. |
8779:2a590c51adb1 |
01-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Expose the same methods on the CPUs in SE and FS modes. |
8777:dd43f1c9fa0a |
31-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Make the functions available from the TC consistent between SE and FS. |
8767:e575781f71b8 |
30-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Make getProcessPtr available in both modes, and get rid of FULL_SYSTEMs. |
8766:b0773af78423 |
30-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Build the base process class in FS. |
8764:e4660687c49f |
16-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Include getMemPort in FS. |
8761:20322354b80b |
16-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Build/expose vport in SE mode. |
8756:cce8cf3906ca |
16-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
ARM: Turn on the page table walker on ARM in SE mode. |
8754:0996451df6de |
16-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Make physPort and getPhysPort available in SE mode. |
8752:28e899b7dee3 |
13-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Turn on the page table walker in SE mode. |
8745:575cab0db076 |
09-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Build the Interrupt objects in SE mode. |
8739:925f15f96322 |
30-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Build the devices in SE mode. |
8737:770ccf3af571 |
31-Jan-2012 |
Koan-Sin Tan <koansin.tan@gmail.com> |
clang: Enable compiling gem5 using clang 2.9 and 3.0
This patch adds the necessary flags to the SConstruct and SConscript files for compiling using clang 2.9 and later (on Ubuntu et al and OSX XCode 4.2), and also cleans up a bunch of compiler warnings found by clang. Most of the warnings are related to hidden virtual functions, comparisons with unsigneds >= 0, and if-statements with empty bodies. A number of mismatches between struct and class are also fixed. clang 2.8 is not working as it has problems with class names that occur in multiple namespaces (e.g. Statistics in kernel_stats.hh).
clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which causes confusion between the container std::set and the function Packet::set, and this is currently addressed by not including the entire namespace std, but rather selecting e.g. "using std::vector" in the appropriate places. |
8735:dd20a8139788 |
31-Jan-2012 |
Andreas Hansson <andreas.hanson@arm.com> |
Thread: Use inherited baseCpu rather than cpu in SimpleThread
This patch is a trivial simplification, removing the cpu pointer from SimpleThread and relying on the baseCpu pointer in ThreadState. The patch does not add or change any functionality, it merely cleans up the code. |
8733:64a7bf8fa56c |
31-Jan-2012 |
Geoffrey Blake <geoffrey.blake@arm.com> |
CheckerCPU: Re-factor CheckerCPU to be compatible with current gem5
Brings the CheckerCPU back to life to allow FS and SE checking of the O3CPU. These changes have only been tested with the ARM ISA. Other ISAs potentially require modification. |
8730:0a742249f76b |
30-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Clean-up of Functional/Virtual/TranslatingPort remnants
This patch cleans up forward declarations and a member-function prototype that still referred to the old FunctionalPort, VirtualPort and TranslatingPort. There is no change in functionality. |
8727:b3995530319f |
28-Jan-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
O3 CPU LSQ: Implement TSO This patch makes O3's LSQ maintain total order between stores. Essentially only the store at the head of the store buffer is allowed to be in flight. Only after that store completes, the next store is issued to the memory system. By default, the x86 architecture will have TSO. |
8711:c7e14f52c682 |
17-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Separate queries for snooping and address ranges
This patch simplifies the address-range determination mechanism and also unifies the naming across ports and devices. It further splits the queries for determining if a port is snooping and what address ranges it responds to (aiming towards a separation of cache-maintenance ports and pure memory-mapped ports). Default behaviours are such that most ports do not have to define isSnooping, and master ports need not implement getAddrRanges. |
8708:7ccbdea0fa12 |
17-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Simplify ports by removing EventManager
This patch removes the inheritance of EventManager from the ports and moves all responsibility for event queues to the owner. Eventually the event manager should be the interface block, which could either be the structural owner or a subblock like a LSQ in the O3 CPU for example. |
8707:489489c67fd9 |
17-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Moving towards a more general port across CPU models
This patch performs minimal changes to move the instruction and data ports from specialised subclasses to the base CPU (to the largest degree possible). Ultimately it servers to make the CPU(s) have a well-defined interface to the memory sub-system. |
8706:b1838faf3bcc |
17-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Add port proxies instead of non-structural ports
Port proxies are used to replace non-structural ports, and thus enable all ports in the system to correspond to a structural entity. This has the advantage of accessing memory through the normal memory subsystem and thus allowing any constellation of distributed memories, address maps, etc. Most accesses are done through the "system port" that is used for loading binaries, debugging etc. For the entities that belong to the CPU, e.g. threads and thread contexts, they wrap the CPU data port in a port proxy.
The following replacements are made: FunctionalPort > PortProxy TranslatingPort > SETranslatingPortProxy VirtualPort > FSTranslatingPortProxy |
8698:f348cf78072c |
12-Jan-2012 |
Maximilien Breughe <maximilien.breughe@elis.ugent.be> |
inorder: MDU deadlock fix |
8674:a9476951e3a2 |
10-Jan-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
DPRINTF: Improve some dprintf messages. |
8670:aae12ce9f34c |
09-Jan-2012 |
Anders Handler <s052838@student.dtu.dk> |
CPU: Remove Alpha-specific PC alignment check. |
8665:e75d9251f7e6 |
09-Jan-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Remove some asserts that no longer seem to be valid. |
8662:d4548b381e87 |
09-Jan-2012 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Add support of function tracing with O3 CPU. |
8655:e4001326a5ba |
09-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MAC: Make gem5 compile and run on MacOSX 10.7.2
Adaptations to make gem5 compile and run on OSX 10.7.2, with a stock gcc 4.2.1 and the remaining dependencies from macports, i.e. python 2.7,.2 swig 2.0.4, mercurial 2.0. The changes include an adaptation of the SConstruct to handle non-library linker flags, and Darwin-specific code to find the memory usage of gem5. A number of Ruby files relied on ambigious uint (without the 32 suffix) which caused compilation errors. |
8641:4d3ecac1abec |
13-Dec-2011 |
Nathan Binkert <nate@binkert.org> |
gcc: fix unused variable warnings from GCC 4.6.1 |
8634:8390f2d80227 |
01-Dec-2011 |
Chris Emmons <chris.emmons@arm.com> |
Output: Add hierarchical output support and cleanup existing codebase. |
8631:8c038d4cd210 |
01-Dec-2011 |
Chander Sudanthi <chander.sudanthi@arm.com> |
O3: Remove hardcoded tgts_per_mshr in O3CPU.py.
There are two lines in O3CPU.py that set the dcache and icache tgts_per_mshr to 20, ignoring any pre-configured value of tgts_per_mshr. This patch removes these hardcoded lines from O3CPU.py and sets the default L1 cache mshr targets to 20. |
8629:e3cb8e20a9b4 |
01-Dec-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Add support for having a TLB cache. |
8627:86358c187837 |
01-Dec-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Add stat that counts how many cycles the O3 cpu was quiesced. |
8608:02d7ac5fb855 |
03-Nov-2011 |
Nilay Vaish<nilay@cs.wisc.edu> |
Ruby: Remove some unused typedefs This patch removes some of the unused typedefs. It also moves some of the typedefs from Global.hh to TypeDefines.hh. The patch also eliminates the file NodeID.hh. |
8607:5fb918115c07 |
31-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
GCC: Get everything working with gcc 4.6.1.
And by "everything" I mean all the quick regressions. |
8592:30a97c4198df |
27-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Tidy up some DPRINTFs in the LSQ. |
8591:8f23aeaf6a91 |
27-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Faults: Replace calls to genMachineCheckFault with M5PanicFault. |
8587:acce52081b45 |
26-Sep-2011 |
Nilay Vaish<nilay@cs.wisc.edu> |
LSQ: Moved a couple of lines to enable O3 + Ruby This patch makes O3 CPU work along with the Ruby memory model. Ruby overwrites the senderState pointer with another pointer. The pointer is restored only when Ruby gets done with the packet. LSQ makes use of senderState just after sendTiming() returns. But the dynamic_cast returns a NULL pointer since Ruby's senderState pointer is from a different class. Storing the senderState pointer before calling sendTiming() does away with the problem. |
8581:56f97760eadd |
22-Sep-2011 |
Steve Reinhardt <steve.reinhardt@amd.com> |
event: minor cleanup Initialize flags via the Event constructor instead of calling setFlags() in the body of the derived class's constructor. I forget exactly why, but this made life easier when implementing multi-queue support.
Also rename Event::getFlags() to isFlagSet() to better match common usage, and get rid of some unused Event methods. |
8557:f44572edfba3 |
19-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Syscall: Make the syscall function available in both SE and FS modes.
In FS mode the syscall function will panic, but the interface will be consistent and code which calls syscall can be compiled in. This will allow, for instance, instructions that use syscall to be built unconditionally but then not returned by the decoder. |
8545:a3992291e230 |
13-Sep-2011 |
Ali Saidi <saidi@eecs.umich.edu> |
LSQ: Only trigger a memory violation with a load/load if the value changes.
Only create a memory ordering violation when the value could have changed between two subsequent loads, instead of just when loads go out-of-order to the same address. While not very common in the case of Alpha, with an architecture with a hardware table walker this can happen reasonably frequently beacuse a translation will miss and start a table walk and before the CPU re-schedules the faulting instruction another one will pass it to the same address (or cache block depending on the dendency checking).
This patch has been tested with a couple of self-checking hand crafted programs to stress ordering between two cores.
The performance improvement on SPEC benchmarks can be substantial (2-10%). |
8542:7230ff0738e3 |
09-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
StaticInst: Merge StaticInst and StaticInstBase.
Having two StaticInst classes, one nominally ISA dependent and the other ISA dependent, has not been historically useful and makes the StaticInst class more complicated that it needs to be. This change merges StaticInstBase into StaticInst. |
8541:27aaee8ec7cc |
09-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Decode: Pull instruction decoding out of the StaticInst class into its own.
This change pulls the instruction decoding machinery (including caches) out of the StaticInst class and puts it into its own class. This has a few intrinsic benefits. First, the StaticInst code, which has gotten to be quite large, gets simpler. Second, the code that handles decode caching is now separated out into its own component and can be looked at in isolation, making it easier to understand. I took the opportunity to restructure the code a bit which will hopefully also help.
Beyond that, this change also lays some ground work for each ISA to have its own, potentially stateful decode object. We'd be able to include less contextualizing information in the ExtMachInst objects since that context would be applied at the decoder. Also, the decoder could "know" ahead of time that all the instructions it's going to see are going to be, for instance, 64 bit mode, and it will have one less thing to check when it decodes them. Because the decode caching mechanism has been separated out, it's now possible to have multiple caches which correspond to different types of decoding context. Having one cache for each element of the cross product of different configurations may become prohibitive, so it may be desirable to clear out the cache when relatively static state changes and not to have one for each setting.
Because the decode function is no longer universally accessible as a static member of the StaticInst class, a new function was added to the ThreadContexts that returns the applicable decode object. |
8519:ef35ce2bd73f |
19-Aug-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
LSQ: Set store predictor to periodically clear itself as recommended in the storesets paper.
This patch improves performance by as much as 10% on some spec benchmarks. |
8518:9c87727099ce |
19-Aug-2011 |
Geoffrey Blake <geoffrey.blake@arm.com> |
Fix bugs due to interaction between SEV instructions and O3 pipeline
SEV instructions were originally implemented to cause asynchronous squashes via the generateTCSquash() function in the O3 pipeline when updating the SEV_MAILBOX miscReg. This caused race conditions between CPUs in an MP system that would lead to a pipeline either going inactive indefinitely or not being able to commit squashed instructions. Fixed SEV instructions to behave like interrupts and cause synchronous sqaushes inside the pipeline, eliminating the race conditions. Also fixed up the semantics of the WFE instruction to behave as documented in the ARMv7 ISA description to not sleep if SEV_MAILBOX=1 or unmasked interrupts are pending. |
8516:a9c0d2ab490a |
19-Aug-2011 |
Mrinmoy Ghosh <Mrinmoy.Ghosh@arm.com> |
LSQ: Add some better dprintfs for storeset predictor. |
8515:12420b96b364 |
19-Aug-2011 |
Mrinmoy Ghosh <Mrinmoy.Ghosh@arm.com> |
LSQ: Fix a few issues with the storeset predictor.
Two issues are fixed in this patch: 1. The load and store pc passed to the predictor are passed in reverse order. 2. The flag indicating that a barrier is inflight was never cleared when the barrier was squashed instead of committed. This made all load insts dependent on a non-existent barrier in-flight. |
8513:f4272aa61e74 |
19-Aug-2011 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
O3: Squash the violator and younger instructions instead not all insts.
Change the way instructions are squashed on memory ordering violations to squash the violator and younger instructions, not all instructions that are younger than the instruction they violated (no reason to throw away valid work). |
8507:889818c58eff |
16-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
InOrder: Make cache_unit.hh include hashmap.hh explicitly, not transitively. |
8506:5a9c6f49f882 |
16-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Make lsq_unit.hh include arch/isa_traits.hh directly, not transitively. |
8503:479b186a4652 |
14-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: When squashing, restore the macroop that should be used for fetching. |
8502:f1fc7102c970 |
14-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Add a pointer to the macroop for a microop in the dyninst. |
8499:e5f14b00c0ae |
13-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: At the end of an instruction, force fetchAddr to something sensible.
It's possible (though until now very unlikely) for fetchAddr to get out of sync with the actual PC of the current instruction. This change forcefull resets fetchAddr at the end of every instruction. |
8495:6ee3a2359fcb |
09-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Stop using the current macroop no matter why you're leaving it.
Until now, the only reason a macroop would be left was because it ended at a microop marked as the last microop. In O3 with branch prediction, it's possible for the branch predictor to have entries which originally came from different instructions which happened to have the same RIP. This could theoretically happen in many ways, but it was encountered specifically when different programs in different address spaces ran one after the other in X86_FS.
What would happen in that case was that the macroop would continue to be looped over and microops fetched from it until it reached the last microop even though the macropc had moved out from under it. If things lined up properly, this could mean that the end bytes of an instruction actually fell into the instruction sized block of memory after the one in the predecoder. The fetch loop implicitly assumes that the last instruction sized chunk of memory processed was the last one needed for the instruction it just finished executing. It would then tell the predecoder to move to an offset within the bytes it was given that is larger than those bytes, and that would trip an assert in the x86 predecoder.
This change fixes this problem by making fetch stop processing the current macroop if the address it should be fetching from changed when the PC is updated. That happens when the last microop was reached because the instruction handled it properly, and it also catches the case where the branch predictor makes fetch do a macro level branch when it shouldn't.
The check of isLastMicroop is retained because otherwise, a macroop that branches back to itself would act like a single, long macroop instead of multiple instances of the same microop. There may be situations (which may turn out to be purely hypothetical) where that matters.
This also fixes a relatively minor issue where the curMacroop variable would be set to NULL immediately after seeing that a microop was the last one before curMacroop was used to build the dyninst. The traceData structure would have a NULL pointer to the macroop for that microop. |
8493:0eca041a8c06 |
09-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: When waiting to handle an interrupt, let everything drain out.
Before this change, the commit stage would wait until the ROB and store queue were empty before recognizing an interrupt. The fetch stage would stop generating instructions at an appropriate point, so commit would then wait until a valid time to interrupt the instruction stream. Instructions might be in flight after fetch but not the in the ROB or store queue (in rename, for instance), so this change makes commit wait until all in flight instructions are finished. |
8492:1ad244a20877 |
08-Aug-2011 |
Nilay Vaish<nilay@cs.wisc.edu> |
BuildEnv: Eliminate RUBY as build environment variable This patch replaces RUBY with PROTOCOL in all the SConscript files as the environment variable that decides whether or not certain components of the simulator are compiled. |
8491:606cf2660887 |
07-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Get rid of the unused addToRemoveList function. |
8489:2e12a633d269 |
07-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Let squashed and deferred instructions issue.
Let squahsed and deferred instructions issue so they don't accumulate and clog up the CPU. |
8487:c7982323e834 |
07-Aug-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Fix uninitialized variable in the tournament branch predictor. |
8486:c4e77a9563f5 |
07-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Translation: Use a pointer type as the template argument.
This allows regular pointers and reference counted pointers without having to use any shim structures or other tricks. |
8484:3c641509bf3e |
02-Aug-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Get rid of the raw ExtMachInst constructor on DynInsts.
This constructor assumes that the ExtMachInst can be decoded directly into a StaticInst that's useful to execute. With the advent of microcoded instructions that's no longer true. |
8481:818aea9960f5 |
31-Jul-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Implement memory mapped IPRs for O3. |
8479:e68b1ad09c6b |
31-Jul-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Fix corner case squashing into the microcode ROM.
When fetching from the microcode ROM, if the PC is set so that it isn't in the cache block that's been fetched the CPU will get stuck. The fetch stage notices that it's in the ROM so it doesn't try to fetch from the current PC. It then later notices that it's outside of the current cache block so it skips generating instructions expecting to continue once the right bytes have been fetched. This change lets the fetch stage attempt to generate instructions, and only checks if the bytes it's going to use are valid if it's really going to use them. |
8471:18e560ba1539 |
15-Jul-2011 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
O3: Create a pipeline activity viewer for the O3 CPU model.
Implemented a pipeline activity viewer as a python script (util/o3-pipeview.py) and modified O3 code base to support an extra trace flag (O3PipeView) for generating traces to be used as inputs by the tool. |
8463:7a48916a32a8 |
10-Jul-2011 |
Mrinmoy Ghosh <Mrinmoy.Ghosh@arm.com> |
Branch predictor: Fixes the tournament branch predictor.
Branch predictor could not predict a branch in a nested loop because: 1. The global history was not updated after a mispredict squash. 2. The global history was updated in the fetch stage. The choice predictors that were updated used the changed global history. This is incorrect, as it incorporates the state of global history after the branch in encountered. Fixed update to choice predictor using the global history state before the branch happened. 3. The global predictor table was also updated using the global history state before the branch happened as above.
Additionally, parameters to initialize ctr and history size were reversed. |
8462:80492ae5148e |
10-Jul-2011 |
Geoffrey Blake <geoffrey.blake@arm.com |
O3: Fix up pipelining icache accesses in fetch stage to function properly
Fixed up the patch from Yasuko Watanabe that enabled pipelining of fetch accessess to icache to work with recent changes to main repository. Also added in ability for fetch stage to delay issuing the fault carrying nop when a pipeline fetch causes a fault and no fetch bandwidth is available until the next cycle. |
8460:3893d9d2c6c2 |
10-Jul-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Make sure fetch doesn't go off into the weeds during speculation. |
8444:56de1f9320df |
03-Jul-2011 |
Gabe Black <gblack@eecs.umich.edu> |
ExecContext: Rename the readBytes/writeBytes functions to readMem and writeMem.
readBytes and writeBytes had the word "bytes" in their names because they accessed blobs of bytes. This distinguished them from the read and write functions which handled higher level data types. Because those functions don't exist any more, this change renames readBytes and writeBytes to more general names, readMem and writeMem, which reflect the fact that they are how you read and write memory. This also makes their names more consistent with the register reading/writing functions, although those are still read and set for some reason. |
8443:530ff1bc8d70 |
03-Jul-2011 |
Gabe Black <gblack@eecs.umich.edu> |
ExecContext: Get rid of the now unused read/write templated functions. |
8436:5648986156db |
30-Jun-2011 |
Brad Beckmann <Brad.Beckmann@amd.com>, Nilay Vaish <nilay@cs.wisc.edu> |
Ruby: Add support for functional accesses This patch rpovides functional access support in Ruby. Currently only the M5Port of RubyPort supports functional accesses. The support for functional through the PioPort will be added as a separate patch. |
8425:832af946c848 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: gem5.opt compile variable name typo. |
8424:d9f54de93703 |
20-Jun-2011 |
Gabe Black <gblack@eecs.umich.edu> |
InOder: Fix a compile error. |
8419:17b2781e1482 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: clear reg. dep entry after removing from list this will safeguard future code from trying to remove from the list twice. That code wouldnt break but would waste time. |
8418:a6aacf190f14 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: se: squash after syscalls |
8417:61f7e127f9a0 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: cleanup dprintfs in cache unit |
8416:f4a37a07b97c |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: SE mode TLB faults handle them like we do in FS mode, by blocking the TLB until the fault is handled by the fault->invoke() |
8415:5db2bac0a900 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder:tracing: fix fault tracing bug |
8414:97571750fadf |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: se compile fixes |
8413:b52a89442a56 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: add necessary debug flag header files |
8411:80aa16801996 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: clear fetchbuffer on traps implement clearfetchbufferfunction extend predecoder to use multiple threads and clear those on trap |
8410:b5d3e3d05173 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: use separate float-reg bits function in dyninst this will make sure we get the correct view of a FP register |
8409:fa2370a92498 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: use trapPending flag to manage traps |
8408:0cce97fe6390 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder/dtb: make sure DTB translate correct address The DTB expects the correct PC in the ThreadContext but how if the memory accesses are speculative? Shouldn't we send along the requestor's PC to the translate functions? |
8407:1edf525495b1 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: handle serializing instructions including IPR accesses and store-conditionals. These class of instructions will not execute correctly in a superscalar machine |
8404:79cba855342c |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: dont handle multiple faults on same cycle if a faulting instruction reaches an execution unit, then ignore it and pass it through the pipeline.
Once we recognize the fault in the graduation unit, dont allow a second fault to creep in on the same cycle. |
8403:fe5fcf6271e9 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: register ports for FS mode handle "snoop" port registration as well as functional port setup for FS mode |
8402:c6449ed4cbe4 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: check for interrupts each tick use a dummy instruction to facilitate the squash after the interrupts trap |
8401:2e9141200f78 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: explicit fault check Before graduating an instruction, explicitly check fault by making the fault check it's own separate command that can be put on an instruction schedule. |
8400:8d26dc2d92b2 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: squash and trap behind a tlb fault |
8399:bf2054aef42b |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: stall stores on store conditionals & compare/swaps |
8397:7cd61d925338 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: make InOrder CPU FS compilable/visible make syscall a SE mode only functionality copy over basic FS functions (hwrei) to make FS compile |
8396:08ace5acd0c3 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: remove memdep tracking for default pipeline speculative load/store pipelines can reenable this |
8395:11bf5bef5a6c |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: fetchBuffer tracking calculate blocks in use for the fetch buffer to figure out how many total blocks are pending |
8394:dd3e52966c26 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: redefine DynInst FP result type Sharing the FP value w/the integer values was giving inconsistent results esp. when their is a 32-bit integer register matched w/a 64-bit float value |
8393:76dd3a85e4ae |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: treat SE mode syscalls as a trapping instruction define a syscallContext to schedule the syscall and then use syscall() to actually perform the action |
8392:8d523e8d4165 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: bug in mdu segfault was caused by squashed multiply thats in the process of an event. use isProcessing flag to handle this and cleanup the MDU code |
8391:8fff826090b1 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: optionally track faulting instructions |
8390:467f34a4dfd8 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: cleanup events in resource pool remove events in the resource pool that can be called from the CPU event, since the CPU event is scheduled at the same time at the resource pool event. ---- Also, match the resPool event function names to the cpu event function names ---- |
8389:9ccf5354e3a4 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: don't stall after stores once a ST is sent off, it's OK to keep processing, however it's a little more complicated to handle the packet acknowledging the store is completed |
8388:26a0b8a1ecb8 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: don't stall after stores once a ST is sent off, it's OK to keep processing, however it's a little more complicated to handle the packet acknowledging the store is completed |
8387:852687db50eb |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: remove decode squash also, cleanup comments for gem5.fast compilation |
8386:b0a7c7b7748a |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: support for compare and swap insts dont treat read() and write() fields as mut. exclusive |
8385:440835b0179a |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: branch predictor update only update BTB on a taken branch and update branch predictor w/pcstate from instruction --- only pay attention to branch predictor updates if the the inst. is in fact a branch |
8384:1f215de12d15 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: priority for grad/squash events define separate priority resource pool squash and graduate events |
8383:10b13dcd6bb8 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: remove stalls on trap squash |
8382:a2396560f01c |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: no dep. tracking for zero reg this causes forwarding a bad value register value |
8381:5dbee14a7363 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
imported patch recoverPCfromTrap |
8380:e0da7b3c3254 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
imported patch squash_from_next_stage |
8379:89e38e3bbdaa |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: add flatDestReg member to dyninst use it in reg. dep. tracking |
8378:3bb902f4e99a |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: update event priorities dont use offset to calculate this but rather an enum that can be updated |
8377:cd9dd7f8125f |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: implement trap handling |
8376:40d1b3f5bee6 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: cleanup intercomm. structs/squash info |
8375:b085f409d89c |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: use setupSquash for misspeculation implement a clean interface to handle branch misprediction and eventually all pipeline flushing |
8373:a4e999395e15 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: DynInst handling of stores for big-endian ISAs The DynInst was not performing the host-to-guest translation which ended up breaking stores for SPARC |
8372:ee898bed2872 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: make marking of dest. regs an explicit request formerly, this was implicit when you accessed the execution unit or the use-def unit but it's better that this just be something that a user can specify. |
8371:19a930f946ce |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: simplify handling of split accesses |
8370:af963f55b04e |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: addtl functionaly for inst. skeds add find and end functions for inst. schedules that can search by stage number |
8369:85428189024a |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: register file stats keep stats for int/float reg file usage instead of aggregating across reg file types |
8368:8a4747be8490 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: scheduling for nonspec insts make handling of speculative and nonspeculative insts more explicit |
8367:e506f0b8ca51 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: find register dependencies "lazily" Architectures like SPARC need to read the window pointer in order to figure out it's register dependence. However, this may not get updated until after an instruction gets executed, so now we lazily detect the register dependence in the EXE stage (execution unit or use_def). This makes sure we get the mapping after the most current change. |
8366:fc44dde6bbc9 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: assert on macro-ops provide a sanity check for someone coding a new architecture |
8365:715806745e00 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: handle faults at writeback stage call trap function when a fault is received |
8364:67081684c03e |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: ISA-zero reg handling ignore writes to the ISA zero register |
8363:6a15522216c3 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: update support for branch delay slots |
8362:89c713a5aa1d |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: inst. iterator cleanup get rid of accessing iterators (for instructions) by reference |
8360:212e2449ee80 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: update bpred code clean up control flow to make it easier to understand |
8359:95168d713bc9 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: add types for dependency checks |
8358:636adb85b6bd |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: use flattenIdx for reg indexing - also use "threadId()" instead of readTid() everywhere - this will help support more complex ISA indexing |
8357:2fcd223a253b |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
simple-thread: give a name() function for debugging w/the SimpleThread object |
8356:a1f59a213b35 |
19-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: use m5_hash_map for skedCache since we dont care about if the cache of instruction schedules is sorted or not, then the hash map should be faster |
8346:ce8b9a250021 |
10-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
o3: missing newlines on some dprintfs |
8342:77d12d8f7971 |
09-Jun-2011 |
Korey Sewell <ksewell@umich.edu> |
sparc: compilation fixes for inorder Add a few constants and functions that the InOrder model wants for SPARC. * * * sparc: add eaComp function InOrder separates the address generation from the actual access so give Sparc that functionality * * * sparc: add control flags for branches branch predictors and other cpu model functions need to know specific information about branches, so add the necessary flags here |
8338:4d1005f78496 |
07-Jun-2011 |
Gabe Black <gblack@eecs.umich.edu> |
gcc 4.0: Add some virtual destructors to make gcc 4.0 happy. |
8335:9228e00459d4 |
02-Jun-2011 |
Nathan Binkert <nate@binkert.org> |
scons: rename TraceFlags to DebugFlags |
8316:6fd588813142 |
23-May-2011 |
Geoffrey Blake <geoffrey.blake@arm.com> |
O3: Fix offset calculation into storeQueue buffer for store->load forwarding
Calculation of offset to copy from storeQueue[idx].data structure for load to store forwarding fixed to be difference in bytes between store and load virtual addresses. Previous method would induce bug where a load would index into buffer at the wrong location. |
8315:6173b87e7652 |
23-May-2011 |
Geoffrey Blake <geoffrey.blake@arm.com> |
O3: Fix issue w/wbOutstading being decremented multiple times on blocked cache.
If a split load fails on a blocked cache wbOutstanding can be decremented twice if the first part of the split load succeeds and the second part fails. Condition the decrementing on not having completed the first part of the load. |
8314:13ac7b9939ef |
23-May-2011 |
Geoffrey Blake <geoffrey.blake@arm.com> |
O3: Fix issue with interrupts/faults occuring in the middle of a macro-op
This patch fixes two problems with the O3 cpu model. The first is an issue with an instruction fetch causing a fault on the next address while the current macro-op is being issued. This happens when the micro-ops exceed the fetch bandwdith and then on the next cycle the fetch stage attempts to issue a request to the next line while it still has micro-ops to issue if the next line faults a fault is attached to a micro-op in the currently executing macro-op rather than a "nop" from the next instruction block. This leads to an instruction incorrectly faulting when on fetch when it had no reason to fault.
A similar problem occurs with interrupts. When an interrupt occurs the fetch stage nominally stops issuing instructions immediately. This is incorrect in the case of a macro-op as the current location might not be interruptable. |
8300:eb279d6e08a2 |
13-May-2011 |
Chander Sudanthi <chander.sudanthi@arm.com> |
Trace: Allow printing ASIDs and selectively tracing based on user/kernel code.
Debug flags are ExecUser, ExecKernel, and ExecAsid. ExecUser and ExecKernel are set by default when Exec is specified. Use minus sign with ExecUser or ExecKernel to remove user or kernel tracing respectively. |
8298:3c1296738e34 |
13-May-2011 |
Geoffrey Blake <geoffrey.blake@arm.com> |
O3: Fix an issue with a load & branch instruction and mem dep squashing
Instructions that load an address and are control instructions can execute down the wrong path if they were predicted correctly and then instructions following them are squashed. If an instruction is a memory and control op use the predicted address for the next PC instead of just advancing the PC. Without this change NPC is used for the next instruction, but predPC is used to verify that the branch was successful so the wrong path is silently executed. |
8294:44f8c2507d85 |
09-May-2011 |
Nathan Binkert <nate@binkert.org> |
work around gcc 4.5 warning |
8293:db269e704d07 |
07-May-2011 |
Tushar Krishna <tushar@csail.mit.edu> |
NetworkTest: added sim_cycles parameter to the network tester.
The network tester terminates after injecting for sim_cycles (default=1000), instead of having to explicitly pass --maxticks from the command line as before. If fixed_pkts is enabled, the tester only injects maxpackets number of packets, else it keeps injecting till sim_cycles. The tester also works with zero command line arguments now. |
8277:bfaab04cb292 |
04-May-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
CPU: Add some useful debug message to the timing simple cpu. |
8276:66bb0d8ae8bf |
04-May-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
CPU: Fix a case where timing simple cpu faults can nest.
If we fault, change the state to faulting so that we don't fault again in the same cycle. |
8275:8c88a94c2f4f |
04-May-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Remove assertion for case that is actually handled in code.
If an nonspeculative instruction has a fault it might not be in the nonSpecInsts map. |
8272:82057507f2f9 |
04-May-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Fix a small corner case with the lsq hazard detection logic. |
8247:acf4b902c02e |
20-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
stats: one more name violation |
8240:38befb82b2c9 |
19-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
stats: rename stats so they can be used as python expressions |
8232:b28d06a175be |
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
trace: reimplement the DTRACE function so it doesn't use a vector At the same time, rename the trace flags to debug flags since they have broader usage than simply tracing. This means that --trace-flags is now --debug-flags and --trace-help is now --debug-help |
8231:51cf7f3cf9ac |
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
debug: create a Debug namespace |
8230:845c8eb5ac49 |
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
includes: fix up code after sorting |
8229:78bf55f23338 |
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
includes: sort all includes |
8208:45331a355c38 |
04-Apr-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works.
This change fixes a small bug in the arm copyRegs() code where some registers wouldn't be copied if the processor was in a mode other than MODE_USER. Additionally, this change simplifies the way the O3 switchCpu code works by utilizing TheISA::copyRegs() to copy the required context information rather than the adhoc copying that goes on in the CPU model. The current code makes assumptions about the visibility of int and float registers that aren't true for all architectures in FS mode. |
8205:7ecbffb674aa |
04-Apr-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Cleanup implementation of ITSTATE and put important code in PCState.
Consolidate all code to handle ITSTATE in the PCState object rather than touching a variety of structures/objects. |
8201:89221928d131 |
04-Apr-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
CPU: Remove references to memory copy operations |
8199:3d6c08c877a9 |
04-Apr-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Tighten memory order violation checking to 16 bytes.
The comment in the code suggests that the checking granularity should be 16 bytes, however in reality the shift by 8 is 256 bytes which seems much larger than required. |
8190:8c68155aac00 |
31-Mar-2011 |
Lisa Hsu <Lisa.Hsu@amd.com> |
Ruby: have the rubytester pass contextId to Ruby. |
8184:a8d64545cda6 |
28-Mar-2011 |
Somayeh Sardashti <somayeh@cs.wisc.edu> |
This patch supports cache flushing in MOESI_hammer |
8181:f789b9aac5f4 |
26-Mar-2011 |
Korey Sewell <ksewell@umich.edu> |
mips: cleanup ISA-specific code *** (1): get rid of expandForMT function MIPS is the only ISA that cares about having a piece of ISA state integrate multiple threads so add constants for MIPS and relieve the other ISAs from having to define this. Also, InOrder was the only core that was actively calling this function * * * (2): get rid of corespecific type The CoreSpecific type was used as a proxy to pass in HW specific params to a MIPS CPU, but since MIPS FS hasnt been touched for awhile, it makes sense to not force every other ISA to use CoreSpecific as well use a special reset function to set it. That probably should go in a PowerOn reset fault anyway. |
8175:ec1eecca2f8f |
22-Mar-2011 |
Tushar Krishna <tushar@csail.mit.edu> |
This patch fixes a build error in networktest.cc that occurs with gcc4.2 |
8171:19444b1f092c |
21-Mar-2011 |
Tushar Krishna <tushar@csail.mit.edu> |
This patch adds the network tester for simple and garnet networks. The tester code is in testers/networktest. The tester can be invoked by configs/example/ruby_network_test.py. A dummy coherence protocol called Network_test is also addded for network-only simulations and testing. The protocol takes in messages from the tester and just pushes them into the network in the appropriate vnet, without storing any state. |
8164:b043c0efa024 |
19-Mar-2011 |
Nilay Vaish<nilay@cs.wisc.edu> |
Ruby: Convert AccessModeType to RubyAccessMode This patch converts AccessModeType to RubyAccessMode so that both the protocol dependent and independent code uses the same access mode. |
8148:93982cb5044c |
17-Mar-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Fix subtle bug in LDM.
If the instruction faults mid-op the base register shouldn't be written back. |
8143:b0b94a7b7c1f |
17-Mar-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Detect and skip udelay() functions in linux kernel.
This change speeds up booting, especially in MP cases, by not executing udelay() on the core but instead skipping ahead tha amount of time that is being delayed. |
8138:f08692f2932e |
17-Mar-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Send instruction back to fetch on squash to seed predecoder correctly. |
8137:48371b9fb929 |
17-Mar-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Cleanup the commitInfo comm struct.
Get rid of unused members and use base types rather than derrived values where possible to limit amount of state. |
8134:b01a51ff05fa |
17-Mar-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
Mem: Fix issue with dirty block being lost when entire block transferred to non-cache.
This change fixes the problem for all the cases we actively use. If you want to try more creative I/O device attachments (E.g. sharing an L2), this won't work. You would need another level of caching between the I/O device and the cache (which you actually need anyway with our current code to make sure writes propagate). This is required so that you can mark the cache in between as top level and it won't try to send ownership of a block to the I/O device. Asserts have been added that should catch any issues. |
8133:9f704aa10eb4 |
17-Mar-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Fix unaligned stores when cache blocked
Without this change the a store can be issued to the cache multiple times. If this case occurs when the l1 cache is out of mshrs (and thus blocked) the processor will never make forward progress because each cycle it will send a single request using the recently freed mshr and not completing the multipart store. This will continue forever. |
8105:906864dd0937 |
02-Mar-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Spelling: Fix the a spelling error by changing mmaped to mmapped.
There may not be a formally correct spelling for the past tense of mmap, but mmapped is the spelling Google doesn't try to autocorrect. This makes sense because it mirrors the past tense of map->mapped and not the past tense of cape->caped. |
8090:722a0d28ee83 |
25-Feb-2011 |
Nilay Vaish<nilay@cs.wisc.edu> |
Ruby: Make DataBlock.hh independent of RubySystem This patch changes DataBlock.hh so that it is not dependent on RubySystem. This dependence seems unecessary. All those functions that depende on RubySystem have been moved to DataBlock.cc file. |
8089:4a59661d3fd1 |
25-Feb-2011 |
Timothy M. Jones <timothy.jones@cl.cam.ac.uk> |
O3CPU: Fix iqCount and lsqCount SMT fetch policies. Fixes two of the SMT fetch policies in O3CPU that were returning the count of instructions in the IQ or LSQ rather than the thread ID to fetch from. |
8081:2dfacb598d6d |
23-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: InstSeqNum bug Because int and not InstSeqNum was used in a couple of places, you can overflow the int type and thus get wierd bugs when the sequence number is negative (or some wierd value) |
8080:f35852d5788f |
23-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: dyn inst initialization remove constructors that werent being used (it just gets confusing) use initialization list for all the variables instead of relying on initVars() function |
8079:969e03f6c2ea |
23-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: cache packet handling -use a pointer to CacheReqPacket instead of PacketPtr so correct destructors get called on packet deletion - make sure to delete the packet if the cache blocks the sendTiming request or for some reason we dont use the packet - dont overwrite memory requests since in the worst case an instruction will be replaying a request so no need to keep allocating a new request - we dont use retryPkt so delete it - fetch code was split out already, so just assert that this is a memory reference inst. and that the staticInst is available |
8073:e154b9b8e366 |
23-Feb-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: When a prefetch causes a fault, don't record it in the inst |
8071:7bf6fccab013 |
23-Feb-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: If there is an outstanding table walk don't let the inst queue sleep.
If there is an outstanding table walk and no other activity in the CPU it can go to sleep and never wake up. This change makes the instruction queue always active if the CPU is waiting for a store to translate.
If Gabe changes the way this code works then the below should be removed as indicated by the todo. |
8068:749581c26e71 |
23-Feb-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Do something for ISB, DSB, DMB |
8067:21f14583aa6a |
23-Feb-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Fix bug that let two table walks occur in parallel. |
8064:5b111ae7e7d4 |
23-Feb-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Fix bug when a squash occurs right before TLB miss returns.
In this case we need to throw away the TLB miss, not assume it was the one we were waiting for. |
8053:e6ce478c05d3 |
22-Feb-2011 |
Brad Beckmann <Brad.Beckmann@amd.com> |
m5: merged in hammer fix |
8046:3ae037a196a2 |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: add names and slot #s to res. dprints |
8045:8b869a22e2f8 |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: ignore nops in execution unit |
8044:a8dc5e12ee36 |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: update graduation unit make sure instructions are able to commit before writing back to the RF do not commit more than 1 non-speculative instruction per cycle |
8043:5485da0578d1 |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: recognize isSerializeAfter flag keep track of when an instruction needs the execution behind it to be serialized. Without this, in SE Mode instructions can execute behind a system call exit(). |
8042:a03f0e3b41c5 |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: update default thread size(=1) a lot of structures get allocated based off that MaxThreads parameter so this is an effort to not abuse it |
8041:6f67329c0091 |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: don't overuse getLatency() resources don't need to call getLatency because the latency is already a member in the class. If there is some type of special case where different instructions impose a different latency inside a resource then we can revisit this and add getLatency() back in |
8040:bcb70863827d |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: update max. resource bandwidths each resource has a certain # of requests it can take per cycle. update the #s here to be more realistic based off of the pipeline width and if the resource needs to be accessed on multiple cycles |
8039:2c841ed4355e |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: cleanup in destructors cleanup hanging pointers and other cruft in the destructors |
8038:5a0ba3f96300 |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: fix cache/fetch unit memory leaks --- need to delete the cache request's data on clearRequest() now that we are recycling requests --- fetch unit needs to deallocate the fetch buffer blocks when they are replaced or squashed. |
8037:de10174cd496 |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: remove events for zero-cycle resources if a resource has a zero cycle latency (e.g. RegFile write), then dont allocate an event for it to use |
8036:0349aeab71bf |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: update pipeline interface for handling finished resource reqs formerly, to free up bandwidth in a resource, we could just change the pointer in that resource but at the same time the pipeline stages had visibility to see what happened to a resource request. Now that we are recycling these requests (to avoid too much dynamic allocation), we can't throw away the request too early or the pipeline stage gets bad information. Instead, mark when a request is done with the resource all together and then let the pipeline stage call back to the resource that it's time to free up the bandwidth for more instructions *** inteface notes *** - When an instruction completes and is done in a resource for that cycle, call done() - When an instruction fails and is done with a resource for that cycle, call done(false) - When an instruction completes, but isnt finished with a resource, call completed() - When an instruction fails, but isnt finished with a resource, call completed(false) * * * inorder: tlbmiss wakeup bug fix |
8035:de6e9c30ad87 |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: remove request map, use request vector take away all instances of reqMap in the code and make all references use the built-in request vectors inside of each resource. The request map was dynamically allocating a request per instruction. The request vector just allocates N number of requests during instantiation and then the surrounding code is fixed up to reuse those N requests *** setRequest() and clearRequest() are the new accessors needed to define a new request in a resource |
8034:143fa8eed0e5 |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: add valid bit for resource requests this will allow us to reuse resource requests within a resource instead of always dynamically allocating |
8033:6d4e6c81c22b |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: remove reqRemoveList we are going to be getting away from creating new resource requests for every instruction so no more need to keep track of a reqRemoveList and clean it up every tick |
8032:d536bebb511d |
18-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: initialize res. req. vectors based on resource bandwidth first change in an optimization that will stop InOrder from allocating new memory for every instruction's request to a resource. This gets expensive since every instruction needs to access ~10 requests before graduation. Instead, the plan is to allocate just enough resource request objects to satisfy each resource's bandwidth (e.g. the execution unit would need to allocate 3 resource request objects for a 1-issue pipeline since on any given cycle it could have 2 read requests and 1 write request) and then let the instructions contend and reuse those allocated requests. The end result is a smaller memory footprint for the InOrder model and increased simulation performance |
8031:96bde0910197 |
16-Feb-2011 |
Nathan Binkert <nate@binkert.org> |
merge alpha system files into tree |
7963:6d955240bb62 |
13-Feb-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Fetch from the microcode ROM when needed. |
7962:404170ece9a4 |
13-Feb-2011 |
Ali Saidi <saidi@eecs.umich.edu> |
O3: Fix GCC 4.2.4 complaint |
7959:7d2a5b524339 |
12-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: clean up the old way of inst. scheduling remove remnants of old way of instruction scheduling which dynamically allocated a new resource schedule for every instruction |
7958:9c040d644df1 |
12-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: utilize cached skeds in pipeline allow the pipeline and resources to use the cached instruction schedule and resource sked iterator |
7957:646800970f2f |
12-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: define iterator for resource schedules resource skeds are divided into two parts: front end (all insts) and back end (inst. specific) each of those are implemented as separate lists, so this iterator wraps around the traditional list iterator so that an instruction can walk it's schedule but seamlessly transfer from front end to back end when necessary |
7956:e42de3ac9f52 |
12-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: stage scheduler for front/back end schedule creation add a stage scheduler class to replace InstStage in pipeline_traits.cc use that class to define a default front-end, resource schedule that all instructions will follow. This will also replace the back end schedule in pipeline_traits.cc. The reason for adding this is so that we can cache instruction schedules in the future instead of calling the same function over/over again as well as constantly dynamically alllocating memory on every instruction to try to figure out it's schedule |
7955:682275767ec3 |
12-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: cache instruction schedules first step in a optimization to not dynamically allocate an instruction schedule for every instruction but rather used cached schedules |
7954:1da241308c1b |
12-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: comments for resource sked class |
7953:fb53bf178ba7 |
12-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: remove unused file inst_buffer file isn't used , so remove it |
7947:6d07db809a81 |
11-Feb-2011 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
O3: Fix pipeline restart when a table walk completes in the fetch stage.
When a table walk is initiated by the fetch stage, the CPU can potentially move to the idle state and never wake up.
The fetch stage must call cpu->wakeCPU() when a translation completes (in finishTranslation()). |
7945:32758425de8c |
11-Feb-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
SimpleCPU: Fix a case where a DTLB fault redirects fetch and an I-side walk occurs.
This change fixes an issue where a DTLB fault occurs and redirects fetch to handle the fault and the ITLB requires a walk which delays translation. In this case the status of the cpu isn't updated appropriately, and an additional instruction fetch occurs. Eventually this hits an assert as multiple instruction fetches are occuring in the system and when the second one returns the processor is in the wrong state.
Some asserts below are removed because it was always true (typo) and the state after the initiateAcc() the processor could be in any valid state when a d-side fault occurs. |
7944:1daf51f62013 |
11-Feb-2011 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
O3: Enhance data address translation by supporting hardware page table walkers.
Some ISAs (like ARM) relies on hardware page table walkers. For those ISAs, when a TLB miss occurs, initiateTranslation() can return with NoFault but with the translation unfinished.
Instructions experiencing a delayed translation due to a hardware page table walk are deferred until the translation completes and kept into the IQ. In order to keep track of them, the IQ has been augmented with a queue of the outstanding delayed memory instructions. When their translation completes, instructions are re-executed (only their initiateAccess() was already executed; their DTB translation is now skipped). The IEW stage has been modified to support such a 2-pass execution. |
7914:eee5bb0fb8ea |
07-Feb-2011 |
Brad Beckmann <Brad.Beckmann@amd.com> |
m5: added work completed monitoring support |
7911:267e1e16e51b |
07-Feb-2011 |
Joel Hestness <hestness@cs.utexas.edu> |
TimingSimpleCPU: split data sender state fix
In sendSplitData, keep a pointer to the senderState that may be updated after the call to handle*Packet. This way, if the receiver updates the packet senderState, it can still be accessed in sendSplitData. |
7897:d9e8b1fd1a9f |
07-Feb-2011 |
Joel Hestness <hestness@cs.utexas.edu> |
mcpat: Adds McPAT performance counters
Updated patches from Rick Strong's set that modify performance counters for McPAT |
7889:6fa135943891 |
04-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: fault handling Maintain all information about an instruction's fault in the DynInst object rather than any cpu-request object. Also, if there is a fault during the execution stage then just save the fault inside the instruction and trap once the instruction tries to graduate |
7888:1ffb11f8e8bb |
04-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: pcstate and delay slots bug not taken delay slots were not being advanced correctly to pc+8, so for those ISAs we 'advance()' the pcstate one more time for the desired effect |
7887:87a6f2ed585a |
04-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: add a fetch buffer to fetch unit Give fetch unit it's own parameterizable fetch buffer to read from. Very inefficient (architecturally and in simulation) to continually fetch at the granularity of the wordsize. As expected, the number of fetch memory requests drops dramatically |
7886:fa81553d67ea |
04-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: overload find-req fn no need to have separate function name findSplitRequest, just overload the function |
7885:303293e1517f |
04-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: implement separate fetch unit instead of having one cache-unit class be responsible for both data and code accesses, separate code that is just for fetch in it's own derived class off the original base class. This makes the code easier to manage as well as handle future cases of special fetch handling |
7884:6e810a479c3e |
04-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: cache port blocking set the request to false when the cache port blocks so we dont deadlock. also, comment out the outstanding address list sanity check for now. |
7883:ba84e1da98a7 |
04-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: stage width as a python parameter allow the user to specify how many instructions a pipeline stage can process on any given cycle (stageWidth...i.e.bandwidth) by setting the parameter through the python interface rather than compile the code after changing the *.cc file. (we always had the parameter there, but still used the static 'ThePipeline::StageWidth' instead) - Since StageWidth is now dynamically defined, change the interstage communication structure to use a vector and get rid of array and array handling index (toNextStageIndex) since we can just make calls to the list for the same information |
7882:9b559768152b |
04-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: multi-issue branch resolution Only execute (resolve) one branch per cycle because handling more than one is a little more complicated |
7881:87f4fd9a2760 |
04-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: pipe. stage inst. buffering use skidbuffer as only location for instructions between stages. before, we had the insts queue from the prior stage and the skidbuffer for the current stage, but that gets confusing and this consolidation helps when handling squash cases |
7880:c9286580867a |
04-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: change skidBuffer to list instead of queue manage insertion and deletion like a queue but will need access to internal elements for future changes Currently, skidbuffer manages any instruction that was in a stage but could not complete processing, however we will want to manage all blocked instructions (from prev stage and from cur. stage) in just one buffer. |
7879:3b4d595397fb |
04-Feb-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: activity tracking bug Previous code was marking CPU activity on almost every cycle due to a bug in tracking the status of pipeline stages. This disables the CPU from sleeping on long latency stalls and increases simulation time |
7878:d3e6ebcccabf |
04-Feb-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Fault: Rename sim/fault.hh to fault_fwd.hh to distinguish it from faults.hh. |
7876:189b9b258779 |
03-Feb-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Config: Keep track of uncached and cached ports separately.
This makes sure that the address ranges requested for caches and uncached ports don't conflict with each other, and that accesses which are always uncached (message signaled interrupts for instance) don't waste time passing through caches. |
7875:4afd05b9485e |
03-Feb-2011 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Fix a style bug in O3. |
7868:6029008db669 |
01-Feb-2011 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Add L1 caches for the TLB walkers.
Small L1 caches are connected to the TLB walkers when caches are used. This allows them to participate in the coherence protocol properly. |
7857:b2c7e56572a4 |
18-Jan-2011 |
Matt Horsnell <Matt.Horsnell@arm.com> |
O3: Fix some variable length instruction issues with the O3 CPU and ARM ISA. |
7856:d25827665112 |
18-Jan-2011 |
Matt Horsnell <Matt.Horsnell@arm.com> |
O3: Don't test misprediction on load instructions until executed. |
7855:c0be563517da |
18-Jan-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Keep around the last committed instruction and use for squashing.
Without this change 0 is always used for the youngest sequence number if a squash occured and the ROB was empty (E.g. an instruction is marked serializeAfter or a fetch stall prevents other instructions from issuing). Using 0 there is a race to rename where an instruction that committed the same cycle as the squashing instruction can have it's renamed state undone by the squash using sequence number 0. |
7854:3c6783497976 |
18-Jan-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Don't try to scoreboard misc registers.
I'm not positive this is the correct fix, but it's working right now. Either we need to do something like this, prevent the misc reg from being renamed at all, or there something else going on. We need to find the root cause as to why this is only a problem sometimes. |
7852:07ba4754ae0a |
18-Jan-2011 |
Matt.Horsnell <Matt.Horsnell@arm.com> |
O3: Fix corner cases where multiple squashes/fetch redirects overwrite timebuf. |
7851:bb38f0c47ade |
18-Jan-2011 |
Matt Horsnell <Matt.Horsnell@arm.com> |
O3: Fix mispredicts from non control instructions. The squash inside the fetch unit should not attempt to remove them from the branch predictor as non-control instructions are not pushed into the predictor. |
7850:02450f4443ce |
18-Jan-2011 |
Matt Horsnell <Matt.Horsnell@arm.com> |
O3: Fixes the way prefetches are handled inside the iew unit.
This patch prevents the prefetch being added to the instCommit queue twice. |
7849:2290428b5f04 |
18-Jan-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Support timing translations for O3 CPU fetch. |
7848:cc5e64f8423f |
18-Jan-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Add support for moving predicated false dest operands from sources. |
7847:0c6613ad8f18 |
18-Jan-2011 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
O3: Fixes fetch deadlock when the interrupt clears before CPU handles it.
When this condition occurs the cpu should restart the fetch stage to fetch from the original execution path. Fault handling in the commit stage is cleaned up a little bit so the control flow is simplier. Finally, if an instruction is being used to carry a fault it isn't executed, so the fault propagates appropriately. |
7833:c6bc8fe81e79 |
12-Jan-2011 |
Korey Sewell <ksewell@umich.edu> |
inorder: fix RUBY_FS build the current code was using incorrect dummy instruction in interrupts function |
7823:dac01f14f20f |
08-Jan-2011 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Replace curTick global variable with accessor functions. This step makes it easy to replace the accessor functions (which still access a global variable) with ones that access per-thread curTick values. |
7818:5d3dad7a1b36 |
08-Jan-2011 |
Steve Reinhardt <steve.reinhardt@amd.com> |
inorder: replace schedEvent() code with reschedule(). There were several copies of similar functions that looked like they all replicated reschedule(), so I replaced them with direct calls. Keeping this separate from the previous cset since there may be some subtle functional differences if the code ever reschedules an event that is scheduled but not squashed (though none were detected in the regressions). |
7817:94fdc8111d7b |
08-Jan-2011 |
Steve Reinhardt <steve.reinhardt@amd.com> |
inorder: get rid of references to mainEventQueue. Events need to be scheduled on the queue assigned to the SimObject, not on the global queue (which should be going away). Also cleaned up a number of redundant expressions that made the code unnecessarily verbose. |
7813:7338bc628489 |
03-Jan-2011 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Move sched_list.hh and timebuf.hh from src/base to src/cpu. These files really aren't general enough to belong in src/base. This patch doesn't reorder include lines, leaving them unsorted in many cases, but Nate's magic script will fix that up shortly. |
7811:a8fc35183c10 |
03-Jan-2011 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Make commenting on close namespace brackets consistent.
Ran all the source files through 'perl -pi' with this script:
s|\s*(};?\s*)?/\*\s*(end\s*)?namespace\s*(\S+)\s*\*/(\s*})?|} // namespace $3|; s|\s*};?\s*//\s*(end\s*)?namespace\s*(\S+)\s*|} // namespace $2\n|; s|\s*};?\s*//\s*(\S+)\s*namespace\s*|} // namespace $1\n|;
Also did a little manual editing on some of the arch/*/isa_traits.hh files and src/SConscript. |
7805:f249937228b5 |
23-Dec-2010 |
Nilay Vaish<nilay@cs.wisc.edu> |
This patch removes the WARN_* and ERROR_* from src/mem/ruby/common/Debug.hh file. These statements have been replaced with warn(), panic() and fatal() defined in src/base/misc.hh |
7804:42f343470ee3 |
22-Dec-2010 |
Steve Reinhardt <steve.reinhardt@amd.com> |
memtest: delete some crufty dead code |
7799:5d0f62927d75 |
20-Dec-2010 |
Gabe Black <gblack@eecs.umich.edu> |
Style: Replace some tabs with spaces. |
7786:bafa8a197088 |
07-Dec-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Allow a store entry to store up to 16 bytes (instead of TheISA::IntReg).
The store queue doesn't need to be ISA specific and architectures can frequently store more than an int registers worth of data. A 128 bits seems more common, but even 256 bits may be appropriate. Pretty much anything less than a cache line size is buildable. |
7784:e7649570ff3a |
07-Dec-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Support squashing all state after special instruction
For SPARC ASIs are added to the ExtMachInst. If the ASI is changed simply marking the instruction as Serializing isn't enough beacuse that only stops rename. This provides a mechanism to squash all the instructions and refetch them |
7783:9b880b40ac10 |
07-Dec-2010 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
O3: Make all instructions that write a misc. register not perform the write until commit.
ARM instructions updating cumulative flags (ARM FP exceptions and saturation flags) are not serialized.
Added aliases for ARM FP exceptions and saturation flags in FPSCR. Removed write accesses to the FP condition codes for most ARM VFP instructions: only VCMP and VCMPE instructions update the FP condition codes. Removed a potential cause of seg. faults in the O3 model for NEON memory macro-ops (ARM). |
7782:9b87755cb699 |
07-Dec-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
O3: Support SWAP and predicated loads/store in ARM. |
7781:a9f9eed35b18 |
07-Dec-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Support switchover with hardware table walkers |
7780:42da07116e12 |
01-Dec-2010 |
Nilay Vaish <nilay@cs.wisc.edu> |
ruby: Converted old ruby debug calls to M5 debug calls
This patch developed by Nilay Vaish converts all the old GEMS-style ruby debug calls to the appropriate M5 debug calls. |
7776:865e37d507c7 |
23-Nov-2010 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Loosen an assert for x86 and connect the APIC ports when caches are used. |
7768:cdb18c1b51ea |
19-Nov-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
SCons: Support building without an ISA |
7767:bf5377d8f5c1 |
18-Nov-2010 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Fix fp destination register flattening, and index offset adjusting.
This change makes O3 flatten floating point destination registers, and also fixes misc register flattening so that it's correctly repositioned relative to the resized regions for integer and floating point indices.
It also fixes some overly long lines. |
7764:03efcdc3421f |
15-Nov-2010 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Make O3 support variably lengthed instructions. |
7763:ff2213d13e58 |
15-Nov-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: reset architetural state by calling clear() |
7760:e93e7e0caae1 |
15-Nov-2010 |
Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> |
CPU/ARM: Add SIMD op classes to CPU models and ARM ISA. |
7758:28a677d7cb51 |
15-Nov-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
O3: prevent a squash when completeAcc() modifies misc reg through TC.
This happens on ARM instructions when they update the IT state bits. Code and associated comment was copied from execute() and initiateAcc() methods |
7756:846fb3ffe0dc |
15-Nov-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
SCons: Cleanup SCons output during compile |
7745:434b5dfb87d9 |
15-Nov-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
CPU: Fix bug when a split transaction is issued to a faster cache
In the case of a split transaction and a cache that is faster than a CPU we could get two responses before next_tick expires. Add an event that is scheduled in this case and return false rather than asserting. |
7725:00ea9430643b |
08-Nov-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM/Alpha/Cpu: Change prefetchs to be more like normal loads.
This change modifies the way prefetches work. They are now like normal loads that don't writeback a register. Previously prefetches were supposed to call prefetch() on the exection context, so they executed with execute() methods instead of initiateAcc() completeAcc(). The prefetch() methods for all the CPUs are blank, meaning that they get executed, but don't actually do anything.
On Alpha dead cache copy code was removed and prefetches are now normal ops. They count as executed operations, but still don't do anything and IsMemRef is not longer set on them.
On ARM IsDataPrefetch or IsInstructionPreftech is now set on all prefetch instructions. The timing simple CPU doesn't try to do anything special for prefetches now and they execute with the normal memory code path. |
7724:ba11187e2582 |
08-Nov-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Make all ARM uops delayed commit. |
7723:ee4ac00d0774 |
08-Nov-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
sim: Use forward declarations for ports.
Virtual ports need TLB data which means anything touching a file in the arch directory rebuilds any file that includes system.hh which in everything. |
7720:65d338a8dba4 |
31-Oct-2010 |
Gabe Black <gblack@eecs.umich.edu> |
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.
This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way.
PC type:
Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM.
These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later.
Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses.
Advancing the PC:
The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements.
One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now.
Variable length instructions:
To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back.
ISA parser:
To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out.
Return address stack:
The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works.
Change in stats:
There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS.
TODO:
Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. |
7717:f166f8bd8818 |
24-Oct-2010 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Get rid of a bunch of commented out lines. |
7699:addb847910d2 |
04-Oct-2010 |
Gabe Black <gblack@eecs.umich.edu> |
Alpha: Fix Alpha NumMiscArchRegs constant.
Also add asserts in O3's Scoreboard class to catch bad indexes. |
7691:358c00c482f7 |
30-Sep-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
CPU/Cache: Fix some errors exposed by valgrind |
7684:ce48527a3edb |
20-Sep-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Fix O3 and possible InOrder segfaults in FS. |
7680:f4eda002333b |
14-Sep-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Trim unnecessary includes from some common files.
This reduces the scope of those includes and makes it less likely for there to be a dependency loop. This also moves the hashing functions associated with ExtMachInst objects to be with the ExtMachInst definitions and out of utility.hh. |
7679:f26cc2c68b48 |
14-Sep-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Get rid of the now unnecessary getInst/setInst family of functions.
This code is no longer needed because of the preceeding change which adds a StaticInstPtr parameter to the fault's invoke method, obviating the only use for this pair of functions. |
7678:f19b6a3a8cec |
13-Sep-2010 |
Gabe Black <gblack@eecs.umich.edu> |
Faults: Pass the StaticInst involved, if any, to a Fault's invoke method.
Also move the "Fault" reference counted pointer type into a separate file, sim/fault.hh. It would be better to name this less similarly to sim/faults.hh to reduce confusion, but fault.hh matches the name of the type. We could change Fault to FaultPtr to match other pointer types, and then changing the name of the file would make more sense. |
7676:92274350b953 |
10-Sep-2010 |
Nathan Binkert <nate@binkert.org> |
style: fix sorting of includes and whitespace in some files |
7664:487916d36377 |
31-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Get rid of the unused ev5_trap function on the simple and checker CPUs. |
7657:4552d30af57f |
26-Aug-2010 |
Steve Reinhardt <steve.reinhardt@amd.com> |
memtest: fix/cleanup functional access testing Don't assert that the response packet is marked as a response since it won't always be so for functional accesses.
Also cleanup code to refer to functional accesses rather than "probes" (old terminology), and mention in the DPRINTF which type of access we're doing. |
7655:8bce423f2075 |
25-Aug-2010 |
Ali Saidi <ali.saidi@arm.com> |
CPU: Print out traces for faluting inst when the flag ExecFaulting is set |
7649:a6a6177a5ffa |
25-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
ARM: Fixed register flattening logic (FP_Base_DepTag was set too low)
When decoding a srs instruction, invalid mode encoding returns invalid instruction. This can happen when garbage instructions are fetched from mispredicted path |
7632:acf43d6bbc18 |
24-Aug-2010 |
Brad Beckmann <Brad.Beckmann@amd.com> |
testers: move testers to a new directory
This patch moves the testers to a new subdirectory under src/cpu and includes the necessary fixes to work with latest m5 initialization patches. |
7627:3b0c4b819651 |
23-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
ISA: Get rid of old, unused utility functions cluttering up the ISAs. |
7619:0a32de653c10 |
23-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Make the constants for StaticInst flags visible outside the class. |
7616:1a0ab2308bbe |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
O3: Skipping mem-order violation check for uncachable loads. Uncachable load is not executed until it reaches the head of the ROB, hence cannot cause one. |
7615:50f6494d9b55 |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
ARM: Improve printing of uop disassembly. |
7601:bf0aa77f8908 |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
CPU: Print out flatten-out register index as with IntRegs/FloatRegs traceflag |
7600:eff7f79f7dfd |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
CPU: Make Exec trace to print predication result (if false) for memory instructions |
7599:f6bbf266f2c8 |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
ARM: mark msr/mrs instructions as SerializeBefore/After Since miscellaneous registers bypass wakeup logic, force serialization to resolve data dependencies through them * * * ARM: adding non-speculative/serialize flags for instructions change CPSR |
7598:c0ae58952ed0 |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
O3: Handle loads when the destination is the PC. For loads that PC is the destination, check if the load was mispredicted again when the value being loaded returns from memory |
7597:063f160e8b50 |
23-Aug-2010 |
Min Kyu Jeong <minkyu.jeong@arm.com> |
ARM/O3: store the result of the predicate evaluation in DynInst or Threadstate. THis allows the CPU to handle predicated-false instructions accordingly. This particular patch makes loads that are predicated-false to be sent straight to the commit stage directly, not waiting for return of the data that was never requested since it was predicated-false. |
7577:056a88043835 |
23-Aug-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
CPU: Set a default value when readBytes faults.
This was being done in read(), but if readBytes was called directly it wouldn't happen. Also, instead of setting the memory blob being read to -1 which would (I believe) require using memset with -1 as a parameter, this now uses bzero. It's hoped that it's more specialized behavior will make it slightly faster. |
7568:f895258c9121 |
20-Aug-2010 |
Brad Beckmann <Brad.Beckmann@amd.com> |
ruby: Fixed minor bug in ruby test for setting the request type |
7553:fcdd99057b8a |
20-Aug-2010 |
Brad Beckmann <Brad.Beckmann@amd.com> |
ruby: Resurrected Ruby's deterministic tests
Added the request series and invalidate deterministic tests as new cpu models and removed the no longer needed ruby tests |
7544:90c5eb6a5e66 |
20-Aug-2010 |
Brad Beckmann <Brad.Beckmann@amd.com> |
memtest: Memtester support for DMA
This patch adds DMA testing to the Memtester and is inherits many changes from Polina's old tester_dma_extension patch. Since Ruby does not work in atomic mode, the atomic mode options are removed. |
7522:84fd1726290d |
14-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
Inorder: Fix compilation of m5.fast.
printMemData is only used in DPRINTFs. If those are removed by compiling m5.fast, that function is unused, gcc generates a warning, that gets turned into an error, and the build fails. This change surrounds the function definition with #if TRACING_ON so it only gets compiled in if the DPRINTFs do to. |
7521:3c48b2b3cb83 |
13-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head. |
7520:67c670459d01 |
13-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Add readBytes and writeBytes functions to the exec contexts. |
7519:28f052c55332 |
13-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
InOrder: Clean up some DPRINTFs that print data sent to/from the cache. |
7518:917208416d2a |
13-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Tidy up endianness handling for mmapped "IPR"s. |
7516:cfbbc9178e7a |
12-Aug-2010 |
Joel Hestness <hestness@cs.utexas.edu> |
TimingSimpleCPU: fix NO_ACCESS memory op handling
When a request is NO_ACCESS (x86 CDA microinstruction), the memory op doesn't go to the cache, so TimingSimpleCPU::completeDataAccess needs to handle the case where the current status of the CPU is Running and not DcacheWaitResponse or DTBWaitResponse |
7511:bd104adbf04d |
22-Jul-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
LSQ Unit: After deleting part of a split request, set it to NULL so that it isn't accidentally deleted again later (causing a segmentation fault). |
7509:3bd51d6ac9ef |
22-Jul-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
O3CPU: Fix a bug where stores in the cpu where never marked as split. |
7507:b1ac6773e83d |
22-Jul-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
O3CPU: O3's tick event gets squashed when it is switched out. When repeatedly switching between O3 and another CPU, O3's tick event might still be scheduled in the event queue (as squashed). Therefore, check for a squashed tick event as well as a non-scheduled event when taking over from another CPU and deal with it accordingly. |
7485:b285acfcd797 |
28-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: remove another debug stat |
7484:5044bb906d5a |
26-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: remove debugging stat m5 doesnt do stats specific to binary and this resource request stat is probably only useful for people who really know the ins/outs of the model anyway |
7482:cd42a32dc8b6 |
25-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: Return Address Stack bug the nextPC was getting sent to the branch predictor not the current PC, so the RAS was returning the wrong PC and mispredicting everything. |
7481:b10c66a125f6 |
25-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: resource scheduling backend replace priority queue with vector of lists(1 list per stage) and place inside a class so that we have more control of when an instruction uses a particular schedule entry ... also, this is the 1st step toward making the InOrderCPU fully parameterizable. See the wiki for details on this process |
7478:69d054e9e61c |
24-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: cleanup virtual functions remove the annotation 'virtual' from function declaration that isnt being derived from |
7477:97bb8e7068d3 |
24-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: enforce 78-character rule |
7476:207e034f6bb2 |
24-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: exe_unit_stats for resolved branches |
7475:957eb55da9de |
23-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: squash from memory stall this applies to multithreading models which would like to squash a thread on memory stall |
7473:0466c41f63cd |
23-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: record load/store trace data |
7472:4d26f7b5815c |
23-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: update branch predictor - use InOrderBPred instead of Resource for DPRINTFs - account for DELAY SLOT in updating RAS and in squashing - don't let squashed instructions update the predictor - the BTB needs to use the ASID not the TID to work for multithreaded programs - add stats for BTB hits |
7471:128cce9f92bc |
23-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder-stats: add instruction type stats also, remove inst-req stats as default.good for debugging but in terms of pure processor stats they aren't useful |
7470:e3311623d9f0 |
23-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: stall signal handling remove stall only when necessary add debugging printfs |
7469:88cc2dc9472c |
23-Jun-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: tick scheduling use nextCycle to calculate ticks after addition |
7467:91994f36de7f |
22-Jun-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
O3ThreadContext: When taking over from a previous context, only assert that the system pointers match in Full System mode. |
7460:41550bb10e08 |
15-Jun-2010 |
Nathan Binkert <nate@binkert.org> |
stats: get rid of the never-really-used event stuff |
7455:586f99bf0dc4 |
11-Jun-2010 |
Nathan Binkert <nate@binkert.org> |
ruby: get rid of the Map class |
7454:3a3e8e8cce1b |
11-Jun-2010 |
Nathan Binkert <nate@binkert.org> |
ruby: get rid of Vector and use STL add a couple of helper functions to base for deleteing all pointers in a container and outputting containers to a stream |
7445:dfd04ffc1773 |
03-Jun-2010 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Minor remote GDB cleanup. Expand the help text on the --remote-gdb-port option so people know you can use it to disable remote gdb without reading the source code, and thus don't waste any time trying to add a separate option to do that. Clean up some gdb-related cruft I found while looking for where one would add a gdb disable option, before I found the comment that told me that I didn't need to do that. |
7408:ee6949c5bb5b |
02-Jun-2010 |
Gabe Black <gblack@eecs.umich.edu> |
ARM: Implement support for the IT instruction and the ITSTATE bits of CPSR. |
7404:bfc74724914e |
02-Jun-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Implement the ARM TLB/Tablewalker. Needs performance improvements. |
7400:f6c9b27c4dbe |
02-Jun-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Implement ARM CPU interrupts |
7349:8b4564729c81 |
02-Jun-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
ARM: Move PC mode bits around so they can be used for exectrace |
7341:95404ec156de |
02-Jun-2010 |
Gabe Black <gblack@eecs.umich.edu> |
Simple CPU: Make the FloatRegs trace flag do something. |
7338:0d6c08d25fe7 |
02-Jun-2010 |
Ali Saidi <Ali.Saidi@ARM.com> |
CPU: Reset fetch offset after a exception |
7100:3467916569e3 |
02-Jun-2010 |
Gabe Black <gblack@eecs.umich.edu> |
ARM: Make the predecoder handle Thumb instructions. |
7082:070529b41c1e |
13-May-2010 |
Maximilien Breughe <Maximilien.Breughe@elis.ugent.be> |
BPRED: Fixed the treshold-bug in the tournament predictor.
Suppose the saturating counters of a branch predictor contain n bits. When the counter is between 0 and (2^(n-1) - 1), boundaries included, the branch is predicted as not taken. When the counter is between 2^(n-1) and (2^n - 1), boundaries included, the branch is predicted as taken. |
7064:586b0e3a12b3 |
15-Apr-2010 |
Nathan Binkert <nate@binkert.org> |
tick: rename Clock namespace to SimClock |
7061:c9b1a0ed2311 |
10-Apr-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: timing for inst forwarding when insts execute, they mark the time they finish to be used for subsequent isnts they may need forwarding of data. However, the regdepmap was using the wrong value to index into the destination operands of the instruction to be forwarded. Thus, in some cases, we are checking to see if the 3rd destination register for an instruction is executed at a certain time, when there is only 1 dest. register valid. Thus, we get a bad, uninitialized time value that will stall forwarding causing performance loss but still the correct execution. |
7056:b66b558578bd |
02-Apr-2010 |
Nathan Binkert <nate@binkert.org> |
ruby: get rid of gems_common/util.hh and .cc and use stuff in src/base |
7055:4e24742201d7 |
02-Apr-2010 |
Nathan Binkert <nate@binkert.org> |
ruby: get "using namespace" out of headers In addition to obvious changes, this required a slight change to the slicc grammar to allow types with :: in them. Otherwise slicc barfs on std::string which we need for the headers that slicc generates. |
7053:28cb3b80435c |
29-Mar-2010 |
Nathan Binkert <nate@binkert.org> |
style: cleanup the Ruby Tester |
7050:4524f8f80973 |
27-Mar-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: write-hints bug fix make sure to only read 1 src reg. for write-hint and any other similar 'store' instruction. Reading the source reg when its not necessary can cause the simulator to read from uninitialized values |
7049:a06e95c99294 |
24-Mar-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
CPU: Added comments to address translation classes. |
7046:d21d575a6f99 |
23-Mar-2010 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu: get rid of uncached access "events" These recordEvent() calls could cause crashes since they access the req pointer after it's potentially been deleted during a failed translation call. (Similar problem to the traceData bug fixed in the previous cset.)
Moving them above the translation call (as was done recentlyi in cset 8b2b8e5e7d35) avoids the crash but doesn't work, since at that point we don't know if the access is uncached or not.
It's not clear why these calls are there, and no one seems to use them, so we'll just delete them. If they are needed, they should be moved to somewhere that's guaranteed to be after the translation completes but before the request is possibly deleted, e.g., in finishTranslation(). |
7045:e21fe6a62b1c |
23-Mar-2010 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu: fix exec tracing memory corruption bug Accessing traceData (to call setAddress() and/or setData()) after initiating a timing translation was causing crashes, since a failed translation could delete the traceData object before returning.
It turns out that there was never a need to access traceData after initiating the translation, as the traced data was always available earlier; this ordering was merely historical. Furthermore, traceData->setAddress() and traceData->setData() were being called both from the CPU model and the ISA definition, often redundantly.
This patch standardizes all setAddress and setData calls for memory instructions to be in the CPU models and not in the ISA definition. It also moves those calls above the translation calls to eliminate the crashes. |
7038:5ae66d5f5ca2 |
22-Mar-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: import name for addtl. bpred stats |
7037:c207d418514e |
22-Mar-2010 |
Maximilien.Breughe@elis.ugent.be |
inorder: fix squash bug in branch predictor |
7036:7739d67ca64f |
22-Mar-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: fix address list bug |
7016:8b2b8e5e7d35 |
22-Mar-2010 |
Brad Beckmann <Brad.Beckmann@amd.com> |
TimingSimpleCPU: Fixed uncacacheable request read bug
Previously the recording of an uncached read occurred after the request was possibly deleted within the translateTiming function. |
7002:48a19d52d939 |
10-Mar-2010 |
Nathan Binkert <nate@binkert.org> |
ruby: get rid of std-includes.hh Do not use "using namespace std;" in headers Include header files as needed |
6994:c6951099a1cb |
26-Feb-2010 |
Nathan Binkert <nate@binkert.org> |
cpu_models: get rid of cpu_models.py and move the stuff into SCons |
6975:862a31349d43 |
20-Feb-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
BaseDynInst: Preserve the faults returned from read and write.
When implementing timing address translations instead of atomic, I forgot to preserve the faults that are returned from the read and write calls. This patch reinstates them. |
6974:4d4903a3e7c5 |
12-Feb-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
O3PCU: Split loads and stores that cross cache line boundaries.
When each load or store is sent to the LSQ, we check whether it will cross a cache line boundary and, if so, split it in two. This creates two TLB translations and two memory requests. Care has to be taken if the first packet of a split load is sent but the second blocks the cache. Similarly, for a store, if the first packet cannot be sent, we must store the second one somewhere to retry later.
This modifies the LSQSenderState class to record both packets in a split load or store.
Finally, a new const variable, HasUnalignedMemAcc, is added to each ISA to indicate whether unaligned memory accesses are allowed. This is used throughout the changed code so that compiler can optimise away code dealing with split requests for ISAs that don't need them. |
6973:a123bd350935 |
12-Feb-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
BaseDynInst: Make the TLB translation timing instead of atomic.
This initiates a timing translation and passes the read or write on to the processor before waiting for it to finish. Once the translation is finished, the instruction's state is updated via the 'finish' function. A new DataTranslation class is created to handle this.
The idea is taken from the implementation of timing translations in TimingSimpleCPU by Gabe Black. This patch also separates out the timing translations from this CPU and uses the new DataTranslation class. |
6960:2b656c4a5770 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: double delete inst bug Make sure that instructions are dereferenced/deleted twice by marking they are on the remove list |
6959:7b99564233cd |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: inst count mgmt |
6958:de51ab31b456 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: implement split stores |
6957:88555fd4d220 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: implement split loads |
6956:31ae0245e4ac |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: add activity stats |
6955:632ad41ac489 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: object cleanup in destructors |
6954:6327f5071027 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: user per-thread dummy insts/reqs |
6953:f44ba2f42b5c |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: add execution unit stats |
6952:df2a5f076618 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: recvRetry bug fix - on certain retry requests you can get an assertion failure - fix by allowing the request to literally "Retry" itself if it wasnt successful before, and then block any requests through cache port while waiting for the cache to be made available for access |
6951:c450ad6c82f4 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder-stats: add prereq to basic stat only show requests processed when the resource is actually in use |
6950:96b33f6f9b7d |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: ctxt switch stats - m5 line enforcement on use_def.cc,hh |
6949:16d59c27bd01 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: pipeline stage stats add idle/run/utilization stats for each pipeline stage |
6948:7eb151d3881f |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: enforce stage bandwidth each stage keeps track of insts_processed on a per_thread basis but we should be keeping that on a total basis inorder to enforce stage width limits |
6947:862f3d824be7 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: set thread status' set Active/Suspended/Halted status for threads. useful for system when determining if/when to exit simulation |
6946:e350ae2a5018 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: add/remove halt/deallocate context respectively Halt is called from the exit() system call while deallocate is unused. So to clear up things, just use halt and remove deallocate. |
6945:8ae78a9733b0 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: track last branch committed when threads are switching in/out the CPU, we need to keep track of special cases like branches. Add appropriate variables in ThreadState t track this and then use these variables when updating pc after context switch |
6944:0b8c6a579218 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: add updatePC event to resPool this will be used for when a thread comes back from a cache miss, it needs to update the PCs because the inst might of been a branch or delayslot in which the next PC isnt always a straight addition |
6943:17801e070302 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: ready thread wakeup allow a thread to wakeup and be activated after it has been in suspended state and another thread is switched out. Need to give pipeline stages a "activateThread" function so that can get to their suspended instruction when the time is right. |
6942:6e0d37136836 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: add threadmodel flag this prints out messages relative to what threading model is being used (smt, switch-on-miss, single, etc.) |
6941:4eb9b3a25a6c |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: mem. mgmt. update update address List and address Map to take into account multiple threads |
6940:c7e00670d83e |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: suspend in respool give resources their own specific activity to do for a "suspend" event instead of defaulting to deactivating the thread for a suspend thread event. This really matters for the fetch sequence unit which wants to remove the thread from fetching while other units want to ignore a thread suspension. If you deactivate a thread in a resource then you may lose some of the allotted bandwidth that the thread is taking up... |
6939:4dacc68fb1f3 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: fetch thread bug dont check total # of threads but instead all active threads |
6938:ac9a5e69ba31 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: ready/suspend status fns update/add in the use of isThreadReady & isThreadSuspended functions.Check in activateThread what list a thread is on so it can be managed accordingly. |
6937:444dc5183fb4 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder-cleanup: remove unused thread functions |
6936:099ca6d9fc03 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: activate thread on cache miss -Support ability to activate next ready thread after a cache miss through the activateNextReadyContext/Thread() functions -To support this a "readyList" of thread ids is added -After a cache miss, thread will suspend and then call activitynextreadythread |
6935:d807273f17c0 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: add event priority offset allow for events to schedule themselves later if desired. this is important because of cases like where you need to activate a thread only after the previous thread has been deactivated. The ordering there has to be enforced |
6934:edf3d0c7a485 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: squash on memory stall add code to recognize memory stalls in resources and the pipeline as well as squash a thread if there is a stall and we are in the switch on cache miss model |
6933:fe210e4ce76d |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: add insts to cpu event some events are going to need instruction data when they process, so just include the instruction in the event construction |
6932:02562dac0416 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: switch out buffer add buffer for instructions to switch out to in a pipeline stage can't squash the instruction and remove the pipeline so we kind of need to 'suspend' an instruction at the stage while the memory stall resolves for the switch on cache miss model |
6931:9b37243a6568 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: dont allow early loads - loads were happening on same cycle as the address was generated which is slightly unrealistic. Instead, force address generation to be on separate cycle from load initiation - also, mark the stages in a more traditional way (F-D-X-M-W) |
6930:6d7f25432d1c |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
configs/inorder: add options for switch-on-miss to inorder cpu |
6929:0cf7d56ab5d7 |
31-Jan-2010 |
Korey Sewell <ksewell@umich.edu> |
inorder: init internal debug cpu counters - cpuEventNum - resReqCount |
6899:f8057af86bf7 |
29-Jan-2010 |
Brad Beckmann <Brad.Beckmann@amd.com> |
ruby: added the GEMS ruby tester |
6816:6f8efbef2300 |
12-Jan-2010 |
Lisa Hsu <Lisa.Hsu@amd.com> |
since totalInstructions() is impl'ed by all the cpus, make it an abstract base class. |
6775:db802ee94eb6 |
18-Nov-2009 |
Brad Beckmann <Brad.Beckmann@amd.com> |
m5: Fixed bug in atomic cpu destructor |
6739:48d10ba361c9 |
11-Nov-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Mem: Eliminate the NO_FAULT request flag. |
6712:b95abe00dd9d |
04-Nov-2009 |
Nathan Binkert <nate@binkert.org> |
build: fix compile problems pointed out by gcc 4.4 |
6711:c79d72abdbe5 |
04-Nov-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
o3: get rid of unused physmem pointer |
6691:cd68b6ecd68d |
27-Oct-2009 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
POWER: Add support for the Power ISA
This adds support for the 32-bit, big endian Power ISA. This supports both integer and floating point instructions based on the Power ISA Book I v2.06. |
6678:34191eea18c1 |
17-Oct-2009 |
Gabe Black <gblack@eecs.umich.edu> |
ISA: Fix compilation. |
6677:b741b3e7164b |
15-Oct-2009 |
Brad Beckmann <Brad.Beckmann@amd.com> |
fixed MC146818 checkpointing bug and added isa serialization calls to simple_thread |
6671:71b42be12ccd |
01-Oct-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-debug: print out workload |
6670:81e6aa93bc6a |
29-Sep-2009 |
Lisa Hsu <hsul@eecs.umich.edu> |
commit Soumyaroop's bug catch about max_insts_all_threads |
6667:8b5bc1a777bc |
26-Sep-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
O3: Add flag to control whether faulting instructions are traced. When enabled, faulting instructions appear in the trace twice (once when they fault and again when they're re-executed). This flag is set by the Exec compound flag for backwards compatibility. |
6664:4df6f4bd36cd |
26-Sep-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
O3: Mark fetch stage as active if it faults. Otherwise if the rest of the pipeline is idle then fault will never propagate to commit to be handled, causing CPU to deadlock. |
6663:0c35aaa631ea |
25-Sep-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-debug: fix cpu tick debug message |
6658:f4de76601762 |
23-Sep-2009 |
Nathan Binkert <nate@binkert.org> |
arch: nuke arch/isa_specific.hh and move stuff to generated config/the_isa.hh |
6654:4c84e771cca7 |
22-Sep-2009 |
Nathan Binkert <nate@binkert.org> |
python: Move more code into m5.util allow SCons to use that code. Get rid of misc.py and just stick misc things in __init__.py Move utility functions out of SCons files and into m5.util Move utility type stuff from m5/__init__.py to m5/util/__init__.py Remove buildEnv from m5 and allow access only from m5.defines Rename AddToPath to addToPath while we're moving it to m5.util Rename read_command to readCommand while we're moving it Rename compare_versions to compareVersions while we're moving it. |
6649:b4907f87b96a |
17-Sep-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-mdu: multiplier latency fix mdu was workign incorrectly for 4+ latency due to incorrectly assuming multiply was finished the next stage |
6643:0f7957bb4450 |
16-Sep-2009 |
sroy@cse.usf.edu |
inorder-smt: remove hardcoded values allows for the 2T hello world example to work in inorder model |
6637:cd671122f09c |
15-Sep-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-alpha-fs: edit inorder model to compile FS mode |
6629:dad8671f8769 |
01-Sep-2009 |
pdudnik@gmail.com |
SCons fix to always make MemTest object |
6623:f7abbfd5a79f |
23-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Atomic CPU: Respect the NO_ACCESS request flag. |
6429:7ed8937e375a |
02-Aug-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Fix setting of INST_FETCH flag for O3 CPU. It's still broken in inorder. Also enhance DPRINTFs in cache and physical memory so we can see more easily whether it's getting set or not. |
6418:4836ec6b73a1 |
29-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Simple CPU: Make the simple CPU handle the IntRegs trace flag. |
6409:6eaa041d043e |
27-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
ARM: Make native trace print out what instruction caused an error. |
6387:70172be3f986 |
25-Jul-2009 |
Korey Sewell <ksewell@umich.edu> |
o3-smt: enforce numThreads parameter for SMT SE mode |
6365:a3037fa327a0 |
20-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Separate out native trace into ISA (in)dependent code and SimObjects. |
6331:d947798df4a1 |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Get rid of the unused get(Data|Inst)Asid and (inst|data)Asid functions. |
6329:5d8b91875859 |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Add a registers.hh file as an ISA switched header. This file is for register indices, Num* constants, and register types. copyRegs and copyMiscRegs were moved to utility.hh and utility.cc. |
6326:008930a4ace5 |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Eliminate the ISA defined RegFile class. |
6324:a535b2232c08 |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Move the PCs out of the ISAs and into the CPUs. |
6323:fd0f91f067d2 |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
ARM, Simple CPU: Fix an index and add assert checks. |
6316:51f3026d4cbb |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Eliminate the ISA defined integer register file. |
6315:c7295a4826d5 |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Eliminate the ISA defined floating point register file. |
6314:781969fbeca9 |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Get rid of the float register width parameter. |
6313:95f69a436c82 |
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Add an ISA object which replaces the MiscRegFile. This object encapsulates (or will eventually) the identity and characteristics of the ISA in the CPU. |
6227:a17798f2a52c |
05-Jun-2009 |
Nathan Binkert <nate@binkert.org> |
types: clean up types, especially signed vs unsigned |
6226:f1076450ab2b |
05-Jun-2009 |
Nathan Binkert <nate@binkert.org> |
move: put predictor includes and cc files into the same place |
6221:58a3c04e6344 |
26-May-2009 |
Nathan Binkert <nate@binkert.org> |
types: add a type for thread IDs and try to use it everywhere |
6216:2f4020838149 |
17-May-2009 |
Nathan Binkert <nate@binkert.org> |
includes: sort includes again |
6214:1ec0ec8933ae |
17-May-2009 |
Nathan Binkert <nate@binkert.org> |
types: Move stuff for global types into src/base/types.hh |
6199:1c6a17f46228 |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
cpus: add InOrderCPU to default build regressions need this so they build the model |
6193:50668b97c086 |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-resources: delete events make sure unrecognized events in the resource pool are deleted and also delete resource events in destructor |
6192:6cd5f0282d8a |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-tlb-cunit: merge the TLB as implicit to any memory access TLBUnit no longer used and we also get rid of memAccSize and memAccFlags functions added to ISA and StaticInst since TLB is not a separate resource to acquire. Instead, TLB access is done before any read/write to memory and the result is checked before it's sent out to memory. * * * |
6191:2afc0eae6099 |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-tlb: squash insts in TLB correctly TLB had a bug where if it was stalled and waiting , it would not squash all instructions older than squashed instruction correctly * * * |
6190:55e837d741fa |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-faults: ignore unalign translation faults for prefetches |
6189:a5334d8c6683 |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-stc: update interface to handle store conditionals |
6188:bfb323a1c559 |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-float: Fix storage of FP results inorder was incorrectly storing FP values and confusing the integer/fp storage view of floating point operations. A big issue was knowing trying to infer when were doing single or double precision access because this lets you know the size of value to store (32-64 bits). This isnt exactly straightforward since alpha uses all 64-bit regs while mips/sparc uses a dual-reg view. by getting this value from the actual floating point register file, the model can figure out what it needs to store |
6187:95db3316a14b |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-fetch: update model to use predecoder |
6186:761e0f61a167 |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-mem: clean up allocation/deletion of requests/packets * * * |
6185:9925b3e83e06 |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-mem: skeleton support for prefetch/writehints |
6184:c947586b3d9e |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-o3: allow both to compile together allow InOrder and O3CPU to be compiled at the same time: need to make branch prediction filed shared by both models |
6183:a008609e0abc |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-unified-tlb: use unified TLB instead of old TLB model |
6182:f51edf04e4a1 |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-miscregs: Fix indexing for misc. reg operands and update result-types for better tracing of these types of values |
6181:19fedb1e5ded |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder/alpha-isa: create eaComp object visible to StaticInst through ISA Remove subinstructions eaComp/memAcc since unused in CPU Models. Instead, create eaComp that is visible from StaticInst object. Gives InOrder model capability of generating address without actually initiating access * * * |
6180:1a8950d566ff |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-bpred: edits to handle non-delay-slot ISAs Changes so that InOrder can work for a non-delay-slot ISA like Alpha. Typically, changes have to do with handling misspeculated branches at different points in pipeline |
6179:83693f4b79fd |
12-May-2009 |
Korey Sewell <ksewell@umich.edu> |
inorder-alpha-port: initial inorder support of ALPHA Edit AlphaISA to support the inorder model. Mostly alternate constructor functions and also a few skeleton multithreaded support functions * * * Remove namespace from header file. Causes compiler issues that are hard to find * * * Separate the TLB from the CPU and allow it to live in the TLBUnit resource. Give CPU accessor functions for access and also bind at construction time * * * Expose memory access size and flags through instruction object (temporarily memAccSize and memFlags to get TLB stuff working.) |
6174:7e5c7412ac89 |
05-May-2009 |
Korey Sewell <ksewell@umich.edu> |
cpus: fix cpu progress event this was double scheduling itself (once in constructor and once in cpu code). also add support for stopping / starting progress events through repeatEvent flag and also changing the interval of the progress event as well |
6144:e330f7bc22ef |
05-May-2009 |
Korey Sewell <ksewell@umich.edu> |
cpus: fix cpu progress event this was double scheduling itself (once in constructor and once in cpu code). also add support for stopping / starting progress events through repeatEvent flag and also changing the interval of the progress event as well |
6116:a5a97b04d796 |
21-Apr-2009 |
Nathan Binkert <nate@binkert.org> |
arm: Unify the ARM tlb. We forgot about this when we did the rest. This code compiles, but there are no tests still |
6105:a27c0934de24 |
20-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
request: rename INST_READ to INST_FETCH. |
6102:7fbf97dc6540 |
20-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Mem: Change isLlsc to isLLSC. |
6078:aae5ac55c749 |
19-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
CPUs: Make the atomic CPU support locked memory accesses. |
6076:e141cc7896ce |
19-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Memory: Rename LOCKED for load locked store conditional to LLSC. |
6043:19852407f5c9 |
19-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: If the simple CPU is already idle, just return from suspendContext, don't assert. |
6036:f0841ee466a5 |
18-Apr-2009 |
Korey Sewell <ksewell@umich.edu> |
o3-delay-slot-bpred: fix decode stage handling of uncdtl. branches.\n decode stage was not setting the predicted PC correctly or passing that information back to fetch correctly |
6034:fc2e234b4404 |
17-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
o3, inorder: fix FS bug due to initializing ThreadState to Halted. For some reason o3 FS init() only called initCPU if the thread state was Suspended, which was no longer the case. There's no apparent reason to check, so I whacked the test completely rather than changing the check to Halted. The inorder init() was also updated to be symmetric, though the previous code was just a fancy no-op. |
6033:f1a9f7f6e7c6 |
16-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
o3: handle fetch with no active threads correctly. This situation can arise now on the first fetch cycle after the last active thread is halted. It seems easy enough to deal with when it happens rather than trying to avoid it. |
6032:e5c792a67b3d |
16-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
o3: fix {read,set}ArchFloatReg* functions. Register indices were not being calculated properly. |
6031:be16ad28822f |
15-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
ThreadState: initialize status to Halted in constructor. This provides a common initial status for all threads independent of CPU model (unlike the prior situation where CPUs initialized threads to inconsistent states). This mostly matters for SE mode; in FS mode, ISA-specific startupCPU() methods generally handle boot-time initialization of thread contexts (since the right thing to do is ISA-dependent). |
6029:007c36616f47 |
15-Apr-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Get rid of the Unallocated thread context state. Basically merge it in with Halted. Also had to get rid of a few other functions that called ThreadContext::deallocate(), including: - InOrderCPU's setThreadRescheduleCondition. - ThreadContext::exit(). This function was there to avoid terminating simulation when one thread out of a multi-thread workload exits, but we need to find a better (non-cpu-centric) way. |
6023:47b4fcb10c11 |
09-Apr-2009 |
Nathan Binkert <nate@binkert.org> |
tlb: More fixing of unified TLB |
6022:410194bb3049 |
09-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
tlb: Don't separate the TLB classes into an instruction TLB and a data TLB |
6020:0647c8b31a99 |
06-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Merge ARM into the head. ARM will compile but may not actually work. |
6012:47748a3b6ecf |
12-Mar-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu: fix minor endian issue with trace output (no functional change) |
6005:1dc178e53487 |
07-Mar-2009 |
Nathan Binkert <nate@binkert.org> |
stats: fix duplicate statistics names. This generally requires providing a more meaningful name() function for a class. |
5999:3cf8e71257e0 |
05-Mar-2009 |
Nathan Binkert <nate@binkert.org> |
stats: Fix all stats usages to deal with template fixes |
5991:3ca926101a5c |
05-Mar-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Get rid of 'using namespace' declarations in headers. |
5989:4ed2100efa84 |
04-Mar-2009 |
Korey Sewell <ksewell@umich.edu> |
InOrderCPU: Clean up Constructors to initialize variables correctly (i.e. in a way for the compiler to play *nice*) |
5988:38e32429b739 |
04-Mar-2009 |
Korey Sewell <ksewell@umich.edu> |
Give each resource in InOrder it's own TraceFlag instead of just standard 'Resource' flag |
5987:a87880065b42 |
04-Mar-2009 |
Korey Sewell <ksewell@umich.edu> |
Remove unused functions/comments cluttering up the code. |
5986:a6d07755d34f |
04-Mar-2009 |
Korey Sewell <ksewell@umich.edu> |
make handling of interstage buffers (i.e. StageQueues) more consistent: (1)number from 0-n, not 1-n+1, (2) always check nextStageValid before a stageNum+1 and prevStageValid for a stageNum-1 reference (3) add skidSize() to get StageQueue size for all threads |
5985:b4e30b30f695 |
04-Mar-2009 |
Korey Sewell <ksewell@umich.edu> |
InOrder didnt have all it's params set to a default value, which is now required for M5 objects; Also, a # of values need to be reset to 0 (or the appropriate value) before we assume they are OK for use. |
5984:4842a7d78634 |
04-Mar-2009 |
Korey Sewell <ksewell@umich.edu> |
Give TimeBuffer an ID that can be set. Necessary because InOrder uses generic stages so w/o an ID there is no way to differentiate buffers when debugging |
5983:2a2c2403ee5b |
04-Mar-2009 |
Korey Sewell <ksewell@umich.edu> |
use numCycles instead of simTicks to determine CPI stat in InOrder |
5982:de47df436ace |
04-Mar-2009 |
Steve Reinhardt <stever@gmail.com> |
O3: Make numThreads error message more helpful. |
5958:2d9737bf3c2f |
27-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Processes: Make getting and setting system call arguments part of a process object. |
5953:899ecfbce5af |
26-Feb-2009 |
Ali Saidi <saidi@eecs.umich.edu> |
CPA: Add code to automatically record function symbols as CPU executes. |
5947:3305e17db621 |
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Only look up the nearest symbol in the kernel if you're actually in kernel code. |
5922:28bcb158eaae |
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Add a flag to identify a read barrier to the static inst class. |
5914:c92d57f579b1 |
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Don't fetch when executing a macroop. If the CPL changes mid macroop, the end of the instruction might not be priveleged enough to execute the beginning. |
5894:8091ac99341a |
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Implement translateTiming which defers to translateAtomic, and convert the timing simple CPU to use it. |
5891:73084c6bb183 |
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
ISA: Replace the translate functions in the TLBs with translateAtomic. |
5890:bdef71accd68 |
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Get rid of translate... functions from various interface classes. |
5882:5a047c3f3795 |
23-Feb-2009 |
Nathan Binkert <nate@binkert.org> |
debug: Move debug_break into src/base |
5880:6fd7648e1b8d |
20-Feb-2009 |
Korey Sewell <ksewell@umich.edu> |
Remove unnecessary building of FreeList/RenameMap in InOrder. Clean-up comments and O3 extensions InOrder Thread Context |
5875:d82be3235ab4 |
16-Feb-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Fixes to get prefetching working again. Apparently we broke it with the cache rewrite and never noticed. Thanks to Bao Yungang <baoyungang@gmail.com> for a significant part of these changes (and for inspiring me to work on the rest). Some other overdue cleanup on the prefetch code too. |
5870:5645632d594c |
11-Feb-2009 |
Nathan Binkert <nate@binkert.org> |
style |
5869:acbe11bbfe68 |
10-Feb-2009 |
Korey Sewell <ksewell@umich.edu> |
Configs: Add support for the InOrder CPU model |
5868:09ab46bfa914 |
10-Feb-2009 |
Korey Sewell <ksewell@umich.edu> |
InOrder: Import new inorder CPU model from MIPS. This model currently only works in MIPS_SE mode, so it will take some effort to clean it up and make it generally useful. Hopefully people are willing to help make that happen! |
5866:303e409d88d9 |
10-Feb-2009 |
Korey Sewell <ksewell@umich.edu> |
ExeTrace: Allow subclasses of the tracer to define their own prefix to dump |
5865:54ed46881217 |
10-Feb-2009 |
Korey Sewell <ksewell@umich.edu> |
CPU: Prepare CPU models for the new in-order CPU model. Some new functions and forward declarations are necessary to make things work |
5849:6496f11d80da |
01-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Don't always reset the micro pc on faults. Let the faults handle it. |
5835:4b6af0ca4565 |
01-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make sure the predecoder is cleared out for interrupts. |
5821:2831ae658bfc |
30-Jan-2009 |
Ali Saidi <saidi@eecs.umich.edu> |
Config: Cause a fatal() when a parameter without a default value isn't set(FS #315). |
5810:606de5b3d116 |
25-Jan-2009 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Add a setCPU function to the interrupt objects. |
5807:57f9f8b8e62f |
24-Jan-2009 |
Nathan Binkert <nate@binkert.org> |
cpu: provide a wakeup mechanism that can be used to pull CPUs out of sleep. Make interrupts use the new wakeup method, and pull all of the interrupt stuff into the cpu base class so that only the wakeup code needs to be updated. I tried to make wakeup, wakeCPU, and the various other mechanisms for waking and sleeping a little more sane, but I couldn't understand why the statistics were changing the way they were. Maybe we'll try again some day. |
5804:34fe9bbc6705 |
21-Jan-2009 |
Nathan Binkert <nate@binkert.org> |
o3cpu: give a name to the activity recorder for better tracing |
5803:aae3d7089925 |
19-Jan-2009 |
Nathan Binkert <nate@binkert.org> |
thread_context: move getSystemPtr so SE mode can get to it. There was really no reason that it should be FS only. |
5793:321f79ddb500 |
13-Jan-2009 |
Nathan Binkert <nate@binkert.org> |
SCons: centralize the Dir() workaround for newer versions of scons. Scons bug id: 2006 M5 Bug id: 308 |
5791:3d417492668d |
12-Jan-2009 |
Richard Strong <rstrong@cs.ucsd.edu> |
This fix addresses an ill formed if statement that fails to compile. The fix was the simple addition of another set of parenthesis to ensure the correct condition resolution. |
5784:8a28646c4bc2 |
07-Jan-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Tracing: Make tracing aware of macro and micro ops. |
5780:50c9d48de3ca |
17-Dec-2008 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Make Alpha pseudo-insts available from SE mode. |
5779:c0d731772342 |
17-Dec-2008 |
Gabe Black <gblack@eecs.umich.edu> |
SPARC: Truncate syscall args and return values appropriately. |
5769:e53bdd0e4bf1 |
06-Dec-2008 |
Nathan Binkert <nate@binkert.org> |
eventq: use the flags data structure |
5744:342cbc20a188 |
14-Nov-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Refactor read/write in the simple timing CPU. |
5737:f43dbc09fad3 |
10-Nov-2008 |
Clint Smullen <cws3k@cs.virginia.edu> |
O3CPU: Make the instcount debugging stuff per-cpu. This is to prevent the assertion from firing if you have a large multicore. Also make sure that it's not compiled in when NDEBUG is defined |
5736:426510e758ad |
10-Nov-2008 |
Nathan Binkert <nate@binkert.org> |
mem: update stuff for changes to Packet and Request |
5728:9574f561dfa2 |
10-Nov-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Make unaligned accesses work in the timing simple CPU. |
5726:17157c5f7e15 |
10-Nov-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make the timing simple CPU handle variable length instructions. |
5718:323cfbfec1a4 |
05-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
Right now a single thread cpu 1 could get assigned context Id != 1, depending on the order in which it's registered with the system. To make them match, here is a little change. |
5715:e8c1d4e669a7 |
04-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
get rid of all instances of readTid() and getThreadNum(). Unify and eliminate redundancies with threadId() as their replacement. |
5714:76abee886def |
02-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
Add in Context IDs to the simulator. From now on, cpuId is almost never used, the primary identifier for a hardware context should be contextId(). The concept of threads within a CPU remains, in the form of threadId() because sometimes you need to know which context within a cpu to manipulate. |
5713:993c7952b930 |
02-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
Make it so that all thread contexts are registered with the System, even in SE. Process still keeps track of the tc's it owns, but registration occurs with the System, this eases the way for system-wide context Ids based on registration. |
5712:199d31b47f7b |
02-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
make BaseCPU the provider of _cpuId, and cpuId() instead of being scattered across the subclasses. generally make it so that member data is _cpuId and accessor functions are cpuId(). The ID val comes from the python (default -1 if none provided), and if it is -1, the index of cpuList will be given. this has passed util/regress quick and se.py -n4 and fs.py -n4 as well as standard switch. |
5710:b44dd45bd604 |
27-Oct-2008 |
Clint Smullen <cws3k@cs.virginia.edu> |
CPU: The API change to EventWrapper did not get propagated to the entirety of TimingSimpleCPU. The constructor no-longer schedules an event at construction and the implict conversion between int and bool was allowing the old code to compile without warning.
Signed-off By: Ali Saidi |
5707:da86e00f87a0 |
23-Oct-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
s/cpu_id/cpuId in o3 (to be consistent and match style), also fix some typos in comments. |
5704:98224505352a |
21-Oct-2008 |
Nathan Binkert <nate@binkert.org> |
style: Use the correct m5 style for things relating to interrupts. |
5702:bf84e2fa05f7 |
20-Oct-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
O3CPU: Undo Gabe's changes to remove hwrei and simpalcheck from O3 CPU. Removing hwrei causes the instruction after the hwrei to be fetched before the ITB/DTB_CM register is updated in a call pal call sys and thus the translation fails because the user is attempting to access a super page address.
Minimally, it seems as though some sort of fetch stall or refetch after a hwrei is required. I think this works currently because the hwrei uses the exec context interface, and the o3 stalls when that occurs.
Additionally, these changes don't update the LOCK register and probably break ll/sc. Both o3 changes were removed since a great deal of manual patching would be required to only remove the hwrei change. |
5694:de7a82f58985 |
13-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Explain why some code is commented out. |
5677:c8479d55206c |
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make the MicroPC type 16 bit. |
5669:cbac62a59686 |
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Don't fetch in the simple CPU if you're in the ROM. |
5668:5b5a9f4203d1 |
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
Get rid of old RegContext code. |
5665:433182bf55c1 |
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Make the highest order bit in the micro pc determine if it's combinational or from the ROM. |
5664:3b3756efad89 |
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Create a microcode ROM object in the CPU which is defined by the ISA. |
5658:55f9947891fb |
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Fix the ordering of special physical address ranges. |
5651:7f0c8006c3d7 |
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make APICs communicate through the memory system. |
5648:e8abda6e0980 |
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make the local APIC accessible through the memory system directly, and make the timer work. |
5647:b06b49498c79 |
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
Turn Interrupts objects into SimObjects. Also, move local APIC state into x86's Interrupts object. |
5646:0a488a147fb8 |
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Eliminate the get_vec function. |
5645:0d35ed236aa1 |
11-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Add a getInterruptController function |
5640:c811ced9efc1 |
11-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Eliminate the simPalCheck funciton. |
5639:67cc7f0427e7 |
11-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Eliminate the hwrei function. |
5610:0e1e9c186769 |
10-Oct-2008 |
Nathan Binkert <nate@binkert.org> |
SimObjects: Clean up handling of C++ namespaces. Make them easier to express by only having the cxx_type parameter which has the full namespace name, and drop the cxx_namespace thing. Add support for multiple levels of namespace. |
5606:6da7a58b0bc8 |
09-Oct-2008 |
Nathan Binkert <nate@binkert.org> |
eventq: convert all usage of events to use the new API. For now, there is still a single global event queue, but this is necessary for making the steps towards a parallelized m5. |
5597:e2983d751be4 |
09-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Generaize the O3 IMPL class so it isn't split out by ISA. |
5596:cdc8893c649e |
09-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Generaize the O3 dynamic instruction class so it isn't split out by ISA. |
5595:6ebdae3f619b |
09-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Generalize the O3 CPU object so it isn't split out by ISA. |
5592:6e0569faeeef |
09-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Fix where setMicroPC was being called instead of setNextMicroPC. |
5570:13592d41f290 |
28-Sep-2008 |
Nathan Binkert <nate@binkert.org> |
gcc: Add extra parens to quell warnings. Even though we're not incorrect about operator precedence, let's add some parens in some particularly confusing places to placate GCC 4.3 so that we don't have to turn the warning off. Agreed that this is a bit of a pain for those users who get the order of operations correct, but it is likely to prevent bugs in certain cases. |
5557:03c186e416aa |
26-Sep-2008 |
Kevin Lim <ktlim@umich.edu> |
O3CPU: Fix thread writeback logic. Fix the logic in the LSQ that determines if there are any stores to write back. In the commit stage, check for thread specific writebacks instead of just any writeback. |
5556:c9f52fae6b37 |
26-Sep-2008 |
Kevin Lim <ktlim@umich.edu> |
O3CPU: Add a hack to ensure that nextPC is set correctly after syscalls. Just check CPU's nextPC before and after syscall and if it changes, update this instruction's nextPC because the syscall must have changed the nextPC. |
5553:de0fa35df4cb |
22-Sep-2008 |
Nathan Binkert <nate@binkert.org> |
gcc: Version 4.3 is pretty anal about shadowing types, placate it. In the future, it would be nice to put the O3CPU into its own namespace so that we don't end up hardcoding pointers to the global namespace. |
5543:3af77710f397 |
10-Sep-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
style: Remove non-leading tabs everywhere they shouldn't be. Developers should configure their editors to not insert tabs |
5537:eaeed2bdf50d |
20-Aug-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Get rid of two more duplicated CPU params. |
5536:17c0c17726ff |
18-Aug-2008 |
Richard Strong<rstrong@hp.com> |
Changed BaseCPU::ProfileEvent's interval member to be of type Tick. This was done to be consistent with its python type of a latency. In addition, the multiple definitions of profile in the different cpu models caused problems for intialization of the interval value. If a child class's profile value was defined, the parent BaseCPU::ProfileEvent interval field would be initialized with a garbage value. The fix was to remove the multiple redifitions of profile in the child CPU classes. |
5529:9ae69b9cd7fd |
11-Aug-2008 |
Nathan Binkert <nate@binkert.org> |
params: Convert the CPU objects to use the auto generated param structs. A whole bunch of stuff has been converted to use the new params stuff, but the CPU wasn't one of them. While we're at it, make some things a bit more stylish. Most of the work was done by Gabe, I just cleaned stuff up a bit more at the end. |
5523:6279e78a2df2 |
03-Aug-2008 |
Nathan Binkert <nate@binkert.org> |
sockets: Add a function to disable all listening sockets. When invoking several copies of m5 on the same machine at the same time, there can be a race for TCP ports for the terminal connections or remote gdb. Expose a function to disable those ports, and have the regression scripts disable them. There are some SimObjects that have no other function than to be used with ports (NativeTrace and EtherTap), so they will panic if the ports are disabled. |
5507:52bcc301b467 |
15-Jul-2008 |
Steve Reinhardt <stever@gmail.com> |
Use ReadResp instead of LoadLockedResp for LoadLockedReq responses. |
5499:8bfc7650c344 |
01-Jul-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
Remove delVirtPort() and make getVirtPort() only return cached version. |
5497:89a6483d7047 |
01-Jul-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
Make the cached virtPort have a thread context so it can do everything that a newly created one can. |
5496:6899b894166f |
01-Jul-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
After a checkpoint (and thus a stats reset), the not_idle_fraction/notIdleFraction statistic is really wrong. The notIdleFraction statistic isn't updated when the statistics reset, probably because the cpu Status information was pulled into the atomic and timing cpus. This changeset pulls Status back into the BaseSimpleCPU object. Anyone care to comment on the odd naming of the Status instance? It shouldn't just be status because that is confusing with Port::Status, but _status seems a bit strage too. |
5494:85c8d296c1cb |
28-Jun-2008 |
Steve Reinhardt <stever@gmail.com> |
Backed out changeset 94a7bb476fca: caused memory leak. |
5489:94a7bb476fca |
21-Jun-2008 |
Steve Reinhardt <stever@gmail.com> |
Generate more useful error messages for unconnected ports. Force all non-default ports to provide a name and an owner in the constructor. |
5487:f0ac4112e128 |
18-Jun-2008 |
Nathan Binkert <nate@binkert.org> |
AtomicSimpleCPU: Separate data stalls from instruction stalls. Separate simulation of icache stalls and dat stalls. |
5482:7fea9bcd84dd |
18-Jun-2008 |
Nathan Binkert <nate@binkert.org> |
ThreadState: Ensure that kernelStats is properly initialized |
5476:758c2413765a |
16-Jun-2008 |
Nathan Binkert <nate@binkert.org> |
port: Clean up default port setup and port switchover code. |
5408:703f1779cc89 |
12-Jun-2008 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Make the simple cpu trace data for loads/stores. |
5400:fee00a595efc |
10-Apr-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
SCons: add comments to SConscript documenting bug workaround |
5398:9727ba4600de |
08-Apr-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
SCons: Manually specifying header only directories with Dir() works around the problem |
5386:5614618f4027 |
24-Mar-2008 |
Steve Reinhardt <stever@gmail.com> |
Don't FastAlloc MSHRs since we don't allocate them on the fly. |
5375:2bd02f12dc05 |
06-Mar-2008 |
Vilas Sridharan <vilas.sridharan@gmail.com> |
O3CPU: Don't call dumpInsts if DEBUG is not defined |
5364:66d1251b7ae6 |
27-Feb-2008 |
Korey Sewell <ksewell@umich.edu> |
Add comments in code to describe bug conditions. This should help if somebody gets to the bug fix before me (or someone else)... |
5363:c474cb7a2b9c |
27-Feb-2008 |
Korey Sewell <ksewell@umich.edu> |
Fix Load/Store Queue squashing after a SMT thread is removed but ensuring you are squashing from the current instruction # causing the thread exit. |
5362:0adba9a562c9 |
27-Feb-2008 |
Korey Sewell <ksewell@umich.edu> |
Fix offset in removeThread() function so that float registers start freeing up from the right point (#32 usually) instead of restarting at 0 and double-freeing.
Commented out assert line in free_list.hh that will check for when double-free condition goes bad. |
5358:e9acb84bbafb |
26-Feb-2008 |
Gabe Black <gblack@eecs.umich.edu> |
TLB: Make a TLB base class and put a virtual demapPage function in it. |
5348:7847a4bf9641 |
14-Feb-2008 |
Ali Saidi <saidi@eecs.umich.edu> |
CPU: move the PC Events code to a place where the code won't be executed multiple times if an instruction faults. |
5338:e75d02a09806 |
10-Feb-2008 |
Steve Reinhardt <stever@gmail.com> |
Fix #include lines for renamed cache files. |
5336:c7e21f4e5a2e |
06-Feb-2008 |
Stephen Hines <hines@cs.fsu.edu> |
Make the Event::description() a const function |
5335:69d45f5f21a2 |
05-Feb-2008 |
Stephen Hines <hines@cs.fsu.edu> |
Add base ARM code to M5 |
5327:3390941f0643 |
14-Jan-2008 |
Ke Meng <mengke97@hotmail.com> |
The reason is that the event is supposed to put the instructions ready to execute for next cycle. And the FUCompletion event has a lower priority than CPU tick event. It is called after the iew->tick() for current cycle has already been executed and the issueToExecuteQueue has already advanced this time. And assume the issueToExecuteLatency is 1, to catch up, the increasement should be made at access(-1) instead of access(0). Otherwise I found it could increase the actual op_latency of the instructions to execute by 1 cycle and potentially put the simulated CPU into a permanent idle state.
Signed-off by: Ali Saidi <saidi@eecs.umich.edu> |
5319:13cb690ba6d6 |
02-Jan-2008 |
Steve Reinhardt <stever@gmail.com> |
Add ReadRespWithInvalidate to handle multi-level coherence situation where we defer a response to a read from a far-away cache A, then later defer a ReadExcl from a cache B on the same bus as us. We'll assert MemInhibit in both cases, but in the latter case MemInhibit will keep the invalidation from reaching cache A. This special response tells cache A that it gets the block to satisfy its read, but must immediately invalidate it. |
5315:30997e988446 |
02-Jan-2008 |
Steve Reinhardt <stever@gmail.com> |
Additional comments and helper functions for PrintReq. |
5314:e902f12a3af1 |
02-Jan-2008 |
Steve Reinhardt <stever@gmail.com> |
Add functional PrintReq command for memory-system debugging. |
5311:9ed42a2315ae |
18-Dec-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Checkpointing: Fix a bug in the simulation script when restoring without standard switch and change some ifs to work with the default port since every port is now connected to something. |
5310:4164e6bfcc8a |
16-Dec-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
CPU: Update where the simple cpus read their cpu id from the thread context to init() to make sure they read the right value. This fixes a bug with multi-processor full-system configurations. |
5281:61e396061986 |
21-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
imported patch pagewalker.patch |
5278:4c963dc4ab07 |
20-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Simple CPU fix simple mistake in translateDataWriteAddr. |
5261:faf87a7e3ef8 |
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
add thread id to misc. reg functions |
5260:9f412d1c6d8b |
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
add MicroPC functions back to thread context |
5259:74ef5093154f |
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
add microPC stuff back in. got deleted on changeset propragation somehow. |
5258:fcccd87d5178 |
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
put the flattenIndex stuff back in O3 AND put fatal() back in faults |
5252:c2804af3a7f4 |
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
add core specific parameter to BaseCPU params |
5250:42577371ff31 |
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
Get MIPS simple regression working. Take out unecessary functions "setShadowSet", "CacheOp" |
5249:49d44a466496 |
15-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
branch merge |
5245:d94bb8af9f76 |
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Separate out the page table walker into it's own cc and hh. |
5237:6c819dbe8045 |
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Work on the page table walker, TLB, and related faults. |
5236:0050ad4fb3ef |
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement a page table walker. |
5235:f07f46843886 |
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make the micropc available through the thread context objects. This is necssary for fault handlers that branch to non-zero micro PCs. |
5222:bb733a878f85 |
13-Nov-2007 |
Korey Sewell <ksewell@umich.edu> |
Add in files from merge-bare-iron, get them compiling in FS and SE mode |
5221:dba788e614fe |
08-Nov-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
TimingSimpleCPU: Add some DPRINTFs when the cpu suspends and resumes. |
5220:8bf8e82fda20 |
08-Nov-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
AtomicSimpleCPU: Refactor resume() code to have a cleaner control path. |
5218:9b99318ca70d |
08-Nov-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Interrupts: Inline some code and remove duplication. |
5217:bb810bb8ca2d |
08-Nov-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
CPU: Add function to explictly compare thread contexts after copying. |
5215:68f719ce5496 |
06-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Remove unneeded variable. |
5202:ff56fa8c2091 |
31-Oct-2007 |
Steve Reinhardt <stever@gmail.com> |
String constant const-ness changes to placate g++ 4.2. Also some bug fixes in MIPS ISA uncovered by g++ warnings (Python string compares don't work in C++!). |
5192:582e583f8e7e |
31-Oct-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Traceflags: Add SCons function to created a traceflag instead of having one file with them all. |
5177:4307a768e10e |
22-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Add functions to the "ExecContext"s that translate a given address. |
5169:bfd18d401251 |
18-Oct-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
CPU: Use the ThreadContext cpu id instead of the params cpu id in all cases. |
5126:d3cdea5e0fb3 |
03-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head. |
5120:b999773ab81f |
03-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Predecoder: Clear out predecoder state on an ITLB fault. |
5110:4a6ab0f8cf33 |
02-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Make the cpuid parameter get set in SE mode as well. |
5108:3b59ba14a7f3 |
02-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Make the cpus check the pc event queues in SE mode. |
5104:cb14dda4d8fc |
02-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Make sure the system parameter gets set in the cpu builders. Other parameters need to be fixed as well. |
5103:391933804192 |
01-Oct-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
CPU: fix sparc_fs booting with SimpleTimingCPU. |
5101:8af5a6a6223d |
28-Sep-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Update stats for quiesced cycles |
5100:7a0180040755 |
28-Sep-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Rename cycles() function to ticks() |
5099:8ff1345b3ae4 |
28-Sep-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Update statistics to use cycles properly instead of ticks |
5088:2d5e28510f27 |
25-Sep-2007 |
Gabe Black <gblack@eecs.umich.edu> |
SPARC: Fix a stupid mistake which was breaking the SPARC regressions. |
5086:e7913ffb379d |
24-Sep-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Get X86_FS to compile. |
5082:82dd253231c8 |
19-Sep-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Put in the foundation for x87 stack based fp registers. |
5049:16a0724434b8 |
05-Sep-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86/StateTrace: Make m5 and statetrace track mmx and xmm registers, and actually compare xmm. |
5038:c996bb7f1a6d |
31-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Get x86 to compile again after the simobject constructor change. |
5036:f9174c026b7f |
30-Aug-2007 |
Miles Kaufmann <milesck@eecs.umich.edu> |
Fix miscellaneous small typos. |
5034:6186ef720dd4 |
30-Aug-2007 |
Miles Kaufmann <milesck@eecs.umich.edu> |
params: Deprecate old-style constructors; update most SimObject constructors.
SimObjects not yet updated: - Process and subclasses - BaseCPU and subclasses
The SimObject(const std::string &name) constructor was removed. Subclasses that still rely on that behavior must call the parent initializer as : SimObject(makeParams(name)) |
5018:21795007349e |
27-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head. |
5012:c0a28154d002 |
27-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head |
5003:2eb7f972aabf |
26-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
O3 CPU: Remove alignment check from dynamic instruction read/write functions. |
5001:31fda5c37c19 |
26-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Simple CPU: Don't trace instructions that fault. Otherwise they show up twice. |
4999:b46ae02966d5 |
26-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Simple CPU: Added code that will split requests that cross block boundaries into multiple memory access. |
4998:51a0f9f59cc5 |
26-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Simple CPU: Make sure only instructions which complete without faulting are counted. |
4997:e7380529bd2d |
26-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Address Translation: Make SE mode use an actual TLB/MMU for translation like FS. |
4991:7e3bb2eabbbf |
13-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
O3: Set up the predicted npc and nnpc for a fault carrying noop so that it doesn't cause a false branch mispredict. |
4988:5b26eba4283f |
13-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Move the "translate" member functions back into the base o3 class. |
4986:b7c82ad6b3ef |
24-Aug-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Mem: Make errors in the memory system be responses, not requests. Fixes cache handling of error responses. |
4985:9f577f468009 |
21-Aug-2007 |
Kevin Lim <ktlim@umich.edu> |
o3: Fix for retry ID bug. It should be cleared prior to the call to recvRetry. Add extra DPRINTF statement for clearer debugging output. |
4968:f1c856d8c460 |
08-Aug-2007 |
Vincentius Robby <acolyte@umich.edu> |
Added fastmem option. Lets CPU accesses to physical memory bypass Bus. |
4963:ba55203d1bdc |
08-Aug-2007 |
Vincentius Robby <acolyte@umich.edu> |
Port, StaticInst: Revert unnecessary changes. |
4962:4e939f4629c3 |
08-Aug-2007 |
Vincentius Robby <acolyte@umich.edu> |
alpha: Make the TLB cache to actually work. Improve MRU checking for StaticInst, Bus, TLB |
4956:fc30658d75de |
04-Aug-2007 |
Vincentius Robby <acolyte@umich.edu> |
StaticInst: Fix decode cache initialization. Cache functionality was negated. |
4950:f5f19784acf1 |
07-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make a microcode branch microop. Also some touch up for ruflag. |
4947:6052dece6776 |
04-Aug-2007 |
Nathan Binkert <nate@binkert.org> |
switching: turn on profiling after a switch if there's an event |
4940:23874ae87540 |
04-Aug-2007 |
Nathan Binkert <nate@binkert.org> |
SimpleCPU: Add some DPRINTFs |
4928:951bd17db218 |
29-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Merge Gabe's changes from head. |
4925:36fd2459422f |
28-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
AtomicSimpleCPU: fix inadvertent loss of endian conversion on read. |
4920:03b88702070e |
27-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
cache/memtest: fixes for functional accesses. |
4918:3214e3694fb2 |
27-Jul-2007 |
Nathan Binkert <nate@binkert.org> |
Merge python and x86 changes with cache branch |
4909:f3b84a9b5c5a |
23-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Fix WriteReq/StoreCondReq setting in O3. |
4898:117991fb7852 |
16-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Fix bug with timing snoop upcalls to MemTest object. |
4895:d36959284fbc |
15-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Fix up a bunch of multilevel coherence issues. Atomic mode seems to work. Timing is closer but not there yet. |
4893:3439144e474a |
15-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Fix problem with unset max_loads in memtest. Also make default 0, and make that mean run forever. |
4881:3e4b4f6ff9dd |
02-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Couple more minor bug fixes for FS timing mode.
src/cpu/simple/timing.cc: Fix another SC problem. src/mem/cache/cache_impl.hh: Forgot to call makeTimingResponse() on uncached timing responses. |
4880:4de4d072e977 |
02-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Fix a couple LL/SC bugs that only affected timing mode.
src/cpu/simple/timing.cc: Fix swap/stq_c command bug. src/mem/packet.cc: Fix incorrect LoadLockedReq command response field. |
4878:5b747482d2d8 |
30-Jun-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Make CPU models use new LoadLockedReq/StoreCondReq commands. |
4873:b135f6e6adfe |
30-Jun-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Event descriptions should not end in "event" (they function as adjectives not nouns) |
4870:fcc39d001154 |
30-Jun-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Get rid of Packet result field. Error responses are now encoded in cmd field. |
4830:aad1410a2b79 |
01-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Reorganize the native tracing code. Ignore different values or rcx and r11 after a syscall until either the local or remote value changes. Also change the codes organization somewhat. |
4828:768d4cf6b0dc |
31-Jul-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Add a flag to indicate an instruction triggers a syscall in SE mode. |
4800:910dde7af74f |
30-Jul-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Fix problem with tracer not being initialized. |
4796:e938afbfc8cd |
29-Jul-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
BsaeCPU: Get rid of some bad DPRINTFs. People should never put pointers in DPRINTFs; it messes up tracediffs. Plus these used the FullCPU trace flag, which is not right. |
4790:f71b033c83e1 |
29-Jul-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Fix register ordering. The correct order is unintuitively rax, rcx, rdx, rbx, etc, not rax, rbx, rcx, rdx. |
4776:8c8407243a2c |
28-Jul-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Turn the instruction tracing code into pluggable sim objects. These need to be refined a little still and given parameters. |
4772:f08370a81812 |
27-Jul-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Fix argument register indexing. Code was assuming that all argument registers followed in order from ArgumentReg0. There is now an ArgumentReg array which is indexed to find the right index. There is a constant, NumArgumentRegs, which can be used to protect against using an invalid ArgumentReg. |
4763:fef9a47b3732 |
24-Jul-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head. |
4762:c94e103c83ad |
24-Jul-2007 |
Nathan Binkert <nate@binkert.org> |
Major changes to how SimObjects are created and initialized. Almost all creation and initialization now happens in python. Parameter objects are generated and initialized by python. The .ini file is now solely for debugging purposes and is not used in construction of the objects in any way. |
4673:833d4a116810 |
28-Jun-2007 |
Korey Sewell <ksewell@umich.edu> |
o3cpu build for mips |
4661:44458219add1 |
22-Jun-2007 |
Korey Sewell <ksewell@umich.edu> |
mips import pt. 1
src/arch/mips/SConscript: "mips import pt.1". |
4660:8ba283606f48 |
23-Jun-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Minor fix plus new assertion to catch similar bugs.
src/cpu/memtest/memtest.cc: Need to set packet source field so that response from cache doesn't run into assertion failure when copying source to dest. src/mem/packet.hh: Copy source field when copying packets. Assert that source is valid before copying it to dest when turning packets around. |
4656:dbfa364feec8 |
21-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into ahchoo.blinky.homelinux.org:/home/gblack/m5/newmem-o3-micro
src/cpu/o3/fetch_impl.hh: hand merge |
4654:225cc048edfa |
20-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Fix compiler errors. |
4653:19f884e6a48b |
19-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into doughnut.hpl.hp.com:/home/gblack/newmem-o3-micro
src/cpu/base_dyn_inst_impl.hh: src/cpu/o3/fetch_impl.hh: Hand merge |
4652:cead97b41680 |
12-May-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make sure all addresses used in syscalls are truncated to 32 bits. Actually -all- arguements are truncated to 32 bits, but we should be able to get away with it. |
4650:bb9977571ff4 |
09-May-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into doughnut.mwconnections.com:/home/gblack/newmem-o3-micro |
4644:4e77ab0671e8 |
23-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/wexford/x/gblack/m5/newmem-o3-spec |
4642:d7b2de2d72f1 |
22-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make the floating point zero register special handling only apply for ALPHA. |
4638:e181f5b0ebca |
15-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make an inner loop which pulls microops out of macroops. These aren't checked for control flow because we can pull out microops until we run out of buffer. This prevents microops from being interpretted as branches because the pc doesn't become npc. |
4637:d3adce1577fd |
15-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Add extra constructors to Alpha and MIPS |
4636:afc8da9f526e |
14-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Add support for microcode and pull out the special branch delay slot handling. Branch delay slots need to be squash on a mispredict as well because the nnpc they saw was incorrect. |
4632:be5b8f67b8fb |
13-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Remove most of the special handling for delay slots since they have to be squashed anyway on a mispredict. This is because the NNPC value they saw when executing was incorrect. |
4628:17b3ce796176 |
21-Jun-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Getting closer...
configs/example/memtest.py: Add progress interval option. src/base/traceflags.py: Add MemTest flag. src/cpu/memtest/memtest.cc: Clean up tracing. src/cpu/memtest/memtest.hh: Get rid of unused code. |
4627:2766d5cfbd9d |
17-Jun-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Merge vm1.(none):/home/stever/bk/newmem-head into vm1.(none):/home/stever/bk/newmem-cache2
configs/example/memtest.py: Hand merge redundant changes. |
4626:ed8aacb19c03 |
17-Jun-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
More major reorg of cache. Seems to work for atomic mode now, timing mode still broken.
configs/example/memtest.py: Revamp options. src/cpu/memtest/memtest.cc: No need for memory initialization. No need to make atomic response... memory system should do that now. src/cpu/memtest/memtest.hh: MemTest really doesn't want to snoop. src/mem/bridge.cc: checkFunctional() cleanup. src/mem/bus.cc: src/mem/bus.hh: src/mem/cache/base_cache.cc: src/mem/cache/base_cache.hh: src/mem/cache/cache.cc: src/mem/cache/cache.hh: src/mem/cache/cache_blk.hh: src/mem/cache/cache_builder.cc: src/mem/cache/cache_impl.hh: src/mem/cache/coherence/coherence_protocol.cc: src/mem/cache/coherence/coherence_protocol.hh: src/mem/cache/coherence/simple_coherence.hh: src/mem/cache/miss/SConscript: src/mem/cache/miss/mshr.cc: src/mem/cache/miss/mshr.hh: src/mem/cache/miss/mshr_queue.cc: src/mem/cache/miss/mshr_queue.hh: src/mem/cache/prefetch/base_prefetcher.cc: src/mem/cache/tags/fa_lru.cc: src/mem/cache/tags/fa_lru.hh: src/mem/cache/tags/iic.cc: src/mem/cache/tags/iic.hh: src/mem/cache/tags/lru.cc: src/mem/cache/tags/lru.hh: src/mem/cache/tags/split.cc: src/mem/cache/tags/split.hh: src/mem/cache/tags/split_lifo.cc: src/mem/cache/tags/split_lifo.hh: src/mem/cache/tags/split_lru.cc: src/mem/cache/tags/split_lru.hh: src/mem/packet.cc: src/mem/packet.hh: src/mem/physical.cc: src/mem/physical.hh: src/mem/tport.cc: More major reorg. Seems to work for atomic mode now, timing mode still broken. |
4599:b3cdf938a853 |
20-Jun-2007 |
Vincentius Robby <acolyte@umich.edu> |
Removed "adding instead of dividing" trick. Caused slowdown in performance instead of speeding up.
src/cpu/base.cc: Removed "adding instead of dividing" trick. src/mem/bus.cc: Fixed spelling in comments. Removed "adding instead of dividing" trick. |
4598:56adf2e778a8 |
20-Jun-2007 |
Nathan Binkert <binkertn@umich.edu> |
Don't do checker stuff if the checker is not defined |
4597:063f25d13229 |
20-Jun-2007 |
Nathan Binkert <binkertn@umich.edu> |
Make sure all parameters have default values if they're supposed to and make sure parameters have the right type. Also make sure that any object that should be an intermediate type has the right options set. |
4594:25b6ff860bed |
19-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Missed an "offset" to get rid of. |
4593:16b19397172c |
19-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make branches work by repopulating the predecoder every time through. This is probably fine as far as the predecoder goes, but the simple cpu might want to not refetch something it already has. That reintroduces the self modifying code problem though. |
4584:81a11c930f2d |
18-Jun-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
fix bug in timing cpu. getTime() is the time the requset was created, not the time it was repsonded to. In timing mode the time it was responded to is curTick. Doesn't change the results, but it does make implementation of nextCycle() more difficult |
4579:810e745682b1 |
16-Jun-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
memtest.cc: No need to initialize memory contents; should come up as 0.
src/cpu/memtest/memtest.cc: No need to initialize memory contents; should come up as 0. |
4572:5499df089a6c |
14-Jun-2007 |
Vincentius Robby <acolyte@umich.edu> |
Modified instruction decode method. Make code compatible with new decode method.
src/arch/alpha/remote_gdb.cc: src/cpu/base_dyn_inst_impl.hh: src/cpu/exetrace.cc: src/cpu/simple/base.cc: Make code compatible with new decode method. src/cpu/static_inst.cc: src/cpu/static_inst.hh: Modified instruction decode method. |
4565:94f0760f1b44 |
14-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
A fix for SPARC_FS compilation. |
4564:d1fb13424616 |
13-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Seperate the pc-pc and the pc of the incoming bytes, and get rid of the "moreBytes" which just takes a MachInst.
src/arch/x86/predecoder.cc: Seperate the pc-pc and the pc of the incoming bytes, and get rid of the "moreBytes" which just takes a MachInst. Also make the "opSize" field describe the number of bytes and not the log of the number of bytes. |
4556:00281bdfeb31 |
12-Jun-2007 |
Nathan Binkert <binkertn@umich.edu> |
Rename enum from OpType to OpClass so it's consistent with the real thing. Also rename the null case to something that can be a C++ symbol. |
4551:c131b771a066 |
10-Jun-2007 |
Nathan Binkert <binkertn@umich.edu> |
Use the right type |
4539:6eeeea62b7c4 |
12-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make microOp vs microop and macroOp vs macroop capitilization consistent.
src/arch/x86/isa/macroop.isa: Make microOp vs microop and macroOp vs macroop capitilization consistent. Also fill out the emulation environment handling a little more, and use an object to pass around output code. src/arch/x86/isa/microops/base.isa: Make microOp vs microop and macroOp vs macroop capitilization consistent. Also adjust python to C++ bool translation. |
4522:3043823ff963 |
04-Jun-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
don't be so aggressive with the tracing on #if |
4518:8380fb0a275a |
01-Jun-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Don't mask the pc because the Alpha predecoder needs it to set the PAL mode bit in the ExtMachInst. |
4517:626afdfa6ec9 |
01-Jun-2007 |
Nathan Binkert <binkertn@umich.edu> |
Fix typo so m5.fast will compile |
4515:a21985804844 |
01-Jun-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zeep.pool:/z/saidi/work/m5.newmem
src/cpu/simple/base.cc: hand merge vincent/gabe/my changes to cast sizeof() to a 64bit int |
4514:ac9b34438d34 |
01-Jun-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
cast sizeof(MachInst) to Addr before generating a mask |
4513:ad010b9fb1dc |
01-Jun-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
don't generate trace data unless tracing is on |
4510:98339396410c |
01-Jun-2007 |
Vincentius Robby <acolyte@umich.edu> |
Minor error. Forgotten to remove brackets for threadPC. |
4501:b5f473594687 |
31-May-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into ahchoo.blinky.homelinux.org:/home/gblack/m5/newmem-x86
src/cpu/simple/base.cc: Hand merge |
4497:17e34dbcc8b3 |
30-May-2007 |
Nathan Binkert <binkertn@umich.edu> |
Fix cut-n-pasto to make the path correct |
4495:dbd2943590e6 |
31-May-2007 |
Vincentius Robby <acolyte@umich.edu> |
Assign traceData to be NULL at BaseSimpleCPU constructor. Initialize a temporary variable for thread->readPC() at setupFetchRequest() to reduce function calls. exec tracing isn't needed for m5.fast binaries Moved MISCREG_GL, MISCREG_CWP, and MISCREG_TLB_DATA out of switch statement and use if blocks instead.
src/arch/sparc/miscregfile.cc: Moved MISCREG_GL, MISCREG_CWP, and MISCREG_TLB_DATA out of switch statement and use if blocks instead. src/cpu/simple/base.cc: Assign traceData to be NULL at BaseSimpleCPU constructor. Initialize a temporary variable for thread->readPC() at setupFetchRequest() to reduce function calls. exec tracing isn't needed for m5.fast binaries |
4488:400afb0dd42d |
28-May-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Remove unnecessary include of physical.hh. |
4486:aaeb03a8a6e1 |
27-May-2007 |
Nathan Binkert <binkertn@umich.edu> |
Move SimObject python files alongside the C++ and fix the SConscript files so that only the objects that are actually available in a given build are compiled in. Remove a bunch of files that aren't used anymore. |
4477:375b35072b58 |
22-May-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Merge vm1.(none):/home/stever/bk/newmem-head into vm1.(none):/home/stever/bk/newmem-cache2
src/mem/cache/base_cache.hh: Manual conflict resolution. |
4475:fb185cc1c845 |
22-May-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Change getDeviceAddressRanges to use bool for snoop arg. |
4474:6d666915bd7e |
22-May-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
memtest.hh: Fix description string. Minor whitespace cleanup.
src/cpu/memtest/memtest.hh: Fix description string. Minor whitespace cleanup. |
4471:4d86c4d096ad |
21-May-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Add new EventWrapper constructor that takes a Tick value and schedules the event immediately. |
4465:70123ac99284 |
18-May-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into doughnut.mwconnections.com:/home/gblack/m5/newmem-x86 |
4434:2ea7b6e0b78f |
09-May-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
fix the translating ports so it can add a page on a fault |
4433:4722c6787f69 |
07-May-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
the bridge never returns false when recvTiming() is called on its ports now, it always returns true and nacks the packet if there isn't sufficient buffer space fix the timing cpu to handle receiving a nacked packet
src/cpu/simple/timing.cc: make the timing cpu handle receiving a nacked packet src/mem/bridge.cc: src/mem/bridge.hh: the bridge never returns false when recvTiming() is called on its ports now, it always returns true and nacks the packet if there isn't sufficient buffer space |
4406:46f15e4eb062 |
26-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Remove extra delete that was causing segfault. |
4405:57af43e114b5 |
26-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Remove unnecessary check. |
4400:619191b2f011 |
16-Apr-2007 |
Ron Dreslinski <rdreslin@umich.edu> |
Fixes for splash, may conflict with Korey's SMT work and doesn't support 03cpu yet.
src/cpu/simple/base.cc: Cpu's should start as unallocated, not suspended src/cpu/simple_thread.cc: Wait for a thread to be assigned to activate the cpu src/kern/tru64/tru64.hh: When looking for a open cpu to assign threads, look for an unallocated one, not a suspended one. |
4395:9acb011a6c35 |
21-Apr-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
fixes for solaris compile |
4392:271b73b42e34 |
22-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Use proper cycles for IPC and CPI equations.
src/cpu/o3/cpu.cc: Use proper cycles for these equations. |
4377:ca55a0b1990a |
18-May-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Changes to make simple cpu handle pcs appropriately for x86 |
4376:ecc6222371af |
11-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Use a computed mask to mask out the fetch address and not a hard coded one. |
4375:b89532cd1b7d |
11-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make the itlb set the PHYSICAL flag on a request when it translates it. This gets it out of the cpu. |
4373:6e1b7eeb5ba3 |
10-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Even if you don't want to fetch more bytes, make sure you handle a fault. |
4359:6b6cb2927594 |
09-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Fixed a compile error. |
4357:f8b2da607906 |
09-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/tmp/head |
4352:52f11aaf7d19 |
08-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Take into account that the flattened integer register space is a different size than the architected one. Also fixed some asserts. |
4350:c3f402102507 |
07-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Get the "hard" SPARC instructions working in o3. I don't like that the IsStoreConditional flag needs to be set for them because they aren't store conditional instructions, and I should fix the format code which is not handling the opt_flags correctly. |
4345:a95454b0e835 |
09-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Fix bug when blocking due to no free registers. |
4332:548ef28989b8 |
04-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into ahchoo.blinky.homelinux.org:/home/gblack/m5/newmem-o3-spec |
4331:e53c3a1aedad |
04-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Updates for other ISA cpu_builders. |
4329:52057dbec096 |
04-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Pass ISA-specific O3 CPU as a constructor parameter instead of using setCPU functions.
src/cpu/o3/alpha/cpu_impl.hh: Pass ISA-specific O3 CPU to FullO3CPU as a constructor parameter instead of using setCPU functions. |
4326:a9277254c1e4 |
03-Apr-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Made the "data" field of store queue entries into a character array. It's sized to match an IntReg which was what it used to be, but we might want to make it something architecture independent. All data is now endian converted before entering the store queue entries which simplifies store to load forwarding in "trans endian" simulations, and makes twin memory ops work.
src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: fixed twin memory operations. |
4319:b8eae8c6afcc |
03-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Fix a memory leak. Hopefully this fixes the longer running benchmarks. |
4318:eb4241362a80 |
02-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Remove/comment out DPRINTFs that were causing a segfault.
The removed ones were unnecessary. The commented out ones could be useful in the future, should this problem get fixed. See flyspray task #243.
src/cpu/o3/commit_impl.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/rob_impl.hh: Remove/comment out DPRINTFs that were causing a segfault. |
4317:99838c26f7be |
02-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Fix up SPARC's CPU builder to match changes to Alpha's CPU builder. |
4302:c45514c856b0 |
29-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Update code so that the O3 CPU can handle not initially having anything hooked up to its ports. This fixes the segfault Ali recently found when using sampling.
src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: Update code so that the O3 CPU can handle not initially having anything hooked up to its ports. |
4288:1fc3aa7ad095 |
25-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Update for new trace data behavior. |
4284:c8800319ed0c |
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/tmp/clean2
src/cpu/base_dyn_inst.hh: Hand merge. Line is no longer needed because it's handled in the ISA. |
4268:12a0b7558078 |
21-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into zower.eecs.umich.edu:/home/gblack/m5/newmem-statetrace |
4266:0952dbfed63f |
18-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Compile fixes for SPARC_FS.
src/arch/alpha/predecoder.hh: src/arch/sparc/predecoder.hh: Put in a missing include src/cpu/exetrace.cc: Convert the legion lockstep stuff from makeExtMI to the predecoder object. |
4265:ab2fb6202751 |
21-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
The m5 side of statetrace. This is fairly ugly, but I don't want to lose it. |
4254:66a131ab3ff9 |
16-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Fix ALPHA_FS compile. The MachInst -> StaticInstPtr constructor is no longer a conversion constructor because it caused ambiguous conversions when setting the pointer to NULL. |
4240:cde9d7751cce |
14-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into ahchoo.blinky.homelinux.org:/home/gblack/m5/newmem-x86
src/arch/mips/utility.hh: src/arch/x86/SConscript: Hand merge |
4224:7e828583f2cb |
11-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make sttw and sttwa use the twin memory operations. |
4217:4c966fec2324 |
13-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
fix segfault when peer owner attempts to use functional port |
4216:c01745179a1f |
13-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
fix interrupting during a quisce on sparc
src/arch/sparc/ua2005.cc: fix interrupting when quisced. Since sticks correspond to instructions when not quisced we need to check if were suspended and interrupt at the guess time src/base/traceflags.py: add trace flag for Iob src/cpu/simple/base.cc: Use Quisce instead of IPI trace flag src/dev/sparc/iob.cc: add some Dprintfs |
4212:0d50e6c98d13 |
12-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
remove the extern C around gdb helper functions. It's need needed for any new version of gdb to work and it causes at least mine to segfault |
4203:b5c2bb0b9cae |
12-Mar-2007 |
Ron Dreslinski <rdreslin@umich.edu> |
Fix some of the memory leaks related to writebacks
src/cpu/memtest/memtest.cc: Add the [] to a delete to make it work correctly src/mem/cache/cache_impl.hh: Fix one of the memory leaks |
4202:f7a05daec670 |
11-Mar-2007 |
Nathan Binkert <binkertn@umich.edu> |
Rework the way SCons recurses into subdirectories, making it automatic. The point is that now a subdirectory can be added to the build process just by creating a SConscript file in it. The process has two passes. On the first pass, all subdirs of the root of the tree are searched for SConsopts files. These files contain any command line options that ought to be added for a particular subdirectory. On the second pass, all subdirs of the src directory are searched for SConscript files. These files describe how to build any given subdirectory. I have added a Source() function. Any file (relative to the directory in which the SConscript resides) passed to that function is added to the build. Clean up everything to take advantage of Source(). function is added to the list of files to be built. |
4200:f55b59fc848b |
10-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
I thought this code got deleted, but since it hasn't I've moved it to a place where it doesn't access freed memory. |
4192:7accc6365bb9 |
09-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Two fixes: 1. Make sure connectMemPorts() only gets called when the CPU's peer gets changed. This is done by making setPeer() virtual, and overriding it in the CPU's ports. When it gets called on a CPU's port (dcache specifically), it calls the normal setPeer() function, and also connectMemPorts(). 2. Consolidate redundant code that handles switching in a CPU.
src/cpu/base.cc: Move common code of switching over peers to base CPU. src/cpu/base.hh: Move common code of switching over peers to BaseCPU. src/cpu/o3/cpu.cc: Add in function that updates thread context's ports. Also use updated function to takeOverFrom() in BaseCPU. This gets rid of some repeated code. src/cpu/o3/cpu.hh: Include function to update thread context's memory ports. src/cpu/o3/lsq.hh: Add function to dcache port that will update the memory ports upon getting a new peer. Also include a function that will tell the CPU to update those memory ports. src/cpu/o3/lsq_impl.hh: Add function that will update the memory ports upon getting a new peer. src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: Add function that will update thread context's memory ports upon getting a new peer. Also use the new BaseCPU's take over from function. src/cpu/simple/atomic.hh: Add in function (and dcache port) that will allow the dcache to update memory ports when it gets assigned a new peer. src/cpu/simple/timing.hh: Add function that will update thread context's memory ports upon getting a new peer. src/mem/port.hh: Make setPeer virtual so that other classes can override it. |
4190:5069dfa3d62e |
08-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
stop m5 from leaking like a sieve don't create a new physPort/virtPort every time activateContext() is called add the ability to tell a memory object to delete it's reference to a port and a method to have a port call deletePortRefs() on the port owner as well as delete it's peer still need to stop calling connectMemoPorts() every time activateContext() is called or we'll overflow the bus id and panic
src/cpu/thread_state.cc: if we hav ea (phys|virt)Port don't create a new on, have it delete it's peer and then reuse it src/mem/bus.cc: src/mem/bus.hh: add ability to delete a port by usig a hash_map instead of an array to store port ids add a function to do deleting src/mem/cache/cache.hh: src/mem/cache/cache_impl.hh: src/mem/mem_object.cc: src/mem/mem_object.hh: adda function to delete port references from a memory object src/mem/port.cc: src/mem/port.hh: add a removeConn function that tell the owener to delete any references to the port and then deletes its peer |
4185:42c0395a03f9 |
07-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
I missed a couple of WithEffects, this should do it |
4182:5b2c0d266107 |
14-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make the predecoder an object with it's own switched header file. Start adding predecoding functionality to x86.
src/arch/SConscript: src/arch/alpha/utility.hh: src/arch/mips/utility.hh: src/arch/sparc/utility.hh: src/cpu/base.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/static_inst.hh: src/arch/alpha/predecoder.hh: src/arch/mips/predecoder.hh: src/arch/sparc/predecoder.hh: Make the predecoder an object with it's own switched header file. |
4181:6edaeff44647 |
13-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Replaced makeExtMI with predecode. Removed the getOpcode function from StaticInst which only made sense for Alpha. Started implementing the x86 predecoder. |
4178:136238c35e4e |
07-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Add setData functions for the new Twin??_t types. |
4172:141705d83494 |
07-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
*MiscReg->*MiscRegNoEffect, *MiscRegWithEffect->*MiscReg |
4167:ce5d0f62f13b |
06-Mar-2007 |
Nathan Binkert <binkertn@umich.edu> |
Move all of the parameters of the Root SimObject so they are directly configured by python. Move stuff from root.(cc|hh) to core.(cc|hh) since it really belogs there now. In the process, simplify how ticks are used in the python code. |
4156:a4667c990e12 |
05-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Add x86 version of call to "decode" |
4149:3da926f8ea75 |
05-Mar-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Added an x86 dyninst |
4115:cc1d6df13c7d |
02-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
make ldtw(a) -- Twin 32 bit load work correctly -- by doing it the same way as the twin 64 bit loads
src/arch/isa_parser.py: src/arch/sparc/isa/decoder.isa: src/arch/sparc/isa/operands.isa: src/base/bigint.hh: src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: src/mem/packet_access.hh: make ldtw(a) Twin 32 bit load work correctly |
4111:65fffcb4fae9 |
28-Feb-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Make trap instructions always generate TrapInstruction Fault objects which call into the Process object to handle system calls. Refactored the Process objects, and move the handler code into it's own file, and add some syscalls which are used in a natively compiled hello world. Software traps with trap number 3 (not syscall number 3) are supposed to cause the register windows to be flushed but are ignored right now. Finally, made uname for SPARC report a 2.6.12 kernel which is what m22-018.pool happens to be running. |
4103:785279436bdd |
03-Mar-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Implement Niagara I/O interface and rework interrupts
configs/common/FSConfig.py: Use binaries we've compiled instead of the ones that come with Legion src/arch/alpha/interrupts.hh: get rid of post(int int_type) and add a get_vec function that gets the interrupt vector for an interrupt number src/arch/sparc/asi.cc: Add AsiIsInterrupt() to AsiIsMmu() src/arch/sparc/faults.cc: src/arch/sparc/faults.hh: Add InterruptVector type src/arch/sparc/interrupts.hh: rework interrupts. They are no longer cleared when created... A I/O or ASI read/write needs to happen before they are cleared src/arch/sparc/isa_traits.hh: Add the "interrupt" trap types to isa traits src/arch/sparc/miscregfile.cc: add names for all the misc registers and possible post an interrupt when TL is changed. src/arch/sparc/miscregfile.hh: Add a helper function to post an interrupt when pil < some set softint src/arch/sparc/regfile.cc: src/arch/sparc/regfile.hh: InterruptLevel shouldn't really live here, moved to interrupt.hh src/arch/sparc/tlb.cc: Add interrupt ASIs to TLB src/arch/sparc/ua2005.cc: Add checkSoftInt to check if a softint needs to be posted Check that a tickCompare isn't scheduled before scheduling one Post and clear interrupts on queue writes and what not src/base/bitfield.hh: Add an helper function to return the msb that is set src/cpu/base.cc: src/cpu/base.hh: get rid of post_interrupt(type) since it's no longer needed.. Add a way to see what interrupts are pending src/cpu/intr_control.cc: src/cpu/intr_control.hh: src/dev/alpha/tsunami_cchip.cc: src/python/m5/objects/IntrControl.py: Make IntrControl have a system pointer rather than using a cpu pointer to get one src/dev/sparc/SConscript: add iob to SConsscrip tests/quick/10.linux-boot/ref/alpha/linux/tsunami-simple-atomic-dual/config.ini: tests/quick/10.linux-boot/ref/alpha/linux/tsunami-simple-atomic-dual/config.out: tests/quick/10.linux-boot/ref/alpha/linux/tsunami-simple-atomic/config.ini: tests/quick/10.linux-boot/ref/alpha/linux/tsunami-simple-atomic/config.out: tests/quick/10.linux-boot/ref/alpha/linux/tsunami-simple-timing-dual/config.ini: tests/quick/10.linux-boot/ref/alpha/linux/tsunami-simple-timing-dual/config.out: tests/quick/10.linux-boot/ref/alpha/linux/tsunami-simple-timing/config.ini: tests/quick/10.linux-boot/ref/alpha/linux/tsunami-simple-timing/config.out: tests/quick/80.netperf-stream/ref/alpha/linux/twosys-tsunami-simple-atomic/config.ini: tests/quick/80.netperf-stream/ref/alpha/linux/twosys-tsunami-simple-atomic/config.out: update config.ini/out for intrcntrl not having a cpu pointer anymore |
4075:cc018a738853 |
18-Feb-2007 |
Nathan Binkert <binkertn@umich.edu> |
Give the progress event its own priority |
4074:f2c4afa8cd46 |
17-Feb-2007 |
Nathan Binkert <binkertn@umich.edu> |
Default to tracing being disabled in C++, it will be turned on in python. Fix the trace start code so it actually starts when it is suppsed to. Make the Exec tracing stuff obey the trace enabled flag. |
4054:3d617b3be4fa |
13-Feb-2007 |
Nathan Binkert <binkertn@umich.edu> |
Merge all of the execution trace configuration stuff into the traceflags infrastructure. InstExec is now just Exec and all of the command line options are now trace options. |
4052:895ad21ffbf3 |
12-Feb-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
some forgotten commits |
4050:cf1daaef9109 |
12-Feb-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zeep.pool:/z/saidi/work/m5.newmem
src/cpu/simple/atomic.cc: merge steve's changes in. |
4046:ef34b290091e |
10-Feb-2007 |
Nathan Binkert <binkertn@umich.edu> |
Clean up tracing stuff more, get rid of the trace log since its not all that useful. Fix a few bugs with python/C++ integration. |
4040:eb894f3fc168 |
12-Feb-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
rename store conditional stuff as extra data so it can be used for conditional swaps as well Add support for a twin 64 bit int load Add Memory barrier and write barrier flags as appropriate Make atomic memory ops atomic
src/arch/alpha/isa/mem.isa: src/arch/alpha/locked_mem.hh: src/cpu/base_dyn_inst.hh: src/mem/cache/cache_blk.hh: src/mem/cache/cache_impl.hh: rename store conditional stuff as extra data so it can be used for conditional swaps as well src/arch/alpha/types.hh: src/arch/mips/types.hh: src/arch/sparc/types.hh: add a largest read data type for statically allocating read buffers in atomic simple cpu src/arch/isa_parser.py: Add support for a twin 64 bit int load src/arch/sparc/isa/decoder.isa: Make atomic memory ops atomic Add Memory barrier and write barrier flags as appropriate src/arch/sparc/isa/formats/mem/basicmem.isa: add post access code block and define a twinload format for twin loads src/arch/sparc/isa/formats/mem/blockmem.isa: remove old microcoded twin load coad src/arch/sparc/isa/formats/mem/mem.isa: swap.isa replaces the code in loadstore.isa src/arch/sparc/isa/formats/mem/util.isa: add a post access code block src/arch/sparc/isa/includes.isa: need bigint.hh for Twin64_t src/arch/sparc/isa/operands.isa: add a twin 64 int type src/cpu/simple/atomic.cc: src/cpu/simple/atomic.hh: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: add support for twinloads add support for swap and conditional swap instructions rename store conditional stuff as extra data so it can be used for conditional swaps as well src/mem/packet.cc: src/mem/packet.hh: Add support for atomic swap memory commands src/mem/packet_access.hh: Add endian conversion function for Twin64_t type src/mem/physical.cc: src/mem/physical.hh: src/mem/request.hh: Add support for atomic swap memory commands Rename sc code to extradata |
4035:f80ad98b2304 |
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Updates for commit. 1. Move interrupt handling to a separate function to clean up main commit() function a bit. Also gate the function call off properly based on whether or not there are outstanding interrupts, and the system is not in PAL mode. 2. Better handling of updating instruction's status bits. Instructions are not marked "atCommit" until other stages view it (pushed off to IEW/IQ), and they have been properly handled (faults). 3. Don't consider the ROB "empty" for the purpose of other stages until the ROB is empty, all stores have written back, and there was no store commits this cycle. The last is necessary in case a store committed, in which case it would look like all stores have written back but in actuality have not.
src/cpu/o3/commit.hh: Slightly modify how interrupts are handled. Also include some extra bools to keep track of state properly. src/cpu/o3/commit_impl.hh: Slightly modify how interrupts are handled. Also include some extra bools to keep track of state.
General correctness updates, most specifically for when commit broadcasts to other stages that the ROB is empty. |
4033:7bb1223f9645 |
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Handle status bits a little better, as well as non-speculative instructions.
src/cpu/o3/iew_impl.hh: Allow for slightly more flexible handling of non-speculative instructions. They can be other classes now, such as loads or stores.
Also be sure to clear the state associated with squashes that are not used. i.e. if a squash due to a memory ordering violation happens on the same cycle as an older branch squashing, clear the state associated with the memory ordering violation.
Lastly don't consider uncached loads to officially be "at commit" until IEW receives the signal back from commit about the load. src/cpu/o3/inst_queue_impl.hh: Don't consider non-speculative instructions to be "at commit" until the IQ has received a signal from commit about the instruction. This prevents non-speculative instructions from being issued too early. src/cpu/o3/mem_dep_unit_impl.hh: Clear instruction's ability to issue if it's replayed. |
4032:8b987a6a2afc |
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Two fixes: 1. Requests are handled more properly now. They assume the memory system takes control of the request upon sending out an access. 2. load-load ordering is maintained.
src/cpu/base_dyn_inst.hh: Update how requests are handled. The BaseDynInst should not be able to hold a pointer to the request because the request becomes owned by the memory system once it is sent out.
Also include some functions to allow certain status bits to be cleared. src/cpu/base_dyn_inst_impl.hh: Update how requests are handled. The BaseDynInst should not be able to hold a pointer to the request because the request becomes owned by the memory system once it is sent out. src/cpu/o3/fetch_impl.hh: General correctness fixes. retryPkt is not necessarily always set, so handle it properly. Also consider the cache unblocked only when recvRetry is called. src/cpu/o3/lsq_unit.hh: Handle requests a little more correctly. Now that the requests aren't pointed to by the DynInst, be sure to delete the request if it's not being used by the memory system.
Also be sure to not store-load forward from an uncacheable store. src/cpu/o3/lsq_unit_impl.hh: Check to make sure load-load ordering was maintained.
Also handle requests a little more correctly. |
4031:bf191145b7c9 |
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Set progress_interval in terms of CPU cycles. |
4030:4046b2213995 |
23-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
A couple of minor fixes. 1. Set CPU ID in all modes for the O3 CPU. 2. Use nextCycle() function to prevent phase drift in O3 CPU. 3. Remove assertion in rename map that is no longer true.
src/cpu/o3/alpha/cpu_builder.cc: Allow for CPU id in all modes, not just full system. Also include a parameter that was left out by accident. src/cpu/o3/alpha/cpu_impl.hh: Set the CPU ID properly. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: Use nextCycle() function so that the CPU does not get out of phase when starting up from quiesces. src/cpu/o3/rename_map.cc: Remove assertion that is no longer true. tests/configs/o3-timing.py: Set CPU's id to 0. |
4027:53292b42ee1c |
12-Feb-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Move store conditional result checking from SimpleAtomicCpu write function into Alpha ISA description. write now just generically returns a result value if the res pointer is non-null (which means we can only provide a res pointer if we expect a valid result value). |
4022:c422464ca16e |
07-Feb-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Make memory commands dense again to avoid cache stat table explosion. Created MemCmd class to wrap enum and provide handy methods to check attributes, convert to string/int, etc. |
4011:e6899d7ca5b1 |
06-Feb-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
more fp fixes fix unaligned accesses in mmaped disk device
src/arch/sparc/isa/decoder.isa: get (ld|st)fsr ops working right. In reality the fp enable check needs to go higher up in the emitted code src/arch/sparc/isa/formats/basic.isa: move the cexec into the aexec field src/cpu/exetrace.cc: copy the exception state from legion when we get it wrong. We aren't going to get it right without an fp emulation layer src/dev/sparc/mm_disk.cc: src/dev/sparc/mm_disk.hh: fix unaligned accesses in the memory mapped disk device |
4008:ccad3906006a |
02-Feb-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
fix mostly floating point related
src/arch/sparc/floatregfile.cc: fix fp read/writing to registers... looking for suggestions on cleaner ways if anyone has them src/arch/sparc/isa/decoder.isa: fix some fp implementations src/arch/sparc/isa/formats/basic.isa: add new fp op class that 0 cexec in fsr and sets rounding mode for the up comming op src/arch/sparc/isa/includes.isa: include the appropriate header files for the rounding code src/arch/sparc/miscregfile.cc: print fsr out when it's read/written and the Sparc traceflgas in on src/cpu/exetrace.cc: fix printing of float registers |
4001:5acecff20547 |
30-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
add fsr to the list of registers we are interested in |
4000:9bf49767a9e4 |
30-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Make SPARC checkpointing work
src/arch/sparc/floatregfile.cc: Fix serialization for fpreg src/arch/sparc/intregfile.cc: fix serialization for intreg src/arch/sparc/miscregfile.cc: fix serialization from miscreg src/arch/sparc/pagetable.cc: fix serialization for page table src/arch/sparc/regfile.cc: need to serialize nnpc src/arch/sparc/tlb.cc: write serialization code for tlb src/cpu/base.cc: provide a way to find the thread number a context is serialize the instruction counter src/cpu/base.hh: provide a way to find the thread number a context is and given a thread number find a context pointer src/cpu/cpuevent.hh: provide method to get thread context from a cpu event for serialization src/dev/sparc/t1000.cc: src/dev/sparc/t1000.hh: nothing to serialize in t1000 src/sim/serialize.cc: src/sim/serialize.hh: Make findObj() work (it hasn't since we did the python conversion stuff) |
3989:6ce62f2fdeb4 |
29-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
fix some over sights in moving windowing and ccr registers to int reg file |
3987:b9434f1d25fa |
29-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zeep.pool:/z/saidi/work/m5.newmem |
3984:8f1bb70a4abf |
29-Jan-2007 |
Gabe Black <gblack@eecs.umich.edu> |
A minor hack to get branch prediction to behave like before on Alpha. |
3983:87619a68b7ba |
29-Jan-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Fixed a warning about an unused variable. |
3980:9bcb2a2e9bb8 |
27-Jan-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zower.eecs.umich.edu:/eecshome/m5/newmem
src/arch/sparc/isa/formats/mem/util.isa: src/arch/sparc/isa_traits.hh: src/arch/sparc/system.cc: Hand Merge |
3975:10fa2125f19e |
24-Jan-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zower.eecs.umich.edu:/eecshome/m5/newmem |
3972:2c65c89843c5 |
23-Jan-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into ewok.(none):/home/gblack/m5/newmemo3
src/sim/byteswap.hh: Hand Merge |
3970:d54945bab95d |
03-Jan-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zower.eecs.umich.edu:/eecshome/m5/newmem |
3969:77957f66c1d5 |
28-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fixes to get non-delay slot ISAs (Alpha) working again, and pulling some debug output out of ifdefs. |
3968:0a08763926a1 |
28-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Phased out DelaySlotInfo. |
3967:1f1dff08a596 |
28-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Some fixes for decode stage branches without delay slots. This will need some work to be compatible with delay slots too. Also changed some direct variable uses to use an accessor function. |
3966:e589d0a642f5 |
28-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Make sure the value of PC is actually updated now that the instruction target isn't set explicitly. |
3965:b4cab77371ed |
28-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Implement a stub nnpc for alpha that is read only as npc+4. |
3962:18329efc47b8 |
20-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fixes to get MIPS_SE to compile. |
3961:42374ae36922 |
20-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fixes to get ALPHA_FS and ALPHA_SE to compile again. |
3960:1dca397b2bab |
20-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Initial work to make remote gdb available in SE mode. This is completely untested. |
3958:58d09260d073 |
18-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fix a place where the wrong width parameter was used, and set the nextNPC correctly on memory squashes. |
3957:37329de528a9 |
18-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Make sure you only handle branch delay slots specially when there actually was a branch. |
3953:300d526414e6 |
17-Dec-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Convert Alpha (and finish converting MIPS) to new InstObjParam interface.
src/arch/alpha/isa/branch.isa: src/arch/alpha/isa/fp.isa: src/arch/alpha/isa/int.isa: src/arch/alpha/isa/main.isa: src/arch/alpha/isa/mem.isa: src/arch/alpha/isa/pal.isa: src/arch/mips/isa/formats/mem.isa: src/arch/mips/isa/formats/util.isa: Get rid of CodeBlock calls to adapt to new InstObjParam interface. src/arch/isa_parser.py: Check template code for operands (in addition to snippets). src/cpu/o3/alpha/dyn_inst.hh: Add (read|write)MiscRegOperand calls to Alpha DynInst. |
3949:b6664282d899 |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zower.eecs.umich.edu:/eecshome/m5/newmem
src/arch/isa_parser.py: src/arch/sparc/isa/formats/mem/basicmem.isa: src/arch/sparc/isa/formats/mem/blockmem.isa: src/arch/sparc/isa/formats/mem/util.isa: src/arch/sparc/miscregfile.cc: src/arch/sparc/miscregfile.hh: src/cpu/o3/iew_impl.hh: Hand Merge |
3945:255fad06ea71 |
28-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
fix comparing fp registers between legion and m5 make fp writes also chatty with the Sparc traceflag
src/arch/sparc/floatregfile.cc: make fp writes also chatty with the Sparc traceflag src/cpu/exetrace.cc: fix comparing fp registers between legion and m5 |
3940:b87f85bb4275 |
27-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
While I'm waiting for legion to run make m5 compile with a few more compilers
SConstruct: src/SConscript: Add flags for Intel CC while i'm at it src/base/compiler.hh: the _Pragma stuff needst to be called this way unless someone happens to have a cleaner way src/base/cprintf_formats.hh: add std:: where appropriate src/base/statistics.hh: use this->map since icc was getting confused about std::map vs the locally defined map src/cpu/static_inst.hh: Add some more dummy returns where needed src/mem/packet.hh: add more dummy returns where needed src/sim/host.hh: use limits to come up with max tick |
3935:ef6891f64dc8 |
26-Jan-2007 |
Lisa Hsu <hsul@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zed.eecs.umich.edu:/z/hsul/work/sparc/x86.m5 |
3931:de791fa53d04 |
26-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Make Sparc traceflag even more chatty some fixes to fp instructions to use the single precision registers if this is an fp op emit fp check code add fpregs to m5legion struct
src/arch/sparc/floatregfile.cc: Make Sparc traceflag even more chatty src/arch/sparc/isa/base.isa: add code to check if the fpu is enabled src/arch/sparc/isa/decoder.isa: some fixes to fp instructions to use the single precision registers fix smul again fix subc/subcc/subccc condition code setting src/arch/sparc/isa/formats/basic.isa: src/arch/sparc/isa/formats/mem/util.isa: if this is an fp op emit fp check code src/cpu/exetrace.cc: check fp regs as well as int regs src/cpu/m5legion_interface.h: add fpregs to m5legion struct |
3929:3640569369a5 |
25-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
fix smul and sdiv to sign extend, and handle overflow/underflow corretly Only allow writing/reading of 32 bits of Y Only allow writing/reading 32 bits of pc when pstate.am Put any loaded data on the first half of a micro-op in uReg0 so it can't overwrite the register we are using for address calculation only erase a entry from the lookup table if it's valid Put in a temporary check to make sure that lookup table and tlb array stay in sync if we are interrupted in the middle of a mico-op, reset the micropc/nexpc so we start on the first part of it when we come back
src/arch/sparc/isa/decoder.isa: fix smul and sdiv to sign extend, and handle overflow/underflow corretly Only allow writing/reading of 32 bits of Y Only allow writing/reading 32 bits of pc when pstate.am Put any loaded data on the first half of a micro-op in uReg0 so it can't overwrite the register we are using for address calculation src/arch/sparc/isa/formats/mem/blockmem.isa: Put any loaded data on the first half of a micro-op in uReg0 so it can't overwrite the register we are using for address calculation src/arch/sparc/isa/includes.isa: Use limits for 32bit underflow/overflow detection src/arch/sparc/tlb.cc: only erase a entry from the lookup table if it's valid Put in a temporary check to make sure that lookup table and tlb array stay in sync src/arch/sparc/tlb_map.hh: add a print function to dump the tlb lookup table src/cpu/simple/base.cc: if we are interrupted in the middle of a mico-op, reset the micropc/nexpc so we start on the first part of it when we come back |
3928:9486450f013f |
23-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
use pstate.am to mask off PC/NPC where it needs to +be check writability of tlb cache entry before using update tagaccess in places I forgot to move the tlb privileged test up since it is higher priority
src/arch/sparc/faults.cc: save only 32 bits of PC/NPC if Pstate.am is set src/arch/sparc/isa/decoder.isa: return only 32 bits of PC/NPC if Pstate.am is set increment cleanwin correctly src/arch/sparc/tlb.cc: check writability of cache entry update tagaccess in a few more places move the privileged test up since it is higher priority src/cpu/exetrace.cc: mask off upper bits of pc if pstate.am is set before comparing to legion |
3923:a8ce86366fd3 |
26-Jan-2007 |
Lisa Hsu <hsul@eecs.umich.edu> |
eliminate cpu checkInterrupts bool, it is redundant and unnecessary. |
3918:1f9a98d198e8 |
26-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
make our code a little more standards compliant pretty close to compiling w/ suns compiler
briefly: add dummy return after panic()/fatal() split out flags by compiler vendor include cstring and cmath where appropriate use std namespace for string ops
SConstruct: Add code to detect compiler and choose cflags based on detected compiler Fix zlib check to work with suncc src/SConscript: split out flags by compiler vendor src/arch/sparc/isa/decoder.isa: use correct namespace for sqrt src/arch/sparc/isa/formats/basic.isa: add dummy return around panic src/arch/sparc/isa/formats/integerop.isa: use correct namespace for stringops src/arch/sparc/isa/includes.isa: include cstring and cmath where appropriate src/arch/sparc/isa_traits.hh: remove dangling comma src/arch/sparc/system.cc: dummy return to make sun cc front end happy src/arch/sparc/tlb.cc: src/base/compression/lzss_compression.cc: use std namespace for string ops src/arch/sparc/utility.hh: no reason to say something is unsigned unsigned int src/base/compression/null_compression.hh: dummy returns to for suncc front end src/base/cprintf.hh: use standard variadic argument syntax instead of gnuc specefic renaming src/base/hashmap.hh: don't need to define hash for suncc src/base/hostinfo.cc: need stdio.h for sprintf src/base/loader/object_file.cc: munmap is in std namespace not null src/base/misc.hh: use M5 generic noreturn macros use standard variadic macro __VA_ARGS__ src/base/pollevent.cc: we need file.h for file flags src/base/random.cc: mess with include files to make suncc happy src/base/remote_gdb.cc: malloc memory for function instead of having a non-constant in an array size src/base/statistics.hh: use std namespace for floor src/base/stats/text.cc: include math.h for rint (cmath won't work) src/base/time.cc: use suncc version of ctime_r src/base/time.hh: change macro to work with both gcc and suncc src/base/timebuf.hh: include cstring from memset and use std:: src/base/trace.hh: change variadic macros to be normal format src/cpu/SConscript: add dummy returns where appropriate src/cpu/activity.cc: include cstring for memset src/cpu/exetrace.hh: include cstring fro memcpy src/cpu/simple/base.hh: add dummy return for panic src/dev/baddev.cc: src/dev/pciconfigall.cc: src/dev/platform.cc: src/dev/sparc/t1000.cc: add dummy return where appropriate src/dev/ide_atareg.h: make define work for both gnuc and suncc src/dev/io_device.hh: add dummy returns where approirate src/dev/pcidev.hh: src/mem/cache/cache_impl.hh: src/mem/cache/miss/blocking_buffer.cc: src/mem/cache/tags/lru.hh: src/mem/cache/tags/split.hh: src/mem/cache/tags/split_lifo.hh: src/mem/cache/tags/split_lru.hh: src/mem/dram.cc: src/mem/packet.cc: src/mem/port.cc: include cstring for string ops src/dev/sparc/mm_disk.cc: add dummy return where appropriate include cstring for string ops src/mem/cache/miss/blocking_buffer.hh: src/mem/port.hh: Add dummy return where appropriate src/mem/cache/tags/iic.cc: cast hastSets to double for log() call src/mem/physical.cc: cast pmemAddr to char* for munmap src/sim/byteswap.hh: make define work for suncc and gnuc |
3905:071838517e31 |
16-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
In the case that we generate a fault (e.g. a tlb miss) on a microcoded instruction set curMacroStaticInst to null This way we'll jump immediately to the handler |
3903:f005d99d790a |
16-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Fix legion lock code a bit so that if we jump out of a micro coded instruction (because of a fault on the first op) we don't lose sync with legion Only print TLB if there is a tlb difference |
3901:64319816e403 |
16-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Modify ISA and staticInst to support a IsFirstMicroOp flag Increment instruction count on first micro-op instead of last
src/arch/sparc/isa/decoder.isa: Implement a twin load for ASI_LDTX_P(0xe2) src/arch/sparc/isa/formats/mem/blockmem.isa: set the new flag IsFirstMicroOp when needed src/cpu/simple/atomic.cc: Increment instruction count on first micro-op instead of last (because if we take a fault on a micro coded instruction it should be counted twice acording to legion) src/cpu/static_inst.hh: Add IsFirstMicroop flag to static insts |
3894:60a7b0a3602f |
08-Jan-2007 |
Lisa Hsu <hsul@eecs.umich.edu> |
the way i understand it, interrupts in m5 is a little bloated. the usage of CPU->checkInterrupts bool is inconsistent, and i think should eventually be phased out. For now, I've just assumed that CPU->checkInterrupts() is the way to fast path a CPU if you have no interrupts by having a simple bitfield in each ISA to determine whether interrupts are pending. getInterrupts has been mostly filled in.
src/arch/sparc/interrupts.hh: fill in how we do interrupts on sparc a little bit.
1) create a bitfield for interrupts, and check that in checkInterrupts() to fast path CPU. 2) fill in getInterrupts() a little bit.
also, update the bitfield access to be HPSTATE::hpriv, etc. src/arch/sparc/ua2005.cc: 1) update formatting 2) change the way interrupts are done to use the new way to tickle the CPU. src/cpu/base.cc: src/cpu/base.hh: overload the post_interrupt function for SPARC interrupts - which are only denoted by a single int value. |
3892:a08303ba86f8 |
08-Jan-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
change when legion-lock causes the simulation to die. It now happens after two consuctive differences since we compare stuff at slightly different times interrupts are seen the cycle before they happen in m5 so the pc gets changed early. |
3886:d55c97419444 |
03-Jan-2007 |
Nathan Binkert <binkertn@umich.edu> |
Formatting |
3884:cc52005408ef |
30-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix up previous commit to proper logic.
src/cpu/o3/commit_impl.hh: Oops, changed the logic a little bit. Fix it up to how it used to be. |
3880:06fc2b8ca95f |
27-Dec-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Compare legion and m5 tlbs for differences Only print faults instructions that aren't traps or faulting loads
src/cpu/exetrace.cc: Compare the legion and m5 tlbs and printout any differences Only show differences if the instruction isn't a trap and isn't a memory operation that changes the trap level (a fault) src/cpu/m5legion_interface.h: update the m5<->legion interface to add tlb data |
3876:127c71cfe21a |
26-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Remove some #if FULL_SYSTEMs so MP stuff works even in SE mode. |
3870:fc7a16797788 |
22-Dec-2006 |
Nathan Binkert <binkertn@umich.edu> |
style |
3867:807483cfab77 |
21-Dec-2006 |
Nathan Binkert <binkertn@umich.edu> |
don't use (*activeThreads).begin(), use activeThreads->blah(). Also don't call (*activeThreads).end() over and over. Just call activeThreads->end() once and save the result. Make sure we always check that there are elements in the list before we grab the first one. |
3863:adf3ddd4bcde |
19-Dec-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
fix twinx loads a little bit bugfixes and demap implementation in tlb ignore some more differencs for one cycle
src/arch/sparc/isa/formats/mem/blockmem.isa: twinx has 2 micro-ops src/arch/sparc/isa/formats/mem/util.isa: fix the fault check for twinx src/arch/sparc/tlb.cc: tlb bugfixes and write demapping code src/cpu/exetrace.cc: don't halt on a couple more instruction (ldx, stx) when things differ beacuse of the way tlb faults are handled in legion. |
3859:9278f759e55c |
21-Dec-2006 |
Nathan Binkert <binkertn@umich.edu> |
<scold> Make sure that variables are always initalized! </scold> |
3846:a0fe3210ce53 |
15-Dec-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
little fixes i noticed while searching for reason for address range issues (but these weren't the cause of the problem).
RangeSize as a function takes a start address, and a SIZE, and will make the range (start, start+size-1) for you.
src/cpu/memtest/memtest.hh: src/cpu/o3/fetch.hh: src/cpu/o3/lsq.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/simple/atomic.hh: src/cpu/simple/timing.hh: Fix RangeSize arguments src/dev/alpha/tsunami_cchip.cc: src/dev/alpha/tsunami_io.cc: src/dev/alpha/tsunami_pchip.cc: src/dev/baddev.cc: pioSize indicates SIZE, not a mask |
3840:5f8deb240569 |
15-Dec-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
some small general fixes to make everythign work nicely with other ISAs, now we can merge back with newmem. exetrace.cc: wrap this variable between FULL_SYSTEM #ifs mmaped_ipr.hh: fix for build miscregfile.cc: fixes for HPSTATE access during SE mode
src/arch/sparc/miscregfile.cc: fixes for HPSTATE access during SE mode src/arch/mips/mmaped_ipr.hh: fix for build src/cpu/exetrace.cc: wrap this variable between FULL_SYSTEM #ifs |
3832:49c95a73e29c |
12-Dec-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Fix bugs in tlbmap (and thus rangemap since the code is nearly identical) Deal with block initializing stores (by doing nothing, at some point we might want to do the write hint 64 like thing) Fix tcc instruction igoner in legion-lock stuff to be correct in all cases Have console interrupts warn rather than panicing until we figure out what to do with interrupts
src/arch/sparc/miscregfile.cc: src/arch/sparc/miscregfile.hh: add a magic miscreg which reads all the bits the tlb needs in one go src/arch/sparc/tlb.cc: initialized the context type and id to reasonable values and handle block init stores src/arch/sparc/tlb_map.hh: fix bug in tlb map code src/base/range_map.hh: fix bug in rangemap code and add range_multimap (these are probably useful for bus range stuff) src/cpu/exetrace.cc: fixup tcc ignore code to be correct src/dev/sparc/t1000.cc: make console interrupt stuff warn instead of panicing until we get interrupt stuff figured out src/unittest/rangemaptest.cc: fix up the rangemap unit test to catch the missing case |
3826:e35adf01a285 |
09-Dec-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Allocate the correct number of global registers Fix fault formating and code for traps fix a couple of bugs in the decoder Cleanup/fix page table entry code Implement more mmaped iprs, fix numbered tlb insertion code, add function to dump tlb contents Don't panic if we differ from legion on a tcc instruction because of where legion prints its data and where we print our data
src/arch/sparc/faults.cc: Fix fault formating and code for traps src/arch/sparc/intregfile.hh: allocate the correct number of global registers src/arch/sparc/isa/decoder.isa: fix a couple of bugs in the decoder: wrasi should write asi not ccr, done/retry should get hpstate from htstate src/arch/sparc/pagetable.hh: cleanup/fix page table code src/arch/sparc/tlb.cc: implement more mmaped iprs, fix numbered insertion code, add function to dump tlb contents src/arch/sparc/tlb.hh: add functions to write TagAccess register on tlb miss and to dump all tlb entries for debugging src/cpu/exetrace.cc: dump tlb entries on error, don't consider differences the cycle we take a trap to be bad. |
3825:9b5e6c4d3ecb |
07-Dec-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
get legion/m5 to first tlb miss fault
src/arch/sparc/asi.cc: src/arch/sparc/asi.hh: add sparc error asi src/arch/sparc/faults.cc: put a panic in if TL == MaxTL src/arch/sparc/isa/decoder.isa: Hpstate needs to be updated on a done too src/arch/sparc/miscregfile.cc: warn istead of panicing of fprs/fsr accesses src/arch/sparc/tlb.cc: add sparc error register code that just does nothing fix a couple of other tlb bugs src/arch/sparc/ua2005.cc: fix implementation of HPSTATE write src/cpu/exetrace.cc: let exectrate mess up a couple of times before dying src/python/m5/objects/T1000.py: add l2 error status register fake devices |
3817:7df12d77afc2 |
04-Dec-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
reogranize code to split off FS only misc regs with effect into their own file (reducing the number of if FULL_SYSTEM defines and includes) Protect other pieces of code so that sparc compiles SE again
src/arch/sparc/SConscript: Add ua2005.cc back into SConscript src/arch/sparc/miscregfile.hh: add functions that deal with priv registers so we don't have to have a bunch of if defs and other ugliness src/arch/sparc/mmaped_ipr.hh: wrap handleIpr* with if full_system so it compiles under se src/arch/sparc/ua2005.cc: reorganize edit fs only miscreg functions src/cpu/exetrace.cc: protect legion code so it doesn't try to compile under se |
3815:2a2d5311b66e |
04-Dec-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Legion actually writes to tl-1 in the data structure, so we need to compare correctly |
3814:33bd4ec9d66a |
04-Dec-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
More changes to get SPARC fs closer. Now at 1.2M cycles before difference
configs/common/FSConfig.py: seperate the hypervisor memory and the guest0 memory. In reality we're going to need a better way to do this at some point. Perhaps auto generating the hv-desc image based on the specified config. src/arch/sparc/isa/decoder.isa: change reads/writes to the [hs]tick(cmpr) registers to use readmiscregwitheffect src/arch/sparc/miscregfile.cc: For niagra stick and tick are aliased to one value (if we end up doing mps we might not want this). Use instruction count from cpu rather than cycles because that is what legion does we can change it back after were done with legion src/base/bitfield.hh: add a new function mbits() that just masks off bits of interest but doesn't shift src/cpu/base.cc: src/cpu/base.hh: add instruction count to cpu src/cpu/exetrace.cc: src/cpu/m5legion_interface.h: compare instruction count between legion and m5 too src/cpu/simple/atomic.cc: change asserts of packet success to if panics wrapped with NDEBUG defines so we can get some more useful information when we have a bad address src/dev/isa_fake.cc: src/dev/isa_fake.hh: src/python/m5/objects/Device.py: expand isa fake a bit more having data for each size request, the ability to have writes update the data and to warn on accesses src/python/m5/objects/System.py: convert some tabs to spaces src/python/m5/objects/T1000.py: add more fake devices for each l1 bank and each memory controller |
3807:1455bc719432 |
29-Nov-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zeep.pool:/z/saidi/work/m5.newmem |
3806:65ae5388c059 |
29-Nov-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Add support for mmapped iprs to atomic cpu
src/arch/SConscript: add mmaped_ipr.hh to switch headers src/arch/sparc/asi.hh: make ASI_IMPLICT=0 so by default nothing needs to be done src/arch/sparc/miscregfile.hh: miscregfile no longer needs to include asi.hh src/arch/sparc/tlb.cc: src/arch/sparc/tlb.hh: implement panic instructions for mmaped ipr reads src/cpu/simple/atomic.cc: add check for mmaped iprs and handle them if it exists src/mem/request.hh: allocate space in the flags for mmaped iprs. Put in in the first 8 bits so that by default its fast. Move the other flags up 8 bits |
3803:031d9d1b3924 |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Switch the endianness of data that's forwarded. This is the same sort of problem that was happening when stores went all the way to memory and back. |
3802:e8f55dfb0f56 |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Make fetch detect when a branch is happening, rather than trying to compute when. |
3801:5ea378e2bccd |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Accidently "cleaned" away the NPC parameter to the constructor. |
3800:31469c190b22 |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Don't have "predict" set the predicted target of the instruction. Do that explicitly when you use predict. |
3798:ec59feae527b |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Add in capability to return to unblocking after a squash. This is needed because if you don't squash -all- the instructions, you need to keep clearing out whatever is left in the skid buffer. |
3797:9b58fa5ccaf5 |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Make sure endian conversion is done on the memory data when it's just set to an existing buffer. |
3796:9cb1eaf3a461 |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Make the decoder use the new setup in the dyninsts for branch prediction. |
3795:60ecc96c3cee |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Made branch delay slots get squashed, and passed back an NPC and NNPC to start fetching from. |
3794:647d6bb9539a |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Added a predicted NPC field, explicitly stored whether the instruction was predicted taken or not. |
3792:dae368e56d0e |
16-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Changes to the isa_parser and affected files to fix an indexing problem with split execute instructions and miscregs aliasing with integer registers.
src/arch/isa_parser.py: Rearranged things so that classes with more than one execute function treat operands properly. 1. Eliminated the CodeBlock class 2. Created a SubOperandList 3. Redefined how InstObjParams is constructed
To define an InstObjParam, you can either pass in a single code literal which will be named "code", or you can pass in a dictionary of code snippets which will be substituted into the Templates. In order to get this to work, there is a new restriction that each template has only one function in it. These changes should only affect memory instructions which have regular and split execute functions.
Also changed the MiscRegs so that they use the instrunctions srcReg and destReg arrays. src/arch/sparc/isa/formats/basic.isa: src/arch/sparc/isa/formats/branch.isa: src/arch/sparc/isa/formats/integerop.isa: src/arch/sparc/isa/formats/mem/basicmem.isa: src/arch/sparc/isa/formats/mem/blockmem.isa: src/arch/sparc/isa/formats/mem/util.isa: src/arch/sparc/isa/formats/nop.isa: src/arch/sparc/isa/formats/priv.isa: src/arch/sparc/isa/formats/trap.isa: Rearranged to work with new InstObjParam scheme. src/cpu/o3/sparc/dyn_inst.hh: Added functions to access the miscregs using the indexes from instructions srcReg and destReg arrays. Also changed the names of the other accessors so that they have the suffix "Operand" if they use those arrays. src/cpu/simple/base.hh: Added functions to access the miscregs using the indexes from instructions srcReg and destReg arrays. |
3791:f1783bae1afe |
12-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem/ into zower.eecs.umich.edu:/eecshome/m5/newmem |
3790:f9a7fc567aa3 |
07-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fixed to take into account the misc regs that became int regs. |
3789:9ce219516b5d |
07-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Compilation fixes |
3788:5c804ea5cc48 |
07-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fix for squashing during a serializing instruction. |
3785:e863df7f4630 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Use the renamed register index, rather than the flattened one. |
3784:edc6cff4cbc1 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Got rid of some typedefs and moved the tlbs into the base o3 cpu. |
3783:cd831e0ab049 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Use the setSyscallReturn defined in arch rather than duplicating it here. |
3782:6a52c6c1b8b4 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Moved the RegIdx arrays to the base dyninst. |
3781:b00795985f07 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Got rid of some typedefs, moved the tlbs to the base o3 cpu, and called the architecture defined setSyscallReturn function instead of a duplicate copy.
src/cpu/o3/alpha/cpu.hh: Got rid of some typedefs, and moved the tlbs to the base o3 cpu. src/cpu/o3/alpha/thread_context.hh: src/cpu/o3/cpu.cc: Moved the tlbs to the base o3 cpu. |
3778:ac52cbef744c |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zower.eecs.umich.edu:/eecshome/m5/newmem
src/cpu/o3/commit_impl.hh: Hand Merge |
3777:2a232a230370 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Added a DPRINTF to print out the actual value pulled from memory. |
3776:4f88e76d8ebe |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Flattening and syscallReturn fixes
src/cpu/o3/thread_context_impl.hh: Use flattened indices src/cpu/simple_thread.hh: Use flattened indices, and pass a thread context to setSyscallReturn rather than a register file. src/cpu/thread_context.hh: The SyscallReturn class is no longer in arch/syscallreturn.hh |
3775:ced38affb6b1 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Don't panic, but this needs to be fixed. |
3774:13180c61fe86 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Make syscalls flatten their register indices, and also call into the ISA's setSyscallReturn function rather than having a duplicated one. |
3773:61c53465193d |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Change rename to rename the flattened register index instead of the architectural one. |
3772:71cccab4eff8 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Added in endianness conversion on memory accesses as the data goes out. This will break the checker! |
3771:808a4c19cf34 |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Change how optional delay slot instructions are detected and squashed. |
3770:422aa205500a |
06-Dec-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Get rid of some typedefs which were hardly used, and move some stuff back here that shouldn't be in the architecture specific DynInst classes. |
3760:a4fadb8ef046 |
24-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Initial changes to get O3 working with SPARC
src/arch/sparc/process.cc: MachineBytes doesn't exist any more. src/arch/sparc/regfile.cc: Add in the miscRegFile for good measure. src/cpu/o3/isa_specific.hh: Add in a section for SPARC src/cpu/o3/sparc/cpu.cc: src/cpu/o3/sparc/cpu.hh: src/cpu/o3/sparc/cpu_builder.cc: src/cpu/o3/sparc/cpu_impl.hh: src/cpu/o3/sparc/dyn_inst.cc: src/cpu/o3/sparc/dyn_inst.hh: src/cpu/o3/sparc/dyn_inst_impl.hh: src/cpu/o3/sparc/impl.hh: src/cpu/o3/sparc/params.hh: src/cpu/o3/sparc/thread_context.cc: src/cpu/o3/sparc/thread_context.hh: Sparc version of this file. |
3754:2552dda24372 |
23-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Use the right constant. |
3748:35d3c2e37b58 |
20-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Add in checks of more Legion based state, and put in more sophisticated formatting functions. |
3743:2061715f68d1 |
16-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fixes for SPARC_FS
configs/common/FSConfig.py: Make a SPARC system create an IO bus. src/python/m5/objects/T1000.py: Create a T1000 platform src/arch/sparc/miscregfile.cc: Initialize the strand status register to the value legion provides. src/cpu/exetrace.cc: Truncate an ExtMachInst to a MachInst before comparing with Legion. |
3735:86a7cf4dcc11 |
12-Dec-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Rename the StaticInst-based (read|set)(Int|Float)Reg methods to (read|set)(Int|Float)RegOperand to distinguish from non-StaticInst version. |
3733:2e34561f1eba |
12-Dec-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Get rid of unused lock code. |
3732:e84a6e9ebd3d |
12-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Allow for multiple redirects to happen on a single cycle (only the one for the oldest instruction is passed on to commit).
This fixes a minor bug when multiple FU completions come back out of order (due to the order in which the FUs are freed up), and the oldest redirect isn't recorded properly. The eon benchmark should run now.
src/cpu/o3/iew_impl.hh: Allow for multiple redirects to happen on a single cycle (only the one for the oldest instruction is passed on to commit). |
3731:4cd483eb6f16 |
11-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix up in case a req hasn't yet been generated for this instruction (if there was a fault prior to translation). |
3730:6ccb47795cd5 |
11-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for fetch to use the icache's block size to generate proper access size. |
3708:b174ae14f007 |
06-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for MIPS_SE/m5.fast compile. |
3698:0aa0884a9040 |
02-Dec-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes for MIPS_SE compiling. Regressions seem to work, but Korey should make sure these changes (commit especially) work okay.
src/cpu/o3/commit_impl.hh: src/cpu/o3/fetch_impl.hh: Fixes for MIPS_SE compile. |
3686:fa8d8b90cd8a |
29-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Change the connecting of the physPort and virtPort to the memory object below the CPU to happen every time activateContext is called. The overhead is probably a little higher than necessary, but allows these connections to properly be made when there are CPUs that are inactive until they are switched in.
Right now this introduces a minor memory leak as old physPorts and virtPorts are not deleted when new ones are created. A flyspray task has been created for this issue. It can not be resolved until we determine how the bus will handle giving out ID's to functional ports that may be deleted.
src/cpu/o3/cpu.cc: src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: Change the setup of the physPort and virtPort to instead happen every time the CPU has a context activated. This is a little high overhead, but keeps it working correctly when the CPU does not have a physical memory attached to it until it switches in (like the case of switch CPUs). src/cpu/o3/thread_context.hh: Change function from being called at init() to just being called whenever the memory ports need to be connected. src/cpu/o3/thread_context_impl.hh: Update this to not delete the port if it's the same as the virtPort. src/cpu/thread_context.hh: Change function from being called at init() to whenever the memory ports need to be connected. src/cpu/thread_state.cc: Instead of initializing the ports, simply connect them, deleting any old ports that might exist. This allows these functions to be called multiple times. src/cpu/thread_state.hh: Ports are no longer initialized, but rather connected at context activation time. |
3675:dc883b610345 |
19-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Update Virtual and Physical ports.
src/cpu/o3/alpha/cpu_impl.hh: Handle the PhysicalPort and VirtualPort in the ThreadState. src/cpu/o3/cpu.cc: Initialize the thread context. src/cpu/o3/thread_context.hh: Add new function to initialize thread context. src/cpu/o3/thread_context_impl.hh: Use code now put into function. src/cpu/simple_thread.cc: Move code to ThreadState and use the new helper function. src/cpu/simple_thread.hh: Remove init() in this derived class; use init() from ThreadState base class. src/cpu/thread_state.cc: Move setting up of Physical and Virtual ports here. Change getMemFuncPort() to connectToMemFunc(), which connects a port to a functional port of the memory object below the CPU. src/cpu/thread_state.hh: Update functions. |
3673:34386ba8cb41 |
17-Nov-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Make an initialization pass for the thread context and set the [phys,virt]Port correctly
src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: Call the thread context initialization |
3667:1d57100f8bf0 |
14-Nov-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Merge zizzer:/bk/newmem into zazzer.eecs.umich.edu:/z/rdreslin/m5bk/newmemcleanest |
3661:efc80a01aeb6 |
14-Nov-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Make cpu's capable of having a phase shift |
3658:f0a7030c6bd9 |
14-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Various fixes to delete packet and request a little better.
src/cpu/simple/timing.cc: Various updates for deleting requests more properly.
The major change is moving the deletion of the fetch request/packet to after the instruction has executed and completed. This should fix a few bugs because Ron's memory system didn't expect a call for a functional access while a timing access was being processed. |
3649:0569961a87fc |
13-Nov-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Changes needed for a bus from CPU->L1
src/cpu/simple/atomic.cc: Make the atomic cpu return 0 on snoops. |
3647:8121d4503cbc |
13-Nov-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Make CPU models signal to update the snoop ranges |
3640:3a2f7b451641 |
13-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
More interrupt reworking. |
3639:251dfe00c03d |
13-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Change warn to DPRINTF. |
3637:4c7735f477a1 |
12-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix typo. |
3636:bc107a8b4e31 |
12-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for regression failure.
src/cpu/o3/fetch_impl.hh: Fetch needs to make sure it isn't waiting on an Icache access. |
3635:8f3b67d2accd |
12-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zamp:./local/clean/tmp/test-regress into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-busfix |
3634:7e9abbddf9da |
12-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for non-FS compile. |
3633:524f2aadbc89 |
12-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Updates to support new interrupt processing and removal of PcPAL.
src/arch/alpha/interrupts.hh: No need for this now that the ThreadContext is being used to set these IPRs in interrupts. Also split up the interrupt checking from the updating of the IPL and interrupt summary. src/arch/alpha/tlb.cc: Check the PC for whether or not it's in PAL mode, not the addr. src/cpu/o3/alpha/cpu.hh: Split up getting the interrupt from actually processing the interrupt. src/cpu/o3/alpha/cpu_impl.hh: Splut up the processing of interrupts. src/cpu/o3/commit_impl.hh: Update for ISA-oriented interrupt changes. src/cpu/o3/fetch_impl.hh: Fix broken if statement from PcPAL updates, and properly populate the request fields.
Also more debugging output. src/cpu/ozone/cpu_impl.hh: Updates for ISA-oriented interrupt stuff. src/cpu/ozone/front_end_impl.hh: Populate request fields properly. src/cpu/simple/base.cc: Update for interrupt stuff. |
3617:384e3b1eae06 |
11-Nov-2006 |
Nathan Binkert <binkertn@umich.edu> |
Get rid of the ParamContext for pseudo instructions and move the parameters to the BaseCPU object. |
3615:ea748987af03 |
11-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
The Lock_Flag_DepTag went away earlier, and using TheISA gives the false impression that this code is ISA independent. |
3614:70e12b0fe41e |
11-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Certain header files should only be used in FS.
src/arch/alpha/faults.hh: Only use pagetable.hh in FS src/arch/alpha/pagetable.hh: pagetable.hh should only be included in FS, so protecting it internally should be unnecessary. src/cpu/exetrace.cc: Only use tlb.hh in FS |
3603:714467743f9b |
10-Nov-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
fix endian issues with condition codes use memcpy instead of bcopy s/u_int32_t/uint32_t/g fixup endian code to work with solaris hack to make sure htole() works... Nate, have a good idea to fix this?
src/arch/sparc/faults.cc: set the reset address to be 40 bits. Makes PC printing easier at least for now. src/arch/sparc/isa/base.isa: fix endian issues with condition codes src/arch/sparc/tlb.hh: add implemented physical addres constants src/arch/sparc/utility.hh: add tlb.hh to utilities src/base/loader/raw_object.cc: add a symbol <filename>_start to the symbol table for binaries files src/base/remote_gdb.cc: use memcpy instead of bcopy src/cpu/exetrace.cc: clean up printing a bit more src/cpu/m5legion_interface.h: add tons to the shared interface src/dev/ethertap.cc: s/u_int32_t/uint32_t/g src/dev/ide_atareg.h: fixup endian code to work with solaris src/dev/pcidev.cc: src/sim/param.hh: hack to make sure htole() works... |
3594:e401993e0cbb |
10-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem |
3588:e4ce301f8c7d |
10-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Change exetrace code for working with my trace tool to use stream io rather than sprintf which was breaking on 64 bit hosts. |
3584:8c3cdb2c001c |
09-Nov-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Get SPARC to the point that it starts running. Add ability to load the ROM bin files, cleanup lockstep printing a bit Since we don't have a platform yet, you need to comment out the default responder stuff in Bus.py to make it work.
SConstruct: Add TARGET_ISA to the list of environment variables that end up in the build_env for python configs/common/FSConfig.py: add a simple SPARC system to being testing with, you'll need to change makeLinuxAlphaSystem to makeSparcSystem in fs.py for now src/SConscript: add a raw file object, at least until we get more info about how to compile openboot properly src/arch/sparc/system.cc: src/arch/sparc/system.hh: add parameters for ROM files (OBP/Reset/Hypervisor), a ROM, load files into ROM src/base/loader/object_file.cc: src/base/loader/object_file.hh: add option to try raw when nothing works src/cpu/exetrace.cc: cleanup lockstep printing a little bit src/cpu/m5legion_interface.h: change the instruction to be 32 bits because it is src/mem/physical.cc: fix assert that doesn't work if memory starts somewhere above 0 src/python/m5/objects/BaseCPU.py: Add if statement to choose between sparc tlbs and alpha tlbs src/python/m5/objects/System.py: Add a sparc system that sets the rom addresses correctly src/python/m5/params.py: add the ability to add Addr() together |
3581:42242aef2724 |
08-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem/ into zeep.eecs.umich.edu:/home/gblack/m5/newmemmemops |
3577:605c370622b1 |
08-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Move the check to see if you're in user mode into the isa directory. |
3570:aacc19068f25 |
08-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Put the ProcessInfo and StackTrace objects into the ISA namespaces. |
3565:6ad587fb7dfd |
07-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Put kernel_stats back into arch. |
3562:91ca0152382f |
07-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Only include kern/kernel_stats.hh if in full system. This was breaking MIPS_SE |
3555:5390475618b1 |
07-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Added sim/host.hh for the Addr type. |
3554:0ec75c89bd8b |
07-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Got rid of a stray blank line. |
3548:85e64c82c522 |
07-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Moved the switched version of kernel_stats.hh back to kern, and moved the base kernel_stats to base_kernel_stats |
3541:d74340b852f6 |
06-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem/ into zeep.eecs.umich.edu:/home/gblack/m5/newmemmemops
src/SConscript: SCCS merged |
3536:89aa06409e4d |
06-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Remote GDB support has been changed to use inheritance. Alpha should work, but isn't tested. Other architectures will not. |
3530:149c119b67a2 |
03-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
The tc needs to be protected instead of private so that the CpuEventWrapper can access it. |
3521:0b0b3551def0 |
03-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Got rid of "inPalMode". Some places are still effectively checking if they are in PAL mode, however. |
3520:4f4a2054fd85 |
03-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Add a new file which describes an ISA's interrupt handling mechanism. It records when interrupts are requested, and returns an interrupt to execute if the |
3512:cefe7f965104 |
09-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Draining fixes.
src/cpu/o3/cpu.cc: Handle draining properly when CPU isn't actually being used. src/cpu/simple/atomic.cc: Be sure to set status properly when draining. src/mem/bus.cc: Fix for draining. |
3506:99f86646ba5c |
07-Nov-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
add code to operate in lockstep with legion
src/python/m5/main.py: add option to operate in lockstep with legion |
3501:85f5edc51d97 |
07-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix compile error. |
3500:8d5e32b3bc2e |
07-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Initialize mem dep unit properly.
src/cpu/o3/mem_dep_unit_impl.hh: Initialize mem dep unit properly, add debug output. |
3495:884bf1f0c0c9 |
06-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Clean up clock phase drift code a bit.
src/cpu/base.cc: Move clock phase drift code to the base CPU so that any CPU model can use it. src/cpu/base.hh: Added two functions to help get the next cycle the CPU should be scheduled. src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: Use the function now in BaseCPU. |
3492:20b28fd2cab5 |
05-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Initialize pointer to NULL.
src/cpu/o3/lsq_unit_impl.hh: Be sure to initialize pointer to NULL. |
3490:37a313c96683 |
02-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-busfix |
3486:11b71489efd6 |
02-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
More proper handling of the ports.
src/cpu/simple_thread.cc: Fix up port handling to share code. src/cpu/thread_state.cc: Separate code off into a function. src/cpu/thread_state.hh: Make a separate function that will get the CPU's memory's functional port. |
3485:ac89a047e5b6 |
02-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Remove function that should have been deleted.
src/cpu/simple_thread.cc: This function should have been deleted from an earlier push. src/cpu/simple_thread.hh: Delete this function; it's now in thread_state.hh/.cc. |
3484:9b7ac1654430 |
02-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Use ISA specific makeExtMI.
src/arch/alpha/utility.hh: For now makeExtMI will be specific to the ISA. |
3483:edede8473667 |
04-Nov-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
fixes so that M5 will compile under solaris
SConstruct: Add check to see if we need to include libsocket src/arch/sparc/floatregfile.cc: src/arch/sparc/intregfile.cc: use memset rather than bzero and include the appropriate headerfile src/base/pollevent.cc: If we're compling under solaris we need sys/file.h src/base/random.cc: src/base/random.hh: solaris doesn't have random(), so use rint with the correct rounding mode if we're compiling on solaris src/base/stats/flags.hh: u_int32_t?? src/base/time.hh: grab the timersub() define from freebsd since it doesn't exist in solaris src/cpu/inst_seq.hh: we don't need to include stdint here src/sim/byteswap.hh: the method to detect endianness on Solaris is a little more complex... |
3479:4fbcaa81d105 |
01-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem/ into zeep.eecs.umich.edu:/home/gblack/m5/newmemmemops |
3476:0e26b5458236 |
31-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-busfix
configs/example/fs.py: configs/example/se.py: src/mem/tport.hh: Hand merge. |
3473:852a0bb230da |
10-Nov-2006 |
Kevin Lim <ktlim@umich.edu> |
Change up some warnings to DPRINTFs. |
3468:cf23ad1ceef2 |
01-Nov-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Adjustments for the AlphaTLB changing to AlphaISA::TLB and changing register file functions to not take faults |
3456:94ba6265a8cf |
31-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Missed a few instances of this function. |
3454:26850ac19a39 |
31-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Move IntrFlag into the MiscRegFile and get rid of specialized accessor functions. |
3453:c3ce58882751 |
31-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Put the Alpha tlb stuff into the AlphaISA namespace, and give the classes more neutral names. |
3442:e52c3470e7ef |
29-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
An attempt to serialize the state of the micro code mechanism in the simple cpu.
src/cpu/simple/base.cc: Make a microcoded op start at the current micropc, rather than starting at 0. src/cpu/thread_state.cc: Serialize the microPC and nextMicroPC |
3435:0830790f937c |
28-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
This one really needs to be arch/faults.hh |
3434:995544da4746 |
28-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Include the right version of faults.hh |
3432:0bd71e26a332 |
28-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
One last adjustment to get rid of skew in the simple atomic cpu. |
3431:2e6b4536e574 |
27-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
A more complete attempt to fix the clock skew. |
3430:5afdd6d7df69 |
27-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Potential fix to clock skew problem. |
3411:07ea0d74b798 |
23-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem |
3402:db60546818d0 |
31-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Remove mem parameter. Now the translating port asks the CPU's dcache's peer for its MemObject instead of having to have a paramter for the MemObject.
configs/example/fs.py: configs/example/se.py: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: src/cpu/thread_state.cc: src/cpu/thread_state.hh: tests/configs/o3-timing-mp.py: tests/configs/o3-timing.py: tests/configs/simple-atomic-mp.py: tests/configs/simple-atomic.py: tests/configs/simple-timing-mp.py: tests/configs/simple-timing.py: tests/configs/tsunami-simple-atomic-dual.py: tests/configs/tsunami-simple-atomic.py: tests/configs/tsunami-simple-timing-dual.py: tests/configs/tsunami-simple-timing.py: No need for mem parameter any more. src/cpu/checker/cpu.cc: Use new constructor for simple thread (no more MemObject parameter). src/cpu/checker/cpu.hh: Remove MemObject parameter. src/cpu/memtest/memtest.hh: Ports now take in their MemObject owner. src/cpu/o3/alpha/cpu_builder.cc: Remove mem parameter. src/cpu/o3/alpha/cpu_impl.hh: Remove memory parameter and clean up handling of TranslatingPort. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/mips/cpu_builder.cc: src/cpu/o3/mips/cpu_impl.hh: src/cpu/o3/params.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_builder.cc: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/simple_params.hh: src/cpu/ozone/thread_state.hh: src/cpu/simple/atomic.cc: Remove memory parameter. |
3401:1df0cb879413 |
31-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Ports now have a pointer to the MemObject that owns it (can be NULL).
src/cpu/simple/atomic.hh: Port now takes in the MemObject that owns it. src/cpu/simple/timing.hh: Port now takes in MemObject that owns it. src/dev/io_device.cc: src/mem/bus.hh: Ports now take in the MemObject that owns it. src/mem/cache/base_cache.cc: Ports now take in the MemObject that own it. src/mem/port.hh: src/mem/tport.hh: Ports now optionally take in the MemObject that owns it. |
3393:43e1a001a7ce |
23-Oct-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zed.eecs.umich.edu:/z/hsul/work/m5/newmem |
3392:82c6aac35063 |
23-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Minor compile fix. Not sure why this is broken. |
3387:8f146ac8248f |
23-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Don't let interupts interupt microcode at undesired points. |
3383:8105c3e566ab |
20-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into zeep.eecs.umich.edu:/home/gblack/m5/newmem |
3380:382e21bc32f3 |
18-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fixed up exetrace.cc to deal with microcode, and to made floating point register numbers correlate to the numbers used in SPARC in m5 and statetrace.
src/cpu/exetrace.cc: Fixed up to deal with microcode, and to make floating point register numbers correlate to the numbers used in SPARC. util/statetrace/arch/tracechild_sparc.cc: util/statetrace/arch/tracechild_sparc.hh: Make floating point register numbers correlate to the numbers used in SPARC. |
3376:ed8179dd13da |
16-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into zeep.eecs.umich.edu:/home/gblack/m5/newmem |
3368:3342dd3f5248 |
22-Oct-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Add Quiesce trace flag to track CPU quiesce/wakeup events. |
3352:8e940d22b2a8 |
20-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Merge zizzer:/bk/newmem into zazzer.eecs.umich.edu:/z/rdreslin/m5bk/newmemcleanest
src/mem/tport.cc: Merge PacketPtr changes |
3349:fec4a86fa212 |
20-Oct-2006 |
Nathan Binkert <binkertn@umich.edu> |
Use PacketPtr everywhere |
3348:11f6ef023158 |
20-Oct-2006 |
Nathan Binkert <binkertn@umich.edu> |
refactor code for the packet, get rid of packet_impl.hh and call it packet_access.hh and fix the #includes so things compile right. |
3342:19e716ad518e |
20-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Use fixPacket function everywhere. Fix fixPacket assert function. Stop timing port from forwarding the request if a response was found in its queue on a read.
src/cpu/memtest/memtest.cc: src/cpu/memtest/memtest.hh: src/python/m5/objects/MemTest.py: Add parameter to configure what percentage of mem accesses are functional src/mem/cache/base_cache.cc: src/mem/cache/cache_impl.hh: Use fix Packet function src/mem/packet.cc: Fix an assert that was checking the wrong thing src/mem/tport.cc: Properly detect if we need to do the access to the functional device |
3340:5b24f2c55fae |
19-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Fix memtester to use functional access, fix cache to work functionally now that we could test it.
src/cpu/memtest/memtest.cc: Fix memtest to do functional accesses src/mem/cache/cache_impl.hh: Fix cache to handle functional accesses properly based on memtester changes Still need to fix functional accesses in timing mode now that the memtester can test it. |
3339:d1b3ec71baa4 |
19-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Small changes: ?? doesn't compile in warn statements Should have been false, where I had a true.
src/cpu/o3/lsq_impl.hh: Apparently you can't have ?? in a warn statement (Something about trigraphs) src/mem/cache/cache_impl.hh: Forgot to signal atomic mode in snoopProbe |
3327:b2a5cde9ea77 |
23-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix fetch to stop fetching upon encountering a fault in SE mode. Also change warning to a DPRINTF. |
3326:d9cc6bae9d77 |
23-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Add in support for LL/SC in the O3 CPU. Needs to be fully tested.
src/cpu/base_dyn_inst.hh: Extend BaseDynInst a little bit so it can be use as a TC as well (specifically for ll/sc code). src/cpu/base_dyn_inst_impl.hh: Add variable to track if the result of the instruction should be recorded. src/cpu/o3/alpha/cpu_impl.hh: Clear lock flag upon hwrei. src/cpu/o3/lsq_unit.hh: Use ISA specified handling of locked reads. src/cpu/o3/lsq_unit_impl.hh: Use ISA specified handling of locked writes. |
3324:c75da9e726ff |
23-Oct-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
make this parallel to the other cpu types so that resume works correctly. |
3319:1ec49a9bfaa3 |
18-Oct-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
only do this assert after you know you're not switched out or idle. |
3310:21adbb41a37e |
17-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Fixes for uni-coherence in timing mode for FS. Still a bug in atomic uni-coherence in FS.
src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_impl.hh: src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: Make CPU models handle coherence requests src/mem/cache/base_cache.cc: Properly signal coherence CSHRs src/mem/cache/coherence/uni_coherence.cc: Only deallocate once |
3300:393d1801068a |
13-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix assertion. I haven't tested it fully (I can't reproduce Lisa's error) but I believe it should fix what she's running into (which was definitely a bug).
src/cpu/o3/fetch_impl.hh: Move assertion to area where it should really always be true. Sometimes you might recvRetry and not necessarily be blocked (if there was a squash). |
3297:f0855ab36ff5 |
12-Oct-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zed.eecs.umich.edu:/z/hsul/work/m5/newmem
src/cpu/simple/timing.cc: hand merge |
3283:0115db938bca |
12-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Another memleak in the memtester (need [] with the delete)
src/cpu/memtest/memtest.cc: Another memleak in the memtester |
3282:5772a2ab19b8 |
12-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Fix a memory leak in the memtester |
3280:91bfa4f79c53 |
16-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fix up microcode support.
src/arch/sparc/isa/formats/blockmem.isa: Several small and medium bug fixes. src/cpu/simple/base.cc: Fixed a few compiler errors and made sure the next micro pc is set to 1 to prevent the first microop from executing twice. Also fixed a fetching bug. src/cpu/thread_state.cc: Made sure the microPC and nextMicroPC are initialized properly. |
3276:dc3cd126b479 |
15-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Started implementing microcode. |
3271:4a871cbe6d84 |
12-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
StaticInst support for microcode |
3267:d3db53c60988 |
12-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into zeep.eecs.umich.edu:/home/gblack/m5/newmem |
3262:5f96609a30ef |
11-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
More cache fixes. Atomic coherence now works as well.
src/cpu/memtest/memtest.cc: src/cpu/memtest/memtest.hh: Make Memtester able to test atomic as well src/mem/bus.cc: src/mem/bus.hh: Handle atomic snoops properly for cache->cache transfers src/mem/cache/cache_impl.hh: Debug output. Clean up memleak in atomic mode. Set hitLatency. Still need to send back reasonable number for atomic return value. src/mem/packet.cc: Add command strings for new commands src/python/m5/objects/MemTest.py: Add param to test atomic memory. |
3230:e86a03911728 |
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem
src/cpu/memtest/memtest.cc: src/cpu/memtest/memtest.hh: src/cpu/simple/timing.hh: tests/configs/o3-timing-mp.py: Hand merge. |
3229:cfb4b2250d26 |
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Comment out code that messed up SMT (but will be needed eventually).
src/cpu/o3/cpu.cc: Comment out reseting CPU structures for now. This can be updated to work in the future. |
3228:f47f69e61ded |
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Be sure to delete packet and sender state if the cache is blocked.
src/cpu/o3/lsq_unit.hh: Be sure to delete data if the cache is blocked. |
3227:fe19356d6f88 |
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix caches plus sampling switch over.
src/cpu/o3/cpu.cc: Fix up caches plus sampling switch over. |
3226:de4981baa276 |
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix outstanding bug (FS#158).
src/cpu/o3/cpu.cc: Extra debugging, fix a bug brought up on bug tracker. |
3225:9872d6c27222 |
09-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix checker bug.
src/cpu/checker/thread_context.hh: Checker's TC should only copy state, and not fully take over from the old context (prevents it from accidentally stealing the quiesce event). |
3222:19bd4dd3be83 |
08-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Record numCycles properly.
src/cpu/simple/timing.cc: Record numCycles stat properly. src/cpu/simple/timing.hh: Extra variable to help record numCycles stat. |
3221:669a04468c0d |
08-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Updates to O3 CPU. It should now work in FS mode, although sampling still has a bug.
src/cpu/o3/commit_impl.hh: Fixes for compile and sampling. src/cpu/o3/cpu.cc: Deallocate and activate threads properly. Also hopefully fix being able to use caches while switching over. src/cpu/o3/cpu.hh: Fixes for deallocating and activating threads. src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit.hh: Handle getting back a BadAddress result from the access. src/cpu/o3/iew_impl.hh: More debug output. src/cpu/o3/lsq_unit_impl.hh: Fixup store conditional handling (still a bit of a hack, but works now).
Also handle getting back a BadAddress result from the access. src/cpu/o3/thread_context_impl.hh: Deallocate context now records if the context should be fully removed. |
3204:1ac62ef68c44 |
09-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
One step closet to having NACK's work.
src/cpu/memtest/memtest.cc: Fix functional return path src/cpu/memtest/memtest.hh: Add snoop ranges in src/mem/cache/base_cache.cc: Properly signal NACKED src/mem/cache/cache_impl.hh: Catch nacked packet and panic for now |
3201:7c3b18c01b0e |
11-Oct-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
some drain changes in timing (kevin's) and some memory mode assertion changes so that when you come out of resume, you only assert if you're really wrong.
src/cpu/simple/atomic.cc: memory mode assertion change so that it only goes off if it's supposed to. src/cpu/simple/timing.cc: some drain changes (kevin's) and some changes to memoryMode assertions so that they don't go off when they're not supposed to. |
3192:f3e215dda3f6 |
09-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Have cpus send snoop ranges |
3191:362184411b8a |
09-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Put a check in so people know not to create more than 8 memtesters. |
3190:9cff20ad90fc |
09-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Merge zizzer:/z/m5/Bitkeeper/newmem into zazzer.eecs.umich.edu:/z/rdreslin/m5bk/newmemcleanest |
3187:7eefad0aed11 |
09-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Update the Memtester, commit a config file/test for it.
src/cpu/SConscript: Add memtester to the compilation environment. Someone who knows this better should make the MemTest a cpu model parameter.
For now attached with the build of o3 cpu. src/cpu/memtest/memtest.cc: src/cpu/memtest/memtest.hh: Update Memtest for new mem system src/python/m5/objects/MemTest.py: Update memtest python description |
3184:8edaf4539e05 |
08-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Fixes for functional path.
If the cpu needs to update any state when it gets a functional write (LSQ??) then that code needs to be written.
src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_impl.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: CPU's can recieve functional accesses, they need to determine if they need to do anything with them. src/mem/bus.cc: src/mem/bus.hh: Make the fuctional path do the correct tye of snoop |
3177:3a2bc3fbae6e |
08-Oct-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
add in serialization of AtomicSimpleCPU _status. This is needed because right now unserializing breaks an assert since CPU status is not saved. Kev says that this will break uniform serialization across CPUs since each type of CPU has its own "status" enum set. So, the repercussions are that if you serialize in this CPU, you must first unserialize in this CPU before switching to something else you want.
src/cpu/simple/atomic.cc: add in serialization of AtomicSimpleCPU _status. Kev says that this will break uniform serialization across CPUs since each type of CPU has its own "status" enum set. So, the repercussions are that if you serialize in this CPU, you must first unserialize in this CPU before switching to something else you want. |
3172:2c84db071850 |
08-Oct-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Replace tests of LOCKED/UNCACHEABLE flags with isLocked()/isUncacheable(). |
3170:37fd1e73f836 |
08-Oct-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Implement Alpha LL/SC support for SimpleCPU (Atomic & Timing) and PhysicalMemory. *No* support for caches or O3CPU. Note that properly setting cpu_id on all CPUs is now required for correct operation.
src/arch/SConscript: src/base/traceflags.py: src/cpu/base.hh: src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: src/cpu/simple/timing.hh: src/mem/physical.cc: src/mem/physical.hh: src/mem/request.hh: src/python/m5/objects/BaseCPU.py: tests/configs/simple-atomic.py: tests/configs/simple-timing.py: tests/configs/tsunami-simple-atomic-dual.py: tests/configs/tsunami-simple-atomic.py: tests/configs/tsunami-simple-timing-dual.py: tests/configs/tsunami-simple-timing.py: Implement Alpha LL/SC support for SimpleCPU (Atomic & Timing) and PhysicalMemory. *No* support for caches or O3CPU. |
3169:65bef767b5de |
08-Oct-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Rename some vars for clarity. |
3160:4d7fc8d7ef23 |
02-Oct-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into zeep.eecs.umich.edu:/home/gblack/m5/newmem
src/cpu/ozone/cpu_impl.hh: Hand merged |
3145:85467cafadbb |
06-Oct-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
checkpoint recovery was screwed up because a new section was created in the middle of another section and messed up unserializing. |
3144:b6e9e1811d71 |
06-Oct-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
there are two main thrusts of this changeset.
1) return the periodicity of checkpoints back into the code (i.e. make m5 checkpoint n m meaningful again). 2) to do this, i had to much around with being able to repeatedly schedule and SimLoopExitEvent, which led to changes in how exit simloop events are handled to make this easier.
src/arch/alpha/isa/decoder.isa: src/mem/cache/cache_impl.hh: modify arg. order for new calling convention of exitSimLoop. src/cpu/base.cc: src/sim/main.cc: src/sim/pseudo_inst.cc: src/sim/root.cc: now, instead of creating a new SimLoopExitEvent, call a wrapper schedExitSimLoop which handles all the default args. src/sim/sim_events.cc: src/sim/sim_events.hh: src/sim/sim_exit.hh: add the periodicity of checkpointing back into the code.
to facilitate this, there are now two wrappers (instead of just overloading exitSimLoop). exitSimLoop is only for exiting NOW (i.e. at curTick), while schedExitSimLoop schedules and exit event for the future. |
3129:d71f12a71a39 |
07-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Updates to bring MemTest closer to working with newmem. Ron still needs to do the initial setup and configuration for it to work properly.
src/SConscript: Include MemTest for now. It's not complete but it compiles so it shouldn't mess anything else up. |
3126:756092c6383c |
02-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Updates to fix merge issues and bring almost everything up to working speed. Ozone CPU remains untested, but everything else compiles and runs.
src/arch/alpha/isa_traits.hh: This got changed to the wrong version by accident. src/cpu/base.cc: Fix up progress event to not schedule itself if the interval is set to 0. src/cpu/base.hh: Fix up the CPU Progress Event to not print itself if it's set to 0. Also remove stats_reset_inst (something I added to m5 but isn't necessary here). src/cpu/base_dyn_inst.hh: src/cpu/checker/cpu.hh: Remove float variable of instResult; it's always held within the double part now. src/cpu/checker/cpu_impl.hh: Use thread and not cpuXC. src/cpu/o3/alpha/cpu_builder.cc: src/cpu/o3/checker_builder.cc: src/cpu/ozone/checker_builder.cc: src/cpu/ozone/cpu_builder.cc: src/python/m5/objects/BaseCPU.py: Remove stats_reset_inst. src/cpu/o3/commit_impl.hh: src/cpu/ozone/lw_back_end_impl.hh: Get TC, not XCProxy. src/cpu/o3/cpu.cc: Switch out updates from the version of m5 I have. Also remove serialize code that got added twice. src/cpu/o3/iew_impl.hh: src/cpu/o3/lsq_impl.hh: src/cpu/thread_state.hh: Remove code that was added twice. src/cpu/o3/lsq_unit.hh: Add back in stats that got lost in the merge. src/cpu/o3/lsq_unit_impl.hh: Use proper method to get flags. Also wake CPU if we're coming back from a cache miss. src/cpu/o3/thread_context_impl.hh: src/cpu/o3/thread_state.hh: Support profiling. src/cpu/ozone/cpu.hh: Update to use proper typename. src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/dyn_inst_impl.hh: Updates for newmem. src/cpu/ozone/lw_lsq_impl.hh: Get flags correctly. src/cpu/ozone/thread_state.hh: Reorder constructor initialization, use tc. src/sim/pseudo_inst.cc: Allow for loading of symbol file. Be sure to use ThreadContext and not ExecContext. |
3125:febd811bccc6 |
30-Sep-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zamp:./local/clean/o3-merge/m5 into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem
configs/boot/micro_memlat.rcS: configs/boot/micro_tlblat.rcS: src/arch/alpha/ev5.cc: src/arch/alpha/isa/decoder.isa: src/arch/alpha/isa_traits.hh: src/cpu/base.cc: src/cpu/base.hh: src/cpu/base_dyn_inst.hh: src/cpu/checker/cpu.hh: src/cpu/checker/cpu_impl.hh: src/cpu/o3/alpha/cpu_impl.hh: src/cpu/o3/alpha/params.hh: src/cpu/o3/checker_builder.cc: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/regfile.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/checker_builder.cc: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_back_end.hh: src/cpu/ozone/lw_back_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/thread_state.hh: src/cpu/simple/base.cc: src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: src/cpu/thread_state.hh: src/dev/ide_disk.cc: src/python/m5/objects/O3CPU.py: src/python/m5/objects/Root.py: src/python/m5/objects/System.py: src/sim/pseudo_inst.cc: src/sim/pseudo_inst.hh: src/sim/system.hh: util/m5/m5.c: Hand merge. |
3121:b0bc3646a35d |
30-Sep-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fixes to get the ozone cpu to compile. |
3120:e49afeaf79e9 |
30-Sep-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Changed makeExtMI to take a ThreadContext instead of a pc. |
3119:6c93a7460ecf |
02-Oct-2006 |
Kevin Lim <ktlim@umich.edu> |
Be sure to set progress interval. |
3112:76b70de314b6 |
15-Sep-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into ewok.(none):/home/gblack/m5/newmem |
3093:b09c33e66bce |
31-Aug-2006 |
Korey Sewell <ksewell@umich.edu> |
add ISA_HAS_DELAY_SLOT directive instead of "#if THE_ISA == ALPHA_ISA" throughout CPU models
src/arch/alpha/isa_traits.hh: src/arch/mips/isa_traits.hh: src/arch/sparc/isa_traits.hh: define 'ISA_HAS_DELAY_SLOT' src/cpu/base_dyn_inst.hh: src/cpu/o3/bpred_unit_impl.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/simple/base.cc: use ISA_HAS_DELAY_SLOT instead of THE_ISA == ALPHA_ISA |
3070:0ca43be10749 |
03-Sep-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fix up the parameters to getInstRecord |
3067:a1308467bb03 |
03-Sep-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fixing up parameters of getInstRecord |
3065:9bcb404a4a5b |
03-Sep-2006 |
Gabe Black <gblack@eecs.umich.edu> |
A quick fix to isolate the tracing code to SPARC |
3064:e907dd767a63 |
30-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Change the cpu pointer in the InstRecord object to a thread context pointer. |
3059:470bc7016218 |
29-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Extended the reg delta output. |
3014:b4309193255a |
16-Aug-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Fixes for Kevins O3 model to work with the blocking caches.
src/cpu/o3/fetch_impl.hh: Fix ordering so dereference works src/cpu/o3/lsq_impl.hh: Check to make sure we didn't squash already src/cpu/o3/lsq_unit.hh: Fix for counting squashed retrys in the WB count src/cpu/o3/lsq_unit_impl.hh: Make sure to set retryID for stores, and clear it appropriately |
2986:99640058db70 |
15-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Some touchup to the reorganized includes and "using" directives. |
2985:c010893f23ae |
15-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge zizzer.eecs.umich.edu:/bk/newmem into ewok.(none):/home/gblack/m5/newmem
src/cpu/static_inst.hh: SCCS merged |
2984:797622d7b311 |
15-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Fixed ALPHA_FS by moving the remnants of isa_fullsys_traits.hh into arch/alpha/pagetable.hh and fixing up some includes |
2980:eab855f06b79 |
15-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Cleaned up include files and got rid of many using directives in header files. |
2978:199dcea84fc4 |
11-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Started to add support for O3 for sparc. |
2973:56dea3a9d279 |
11-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Started adding a system to output data after every instruction.
src/arch/alpha/regfile.hh: src/arch/mips/regfile/float_regfile.hh: src/arch/mips/regfile/int_regfile.hh: src/arch/mips/regfile/misc_regfile.hh: src/cpu/exetrace.hh: Added functions to start to support dumping register values once per cycle. src/cpu/exetrace.cc: Added some code to support printing the value of registers after each cycle. src/python/m5/main.py: Options to turn on output after every instruction. They are commented out. |
2972:f84c6c5309ce |
11-Aug-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Pushed most of constants.hh back into isa_traits.hh and regfile.hh and created a seperate file for the syscallreturn class. |
2965:82703e01285a |
26-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
MIPS ISA runs 'hello world' in O3CPU ...
src/arch/mips/isa/base.isa: special case syscall disasembly... maybe give own instruction class? src/arch/mips/isa/decoder.isa: add 'IsSerializeAfter' flag for syscall src/cpu/o3/commit.hh: Add skidBuffer to commit src/cpu/o3/commit_impl.hh: Use skidbuffer in MIPS ISA src/cpu/o3/fetch_impl.hh: Print name out when there is a fault src/cpu/o3/mips/cpu_impl.hh: change comment |
2948:ae26cf37957c |
20-Jul-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Enforce the timing cpu ticking at it's clock rate Add a max time option in seconds and a single system root clock be 1THz
configs/test/fs.py: Add a max time option in seconds and a single system root clock be 1THz src/cpu/simple/timing.cc: src/cpu/simple/timing.hh: Enforce the timing cpu ticking at it's clock rate |
2946:015472193926 |
05-Jul-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zeep.pool:/z/saidi/work/m5.newmem.head |
2943:eb2b70e6116b |
18-Jul-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge m5.eecs.umich.edu:/bk/newmem into ewok.(none):/home/gblack/m5/newmem |
2942:9b480d885f7a |
12-Jun-2006 |
Gabe Black <gblack@eecs.umich.edu> |
Merge m5.eecs.umich.edu:/bk/newmem into ewok.(none):/home/gblack/m5/newmem
src/arch/sparc/regfile.hh: Hand Merge |
2935:d1223a6c9156 |
23-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
This changeset gets the MIPS ISA pretty much working in the O3CPU. It builds, runs, and gets very very close to completing the hello world succesfully but there are some minor quirks to iron out. Who would've known a DELAY SLOT introduces that much complexity?! arrgh!
Anyways, a lot of this stuff had to do with my project at MIPS and me needing to know how I was going to get this working for the MIPS ISA. So I figured I would try to touch it up and throw it in here (I hate to introduce non-completely working components... )
src/arch/alpha/isa/mem.isa: spacing src/arch/mips/faults.cc: src/arch/mips/faults.hh: Gabe really authored this src/arch/mips/isa/decoder.isa: add StoreConditional Flag to instruction src/arch/mips/isa/formats/basic.isa: Steven really did this file src/arch/mips/isa/formats/branch.isa: fix bug for uncond/cond control src/arch/mips/isa/formats/mem.isa: Adjust O3CPU memory access to use new memory model interface. src/arch/mips/isa/formats/util.isa: update LoadStoreBase template src/arch/mips/isa_traits.cc: update SERIALIZE partially src/arch/mips/process.cc: src/arch/mips/process.hh: no need for this for NOW. ASID/Virtual addressing handles it src/arch/mips/regfile/misc_regfile.hh: add in clear() function and comments for future usage of special misc. regs src/cpu/base_dyn_inst.hh: add in nextNPC variable and supporting functions.
add isCondDelaySlot function
Update predTaken and mispredicted functions src/cpu/base_dyn_inst_impl.hh: init nextNPC src/cpu/o3/SConscript: add MIPS files to compile src/cpu/o3/alpha/thread_context.hh: no need for my name on this file src/cpu/o3/bpred_unit_impl.hh: Update RAS appropriately for MIPS src/cpu/o3/comm.hh: add some extra communication variables to aid in handling the delay slots src/cpu/o3/commit.hh: minor name fix for nextNPC functions. src/cpu/o3/commit_impl.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/rename_impl.hh: Fix necessary variables and functions for squashes with delay slots src/cpu/o3/cpu.cc: Update function interface ...
adjust removeInstsNotInROB function to recognize delay slots insts src/cpu/o3/cpu.hh: update removeInstsNotInROB src/cpu/o3/decode.hh: declare necessary variables for handling delay slot src/cpu/o3/dyn_inst.hh: Add in MipsDynInst src/cpu/o3/fetch.hh: src/cpu/o3/iew.hh: src/cpu/o3/rename.hh: declare necessary variables and adjust functions for handling delay slot src/cpu/o3/inst_queue.hh: src/cpu/simple/base.cc: no need for my name here src/cpu/o3/isa_specific.hh: add in MIPS files src/cpu/o3/scoreboard.hh: dont include alpha specific isa traits! src/cpu/o3/thread_context.hh: no need for my name here, i just rearranged where the file goes src/cpu/static_inst.hh: add isCondDelaySlot function src/cpu/o3/mips/cpu.cc: src/cpu/o3/mips/cpu.hh: src/cpu/o3/mips/cpu_builder.cc: src/cpu/o3/mips/cpu_impl.hh: src/cpu/o3/mips/dyn_inst.cc: src/cpu/o3/mips/dyn_inst.hh: src/cpu/o3/mips/dyn_inst_impl.hh: src/cpu/o3/mips/impl.hh: src/cpu/o3/mips/params.hh: src/cpu/o3/mips/thread_context.cc: src/cpu/o3/mips/thread_context.hh: MIPS file for O3CPU...mirrors ALPHA definition |
2932:eba74420a01c |
21-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Minor functionality updates.
SConstruct: Include an option to specify the CPUs being tested. src/cpu/SConscript: Checker isn't SMT right now, so don't do SMT tests with the O3CPU if we're using the checker. src/python/m5/objects/O3CPU.py: Include default options. Unfortunately FullO3Config.py is still needed because it specifies which FUPool is being used. tests/SConscript: Several minor updates (sorry for one commit). Updated the copyright and fixed some m5 style issues. Also added the ability to specify which CPUs to run the tests on. |
2930:51a61690c402 |
19-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Minor changes to reflect state used for regression stats.
src/cpu/checker/cpu.hh: Don't count checker's instructions towards total instructions committed. src/python/m5/objects/Root.py: Set default clock to 1 THz. |
2929:f986dc04e25f |
19-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Put regression tests back into m5. They are located in the "tests" directory. The directory output and reference outputs have changed slightly. Now the directory is ALPHA_SE/test/<test>/<cpu_model>/, and for the reference stats <test>/ref/<arch>/<cpu_model>
Right now only non-SMT SE regression tests have been added back in. The rest are pending getting SMT working, and consolidating the FS configuration files.
Eventually support for different OSs can be added so you can specify which versions of the binary you want to run from one config file.
Note: mp-test1 doesn't have any reference stats because MP mode doesn't currently work. The test itself should probably work once the code is fixed.
SConstruct: Updates to allow for regression tests to work via the command line "scons build/ALPHA_SE/test/debug/quick" and such once again. src/cpu/SConscript: Keep a list of SMT supporting CPUs so that the regression tests can easily specify which CPUs to use if they are SMT only. |
2927:62f1518ae800 |
19-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
O3CPU fixes.
src/cpu/o3/lsq_unit.hh: LSQ needs to decrement the WB counter if the load is going to be replayed. src/cpu/o3/lsq_unit_impl.hh: LSQ needs to decrement the WB counter if the load is squashed. |
2926:48f2f450cbf6 |
19-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Some minor compiling fixes.
src/cpu/o3/iew.hh: Non-debug compile fixes. src/cpu/simple/atomic.cc: src/cpu/simple/atomic.hh: Merge fix. |
2923:db8a876258df |
14-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge
configs/test/fs.py: configs/test/test.py: SCCS merged |
2921:e6bb350c3fa5 |
14-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix the CheckerCPU being included via python.
src/arch/SConscript: Fixes for including the CheckerCPU if it's specified via command line. Previously the env variable was actually being modified. src/cpu/SConscript: Copy the CPU_MODELS from the env, don't create a proxy to it. |
2918:20cdaf201249 |
12-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Serialization changes to make O3CPU consistent with the other models.
src/cpu/o3/commit_impl.hh: Always set instruction. This is necessary for serialization as the instruction is also serialized. src/cpu/o3/cpu.cc: Change serialization so it matches other CPU's output. Also fix up some indexing. |
2915:1f4d02556ac1 |
12-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Updates for serialization. As long as the tickEvent doesn't need to be serialized (I don't believe it does because we drain all CPUs prior to checkpointing), it should be feasible to start up from other CPU's checkpoints.
src/cpu/simple/atomic.cc: src/cpu/simple/atomic.hh: src/cpu/simple/base.cc: src/cpu/simple/timing.cc: src/cpu/simple_thread.cc: Updates for serialization. |
2911:854ee6cd377e |
14-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
forgot tid |
2910:7eb6f817e267 |
14-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
For now, halt context is the same as deallocating. suspend context will now take the thread off the activeThread list.
src/arch/mips/isa_traits.cc: add in copy MiscRegs unimplemented function |
2907:7b0ababb4166 |
13-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Move Dcache port creation from LSQUnit to LSQ in order to support Ron's recent changes, and using the O3CPU in SMT mode.
src/cpu/o3/lsq.hh: Update to have LSQ work with only one dcache port for all LSQ Units. LSQ has the dcache port, and the LSQ Units must tell the LSQ if the cache has become blocked. src/cpu/o3/lsq_impl.hh: Updates to have the LSQ work with only one dcache port for all LSQUnits. src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: Update for LSQ to create dcache port instead of LSQUnits. Now LSQUnits are given the dcache port from the LSQ, and also must check the LSQ if the cache is blocked prior to accessing the cache. |
2906:3d65b80fdb11 |
13-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for bug when squashing and the fetching. Now fetch checks if the cache data is valid. |
2905:62879b0282eb |
13-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Update for changes to draining. |
2901:f9a45473ab55 |
12-Jul-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
memory mode information now contained in system object States are now running, draining, or drained. memory state information moved into system object system parameter is not fs only for cpus Implement drain() support in devices Update for drain() call that returns number of times drain_event->process() will be called
Break O3 CPU! No sense in putting in a hack change that kevin is going to remove in a few minutes i imagine
src/cpu/simple/atomic.cc: src/cpu/simple/atomic.hh: Since se mode has a system, allow access to it Verify that the atomic cpu is connected to an atomic system on resume src/cpu/simple/base.cc: Since se mode has a system, allow access to it src/cpu/simple/timing.cc: src/cpu/simple/timing.hh: Update for new drain() call that returns number of times drain_event->process() will be called and memory state being moved into the system Since se mode has a system, allow access to it Verify that the timing cpu is connected to an timing system on resume src/dev/ide_disk.cc: src/dev/io_device.cc: src/dev/io_device.hh: src/dev/ns_gige.cc: src/dev/ns_gige.hh: src/dev/pcidev.cc: src/dev/pcidev.hh: src/dev/sinic.cc: src/dev/sinic.hh: Implement drain() support in devices src/python/m5/config.py: Allow drain to return number of times drain_event->process() will be called. Normally 0 or 1 but things like O3 cpu or devices with multiple ports may want to call it many times src/python/m5/objects/BaseCPU.py: move system parameter out of fs to everyone src/sim/sim_object.cc: src/sim/sim_object.hh: States are now running, draining, or drained. memory state information moved into system object src/sim/system.cc: src/sim/system.hh: memory mode information now contained in system object |
2900:7cccbae04d02 |
12-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem
src/cpu/o3/fetch_impl.hh: Hand merge. |
2894:a83675362809 |
11-Jul-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Fix ordering issue with squashed Icache Fetches and Static data in packet.
Now hello world works with 2 levels of cache with O3 CPU(multiple outstanding requests).
src/cpu/o3/fetch_impl.hh: Fix ordering issue with squashed Icache Fetches and Static data in packet. |
2893:58c423134221 |
12-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Track the PC of the cache data stored in fetch so it doesn't access memory multiple times if information is already in fetch. |
2887:c4d893b14e07 |
10-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Minor fixes.
src/cpu/checker/thread_context.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_impl.hh: Change functions to match Korey's changes. src/cpu/ozone/lw_back_end.hh: Fix compile error. |
2886:2fdb9976b0a3 |
10-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge |
2881:5bcfebdbd31d |
10-Jul-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Fix cpu in full system to match SE. |
2880:a48d5059cd35 |
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/zooks/y/ksewell/research/m5-sim/newmem-o3 |
2877:4b56debc25d1 |
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Minor fix for SMT Hello Worlds to finish correctly. Still, there is a problem with the LSQ and indexing out of range in the buffer. I havent nailed down the fix yet, but it's coming ...
src/cpu/o3/commit_impl.hh: add space to DPRINT src/cpu/o3/cpu.cc: add newline to DPRINT src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: Each thread needs it's own squashedSeqNum for the case where they are both squashing at the same time and they dont write over each other's squash number. |
2876:a862ab9f93f8 |
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/zooks/y/ksewell/research/m5-sim/newmem-o3 |
2875:9b6f6b75b187 |
07-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Fix so that O3CPU doesnt segfault on exit. Major thing was to not execute commit if there are no active threads in CPU.
src/cpu/o3/alpha/thread_context.hh: call deallocate instead of deallocateContext src/cpu/o3/commit_impl.hh: dont run commit stage if there are no instructions src/cpu/o3/cpu.cc: add deallocate event, deactivateThread function, and edit deallocateContext. src/cpu/o3/cpu.hh: add deallocate event and add optional delay to deallocateContext src/cpu/o3/thread_context.hh: optional delay for deallocate src/cpu/o3/thread_context_impl.hh: edit DPRINTFs to say Thread Context instead of Alpha TC src/cpu/thread_context.hh: optional delay src/sim/syscall_emul.hh: name stuff |
2874:5389a28b80fb |
10-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Some minor cleanups.
src/cpu/SConscript: Change the error message to be slightly nicer. src/cpu/o3/commit.hh: Remove old code. src/cpu/o3/commit_impl.hh: Remove old unused code. |
2873:1377a68cd00e |
10-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Add parameters for backwards and forwards sizes for time buffers.
src/base/timebuf.hh: Add a function to return the size of the time buffer. |
2872:ab3083fa35a7 |
07-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Support for recent port changes.
src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_back_end.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/python/m5/objects/OzoneCPU.py: Support Ron's recent port changes. src/cpu/ozone/lw_back_end_impl.hh: Support Ron's recent port changes. Also support handling faults in SE. |
2871:7ed5c9ef3eb6 |
07-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Support Ron's changes for hooking up ports.
src/cpu/checker/cpu.hh: Now that BaseCPU is a MemObject, the checker must define this function. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/iew.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_unit.hh: Implement getPort function so the connector can connect the ports properly. src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit_impl.hh: The connector handles connecting the ports now. src/python/m5/objects/O3CPU.py: Add ports to the parameters. |
2870:e81b23c19e5a |
07-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for bug when draining and a memory access is outstanding. |
2869:4dbf4770df29 |
07-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge |
2867:cc92d58a3210 |
07-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Switch out fixes for CPUs.
src/cpu/o3/cpu.cc: Fix up keeping proper state when switched out and drained. src/cpu/simple/timing.cc: src/cpu/simple/timing.hh: Keep track of the event we use to schedule fetch initially and upon resume. We may have to cancel the event if the CPU is switched out. |
2866:9b2e1d16d0aa |
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge |
2864:eab7ff8f6d72 |
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Support serializing and unserializing in the O3 CPU. Also a few small fixes for draining/switching CPUs.
src/cpu/o3/commit_impl.hh: Fix to clear drainPending variable on call to resume. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: Support serializing and unserializing in the O3 CPU. src/cpu/o3/lsq_impl.hh: Be sure to say we have no stores to write back if the active thread list is empty. src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: Slightly change how SimpleThread is used to copy from other ThreadContexts. |
2863:2592e056dc5c |
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix the O3CPU to support the multi-pass method for checking if the system has fully drained.
src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: Return a value so that the CPU can instantly return from draining if the pipeline is already drained. src/cpu/o3/cpu.cc: Use values returned from pipeline stages so that the CPU can instantly return from draining if the pipeline is already drained. |
2862:7bc3562e6405 |
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Various serialization changes to make it possible for the O3CPU to checkpoint.
src/arch/alpha/regfile.hh: Define serialize/unserialize functions on MiscRegFile itself. src/cpu/o3/regfile.hh: Remove old commented code. src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: Push common serialization code to ThreadState level. Also allow the SimpleThread to be used for checkpointing by other models. src/cpu/thread_state.cc: src/cpu/thread_state.hh: Move common serialization code into ThreadState. |
2860:843426871cbc |
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes for draining.
src/cpu/simple/timing.cc: Update for changed return values. src/python/m5/__init__.py: Loop in order to make sure all objects are really drained. Objects may become undrained as other objects become drained (e.g. a bus-bridge has a packet, while a bus is empty, and the first drain() will cause the bus-bridge to give the packet to the bus).
The only case we know every object is actually drained is if they all return immediately that they are drained. |
2857:5f3e107e8f13 |
07-Jul-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Remove hack now that ports work properly |
2856:89691405ec9c |
07-Jul-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Update cpus to use the getPort function to use a connector object to connect the I/D cache ports to memory
configs/test/test.py: Update to use new cpu getPort functionality src/cpu/base.cc: Make cpu's a memObject to expose getPort interface src/cpu/base.hh: Make cpu's a memObject to export getPort interface src/cpu/simple/atomic.cc: src/cpu/simple/atomic.hh: src/cpu/simple/timing.cc: src/cpu/simple/timing.hh: Now use the connector via getPort interface src/mem/cache/base_cache.cc: Make sure the cache recognizes all port names |
2855:5ca2cdb32521 |
06-Jul-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Timing cache works for hello world test. Still need 1) detailed CPU (blocking ability in cache) 1a) Multiple outstanding requests (need to keep track of times for events) 2)Multi-level support 3)MP coherece support 4)LL/SC support 5)Functional path needs to be correctly implemented (temporarily works without multiple outstanding requests (simple cpu))
src/cpu/simple/timing.cc: Temp hack because timing cpu doesn't export ports properly so single I/D cache communicates only through the Icache port. src/mem/cache/base_cache.cc: Handle marking MSHR's in service Add support for getting CSHR's src/mem/cache/base_cache.hh: Make these functions visible at the base cache level src/mem/cache/cache.hh: make the functions virtual src/mem/cache/cache_impl.hh: Rename the function to make sense src/mem/packet.hh: Accidentally clearing the needsResponse field when sending a response back. |
2852:7fc1b748dd81 |
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge |
2850:0b4a6b4c9b8a |
06-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Had to add this because for some reason gcc wasnt recognizing "THE_ISA == ALPHA_ISA"... wierd but OK |
2849:c285bf8ffb4a |
06-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/zooks/y/ksewell/research/m5-sim/newmem-o3 |
2848:f29a4a5c4d66 |
06-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Use O3DynInst in cpu_models.py and in static_inst_exec_sigs.hh instead of a specific ISA dyn. inst.
src/cpu/cpu_models.py: Use O3DynInst src/cpu/o3/dyn_inst.hh: declare O3DynInst here based off of ISA ... this must be updated for each ISA. src/cpu/static_inst.hh: take out O3 forward declarations here and include header file to keep this file clean |
2847:6b19f07d9666 |
06-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
more steps toward O3 SMT
src/arch/mips/isa/formats/fp.isa: Adjust for newmem src/cpu/cpu_models.py: Use O3DynInst instead of convoluted way src/cpu/o3/alpha/impl.hh: take out O3DynInst typedef here ... src/cpu/o3/cpu.cc: open up the SMT functions in the O3CPU src/cpu/static_inst.hh: Add O3DynInst src/cpu/o3/dyn_inst.hh: Use to get ISA-specific O3DynInst |
2845:18e6dde158f0 |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem |
2843:19c4c6c2b5b1 |
06-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Support for draining, and the new method of switching out. Now switching out happens after the pipeline has been drained, deferring the three way handshake to the normal drain mechanism. The calls of switchOut() and takeOverFrom() both take action immediately.
src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: Support for draining, new method of switching out. |
2840:227f7c4f8c81 |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Remove sampler and serializer. Now they are handled through C++ interacting with Python.
src/SConscript: src/cpu/base.cc: src/cpu/base.hh: src/cpu/checker/cpu.hh: src/cpu/checker/cpu_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_impl.hh: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/sim/pseudo_inst.cc: Remove sampler. src/sim/sim_object.cc: Remove serializer. |
2839:d5dd8a3cdea0 |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Rename quiesce to drain to avoid confusion with the pseudo instruction.
src/cpu/simple/timing.cc: src/cpu/simple/timing.hh: src/python/m5/__init__.py: src/python/m5/config.py: src/sim/main.cc: src/sim/sim_events.cc: src/sim/sim_events.hh: src/sim/sim_object.cc: src/sim/sim_object.hh: Rename quiesce to drain. |
2838:8e81edd2fdbf |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Checker ignores any faults that occur in syscall emulation mode for now.
src/cpu/checker/cpu_impl.hh: The only fault we handle in SE causes troubles when invoked with the Checker. This is because it changes state within the process, and not the checker, so the state isn't correct when the main CPU calls invoke. It's safe to just ignore the fault in the Checker and continue. |
2837:10ae172449b3 |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix up some merge problems.
src/base/traceflags.py: Remove BaseCPU traceflag. src/cpu/o3/alpha/params.hh: Move non-Alpha specific parameters out of this params class. src/cpu/o3/params.hh: Move non-Alpha specific params into this params class. |
2836:c8f549058964 |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge
src/base/traceflags.py: src/cpu/SConscript: Hand merge. src/cpu/o3/alpha/params.hh: Hand merge. This needs to get changed. |
2835:d2a977df88de |
05-Jul-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Fix some unset values in the request in the timing CPU. Properly implement the MSHR allocate function.
src/cpu/simple/timing.cc: Set the thread context in the CPU.
Need to do this properly, currently I just set it to Cpu=0 Thread=0. This will just cause all the stats in the cache based on these to just yield totals and not a distribution. src/mem/cache/miss/mshr.cc: Properly implement the allocate function for the MSHR. |
2834:c8342a71404b |
03-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Fix for FS O3CPU compile ... missing forward class declaration/header file after files got split for ISA-independence
src/cpu/o3/alpha/thread_context.hh: Use 'this' when accessing cpu src/cpu/o3/cpu.hh: add numActiveThreds function src/cpu/o3/thread_context.hh: forward class declarations src/cpu/o3/thread_context_impl.hh: add quiesce event header file src/cpu/thread_context.hh: add exit() function to thread context (read comments in file) src/sim/syscall_emul.cc: adjust exitFunc syscall |
2832:c990b002e0be |
02-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
typo ... change 'single_thread' to 'round_robin_policy' |
2831:0a42b294727c |
02-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
Fix default SMT configuration in O3CPU (i.e. fetch policy, workloads/numThreads)
Edit Test3 for newmem
src/base/traceflags.py: Add O3CPU flag src/cpu/base.cc: for some reason adding a BaseCPU flag doesnt work so just go back to old way... src/cpu/o3/alpha/cpu_builder.cc: Determine number threads by workload size instead of solely by parameter.
Default SMT fetch policy to RoundRobin if it's not specified in Config file src/cpu/o3/commit.hh: only use nextNPC for !ALPHA src/cpu/o3/commit_impl.hh: add FetchTrapPending as condition for commit src/cpu/o3/cpu.cc: panic if active threads is more than Impl::MaxThreads src/cpu/o3/fetch.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: name stuff src/cpu/o3/fetch_impl.hh: fatal if try to use SMT branch count, that's unimplemented right now src/python/m5/config.py: make it clearer that a parameter is not valid within a configuration class |
2830:14ecb0704388 |
01-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
traceflag stuff
src/base/traceflags.py: add BaseCPU flag, O3CPUAll flag grouping src/cpu/base.cc: Use BaseCPU flag instead of FullCPU flag |
2829:f354c00bba05 |
01-Jul-2006 |
Korey Sewell <ksewell@umich.edu> |
fix cpu builder to build the correct name...
add activateThread event and functions
src/cpu/o3/alpha/cpu_builder.cc: Have CPU builder build a DerivO3CPU not a DerivAlphaO3CPU src/cpu/o3/cpu.cc: add activateThread Event
add activateThread function
adjust activateContext to schedule a thread to activate within the CPU instead of activating thread right away. This will lead to stages trying to use threads that arent ready yet and wasting execution time & possibly performance. src/cpu/o3/cpu.hh: add activateThread Event
add activateThread function
add schedule/descheculed activate thread event |
2828:6f7429218c08 |
30-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
Merge zizzer.eecs.umich.edu:/z/m5/Bitkeeper/newmem into zizzer.eecs.umich.edu:/.automount/zooks/y/ksewell/research/m5-sim/newmem-o3 |
2827:45c3bdb0ffd4 |
30-Jun-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
AtomicSimpleCPU with a cache now runs the hello world! test program. Need to clean up a bunch of flags/hacks in the code. Then onto Timming mode.
Functional accesses also work properly, although not exactly how we wanted them. I'll need to clean that up as well.
src/cpu/simple/atomic.cc: Atomic CPU needs to set thread context so stats work in cache. Temporarily just use CPU=0 ThreadID=0 src/mem/cache/cache_impl.hh: Need to return success/failure properly still Physical memory object doesn't assert SATISFIED anymore, need to remove that flag src/mem/cache/tags/lru.cc: Doesn't work if the REQ doesn't set it's ASID. Temporary fix use 0 always |
2823:ff50d1693ee5 |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Need to change state upon quiescing. |
2821:2e4ca11d96bd |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Split off files that are shared across the O3 and Ozone models. |
2820:7fde0b0f8f78 |
05-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Add some different parameters. The main change is that the writeback count is now limited so that it doesn't overflow the buffer.
src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_params.hh: Add in dispatchWidth, wbWidth, wbDepth parameters. wbDepth is the number of cycles of wbWidth instructions that can be buffered. src/cpu/o3/iew.hh: Include separate parameter for dispatch width. Also limit the number of outstanding writebacks so the writeback buffer isn't overflowed. The IQ must make sure with the IEW stage that it can issue instructions prior to issuing. src/cpu/o3/iew_impl.hh: Include separate parameter for dispatch width. Also limit the number of outstanding writebacks so the writeback buffer isn't overflowed. src/cpu/o3/inst_queue_impl.hh: IQ needs to check with the IEW to make sure it can issue instructions, and increments the IEW wb counter each time there is an outstanding instruction that will writeback. src/cpu/o3/lsq_unit_impl.hh: Be sure to decrement the writeback counter if there's a squashed load that returned. src/python/m5/objects/AlphaO3CPU.py: Change the parameters to include dispatch width, writeback width, and writeback depth. |
2818:a2b6429690b6 |
30-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
now O3CPU is totally independent of the ISA... all alpha specific stuff is the cpu/o3/alpha directory
src/cpu/o3/alpha/cpu.cc: src/cpu/o3/alpha/cpu_impl.hh: src/cpu/o3/alpha/impl.hh: filenames src/cpu/o3/alpha/thread_context.hh: public src/cpu/o3/base_dyn_inst.cc: src/cpu/o3/bpred_unit.cc: src/cpu/o3/commit.cc: src/cpu/o3/cpu.cc: src/cpu/o3/decode.cc: src/cpu/o3/fetch.cc: src/cpu/o3/iew.cc: src/cpu/o3/inst_queue.cc: src/cpu/o3/lsq.cc: src/cpu/o3/lsq_unit.cc: src/cpu/o3/mem_dep_unit.cc: src/cpu/o3/rename.cc: src/cpu/o3/rob.cc: use O3CPUImpl ... not Alpha src/cpu/o3/checker_builder.cc: filename |
2817:273f7fb94f83 |
30-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
Make O3CPU model independent of the ISA
Use O3CPU when building instead of AlphaO3CPU.
I could use some better python magic in the cpu_models.py file!
AUTHORS: add middle initial SConstruct: change from AlphaO3CPU to O3CPU src/cpu/SConscript: edits to build O3CPU instead of AlphaO3CPU src/cpu/cpu_models.py: change substitution template to use proper CPU EXEC CONTEXT For O3CPU Model...
Actually, some Python expertise could be used here. The 'env' variable is not passed to this file, so I had to parse through the ARGV to find the ISA... src/cpu/o3/base_dyn_inst.cc: src/cpu/o3/bpred_unit.cc: src/cpu/o3/commit.cc: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/decode.cc: src/cpu/o3/fetch.cc: src/cpu/o3/iew.cc: src/cpu/o3/inst_queue.cc: src/cpu/o3/lsq.cc: src/cpu/o3/lsq_unit.cc: src/cpu/o3/mem_dep_unit.cc: src/cpu/o3/rename.cc: src/cpu/o3/rob.cc: use isa_specific.hh src/sim/process.cc: only initi NextNPC if not ALPHA src/cpu/o3/alpha/cpu.cc: alphao3cpu impl src/cpu/o3/alpha/cpu.hh: move AlphaTC to it's own file src/cpu/o3/alpha/cpu_impl.hh: Move AlphaTC to it's own file ... src/cpu/o3/alpha/dyn_inst.cc: src/cpu/o3/alpha/dyn_inst.hh: src/cpu/o3/alpha/dyn_inst_impl.hh: include paths src/cpu/o3/alpha/impl.hh: include paths, set default MaxThreads to 2 instead of 4 src/cpu/o3/alpha/params.hh: set Alpha Specific Params here src/python/m5/objects/O3CPU.py: add O3CPU class src/cpu/o3/SConscript: include isa-specific build files src/cpu/o3/alpha/thread_context.cc: NEW HOME of AlphaTC src/cpu/o3/alpha/thread_context.hh: new home of AlphaTC src/cpu/o3/isa_specific.hh: includes ISA specific files src/cpu/o3/params.hh: base o3 params src/cpu/o3/thread_context.hh: base o3 thread context src/cpu/o3/thread_context_impl.hh: base o3 thead context impl |
2816:776562207565 |
29-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge |
2808:a88ea76f6738 |
27-Jun-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Make full CPU handle SE faults |
2806:2e42ac0e7bd0 |
29-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zamp:/z/ktlim2/clean/newmem-merge into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem |
2803:0459f5ec8bf8 |
26-Jun-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
don't depend on the memory system to return the atomic cpu a multiple of cpu cycles. |
2800:18a615ca6e19 |
26-Jun-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
add syscall emulation page table fault so we can allocate more stack pages
src/cpu/simple/base.cc: add syscall emulation page table fault so we can allocate more stack pages FaultBase::invoke will do this, we don't need to do it here src/sim/faults.hh: I have no idea why this #if was there... gone src/sim/process.cc: make stack_min actually be the current minimum |
2798:751e9170247e |
29-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Various fixes for the CPU models to support the features that have been moved to python.
src/cpu/base.cc: src/cpu/base.hh: src/cpu/simple/atomic.hh: Switching out no longer takes a sampler. src/cpu/simple/atomic.cc: Fix up switching out. Also fix up serialization; the nameOut() was messing up the ordering. src/cpu/simple/timing.cc: Add in quiesce, fix up serialization. src/cpu/simple/timing.hh: Add in queisce, fix up serialization. |
2795:a51d5bbcbe41 |
25-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Make OzoneCPU work again in SE/FS.
src/cpu/ozone/cpu.hh: Fixes to get OzoneCPU working in SE/FS again. src/cpu/ozone/cpu_impl.hh: Be sure to set up ports properly. src/cpu/ozone/front_end.hh: Allow port to be created without specifying its name at the beginning. src/cpu/ozone/front_end_impl.hh: Setup port properly, also only use checker if it's enabled. src/cpu/ozone/lw_back_end_impl.hh: Be sure to initialize variables. src/cpu/ozone/lw_lsq.hh: Handle locked flag for UP systems. src/cpu/ozone/lw_lsq_impl.hh: Initialize all variables. src/python/m5/objects/OzoneCPU.py: Fix up config. |
2794:0dd6cb8820e1 |
22-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Checker related updates.
src/cpu/o3/cpu.cc: Updates to make sure the checker is compiled in if enabled and also to include it only when it's used. |
2792:440dfbb180a7 |
22-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Changes to get OzoneCPU to compile once more. The changes largely are fixing up the memory accesses to use ports/Requests/Packets, supporting the splitting off of instantiation of template classes, and handling some of the reorganization that happened.
OzoneCPU is untested for now but at least compiles. Fixes will be coming shortly.
SConstruct: Remove OzoneSimpleCPU from list of CPUs. src/cpu/SConscript: Leave out OzoneSimpleCPU. src/cpu/ozone/bpred_unit.cc: Fixes to get OzoneCPU to compile. src/cpu/ozone/checker_builder.cc: src/cpu/ozone/cpu.cc: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_builder.cc: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/dyn_inst.hh: src/cpu/ozone/dyn_inst_impl.hh: src/cpu/ozone/front_end.cc: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_back_end.hh: src/cpu/ozone/lw_back_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/ozone_impl.hh: src/cpu/ozone/rename_table.cc: src/cpu/ozone/simple_params.hh: src/cpu/ozone/thread_state.hh: Fixes to get OzoneCPU back to compiling. |
2791:7b2a7e21909b |
22-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Change ThreadState constructor ordering to match the rest of the ThreadStates. |
2790:2f8e9762bee9 |
22-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Misc fixes.
src/cpu/o3/alpha_dyn_inst_impl.hh: Consolidate these calls into one. src/cpu/o3/commit_impl.hh: Include checker only if it's being used. src/cpu/o3/fetch_impl.hh: Do not deallocate request if it's a squashed response that was received. src/cpu/o3/lsq_unit.hh: Add in comment. src/cpu/o3/lsq_unit_impl.hh: Only include checker if it's being used. |
2789:f99d51ed08d6 |
22-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Split Checker up properly into templated and non-templated definitions. |
2788:73f724ff348f |
22-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix to have the static inst exec sigs also dependent on the CPU models used. |
2783:381a5413b55a |
17-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Minor updates.
src/cpu/o3/alpha_cpu.hh: Fix #define in header. util/rundiff: Fix file comments to be more correct. util/tracediff: Update comments to be more correct. |
2781:b689ee340f27 |
17-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge |
2771:f11394ba1687 |
17-Jun-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
remove byte_swap.hh since it's not used |
2769:04c9a7db403f |
17-Jun-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Merge zizzer:/bk/newmem into zeep.eecs.umich.edu:/z/saidi/work/m5.newmem |
2766:0844a9607f77 |
17-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix up code to be able to use the Checker.
SConstruct: Remove check for Checker from this SConstruct src/arch/SConscript: Specific check if CheckerCPU is being used. Not the cleanest, but works for now. src/cpu/SConscript: Code to handle using the CheckerCPU a little better. Allows -c to be used normally. |
2765:2962455d1c0a |
17-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Split off instantiation into separate CC files for each of the models. This makes it easier to be able to specify only certain CPU models.
src/cpu/SConscript: Split off instantiations into separate CC files. This makes it easier to split them per CPU model. src/cpu/base_dyn_inst_impl.hh: Move instantations out of impl.hh file and into a cc file. src/cpu/checker/cpu_impl.hh: Move instantiations over to .cc files inside each CPU's directory. Makes it easier to only use what's actually included. src/cpu/o3/bpred_unit.cc: Pull Ozone instantiations out of this .cc file; put them into the ozone's CC file. src/cpu/o3/checker_builder.cc: Instantiate Checker for O3 CPU. src/cpu/ozone/checker_builder.cc: Instantiate Checker for Ozone CPU. |
2761:55b821162cd2 |
17-Jun-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Minor fixes in comments.
SConstruct: Fix paths in comments and other minor comment edits. src/cpu/SConscript: Fix path in comment. |
2757:58e3a66e72f7 |
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge |
2756:7bf0d6481df9 |
15-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
Initial changes to allowed DetailedCPU to work with other architectures (i.e. Sparc & MIPS)
Still need to add some code to fetch & commit stages
src/cpu/o3/commit.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: Add nextNPC read & set functions src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: Add nextNPC |
2746:6c4d449121b9 |
12-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
Merge zizzer:/bk/newmem into zizzer.eecs.umich.edu:/.automount/zooks/y/ksewell/research/m5-sim/newmem-release |
2741:a73a50764b86 |
11-Jun-2006 |
Korey Sewell <ksewell@umich.edu> |
Edit Fetch DPRINT in simple CPU
src/arch/mips/isa/formats/mt.isa: change copyright to 2006 src/cpu/simple/base.cc: Only DPRINT NNPC if we are not using ALPHA src/cpu/static_inst.hh: Take Out MIPS Specific functions ... |
2736:98dcdc08884d |
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Reorganization to move FuncUnit, FUDesc, and OpDesc out of the encumbered directory and into the normal cpu directory.
src/SConscript: Split off FuncUnits from old FUPool so I'm not including encumbered code. This was all written by Steve Raasch so it's safe to include in the main tree. src/cpu/o3/fu_pool.cc: Include the func unit file that's not in the encumbered directory. |
2735:f74563d64c6b |
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Add in exec_context.hh, which is a file for documentation purposes only. It describes the ExecContext interface that the ISA uses to access CPU state. Also #ifdef Erik's old copy code from the decoder so ExecContext doesn't need his two specific copy functions.
src/arch/alpha/isa/decoder.isa: Surround Erik's old copy code with #ifdefs. This way the copy functions don't need to be included in the ExecContext (until somebody decides to add them back in). |
2734:af0d50755df7 |
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Miscellaneous minor fixes.
src/cpu/checker/cpu.cc: Add in comment. src/cpu/cpuevent.hh: Fix up comment. src/cpu/o3/bpred_unit.cc: Comment out Ozone instantiations. src/cpu/o3/dep_graph.hh: Include destructor. |
2733:e0eac8fc5774 |
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Two updates that got combined into one ChangeSet accidentally. They're both pretty simple so they shouldn't cause any trouble.
First: Rename FullCPU and its variants in the o3 directory to O3CPU to differentiate from the old model, and also to specify it's an out of order model.
Second: Include build options for selecting the Checker to be used. These options make sure if the Checker is being used there is a CPU that supports it also being compiled.
SConstruct: Add in option USE_CHECKER to allow for not compiling in checker code. The checker is enabled through this option instead of through the CPU_MODELS list. However it's still necessary to treat the Checker like a CPU model, so it is appended onto the CPU_MODELS list if enabled. configs/test/test.py: Name change for DetailedCPU to DetailedO3CPU. Also include option for max tick. src/base/traceflags.py: Add in O3CPU trace flag. src/cpu/SConscript: Rename AlphaFullCPU to AlphaO3CPU.
Only include checker sources if they're necessary. Also add a list of CPUs that support the Checker, and only allow the Checker to be compiled in if one of those CPUs are also being included. src/cpu/base_dyn_inst.cc: src/cpu/base_dyn_inst.hh: Rename typedef to ImplCPU instead of FullCPU, to differentiate from the old FullCPU. src/cpu/cpu_models.py: src/cpu/o3/alpha_cpu.cc: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_cpu_impl.hh: Rename AlphaFullCPU to AlphaO3CPU to differentiate from old FullCPU model. src/cpu/o3/alpha_dyn_inst.hh: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/alpha_impl.hh: src/cpu/o3/alpha_params.hh: src/cpu/o3/commit.hh: src/cpu/o3/cpu.hh: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/regfile.hh: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: src/cpu/o3/thread_state.hh: src/python/m5/objects/AlphaO3CPU.py: Rename FullCPU to O3CPU to differentiate from old FullCPU model. src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit_impl.hh: Rename FullCPU to O3CPU to differentiate from old FullCPU model. Also #ifdef the checker code so it doesn't need to be included if it's not selected. |
2732:d2443ce353d2 |
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Checker updates.
src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: Updates for checker. Output more informative messages on error. Rename some functions. Add in option to warn (and not exit) on load results being incorrect. src/cpu/checker/cpu_builder.cc: src/cpu/checker/o3_cpu_builder.cc: Add in parameter to warn (and not exit) on load result errors. src/cpu/o3/commit_impl.hh: src/cpu/o3/lsq_unit_impl.hh: Renamed checker functin. |
2731:822b96578fba |
14-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Minor code cleanup of BaseDynInst.
src/cpu/base_dyn_inst.cc: src/cpu/base_dyn_inst.hh: Minor code cleanup by putting several bools into a bitset instead. src/cpu/o3/commit_impl.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/rob_impl.hh: Changed around some things in BaseDynInst. |
2727:91e17c7ee622 |
13-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Minor updates for stats.
src/cpu/o3/commit_impl.hh: src/cpu/o3/fetch.hh: Update stats comments. src/cpu/o3/fetch_impl.hh: Differentiate stats. src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: Update for stats. src/cpu/o3/lsq.hh: LSQ now has stats. src/cpu/o3/lsq_impl.hh: Register stats of all LSQ units. src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: Add in stats. |
2723:4c47709f88ab |
13-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zizzer.eecs.umich.edu:/.automount/zamp/z/ktlim2/clean/newmem-merge |
2722:610b13e19da0 |
13-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Compile fix. |
2720:695250d6fa42 |
12-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge fixes to make full system compile and run.
src/arch/alpha/linux/system.cc: src/cpu/o3/alpha_cpu_impl.hh: src/sim/system.cc: Merge fixes. |
2719:d73e952240aa |
12-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Removed syscall function from thread_context.hh. ThreadContext is the interface for external, non-CPU objects to access the thread, so they probably shouldn't be able to call syscall(). The case it was being used for was already handled by the ISA code.
src/arch/sparc/faults.cc: src/cpu/thread_context.hh: Fix for merge problems. |
2716:b9114064d77a |
11-Jun-2006 |
Nathan Binkert <binkertn@umich.edu> |
Merge iceaxe.:/Volumes/work/research/m5/head into iceaxe.:/Volumes/work/research/m5/merge
src/cpu/simple/base.cc: src/kern/kernel_stats.cc: src/kern/kernel_stats.hh: src/kern/system_events.cc: src/kern/system_events.hh: src/python/m5/objects/System.py: src/sim/system.cc: src/sim/system.hh: hand merge |
2713:c424d724dc4c |
11-Jun-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Fix compiling for SPARC_SE: - change include from exec_context.hh -> threadcontext.hh - g++ 4.0.3 complaint about broken code (which it was). - bad merge thread_context -> exec_context
src/arch/sparc/isa/includes.isa: Fix SPARC_SE for exec_context->thread_context switch src/arch/sparc/regfile.hh: fix g++ 4.0.3 complaint about broken code (which it was). src/cpu/thread_context.hh: fix bad merge |
2708:c4157b162e7b |
09-Jun-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Merge vm1.(none):/home/stever/bk/newmem into vm1.(none):/home/stever/bk/newmem-py
src/python/m5/__init__.py: src/sim/syscall_emul.cc: Hand merge. |
2704:731cd38be7f5 |
12-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes for checker. The RC/RS instructions check the interrupt flag, which isn't verifiable by the checker.
src/arch/alpha/isa/decoder.isa: src/cpu/checker/cpu.cc: Fixes for checker. |
2703:638e5b90f4c6 |
12-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix output messages.
src/cpu/o3/decode_impl.hh: src/cpu/o3/rename_impl.hh: Fix output message. |
2702:8a3ee279559b |
12-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Clean up/shift some code around.
src/cpu/base_dyn_inst.cc: Clean up some code and update. src/cpu/base_dyn_inst.hh: Clean up some code and update with more descriptive function names. src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_params.hh: src/cpu/o3/commit.hh: Remove unused parameters. src/cpu/o3/commit_impl.hh: Remove unused parameters, also set squashCounter directly to the counted number of squashes. src/cpu/o3/fetch_impl.hh: Update for function name changes. src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: Remove unused parameter, move some code into a function. |
2699:c255fef3daaa |
09-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Two minor fixes.
src/cpu/o3/lsq_unit_impl.hh: Missed this name change. src/cpu/thread_state.cc: Fix for stats. |
2698:d5f35d41e017 |
09-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Removing of old code and adding in new comments.
src/cpu/base_dyn_inst.cc: Clean up old functions, comments. src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_params.hh: src/cpu/o3/cpu.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/ozone/lsq_unit.hh: src/cpu/ozone/lsq_unit_impl.hh: Remove old commented code. src/cpu/o3/fetch.hh: Remove old commented code, add in comments. src/cpu/o3/inst_queue_impl.hh: Move comment to better place. src/cpu/o3/lsq_unit.hh: Remove old commented code, add in new comments. src/cpu/o3/lsq_unit_impl.hh: Remove old commented code, rename variable. |
2696:30b38e36ff54 |
09-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Allow for fetch to retry access if the sendTiming call fails. |
2695:07d258482551 |
09-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix allocating requests twice on retries. |
2694:879ca5098a90 |
09-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Remove obsolete stuff.
src/cpu/o3/alpha_cpu.hh: Remove functions no longer used for reading and writing. |
2693:18c6be231eb1 |
09-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes for some outstanding issues in the LSQ. It should now be able to retry. It should also be able to handle LL/SC (through hacks) for the UP case.
src/cpu/o3/lsq_unit.hh: Handle being able to retry (untested but hopefully very close to working).
Handle lock flag for LL/SC hack. Hopefully the memory system will add in LL/SC soon.
Better output message. src/cpu/o3/lsq_unit_impl.hh: Handle being able to retry (untested but should be very close to working).
Make SC's work (hopefully) while the memory system doesn't have a LL/SC implementation. |
2692:e5b7553eff69 |
08-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Tell checker that an instruction is completed prior once it does the access to memory. As long as the checker does not access memory to verify the store's data (currently impossible in the O3 model), this will work fine.
src/cpu/o3/lsq_unit_impl.hh: Tell checker that an instruction is completed prior once it does the access to memory. |
2691:549145d8ff75 |
08-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Creation of translating port pushed off to CPU. |
2690:f4337c0d9e6f |
08-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Get O3 CPU mostly working in full system, and fix an FP bug that showed up.
It still does not yet handle retries.
src/cpu/base_dyn_inst.hh: Get working in full-system mode and fix some FP bugs. src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/thread_context.hh: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/thread_state.hh: src/cpu/thread_state.hh: Get working in full system. src/cpu/checker/o3_cpu_builder.cc: Checker does not take a MemObject as a simobj parameter. src/cpu/o3/alpha_dyn_inst.hh: Fix up float regs. src/cpu/o3/regfile.hh: Fix up an fp error, print out more useful output messages. |
2689:dbf969c18a65 |
07-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Update copyright. |
2684:71f3cabf891f |
08-Jun-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
add write/read functions that have endian conversions in them when we get a virtual port delete it (even though delete does nothing in these cases)
src/arch/alpha/linux/system.cc: src/arch/alpha/stacktrace.cc: src/base/remote_gdb.cc: src/cpu/simple_thread.cc: when we get a virtual port delete it (even though delete does nothing in this case) src/mem/port.hh: src/mem/vport.hh: add write/read functions that have endian conversions in them |
2683:d6b72bb2ed97 |
07-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Reorganization/renaming of CPUExecContext. Now it is called SimpleThread in order to clear up the confusion due to the many ExecContexts. It also derives from a common ThreadState object, which holds various state common to threads across CPU models.
Following with the previous check-in, ExecContext now refers only to the interface provided to the ISA in order to access CPU state. ThreadContext refers to the interface provided to all objects outside the CPU in order to access thread state. SimpleThread provides all thread state and the interface to access it, and is suitable for simple execution models such as the SimpleCPU.
src/SConscript: Include thread state file. src/arch/alpha/ev5.cc: src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/thread_context.hh: src/cpu/memtest/memtest.cc: src/cpu/memtest/memtest.hh: src/cpu/o3/cpu.cc: src/cpu/ozone/cpu_impl.hh: src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: Rename CPUExecContext to SimpleThread. src/cpu/base_dyn_inst.hh: Make thread member variables protected.. src/cpu/o3/alpha_cpu.hh: src/cpu/o3/cpu.hh: Make various members of ThreadState protected. src/cpu/o3/alpha_cpu_impl.hh: Push generation of TranslatingPort into the CPU itself. Make various members of ThreadState protected. src/cpu/o3/thread_state.hh: Pull a lot of common code into the base ThreadState class. src/cpu/ozone/thread_state.hh: Rename CPUExecContext to SimpleThread, move a lot of common code into base ThreadState class. src/cpu/thread_state.hh: Push a lot of common code into base ThreadState class. This goes along with renaming CPUExecContext to SimpleThread, and making it derive from ThreadState. src/cpu/simple_thread.cc: Rename CPUExecContext to SimpleThread, make it derive from ThreadState. This helps push a lot of common code/state into a single class that can be used by all CPUs. src/cpu/simple_thread.hh: Rename CPUExecContext to SimpleThread, make it derive from ThreadState. src/kern/system_events.cc: Rename cpu_exec_context to thread_context. src/sim/process.hh: Remove unused forward declaration. |
2682:52ac6338355d |
07-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Move checker's exec_context.hh to match the other changes. Also add in some more comments.
src/cpu/thread_context.hh: Add more comments. |
2681:6885b69f4075 |
07-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Clear misc regs at startup.
src/arch/alpha/regfile.hh: Define clear functions on the individual reg files. src/cpu/o3/regfile.hh: Be sure to clear the misc reg file at startup. |
2680:246e7104f744 |
06-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Change ExecContext to ThreadContext. This is being renamed to differentiate between the interface used objects outside of the CPU, and the interface used by the ISA. ThreadContext is used by objects outside of the CPU and is specifically defined in thread_context.hh. ExecContext is more implicit, and is defined by files such as base_dyn_inst.hh or cpu/simple/base.hh.
Further renames/reorganization will be coming shortly; what is currently CPUExecContext (the old ExecContext from m5) will be renamed to SimpleThread or something similar.
src/arch/alpha/arguments.cc: src/arch/alpha/arguments.hh: src/arch/alpha/ev5.cc: src/arch/alpha/faults.cc: src/arch/alpha/faults.hh: src/arch/alpha/freebsd/system.cc: src/arch/alpha/freebsd/system.hh: src/arch/alpha/isa/branch.isa: src/arch/alpha/isa/decoder.isa: src/arch/alpha/isa/main.isa: src/arch/alpha/linux/process.cc: src/arch/alpha/linux/system.cc: src/arch/alpha/linux/system.hh: src/arch/alpha/linux/threadinfo.hh: src/arch/alpha/process.cc: src/arch/alpha/regfile.hh: src/arch/alpha/stacktrace.cc: src/arch/alpha/stacktrace.hh: src/arch/alpha/tlb.cc: src/arch/alpha/tlb.hh: src/arch/alpha/tru64/process.cc: src/arch/alpha/tru64/system.cc: src/arch/alpha/tru64/system.hh: src/arch/alpha/utility.hh: src/arch/alpha/vtophys.cc: src/arch/alpha/vtophys.hh: src/arch/mips/faults.cc: src/arch/mips/faults.hh: src/arch/mips/isa_traits.cc: src/arch/mips/isa_traits.hh: src/arch/mips/linux/process.cc: src/arch/mips/process.cc: src/arch/mips/regfile/float_regfile.hh: src/arch/mips/regfile/int_regfile.hh: src/arch/mips/regfile/misc_regfile.hh: src/arch/mips/regfile/regfile.hh: src/arch/mips/stacktrace.hh: src/arch/sparc/faults.cc: src/arch/sparc/faults.hh: src/arch/sparc/isa_traits.hh: src/arch/sparc/linux/process.cc: src/arch/sparc/linux/process.hh: src/arch/sparc/process.cc: src/arch/sparc/regfile.hh: src/arch/sparc/solaris/process.cc: src/arch/sparc/stacktrace.hh: src/arch/sparc/ua2005.cc: src/arch/sparc/utility.hh: src/arch/sparc/vtophys.cc: src/arch/sparc/vtophys.hh: src/base/remote_gdb.cc: src/base/remote_gdb.hh: src/cpu/base.cc: src/cpu/base.hh: src/cpu/base_dyn_inst.hh: src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/exec_context.hh: src/cpu/cpu_exec_context.cc: src/cpu/cpu_exec_context.hh: src/cpu/cpuevent.cc: src/cpu/cpuevent.hh: src/cpu/exetrace.hh: src/cpu/intr_control.cc: src/cpu/memtest/memtest.hh: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/regfile.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/back_end.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/inorder_back_end.hh: src/cpu/ozone/lw_back_end.hh: src/cpu/ozone/lw_back_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/thread_state.hh: src/cpu/pc_event.cc: src/cpu/pc_event.hh: src/cpu/profile.cc: src/cpu/profile.hh: src/cpu/quiesce_event.cc: src/cpu/quiesce_event.hh: src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: src/cpu/static_inst.cc: src/cpu/static_inst.hh: src/cpu/thread_state.hh: src/dev/alpha_console.cc: src/dev/ns_gige.cc: src/dev/sinic.cc: src/dev/tsunami_cchip.cc: src/kern/kernel_stats.cc: src/kern/kernel_stats.hh: src/kern/linux/events.cc: src/kern/linux/events.hh: src/kern/system_events.cc: src/kern/system_events.hh: src/kern/tru64/dump_mbuf.cc: src/kern/tru64/tru64.hh: src/kern/tru64/tru64_events.cc: src/kern/tru64/tru64_events.hh: src/mem/vport.cc: src/mem/vport.hh: src/sim/faults.cc: src/sim/faults.hh: src/sim/process.cc: src/sim/process.hh: src/sim/pseudo_inst.cc: src/sim/pseudo_inst.hh: src/sim/syscall_emul.cc: src/sim/syscall_emul.hh: src/sim/system.cc: src/cpu/thread_context.hh: src/sim/system.hh: src/sim/vptr.hh: Change ExecContext to ThreadContext. |
2679:737e9f158843 |
06-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix checker to work in newmem in SE mode.
src/cpu/o3/fetch_impl.hh: Give the checker a pointer to the icachePort. src/cpu/o3/lsq_unit_impl.hh: Give the checker a pointer to the dcachePort. src/mem/request.hh: Allow checking for the scResult being valid prior to accessing it. |
2678:1f86b91dc3bb |
05-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes to get new CPU model working for simple test case. The CPU does not yet support retrying accesses.
src/cpu/base_dyn_inst.cc: Delete the allocated data in destructor. src/cpu/base_dyn_inst.hh: Only copy the addresses if the translation succeeded. src/cpu/o3/alpha_cpu.hh: Return actual translating port. Don't panic on setNextNPC() as it's always called, regardless of the architecture, when the process initializes. src/cpu/o3/alpha_cpu_impl.hh: Pass in memobject to the thread state in SE mode. src/cpu/o3/commit_impl.hh: Initialize all variables. src/cpu/o3/decode_impl.hh: Handle early resolution of branches properly. src/cpu/o3/fetch.hh: Switch structure back to requests. src/cpu/o3/fetch_impl.hh: Initialize all variables, create/delete requests properly. src/cpu/o3/lsq_unit.hh: Include sender state along with the packet. Also include a more generic writeback event that's only used for stores forwarding data to loads. src/cpu/o3/lsq_unit_impl.hh: Redo writeback code to support the response path of the memory system. src/cpu/o3/mem_dep_unit.cc: src/cpu/o3/mem_dep_unit_impl.hh: Wrap variables in #ifdefs. src/cpu/o3/store_set.cc: Include to get panic() function. src/cpu/o3/thread_state.hh: Create with MemObject as well. src/cpu/thread_state.hh: Have a translating port in the thread state object. src/python/m5/objects/AlphaFullCPU.py: Mem parameter no longer needed. |
2674:6d4afef73a20 |
04-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zamp:/z/ktlim2/clean/m5-o3 into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge
src/cpu/checker/o3_cpu_builder.cc: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/bpred_unit.cc: src/cpu/o3/commit.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/thread_state.hh: Hand merge. |
2672:268abc78c6af |
04-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes to get everything working again.
src/cpu/simple/base.cc: Start XC's as suspended. |
2671:28ad11557754 |
04-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fix for full system compiling. |
2670:9107b8bd08cd |
02-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/newmem into zizzer.eecs.umich.edu:/.automount/zamp/z/ktlim2/clean/newmem |
2669:f2b336e89d2a |
02-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes to get compiling to work. This is mainly fixing up some includes; changing functions within the XCs; changing MemReqPtrs to Requests or Packets where appropriate.
Currently the O3 and Ozone CPUs do not work in the new memory system; I still need to fix up the ports to work and handle responses properly. This check-in is so that the merge between m5 and newmem is no longer outstanding.
src/SConscript: Need to include FU Pool for new CPU model. I'll try to figure out a cleaner way to handle this in the future. src/base/traceflags.py: Include new traces flags, fix up merge mess up. src/cpu/SConscript: Include the base_dyn_inst.cc as one of othe sources. Don't compile the Ozone CPU for now. src/cpu/base.cc: Remove an extra } from the merge. src/cpu/base_dyn_inst.cc: Fixes to make compiling work. Don't instantiate the OzoneCPU for now. src/cpu/base_dyn_inst.hh: src/cpu/o3/2bit_local_pred.cc: src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst.hh: src/cpu/o3/alpha_params.hh: src/cpu/o3/bpred_unit.cc: src/cpu/o3/btb.hh: src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/free_list.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/regfile.hh: src/cpu/o3/sat_counter.hh: src/cpu/op_class.hh: src/cpu/ozone/cpu.hh: src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/exec_context.hh: src/cpu/checker/o3_cpu_builder.cc: src/cpu/ozone/cpu_impl.hh: src/mem/request.hh: src/cpu/o3/fu_pool.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/back_end.hh: src/cpu/ozone/dyn_inst.cc: src/cpu/ozone/dyn_inst.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/inorder_back_end.hh: src/cpu/ozone/lw_back_end.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/ozone_impl.hh: src/cpu/ozone/thread_state.hh: Fixes to get compiling to work. src/cpu/o3/alpha_cpu.hh: Fixes to get compiling to work. Float reg accessors have changed, as well as MemReqPtrs to RequestPtrs. src/cpu/o3/alpha_dyn_inst_impl.hh: Fixes to get compiling to work. Pass in the packet to the completeAcc function. Fix up syscall function. |
2667:fe64b8353b1c |
09-Jun-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Move main control from C++ into Python. User script now invokes initialization and simulation loop after building configuration. These functions are exported from C++ to Python using SWIG.
SConstruct: Set up SWIG builder & scanner. Set up symlinking of source files into build directory (by not disabling the default behavior). configs/test/test.py: Rewrite to use new script-driven interface. Include a sample option. src/SConscript: Set up symlinking of source files into build directory (by not disabling the default behavior). Add SWIG-generated main_wrap.cc to source list. src/arch/SConscript: Set up symlinking of source files into build directory (by not disabling the default behavior). src/arch/alpha/ev5.cc: src/arch/alpha/isa/decoder.isa: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/trace/opt_cpu.cc: src/cpu/trace/trace_cpu.cc: src/sim/pseudo_inst.cc: src/sim/root.cc: src/sim/serialize.cc: src/sim/syscall_emul.cc: SimExit() is now exitSimLoop(). src/cpu/base.cc: SimExitEvent is now SimLoopExitEvent src/python/SConscript: Add SWIG build command for main.i. Use python/m5 in build dir as source for zip archive... easy now with file duplication enabled. src/python/m5/__init__.py: - Move copyright notice back to C++ so we can print it right away, even for interactive sessions. - Get rid of argument parsing code; just provide default option descriptors for user script to call optparse with. - Don't clutter m5 namespace by sucking in all of m5.config and m5.objects. - Move instantiate() function here from config.py. src/python/m5/config.py: - Move instantiate() function to __init__.py. - Param.Foo deferred type lookups must use m5.objects namespace now (not m5). src/python/m5/objects/AlphaConsole.py: src/python/m5/objects/AlphaFullCPU.py: src/python/m5/objects/AlphaTLB.py: src/python/m5/objects/BadDevice.py: src/python/m5/objects/BaseCPU.py: src/python/m5/objects/BaseCache.py: src/python/m5/objects/Bridge.py: src/python/m5/objects/Bus.py: src/python/m5/objects/CoherenceProtocol.py: src/python/m5/objects/Device.py: src/python/m5/objects/DiskImage.py: src/python/m5/objects/Ethernet.py: src/python/m5/objects/Ide.py: src/python/m5/objects/IntrControl.py: src/python/m5/objects/MemObject.py: src/python/m5/objects/MemTest.py: src/python/m5/objects/Pci.py: src/python/m5/objects/PhysicalMemory.py: src/python/m5/objects/Platform.py: src/python/m5/objects/Process.py: src/python/m5/objects/Repl.py: src/python/m5/objects/Root.py: src/python/m5/objects/SimConsole.py: src/python/m5/objects/SimpleDisk.py: src/python/m5/objects/System.py: src/python/m5/objects/Tsunami.py: src/python/m5/objects/Uart.py: Fix up imports (m5 namespace no longer includes m5.config). src/sim/eventq.cc: src/sim/eventq.hh: Support for Python-called simulate() function: - Use IsExitEvent flag to signal events that want to exit the simulation loop gracefully (instead of calling exit() to terminate the process). - Modify interface to hand exit event object back to caller so it can be inspected for cause. src/sim/host.hh: Add MaxTick constant. src/sim/main.cc: Move copyright notice back to C++ so we can print it right away, even for interactive sessions. Use PYTHONPATH environment var to set module path (instead of clunky code injection method). Move main control from here into Python: - Separate initialization code and simulation loop into separate functions callable from Python. - Make Python interpreter invocation more pure (more like directly invoking interpreter). Add -i and -p flags (only options on binary itself; other options processed by Python). Import readline package when using interactive mode. src/sim/sim_events.cc: SimExitEvent is now SimLoopExitEvent, and uses IsSimExit flag to terminate loop (instead of exiting simulator process). src/sim/sim_events.hh: SimExitEvent is now SimLoopExitEvent, and uses IsSimExit flag to terminate loop (instead of exiting simulator process). Get rid of a few unused constructors. src/sim/sim_exit.hh: SimExit() is now exitSimLoop(). Get rid of unused functions. Add comments. |
2665:a124942bacb8 |
31-May-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Updated Authors from bk prs info |
2663:c82193ae8467 |
31-May-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Streamline interface to Request object.
src/SConscript: mem/request.cc no longer needed (all functions inline). src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: src/cpu/simple/timing.cc: src/dev/io_device.cc: src/mem/port.cc: Modified Request object interface. src/mem/packet.hh: Modified Request object interface. Address & size are always set together now, so track with single flag. src/mem/request.hh: Streamline interface to support a handful of calls that set multiple fields reflecting common usage patterns. Reduce number of validFoo booleans by combining flags for fields which must be set together. |
2662:f24ae2d09e27 |
30-May-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Minor further cleanup & commenting of Packet class.
src/cpu/simple/atomic.cc: Make common ifetch setup based on Request rather than Packet. Packet::reset() no longer a separate function. sendAtomic() returns latency, not absolute tick. src/cpu/simple/atomic.hh: sendAtomic returns latency, not absolute tick. src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: Make common ifetch setup based on Request rather than Packet. src/dev/alpha_console.cc: src/dev/ide_ctrl.cc: src/dev/io_device.cc: src/dev/isa_fake.cc: src/dev/ns_gige.cc: src/dev/pciconfigall.cc: src/dev/sinic.cc: src/dev/tsunami_cchip.cc: src/dev/tsunami_io.cc: src/dev/tsunami_pchip.cc: src/dev/uart8250.cc: src/mem/physical.cc: Get rid of redundant Packet time field. src/mem/packet.cc: Eliminate reset() method. src/mem/packet.hh: Fold reset() function into reinitFromRequest()... it was only ever called together with that function. Get rid of redundant time field. Cleanup/add comments. src/mem/port.hh: Document in comment that sendAtomic returns latency, not absolute tick. |
2657:b119b774656b |
30-May-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Add a very poor implementation of dealing with retries on timing requests. It is especially slow with tracing on since it ends up being O(N^2). But it's probably going to have to change for the real bus anyway, so it should be rewritten then Change recvRetry() to not accept a packet. Sendtiming should be called again (and can respond with false or true) Removed Port Blocked/Unblocked and replaced with sendRetry(). Remove possibility of packet mangling if packet is going to be refused anyway in bridge
src/cpu/simple/atomic.cc: src/cpu/simple/atomic.hh: src/cpu/simple/timing.cc: src/cpu/simple/timing.hh: Change recvRetry() to not accept a packet. Sendtiming should be called again (and can respond with false or true) src/dev/io_device.cc: src/dev/io_device.hh: Make DMA Timing requests/responses work. Change recvRetry() to not accept a packet. Sendtiming should be called again (and can respond with false or true) src/mem/bridge.cc: src/mem/bridge.hh: Change recvRetry() to not accept a packet. Sendtiming should be called again (and can respond with false or true) Removed Port Blocked/Unblocked and replaced with sendRetry(). Remove posibility of packet mangling if packet is going to be refused anyway. src/mem/bus.cc: src/mem/bus.hh: Add a very poor implementation of dealing with retries on timing requests. It is especially slow with tracing on since it ends up being O(N^2). But it's probably going to have to change for the real bus anyway, so it should be rewritten then src/mem/port.hh: Change recvRetry() to not accept a packet. Sendtiming should be called again (and can respond with false or true) Removed Blocked/Unblocked port status, their functionality is really duplicated in the recvRetry() method |
2654:9559cfa91b9d |
30-May-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zizzer:/bk/m5 into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem
SConstruct: src/SConscript: src/arch/SConscript: src/arch/alpha/faults.cc: src/arch/alpha/tlb.cc: src/base/traceflags.py: src/cpu/SConscript: src/cpu/base.cc: src/cpu/base.hh: src/cpu/base_dyn_inst.cc: src/cpu/cpu_exec_context.cc: src/cpu/cpu_exec_context.hh: src/cpu/exec_context.hh: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/regfile.hh: src/cpu/ozone/cpu.hh: src/cpu/simple/base.cc: src/cpu/base_dyn_inst.hh: src/cpu/o3/2bit_local_pred.cc: src/cpu/o3/2bit_local_pred.hh: src/cpu/o3/alpha_cpu.cc: src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_dyn_inst.cc: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/alpha_impl.hh: src/cpu/o3/alpha_params.hh: src/cpu/o3/bpred_unit.cc: src/cpu/o3/bpred_unit.hh: src/cpu/o3/bpred_unit_impl.hh: src/cpu/o3/btb.cc: src/cpu/o3/btb.hh: src/cpu/o3/comm.hh: src/cpu/o3/commit.cc: src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu_policy.hh: src/cpu/o3/decode.cc: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.cc: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/free_list.cc: src/cpu/o3/free_list.hh: src/cpu/o3/iew.cc: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.cc: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/mem_dep_unit.cc: src/cpu/o3/mem_dep_unit.hh: src/cpu/o3/mem_dep_unit_impl.hh: src/cpu/o3/ras.cc: src/cpu/o3/ras.hh: src/cpu/o3/rename.cc: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/rename_map.cc: src/cpu/o3/rename_map.hh: src/cpu/o3/rob.cc: src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: src/cpu/o3/sat_counter.cc: src/cpu/o3/sat_counter.hh: src/cpu/o3/store_set.cc: src/cpu/o3/store_set.hh: src/cpu/o3/tournament_pred.cc: src/cpu/o3/tournament_pred.hh: Hand merges. |
2651:76db2c628241 |
29-May-2006 |
Ali Saidi <saidi@eecs.umich.edu> |
Create a new CpuEvent class that has a pointer to an execution context in the object and places itself on a global list so so the events can be migrated on cpu switches. Create a new wrapper classe called CpuEventWrapper that works like the old wrapper class but calls the function with the xc parameter Use new CpuEventWrapper class from tick compare events on sparc
src/arch/sparc/regfile.hh: Use new CpuEventWrapper class from tick compare events src/arch/sparc/ua2005.cc: Move definition to to a fullsystem only file, since it is. src/cpu/base.cc: On switch from one cpu to another CpuEvent::replaceExecContext() needs to be called on all (oldxc,newxc) pairs. |
2644:8a45565c2c04 |
26-May-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Fixes for TimingSimpleCPU under full system. Now boots Alpha Linux!
src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: Move traceData->finalize() into postExecute(). src/cpu/simple/timing.cc: Fixes for full system. Now boots Alpha Linux! - Handle ifetch faults, suspend/resume. - Delete memory request & packet objects on response. - Don't try to do split memory accesses on prefetch references (ISA description doesn't support this). src/cpu/simple/timing.hh: Minor reorganization of internal methods. |
2641:6d9d837e2032 |
26-May-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Significant rework of Packet class interface: - new constructor guarantees initialization of most fields - flags track status of non-guaranteed fields (addr, size, src) - accessor functions (getAddr() etc.) check status on access - Command & Result classes are nested in Packet class scope - Command now built from vector of behavior bits - string version of Command for tracing - reinitFromRequest() and makeTimingResponse() encapsulate common manipulations of existing packets
src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: src/cpu/simple/timing.cc: src/dev/alpha_console.cc: src/dev/ide_ctrl.cc: src/dev/io_device.cc: src/dev/io_device.hh: src/dev/isa_fake.cc: src/dev/ns_gige.cc: src/dev/pciconfigall.cc: src/dev/sinic.cc: src/dev/tsunami_cchip.cc: src/dev/tsunami_io.cc: src/dev/tsunami_pchip.cc: src/dev/uart8250.cc: src/mem/bus.cc: src/mem/bus.hh: src/mem/physical.cc: src/mem/port.cc: src/mem/port.hh: src/mem/request.hh: Update for new Packet interface. |
2640:266b80dd5eca |
26-May-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Add names to memory Port objects for tracing. |
2637:18e4273315cd |
22-May-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
Get rid of FastCPU model. It doesn't compile, and if we really want this we should start over from scratch and see if we can reuse parts from BaseSimpleCPU (e.g., derive a FastSimpleCPU).
SConstruct: src/arch/SConscript: src/cpu/cpu_models.py: Get rid of FastCPU model. |
2632:1bb2f91485ea |
22-May-2006 |
Steve Reinhardt <stever@eecs.umich.edu> |
New directory structure: - simulator source now in 'src' subdirectory - imported files from 'ext' repository - support building in arbitrary places, including outside of the source tree. See comment at top of SConstruct file for more details. Regression tests are temporarily disabled; that syetem needs more extensive revisions.
SConstruct: Update for new directory structure. Modify to support build trees that are not subdirectories of the source tree. See comment at top of file for more details. Regression tests are temporarily disabled. src/arch/SConscript: src/arch/isa_parser.py: src/python/SConscript: Update for new directory structure. |