#
14297:b4519e586f5e |
|
10-Sep-2019 |
Jordi Vaquero <jordi.vaquero@metempsy.com> |
cpu, mem: Changing AtomicOpFunctor* for unique_ptr<AtomicOpFunctor>
This change is based on modify the way we move the AtomicOpFunctor* through gem5 in order to mantain proper ownership of the object and ensuring its destruction when it is no longer used.
Doing that we fix at the same time a memory leak in Request.hh where we were assigning a new AtomicOpFunctor* without destroying the previous one.
This change creates a new type AtomicOpFunctor_ptr as a std::unique_ptr<AtomicOpFunctor> and move its ownership as needed. Except for its only usage when AtomicOpFunc() is called.
Change-Id: Ic516f9d8217cb1ae1f0a19500e5da0336da9fd4f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20919 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>
|
#
14194:967b9c450b04 |
|
17-Aug-2019 |
Gabe Black <gabeblack@google.com> |
cpu: Move O3's data port into the LSQ.
That's where it's used, and putting it there avoids having to pass around the port using the top level getDataPort function.
Change-Id: I0dea25d0c5f4bb3f58a6574a8f2b2d242784caf2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20238 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
14135:a0affe46d00a |
|
26-Jul-2019 |
Jordi Vaquero <jordi.vaquero@metempsy.com> |
cpu-o3: added _amo_op parameter in o3 LSQ
Fix bug with AMO (or RMW) instructions where the amo_op variable is not being propagated to the LSQ request.
Change-Id: I60c59641d9b497051376f638e27f3c4cc361f615 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19814 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
|
#
14111:14c05f862590 |
|
15-Nov-2018 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu-o3: Fix too strict assert condition in writeback()
The assert() in the LSQ writeback() only allowed ReExec faults. However, a SplitRequest which completed the translation in PartialFault state (i.e. any but the very first cacheline translation failed) may end up here. The assert() condition is extended accordingly.
The patch also removes the superfluous/unused Complete/Squashed states from the LSQ request. (The completion of the request is recorded in the flags still.)
Change-Id: Ie575f4d3b4d5295585828ad8c7d3f4c7c1fe15d0 Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19174 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
14105:969b4e972b07 |
|
27-Feb-2019 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu: Add first-/non-faulting load support to Minor and O3
Some architectures allow masking faults of memory load instructions in some specific circumstances (e.g. first-faulting and non-faulting loads in Arm SVE). This patch adds support for such loads in the Minor and O3 CPU models.
Change-Id: I264a81a078f049127779aa834e89f0e693ba0bea Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19178 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>
|
#
13954:2f400a5f2627 |
|
07-Jul-2017 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
cpu,mem: Add support for partial loads/stores and wide mem. accesses
This changeset adds support for partial (or masked) loads/stores, i.e. loads/stores that can disable accesses to individual bytes within the target address range. In addition, this changeset extends the code to crack memory accesses across most CPU models (TimingSimpleCPU still TBD), so that arbitrarily wide memory accesses are supported. These changes are required for supporting ISAs with wide vectors.
Additional authors: - Gabor Dozsa <gabor.dozsa@arm.com> - Tiago Muck <tiago.muck@arm.com>
Change-Id: Ibad33541c258ad72925c0b1d5abc3e5e8bf92d92 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13518 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
13710:5ba1d8066ef0 |
|
25-Jun-2018 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu-o3: Add cache read ports limit to LSQ
This change introduces cache read ports to limit the number of per-cycle loads. Previously only the number of per-cycle stores could be limited.
Change-Id: I39bbd984056c5a696725ee2db462a55b2079e2d4 Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13517 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
13652:45d94ac03a27 |
|
22-Jan-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: support atomic memory request type with AtomicOpFunctor
This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU, MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory system.
Atomic memory instruction is treated as a special store instruction in all CPU models.
In simple CPUs, an AMO request with an associated AtomicOpFunctor is simply sent to L1 dcache.
In MinorCPU, an AMO request bypasses store buffer and waits for any conflicting store request(s) currently in the store buffer to retire before the AMO request is sent to the cache. AMO requests are not buffered in the store buffer, so their effects appear immediately in the cache.
In DerivO3CPU, an AMO request is inserted in the store buffer so that it is delivered to the cache only after all previous stores are issued to the cache. Data forwarding between between an outstanding AMO in the store buffer and a subsequent load is not allowed since the AMO request does not hold valid data until it's executed in the cache.
This implementation assumes that a target ISA implementation must insert enough memory fences as micro-ops around an atomic instruction to enforce a correct order of memory instructions with respect to its memory consistency model. Without extra memory fences, this implementation can allow AMOs and other memory instructions that do not conflict (i.e., not target the same address) to reorder.
This implementation also assumes that atomic instructions execute within a cache line boundary since the cache for now is not able to execute an operation on two different cache lines in one single step. Therefore, ISAs like x86 that require multi-cache-line atomic instructions need to either use a pair of locking load and unlocking store or change the cache implementation to guarantee the atomicity of an atomic instruction.
Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a Reviewed-on: https://gem5-review.googlesource.com/c/8188 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
13628:332f730a1855 |
|
04-Feb-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
misc: added missing override specifier
Added missing specifier for various virtual functions.
Change-Id: I4783e92d78789a9ae182fad79aadceafb00b2458 Reviewed-on: https://gem5-review.googlesource.com/c/16103 Reviewed-by: Hoa Nguyen <hoanguyen@ucdavis.edu> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
13590:d7e018859709 |
|
13-Feb-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu-o3: O3 LSQ Generalisation
This patch does a large modification of the LSQ in the O3 model. The main goal of the patch is to remove the 'an operation can be served with one or two memory requests' assumption that is present in the LSQ and the instruction with the req, reqLow, reqHigh triplet, and generalising it to operations that can be addressed with one request, and operations that require many requests, embodied in the SingleDataRequest and the SplitDataRequest.
This modification has been done mimicking the minor model to an extent, shifting the responsibilities of dealing with VtoP translation and tracking the status and resources from the DynInst to the LSQ via the LSQRequest. The LSQRequest models the information concerning the operation, handles the creation of fragments for translation and request as well as assembling/splitting the data accordingly.
With this modifications, the implementation of vector ISAs, particularly on the memory side, become more rich, as the new model permits a dissociation of the ISA characteristics as vector length, from the microarchitectural characteristics that govern how contiguous loads are executing, allowing exploration of different LSQ to DL1 bus widths to understand the tradeoffs in complexity and performance.
Part of the complexities introduced stem from the fact that gem5 keeps a large amount of metadata regarding, in particular, memory operations, thus, when an instruction is squashed while some operation as TLB lookup or cache access is ongoing, when the relevant structure communicates to the LSQ that the operation is over, it tries to access some pieces of data that should have died when the instruction is squashed, leading to asserts, panics, or memory corruption. To ensure the correct behaviour, the LSQRequest rely on assesing who is their owner, and self-destroying if they detect their owner is done with the request, and there will be no subsequent action. For example, in the case of an instruction squashed whal the TLB is doing a walk to serve the translation, when the translation is served by the TLB, the LSQRequest detects that the instruction was squashed, and as the translation is done, no one else expect to access its information, and therefore, it self-destructs. Having destroyed the LSQRequest earlier, would lead to wrong behaviour as the TLB walk may access some fields of it.
Additional authors: - Gabor Dozsa <gabor.dozsa@arm.com>
Change-Id: I9578a1a3f6b899c390cdd886856a24db68ff7d0c Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13516 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
|
#
13560:f8732494c155 |
|
24-Dec-2018 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu-o3: Make the smtLSQPolicy a Param.ScopedEnum
The smtLSQPolicy is a parameter in the o3 cpu that can have 3 different values. Previously this setting was done through a string and a parser function would turn it into a c++ enum value. This changeset turns the string into a python Param.ScopedEnum.
Change-Id: I82041b88bd914c5dc660058d9e3998e3114e7c35 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15397 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
13472:7ceacede4f1e |
|
01-Mar-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu: Change raw pointers to STL Containers
This patch changes two members from being raw pointers to being STL containers. The reason behind, other than cleanlyness and arguable OO best practices is that containers have more intronspections capabilities than naked pointers do, as the size is known.
Using STL containers adds little overhead and eases the automation of process during debugging (gdb).
Change-Id: I4d9d3eedafa8b5e50ac512ea93b458a4200229f2 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13126 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
13429:a1e199fd8122 |
|
06-Feb-2017 |
Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> |
cpu: Fix the usage of const DynInstPtr
Summary: Usage of const DynInstPtr& when possible and introduction of move operators to RefCountingPtr.
In many places, scoped references to dynamic instructions do a copy of the DynInstPtr when a reference would do. This is detrimental to performance. On top of that, in case there is a need for reference tracking for debugging, the redundant copies make the process much more painful than it already is.
Also, from the theoretical point of view, a function/method that defines a convenience name to access an instruction should not be considered an owner of the data, i.e., doing a copy and not a reference is not justified.
On a related topic, C++11 introduces move semantics, and those are useful when, for example, there is a class modelling a HW structure that contains a list, and has a getHeadOfList function, to prevent doing a copy to an internal variable -> update pointer, remove from the list -> update pointer, return value making a copy to the assined variable -> update pointer, destroy the returned value -> update pointer.
Change-Id: I3bb46c20ef23b6873b469fd22befb251ac44d2f6 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13105 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
12749:223c83ed9979 |
|
04-Jun-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
misc: Using smart pointers for memory Requests
This patch is changing the underlying type for RequestPtr from Request* to shared_ptr<Request>. Having memory requests being managed by smart pointers will simplify the code; it will also prevent memory leakage and dangling pointers.
Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10996 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
11435:0f1b46dde3fa |
|
07-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
mem: Remove threadId from memory request class
In general, the ThreadID parameter is unnecessary in the memory system as the ContextID is what is used for the purposes of locks/wakeups. Since we allocate sequential ContextIDs for each thread on MT-enabled CPUs, ThreadID is unnecessary as the CPUs can identify the requesting thread through sideband info (SenderState / LSQ entries) or ContextID offset from the base ContextID for a cpu.
This is a re-spin of 20264eb after the revert (bd1c6789) and includes some fixes of that commit.
|
#
11429:cf5af0cc3be4 |
|
06-Apr-2016 |
Andreas Sandberg <andreas.sandberg@arm.com> |
Revert power patch sets with unexpected interactions
The following patches had unexpected interactions with the current upstream code and have been reverted for now:
e07fd01651f3: power: Add support for power models 831c7f2f9e39: power: Low-power idle power state for idle CPUs 4f749e00b667: power: Add power states to ClockedObject
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
11428:20264eb69fbf |
|
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
mem: Remove threadId from memory request class
In general, the ThreadID parameter is unnecessary in the memory system as the ContextID is what is used for the purposes of locks/wakeups. Since we allocate sequential ContextIDs for each thread on MT-enabled CPUs, ThreadID is unnecessary as the CPUs can identify the requesting thread through sideband info (SenderState / LSQ entries) or ContextID offset from the base ContextID for a cpu.
|
#
11302:bce9037689b0 |
|
17-Jan-2016 |
Steve Reinhardt <steve.reinhardt@amd.com> |
cpu: remove unnecessary data ptr from O3 internal read() funcs
The read() function merely initiates a memory read operation; the data doesn't arrive until the access completes and a response packet is received from the memory system. Thus there's no need to provide a data pointer; its existence is historical.
Getting this pointer out of this internal o3 interface sets the stage for similar cleanup in the ExecContext interface. Also found that we were pointlessly setting the contents at this pointer on a store forward (the useful memcpy happens just a few lines below the deleted one).
|
#
10713:eddb533708cb |
|
02-Mar-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Split port retry for all different packet classes
This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.
|
#
10333:6be8945d226b |
|
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix cache blocked load behavior in o3 cpu
This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked.
Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that.
Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior.
|
#
10239:592f0bb6bd6f |
|
21-Jun-2014 |
Binh Pham <binhpham@cs.rutgers.edu> |
o3: split load & store queue full cases in rename
Check for free entries in Load Queue and Store Queue separately to avoid cases when load cannot be renamed due to full Store Queue and vice versa.
This work was done while Binh was an intern at AMD Research.
|
#
9868:44a67004d6b4 |
|
11-Sep-2013 |
Joel Hestness <jthestness@gmail.com> |
cpu: Dynamically instantiate O3 CPU LSQUnits
Previously, the LSQ would instantiate MaxThreads LSQUnits in the body of it's object, but it would only initialize numThreads LSQUnits as specified by the user. This had the effect of leaving some LSQUnits uninitialized when the number of threads was less than MaxThreads, and when adding statistics to the LSQUnit that must be initialized, this caused the stats initialization check to fail. By dynamically instantiating LSQUnits, they are all initialized and this avoids uninitialized LSQUnits from floating around during runtime.
|
#
9444:ab47fe7f03f0 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Rewrite O3 draining to avoid stopping in microcode
Previously, the O3 CPU could stop in the middle of a microcode sequence. This patch makes sure that the pipeline stops when it has committed a normal instruction or exited from a microcode sequence. Additionally, it makes sure that the pipeline has no instructions in flight when it is drained, which should make draining more robust.
Draining is controlled in the commit stage, which checks if the next PC after a committed instruction is in microcode. If this isn't the case, it requests a squash of all instructions after that the instruction that just committed and immediately signals a drain stall to the fetch stage. The CPU then continues to execute until the pipeline and all associated buffers are empty.
|
#
9440:fdc91cab5760 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Fix O3 LSQ debug dumping constness and formatting
|
#
9358:aa761458ddcb |
|
06-Dec-2012 |
Nathanael Premillieu <nathanael.premillieu@irisa.fr> |
o3 cpu: remove some unused buggy functions in the lsq Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
#
8975:7f36d4436074 |
|
01-May-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Separate requests and responses for timing accesses
This patch moves send/recvTiming and send/recvTimingSnoop from the Port base class to the MasterPort and SlavePort, and also splits them into separate member functions for requests and responses: send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq, send/recvTimingSnoopResp. A master port sends requests and receives responses, and also receives snoop requests and sends snoop responses. A slave port has the reciprocal behaviour as it receives requests and sends responses, and sends snoop requests and receives snoop responses.
For all MemObjects that have only master ports or slave ports (but not both), e.g. a CPU, or a PIO device, this patch merely adds more clarity to what kind of access is taking place. For example, a CPU port used to call sendTiming, and will now call sendTimingReq. Similarly, a response previously came back through recvTiming, which is now recvTimingResp. For the modules that have both master and slave ports, e.g. the bus, the behaviour was previously relying on branches based on pkt->isRequest(), and this is now replaced with a direct call to the apprioriate member function depending on the type of access. Please note that send/recvRetry is still shared by all the timing accessors and remains in the Port base class for now (to maintain the current bus functionality and avoid changing the statistics of all regressions).
The packet queue is split into a MasterPort and SlavePort version to facilitate the use of the new timing accessors. All uses of the PacketQueue are updated accordingly.
With this patch, the type of packet (request or response) is now well defined for each type of access, and asserts on pkt->isRequest() and pkt->isResponse() are now moved to the appropriate send member functions. It is also worth noting that sendTimingSnoopReq no longer returns a boolean, as the semantics do not alow snoop requests to be rejected or stalled. All these assumptions are now excplicitly part of the port interface itself.
|
#
8948:e95ee70f876c |
|
14-Apr-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Separate snoops and normal memory requests/responses
This patch introduces port access methods that separates snoop request/responses from normal memory request/responses. The differentiation is made for functional, atomic and timing accesses and builds on the introduction of master and slave ports.
Before the introduction of this patch, the packets belonging to the different phases of the protocol (request -> [forwarded snoop request -> snoop response]* -> response) all use the same port access functions, even though the snoop packets flow in the opposite direction to the normal packet. That is, a coherent master sends normal request and receives responses, but receives snoop requests and sends snoop responses (vice versa for the slave). These two distinct phases now use different access functions, as described below.
Starting with the functional access, a master sends a request to a slave through sendFunctional, and the request packet is turned into a response before the call returns. In a system without cache coherence, this is all that is needed from the functional interface. For the cache-coherent scenario, a slave also sends snoop requests to coherent masters through sendFunctionalSnoop, with responses returned within the same packet pointer. This is currently used by the bus and caches, and the LSQ of the O3 CPU. The send/recvFunctional and send/recvFunctionalSnoop are moved from the Port super class to the appropriate subclass.
Atomic accesses follow the same flow as functional accesses, with request being sent from master to slave through sendAtomic. In the case of cache-coherent ports, a slave can send snoop requests to a master through sendAtomicSnoop. Just as for the functional access methods, the atomic send and receive member functions are moved to the appropriate subclasses.
The timing access methods are different from the functional and atomic in that requests and responses are separated in time and send/recvTiming are used for both directions. Hence, a master uses sendTiming to send a request to a slave, and a slave uses sendTiming to send a response back to a master, at a later point in time. Snoop requests and responses travel in the opposite direction, similar to what happens in functional and atomic accesses. With the introduction of this patch, it is possible to determine the direction of packets in the bus, and no longer necessary to look for both a master and a slave port with the requested port id.
In contrast to the normal recvFunctional, recvAtomic and recvTiming that are pure virtual functions, the recvFunctionalSnoop, recvAtomicSnoop and recvTimingSnoop have a default implementation that calls panic. This is to allow non-coherent master and slave ports to not implement these functions.
|
#
8809:bb10807da889 |
|
01-Feb-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head, hopefully the last time for this batch.
|
#
8799:dac1e33e07b0 |
|
28-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with the main repo.
|
#
8794:e2ac2b7164dd |
|
18-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Get rid of includes of config/full_system.hh.
|
#
8793:5f25086326ac |
|
18-Nov-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Get rid of FULL_SYSTEM in the CPU directory.
|
#
8737:770ccf3af571 |
|
31-Jan-2012 |
Koan-Sin Tan <koansin.tan@gmail.com> |
clang: Enable compiling gem5 using clang 2.9 and 3.0
This patch adds the necessary flags to the SConstruct and SConscript files for compiling using clang 2.9 and later (on Ubuntu et al and OSX XCode 4.2), and also cleans up a bunch of compiler warnings found by clang. Most of the warnings are related to hidden virtual functions, comparisons with unsigneds >= 0, and if-statements with empty bodies. A number of mismatches between struct and class are also fixed. clang 2.8 is not working as it has problems with class names that occur in multiple namespaces (e.g. Statistics in kernel_stats.hh).
clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which causes confusion between the container std::set and the function Packet::set, and this is currently addressed by not including the entire namespace std, but rather selecting e.g. "using std::vector" in the appropriate places.
|
#
8707:489489c67fd9 |
|
17-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
CPU: Moving towards a more general port across CPU models
This patch performs minimal changes to move the instruction and data ports from specialised subclasses to the base CPU (to the largest degree possible). Ultimately it servers to make the CPU(s) have a well-defined interface to the memory sub-system.
|
#
8706:b1838faf3bcc |
|
17-Jan-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
MEM: Add port proxies instead of non-structural ports
Port proxies are used to replace non-structural ports, and thus enable all ports in the system to correspond to a structural entity. This has the advantage of accessing memory through the normal memory subsystem and thus allowing any constellation of distributed memories, address maps, etc. Most accesses are done through the "system port" that is used for loading binaries, debugging etc. For the entities that belong to the CPU, e.g. threads and thread contexts, they wrap the CPU data port in a port proxy.
The following replacements are made: FunctionalPort > PortProxy TranslatingPort > SETranslatingPortProxy VirtualPort > FSTranslatingPortProxy
|
#
8229:78bf55f23338 |
|
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
includes: sort all includes
|
#
7520:67c670459d01 |
|
13-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Add readBytes and writeBytes functions to the exec contexts.
|
#
6974:4d4903a3e7c5 |
|
12-Feb-2010 |
Timothy M. Jones <tjones1@inf.ed.ac.uk> |
O3PCU: Split loads and stores that cross cache line boundaries.
When each load or store is sent to the LSQ, we check whether it will cross a cache line boundary and, if so, split it in two. This creates two TLB translations and two memory requests. Care has to be taken if the first packet of a split load is sent but the second blocks the cache. Similarly, for a store, if the first packet cannot be sent, we must store the second one somewhere to retry later.
This modifies the LSQSenderState class to record both packets in a split load or store.
Finally, a new const variable, HasUnalignedMemAcc, is added to each ISA to indicate whether unaligned memory accesses are allowed. This is used throughout the changed code so that compiler can optimise away code dealing with split requests for ISAs that don't need them.
|
#
6221:58a3c04e6344 |
|
26-May-2009 |
Nathan Binkert <nate@binkert.org> |
types: add a type for thread IDs and try to use it everywhere
|
#
5714:76abee886def |
|
02-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
Add in Context IDs to the simulator. From now on, cpuId is almost never used, the primary identifier for a hardware context should be contextId(). The concept of threads within a CPU remains, in the form of threadId() because sometimes you need to know which context within a cpu to manipulate.
|
#
5606:6da7a58b0bc8 |
|
09-Oct-2008 |
Nathan Binkert <nate@binkert.org> |
eventq: convert all usage of events to use the new API. For now, there is still a single global event queue, but this is necessary for making the steps towards a parallelized m5.
|
#
5529:9ae69b9cd7fd |
|
11-Aug-2008 |
Nathan Binkert <nate@binkert.org> |
params: Convert the CPU objects to use the auto generated param structs. A whole bunch of stuff has been converted to use the new params stuff, but the CPU wasn't one of them. While we're at it, make some things a bit more stylish. Most of the work was done by Gabe, I just cleaned stuff up a bit more at the end.
|
#
5494:85c8d296c1cb |
|
28-Jun-2008 |
Steve Reinhardt <stever@gmail.com> |
Backed out changeset 94a7bb476fca: caused memory leak.
|
#
5489:94a7bb476fca |
|
21-Jun-2008 |
Steve Reinhardt <stever@gmail.com> |
Generate more useful error messages for unconnected ports. Force all non-default ports to provide a name and an owner in the constructor.
|
#
4475:fb185cc1c845 |
|
22-May-2007 |
Steve Reinhardt <stever@eecs.umich.edu> |
Change getDeviceAddressRanges to use bool for snoop arg.
|
#
4329:52057dbec096 |
|
04-Apr-2007 |
Kevin Lim <ktlim@umich.edu> |
Pass ISA-specific O3 CPU as a constructor parameter instead of using setCPU functions.
src/cpu/o3/alpha/cpu_impl.hh: Pass ISA-specific O3 CPU to FullO3CPU as a constructor parameter instead of using setCPU functions.
|
#
4192:7accc6365bb9 |
|
09-Mar-2007 |
Kevin Lim <ktlim@umich.edu> |
Two fixes: 1. Make sure connectMemPorts() only gets called when the CPU's peer gets changed. This is done by making setPeer() virtual, and overriding it in the CPU's ports. When it gets called on a CPU's port (dcache specifically), it calls the normal setPeer() function, and also connectMemPorts(). 2. Consolidate redundant code that handles switching in a CPU.
src/cpu/base.cc: Move common code of switching over peers to base CPU. src/cpu/base.hh: Move common code of switching over peers to BaseCPU. src/cpu/o3/cpu.cc: Add in function that updates thread context's ports. Also use updated function to takeOverFrom() in BaseCPU. This gets rid of some repeated code. src/cpu/o3/cpu.hh: Include function to update thread context's memory ports. src/cpu/o3/lsq.hh: Add function to dcache port that will update the memory ports upon getting a new peer. Also include a function that will tell the CPU to update those memory ports. src/cpu/o3/lsq_impl.hh: Add function that will update the memory ports upon getting a new peer. src/cpu/simple/atomic.cc: src/cpu/simple/timing.cc: Add function that will update thread context's memory ports upon getting a new peer. Also use the new BaseCPU's take over from function. src/cpu/simple/atomic.hh: Add in function (and dcache port) that will allow the dcache to update memory ports when it gets assigned a new peer. src/cpu/simple/timing.hh: Add function that will update thread context's memory ports upon getting a new peer. src/mem/port.hh: Make setPeer virtual so that other classes can override it.
|
#
3846:a0fe3210ce53 |
|
15-Dec-2006 |
Lisa Hsu <hsul@eecs.umich.edu> |
little fixes i noticed while searching for reason for address range issues (but these weren't the cause of the problem).
RangeSize as a function takes a start address, and a SIZE, and will make the range (start, start+size-1) for you.
src/cpu/memtest/memtest.hh: src/cpu/o3/fetch.hh: src/cpu/o3/lsq.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/simple/atomic.hh: src/cpu/simple/timing.hh: Fix RangeSize arguments src/dev/alpha/tsunami_cchip.cc: src/dev/alpha/tsunami_io.cc: src/dev/alpha/tsunami_pchip.cc: src/dev/baddev.cc: pioSize indicates SIZE, not a mask
|
#
3647:8121d4503cbc |
|
13-Nov-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Make CPU models signal to update the snoop ranges
|
#
3192:f3e215dda3f6 |
|
09-Oct-2006 |
Ron Dreslinski <rdreslin@umich.edu> |
Have cpus send snoop ranges
|
#
2907:7b0ababb4166 |
|
13-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Move Dcache port creation from LSQUnit to LSQ in order to support Ron's recent changes, and using the O3CPU in SMT mode.
src/cpu/o3/lsq.hh: Update to have LSQ work with only one dcache port for all LSQ Units. LSQ has the dcache port, and the LSQ Units must tell the LSQ if the cache has become blocked. src/cpu/o3/lsq_impl.hh: Updates to have the LSQ work with only one dcache port for all LSQUnits. src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: Update for LSQ to create dcache port instead of LSQUnits. Now LSQUnits are given the dcache port from the LSQ, and also must check the LSQ if the cache is blocked prior to accessing the cache.
|
#
2871:7ed5c9ef3eb6 |
|
07-Jul-2006 |
Kevin Lim <ktlim@umich.edu> |
Support Ron's changes for hooking up ports.
src/cpu/checker/cpu.hh: Now that BaseCPU is a MemObject, the checker must define this function. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/iew.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_unit.hh: Implement getPort function so the connector can connect the ports properly. src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit_impl.hh: The connector handles connecting the ports now. src/python/m5/objects/O3CPU.py: Add ports to the parameters.
|
#
2733:e0eac8fc5774 |
|
16-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Two updates that got combined into one ChangeSet accidentally. They're both pretty simple so they shouldn't cause any trouble.
First: Rename FullCPU and its variants in the o3 directory to O3CPU to differentiate from the old model, and also to specify it's an out of order model.
Second: Include build options for selecting the Checker to be used. These options make sure if the Checker is being used there is a CPU that supports it also being compiled.
SConstruct: Add in option USE_CHECKER to allow for not compiling in checker code. The checker is enabled through this option instead of through the CPU_MODELS list. However it's still necessary to treat the Checker like a CPU model, so it is appended onto the CPU_MODELS list if enabled. configs/test/test.py: Name change for DetailedCPU to DetailedO3CPU. Also include option for max tick. src/base/traceflags.py: Add in O3CPU trace flag. src/cpu/SConscript: Rename AlphaFullCPU to AlphaO3CPU.
Only include checker sources if they're necessary. Also add a list of CPUs that support the Checker, and only allow the Checker to be compiled in if one of those CPUs are also being included. src/cpu/base_dyn_inst.cc: src/cpu/base_dyn_inst.hh: Rename typedef to ImplCPU instead of FullCPU, to differentiate from the old FullCPU. src/cpu/cpu_models.py: src/cpu/o3/alpha_cpu.cc: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_cpu_impl.hh: Rename AlphaFullCPU to AlphaO3CPU to differentiate from old FullCPU model. src/cpu/o3/alpha_dyn_inst.hh: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/alpha_impl.hh: src/cpu/o3/alpha_params.hh: src/cpu/o3/commit.hh: src/cpu/o3/cpu.hh: src/cpu/o3/decode.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/regfile.hh: src/cpu/o3/rename.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/rob.hh: src/cpu/o3/rob_impl.hh: src/cpu/o3/thread_state.hh: src/python/m5/objects/AlphaO3CPU.py: Rename FullCPU to O3CPU to differentiate from old FullCPU model. src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit_impl.hh: Rename FullCPU to O3CPU to differentiate from old FullCPU model. Also #ifdef the checker code so it doesn't need to be included if it's not selected.
|
#
2727:91e17c7ee622 |
|
13-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Minor updates for stats.
src/cpu/o3/commit_impl.hh: src/cpu/o3/fetch.hh: Update stats comments. src/cpu/o3/fetch_impl.hh: Differentiate stats. src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: Update for stats. src/cpu/o3/lsq.hh: LSQ now has stats. src/cpu/o3/lsq_impl.hh: Register stats of all LSQ units. src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: Add in stats.
|
#
2698:d5f35d41e017 |
|
09-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Removing of old code and adding in new comments.
src/cpu/base_dyn_inst.cc: Clean up old functions, comments. src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_params.hh: src/cpu/o3/cpu.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/ozone/lsq_unit.hh: src/cpu/ozone/lsq_unit_impl.hh: Remove old commented code. src/cpu/o3/fetch.hh: Remove old commented code, add in comments. src/cpu/o3/inst_queue_impl.hh: Move comment to better place. src/cpu/o3/lsq_unit.hh: Remove old commented code, add in new comments. src/cpu/o3/lsq_unit_impl.hh: Remove old commented code, rename variable.
|
#
2689:dbf969c18a65 |
|
07-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Update copyright.
|
#
2674:6d4afef73a20 |
|
04-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Merge ktlim@zamp:/z/ktlim2/clean/m5-o3 into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge
src/cpu/checker/o3_cpu_builder.cc: src/cpu/o3/alpha_cpu.hh: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst_impl.hh: src/cpu/o3/bpred_unit.cc: src/cpu/o3/commit.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/thread_state.hh: Hand merge.
|
#
2669:f2b336e89d2a |
|
02-Jun-2006 |
Kevin Lim <ktlim@umich.edu> |
Fixes to get compiling to work. This is mainly fixing up some includes; changing functions within the XCs; changing MemReqPtrs to Requests or Packets where appropriate.
Currently the O3 and Ozone CPUs do not work in the new memory system; I still need to fix up the ports to work and handle responses properly. This check-in is so that the merge between m5 and newmem is no longer outstanding.
src/SConscript: Need to include FU Pool for new CPU model. I'll try to figure out a cleaner way to handle this in the future. src/base/traceflags.py: Include new traces flags, fix up merge mess up. src/cpu/SConscript: Include the base_dyn_inst.cc as one of othe sources. Don't compile the Ozone CPU for now. src/cpu/base.cc: Remove an extra } from the merge. src/cpu/base_dyn_inst.cc: Fixes to make compiling work. Don't instantiate the OzoneCPU for now. src/cpu/base_dyn_inst.hh: src/cpu/o3/2bit_local_pred.cc: src/cpu/o3/alpha_cpu_builder.cc: src/cpu/o3/alpha_cpu_impl.hh: src/cpu/o3/alpha_dyn_inst.hh: src/cpu/o3/alpha_params.hh: src/cpu/o3/bpred_unit.cc: src/cpu/o3/btb.hh: src/cpu/o3/commit.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/free_list.hh: src/cpu/o3/iew.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/regfile.hh: src/cpu/o3/sat_counter.hh: src/cpu/op_class.hh: src/cpu/ozone/cpu.hh: src/cpu/checker/cpu.cc: src/cpu/checker/cpu.hh: src/cpu/checker/exec_context.hh: src/cpu/checker/o3_cpu_builder.cc: src/cpu/ozone/cpu_impl.hh: src/mem/request.hh: src/cpu/o3/fu_pool.hh: src/cpu/o3/lsq.hh: src/cpu/o3/lsq_unit.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/back_end.hh: src/cpu/ozone/dyn_inst.cc: src/cpu/ozone/dyn_inst.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/inorder_back_end.hh: src/cpu/ozone/lw_back_end.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/ozone_impl.hh: src/cpu/ozone/thread_state.hh: Fixes to get compiling to work. src/cpu/o3/alpha_cpu.hh: Fixes to get compiling to work. Float reg accessors have changed, as well as MemReqPtrs to RequestPtrs. src/cpu/o3/alpha_dyn_inst_impl.hh: Fixes to get compiling to work. Pass in the packet to the completeAcc function. Fix up syscall function.
|