#
14297:b4519e586f5e |
|
10-Sep-2019 |
Jordi Vaquero <jordi.vaquero@metempsy.com> |
cpu, mem: Changing AtomicOpFunctor* for unique_ptr<AtomicOpFunctor>
This change is based on modify the way we move the AtomicOpFunctor* through gem5 in order to mantain proper ownership of the object and ensuring its destruction when it is no longer used.
Doing that we fix at the same time a memory leak in Request.hh where we were assigning a new AtomicOpFunctor* without destroying the previous one.
This change creates a new type AtomicOpFunctor_ptr as a std::unique_ptr<AtomicOpFunctor> and move its ownership as needed. Except for its only usage when AtomicOpFunc() is called.
Change-Id: Ic516f9d8217cb1ae1f0a19500e5da0336da9fd4f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20919 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>
|
#
14105:969b4e972b07 |
|
27-Feb-2019 |
Gabor Dozsa <gabor.dozsa@arm.com> |
cpu: Add first-/non-faulting load support to Minor and O3
Some architectures allow masking faults of memory load instructions in some specific circumstances (e.g. first-faulting and non-faulting loads in Arm SVE). This patch adds support for such loads in the Minor and O3 CPU models.
Change-Id: I264a81a078f049127779aa834e89f0e693ba0bea Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19178 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>
|
#
13954:2f400a5f2627 |
|
07-Jul-2017 |
Giacomo Gabrielli <giacomo.gabrielli@arm.com> |
cpu,mem: Add support for partial loads/stores and wide mem. accesses
This changeset adds support for partial (or masked) loads/stores, i.e. loads/stores that can disable accesses to individual bytes within the target address range. In addition, this changeset extends the code to crack memory accesses across most CPU models (TimingSimpleCPU still TBD), so that arbitrarily wide memory accesses are supported. These changes are required for supporting ISAs with wide vectors.
Additional authors: - Gabor Dozsa <gabor.dozsa@arm.com> - Tiago Muck <tiago.muck@arm.com>
Change-Id: Ibad33541c258ad72925c0b1d5abc3e5e8bf92d92 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13518 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
13652:45d94ac03a27 |
|
22-Jan-2018 |
Tuan Ta <qtt2@cornell.edu> |
cpu: support atomic memory request type with AtomicOpFunctor
This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU, MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory system.
Atomic memory instruction is treated as a special store instruction in all CPU models.
In simple CPUs, an AMO request with an associated AtomicOpFunctor is simply sent to L1 dcache.
In MinorCPU, an AMO request bypasses store buffer and waits for any conflicting store request(s) currently in the store buffer to retire before the AMO request is sent to the cache. AMO requests are not buffered in the store buffer, so their effects appear immediately in the cache.
In DerivO3CPU, an AMO request is inserted in the store buffer so that it is delivered to the cache only after all previous stores are issued to the cache. Data forwarding between between an outstanding AMO in the store buffer and a subsequent load is not allowed since the AMO request does not hold valid data until it's executed in the cache.
This implementation assumes that a target ISA implementation must insert enough memory fences as micro-ops around an atomic instruction to enforce a correct order of memory instructions with respect to its memory consistency model. Without extra memory fences, this implementation can allow AMOs and other memory instructions that do not conflict (i.e., not target the same address) to reorder.
This implementation also assumes that atomic instructions execute within a cache line boundary since the cache for now is not able to execute an operation on two different cache lines in one single step. Therefore, ISAs like x86 that require multi-cache-line atomic instructions need to either use a pair of locking load and unlocking store or change the cache implementation to guarantee the atomicity of an atomic instruction.
Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a Reviewed-on: https://gem5-review.googlesource.com/c/8188 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
12749:223c83ed9979 |
|
04-Jun-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
misc: Using smart pointers for memory Requests
This patch is changing the underlying type for RequestPtr from Request* to shared_ptr<Request>. Having memory requests being managed by smart pointers will simplify the code; it will also prevent memory leakage and dangling pointers.
Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10996 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
12748:ae5ce8e42de7 |
|
03-Jun-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
misc: Substitute pointer to Request with aliased RequestPtr
Every usage of Request* in the code has been replaced with the RequestPtr alias. This is a preparing patch for when RequestPtr will be the typdefed to a smart pointer to Request rather then a raw pointer to Request.
Change-Id: I73cbaf2d96ea9313a590cdc731a25662950cd51a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10995 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
|
#
12127:4207df055b0d |
|
28-Jun-2017 |
Sean Wilson <spwilson2@wisc.edu> |
cpu: Refactor some Event subclasses to lambdas
Change-Id: If765c6100d67556f157e4e61aa33c2b7eeb8d2f0 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3923 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
11608:6319a1125f1c |
|
14-Aug-2016 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu, arch: fix the type used for the request flags
Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
11567:560d7fbbddd1 |
|
21-Jul-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add SMT support to MinorCPU
This patch adds SMT support to the MinorCPU. Currently RoundRobin or Random thread scheduling are supported.
Change-Id: I91faf39ff881af5918cca05051829fc6261f20e3
|
#
11341:bda2c39fd9fd |
|
15-Feb-2016 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Add missing overrides to appease clang
Since the last round of fixes a few new issues have snuck in. We should consider switching the regression runs to clang.
|
#
11331:cd5c48db28e6 |
|
10-Feb-2016 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Deduce if cache should forward snoops
This patch changes how the cache determines if snoops should be forwarded from the memory side to the CPU side. Instead of having a parameter, the cache now looks at the port connected on the CPU side, and if it is a snooping port, then snoops are forwarded. Less error prone, and less parameters to worry about.
The patch also tidies up the CPU classes to ensure that their I-side port is not snooping by removing overrides to the snoop request handler, such that snoop requests will panic via the default MasterPort implement
|
#
10713:eddb533708cb |
|
02-Mar-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Split port retry for all different packet classes
This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.
|
#
10581:7c4f1d0a8cff |
|
02-Dec-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: Fix retries on barrier/store in Minor's store buffer
This patch fixes a case where a store in Minor's store buffer never leaves the store buffer as it is pre-maturely counted as having been issued, leading to the store buffer idling.
LSQ::StoreBuffer::numUnissuedAccesses should count the number of accesses either in memory, or still in the store buffer after being completed.
For stores which are also barriers, the store will stay in the store buffer for a cycle after it is completed and will be cleaned up by the barrier clearing code (to ensure that barriers are completed in-order). To acheive this, numUnissuedAccesses is not decremented when a store-barrier is issued to memory, but when its barrier effect is cleared.
Without this patch, the correct behaviour happens when a memory transaction is immediately accepted, but not if it needs a retry.
|
#
10379:c00f6d7e2681 |
|
19-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Pass faults by const reference where possible
This patch changes how faults are passed between methods in an attempt to copy as few reference-counting pointer instances as possible. This should avoid unecessary copies being created, contributing to the increment/decrement of the reference counters.
|
#
10368:a7cb233caa7b |
|
12-Sep-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: Fix memory access in Minor not setting parent Request flags
This patch fixes cases where uncacheable/memory type flags are not set correctly on a memory op which is split in the LSQ. Without this patch, request->request if freely used to check flags where the flags should actually come from the accumulation of request fragment flags.
This patch also fixes a bug where an uncacheable access which passes through tryToSendRequest more than once can increment LSQ::numAccessesInMemorySystem more than once.
|
#
10259:ebb376f73dd2 |
|
23-Jul-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: `Minor' in-order CPU model
This patch contains a new CPU model named `Minor'. Minor models a four stage in-order execution pipeline (fetch lines, decompose into macroops, decompose macroops into microops, execute).
The model was developed to support the ARM ISA but should be fixable to support all the remaining gem5 ISAs. It currently also works for Alpha, and regressions are included for ARM and Alpha (including Linux boot).
Documentation for the model can be found in src/doc/inside-minor.doxygen and its internal operations can be visualised using the Minorview tool utils/minorview.py.
Minor was designed to be fairly simple and not to engage in a lot of instruction annotation. As such, it currently has very few gathered stats and may lack other gem5 features.
Minor is faster than the o3 model. Sample results:
Benchmark | Stat host_seconds (s) ---------------+--------v--------v-------- (on ARM, opt) | simple | o3 | minor | timing | timing | timing ---------------+--------+--------+-------- 10.linux-boot | 169 | 1883 | 1075 10.mcf | 117 | 967 | 491 20.parser | 668 | 6315 | 3146 30.eon | 542 | 3413 | 2414 40.perlbmk | 2339 | 20905 | 11532 50.vortex | 122 | 1094 | 588 60.bzip2 | 2045 | 18061 | 9662 70.twolf | 207 | 2736 | 1036
|