History log of /gem5/src/cpu/minor/lsq.cc
Revision Date Author Comments
# 14297:b4519e586f5e 10-Sep-2019 Jordi Vaquero <jordi.vaquero@metempsy.com>

cpu, mem: Changing AtomicOpFunctor* for unique_ptr<AtomicOpFunctor>

This change is based on modify the way we move the AtomicOpFunctor*
through gem5 in order to mantain proper ownership of the object and
ensuring its destruction when it is no longer used.

Doing that we fix at the same time a memory leak in Request.hh
where we were assigning a new AtomicOpFunctor* without destroying the
previous one.

This change creates a new type AtomicOpFunctor_ptr as a
std::unique_ptr<AtomicOpFunctor> and move its ownership as needed. Except
for its only usage when AtomicOpFunc() is called.

Change-Id: Ic516f9d8217cb1ae1f0a19500e5da0336da9fd4f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20919
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>


# 14105:969b4e972b07 27-Feb-2019 Gabor Dozsa <gabor.dozsa@arm.com>

cpu: Add first-/non-faulting load support to Minor and O3

Some architectures allow masking faults of memory load instructions in
some specific circumstances (e.g. first-faulting and non-faulting
loads in Arm SVE). This patch adds support for such loads in the Minor
and O3 CPU models.

Change-Id: I264a81a078f049127779aa834e89f0e693ba0bea
Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19178
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>


# 13954:2f400a5f2627 07-Jul-2017 Giacomo Gabrielli <giacomo.gabrielli@arm.com>

cpu,mem: Add support for partial loads/stores and wide mem. accesses

This changeset adds support for partial (or masked) loads/stores, i.e.
loads/stores that can disable accesses to individual bytes within the
target address range. In addition, this changeset extends the code to
crack memory accesses across most CPU models (TimingSimpleCPU still
TBD), so that arbitrarily wide memory accesses are supported. These
changes are required for supporting ISAs with wide vectors.

Additional authors:
- Gabor Dozsa <gabor.dozsa@arm.com>
- Tiago Muck <tiago.muck@arm.com>

Change-Id: Ibad33541c258ad72925c0b1d5abc3e5e8bf92d92
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13518
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 13652:45d94ac03a27 22-Jan-2018 Tuan Ta <qtt2@cornell.edu>

cpu: support atomic memory request type with AtomicOpFunctor

This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU,
MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory
system.

Atomic memory instruction is treated as a special store instruction in
all CPU models.

In simple CPUs, an AMO request with an associated AtomicOpFunctor is
simply sent to L1 dcache.

In MinorCPU, an AMO request bypasses store buffer and waits for any
conflicting store request(s) currently in the store buffer to retire
before the AMO request is sent to the cache. AMO requests are not buffered
in the store buffer, so their effects appear immediately in the cache.

In DerivO3CPU, an AMO request is inserted in the store buffer so that it
is delivered to the cache only after all previous stores are issued to
the cache. Data forwarding between between an outstanding AMO in the
store buffer and a subsequent load is not allowed since the AMO request
does not hold valid data until it's executed in the cache.

This implementation assumes that a target ISA implementation must insert
enough memory fences as micro-ops around an atomic instruction to
enforce a correct order of memory instructions with respect to its
memory consistency model. Without extra memory fences, this implementation
can allow AMOs and other memory instructions that do not conflict
(i.e., not target the same address) to reorder.

This implementation also assumes that atomic instructions execute within
a cache line boundary since the cache for now is not able to execute an
operation on two different cache lines in one single step. Therefore,
ISAs like x86 that require multi-cache-line atomic instructions need to
either use a pair of locking load and unlocking store or change the
cache implementation to guarantee the atomicity of an atomic
instruction.

Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a
Reviewed-on: https://gem5-review.googlesource.com/c/8188
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 13449:2f7efa89c58b 26-Nov-2018 Gabe Black <gabeblack@google.com>

arch, base, cpu, gpu, mem: Replace assert(0 or false with panic.

Neither assert(0) nor assert(false) give any hint as to why control
getting to them is bad, and their more descriptive versions,
assert(0 && "description") and assert(false && "description"), jury
rig assert to add an error message when the utility function panic()
already does that directly with better formatting options.

This change replaces that flavor of call to assert with panic, except
in the actual code which processes the formatting that panic uses (to
avoid infinitely recurring error handling), and in some *.sm files
since I don't know what rules those have to follow and don't want to
accidentaly break them.

Change-Id: I8addfbfaf77eaed94ec8191f2ae4efb477cefdd0
Reviewed-on: https://gem5-review.googlesource.com/c/14636
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 12749:223c83ed9979 04-Jun-2018 Giacomo Travaglini <giacomo.travaglini@arm.com>

misc: Using smart pointers for memory Requests

This patch is changing the underlying type for RequestPtr from Request*
to shared_ptr<Request>. Having memory requests being managed by smart
pointers will simplify the code; it will also prevent memory leakage and
dangling pointers.

Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10996
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 12748:ae5ce8e42de7 03-Jun-2018 Giacomo Travaglini <giacomo.travaglini@arm.com>

misc: Substitute pointer to Request with aliased RequestPtr

Every usage of Request* in the code has been replaced with the
RequestPtr alias. This is a preparing patch for when RequestPtr will be
the typdefed to a smart pointer to Request rather then a raw pointer to
Request.

Change-Id: I73cbaf2d96ea9313a590cdc731a25662950cd51a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10995
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>


# 12355:568ec3a0c614 07-Feb-2017 Nikos Nikoleris <nikos.nikoleris@arm.com>

cpu: Add support for CMOs in the cpu models

Cache maintenance operations go through the write channel of the
cpu. This changes makes sure that the cpu does not try to fill in the
packet with data.

Change-Id: Ic83205bb1cda7967636d88f15adcb475eb38d158
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/5055
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>


# 12179:432a44667130 01-Sep-2017 Pau Cabre <pau.cabre@metempsy.com>

cpu-minor: Fix for addr range coverage calculation

Coverage was wrongly set to PartialAddrRangeCoverage in the case of
disjoint adjacent ranges

Change-Id: I29aaf5145e6cdcf5f0b8f4e009d57ee57bd4c944
Signed-off-by: Pau Cabre <pau.cabre@metempsy.com>
Reviewed-on: https://gem5-review.googlesource.com/4640
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>


# 12127:4207df055b0d 28-Jun-2017 Sean Wilson <spwilson2@wisc.edu>

cpu: Refactor some Event subclasses to lambdas

Change-Id: If765c6100d67556f157e4e61aa33c2b7eeb8d2f0
Signed-off-by: Sean Wilson <spwilson2@wisc.edu>
Reviewed-on: https://gem5-review.googlesource.com/3923
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 11793:ef606668d247 09-Nov-2016 Brandon Potter <brandon.potter@amd.com>

style: [patch 1/22] use /r/3648/ to reorganize includes


# 11608:6319a1125f1c 14-Aug-2016 Nikos Nikoleris <nikos.nikoleris@arm.com>

cpu, arch: fix the type used for the request flags

Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>


# 11567:560d7fbbddd1 21-Jul-2016 Mitch Hayenga <mitch.hayenga@arm.com>

cpu: Add SMT support to MinorCPU

This patch adds SMT support to the MinorCPU. Currently
RoundRobin or Random thread scheduling are supported.

Change-Id: I91faf39ff881af5918cca05051829fc6261f20e3


# 11435:0f1b46dde3fa 07-Apr-2016 Mitch Hayenga <mitch.hayenga@arm.com>

mem: Remove threadId from memory request class

In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.

This is a re-spin of 20264eb after the revert (bd1c6789) and includes
some fixes of that commit.


# 11429:cf5af0cc3be4 06-Apr-2016 Andreas Sandberg <andreas.sandberg@arm.com>

Revert power patch sets with unexpected interactions

The following patches had unexpected interactions with the current
upstream code and have been reverted for now:

e07fd01651f3: power: Add support for power models
831c7f2f9e39: power: Low-power idle power state for idle CPUs
4f749e00b667: power: Add power states to ClockedObject

Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>


# 11428:20264eb69fbf 05-Apr-2016 Mitch Hayenga <mitch.hayenga@arm.com>

mem: Remove threadId from memory request class

In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.


# 11356:a80884911971 19-Jul-2015 Krishnendra Nathella <krinat01@arm.com>

cpu: Fix LLSC atomic CPU wakeup

Writes to locked memory addresses (LLSC) did not wake up the locking
CPU. This can lead to deadlocks on multi-core runs. In AtomicSimpleCPU,
recvAtomicSnoop was checking if the incoming packet was an invalidation
(isInvalidate) and only then handled a locked snoop. But, writes are
seen instead of invalidates when running without caches (fast-forward
configurations). As as simple fix, now handleLockedSnoop is also called
even if the incoming snoop packet are from writes.


# 11148:1bc3d93c7eaa 30-Sep-2015 Mitch Hayenga <mitch.hayenga@arm.com>

cpu: Add per-thread monitors

Adds per-thread address monitors to support FullSystem SMT.


# 11056:842f56345a42 21-Aug-2015 Andreas Hansson <andreas.hansson@arm.com>

mem: Reflect that packet address and size are always valid

This patch simplifies the packet, and removes the possibility of
creating a packet without a valid address and/or size. Under no
circumstances are these fields set at a later point, and thus they
really have to be provided at construction time.

The patch also fixes a case there the MinorCPU creates a packet
without a valid address and size, only to later delete it.


# 10824:308771bd2647 05-May-2015 Andreas Sandberg <Andreas.Sandberg@ARM.com>

mem, cpu: Add a separate flag for strictly ordered memory

The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.

This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:

* Speculation: CPU models are not allowed to issue speculative
loads.

* Write combining: CPU models and caches are not allowed to merge
writes to the same cache line.

Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.


# 10739:4cfe55719da5 11-Feb-2015 Steve Reinhardt <steve.reinhardt@amd.com>

mem: restructure Packet cmd initialization a bit more

Refactor the way that specific MemCmd values are generated for packets.
The new approach is a little more elegant in that we assign the right
value up front, and it's also more amenable to non-heap-allocated
Packet objects.

Also replaced the code in the Minor model that was still doing it the
ad-hoc way.

This is basically a refinement of http://repo.gem5.org/gem5/rev/711eb0e64249.


# 10713:eddb533708cb 02-Mar-2015 Andreas Hansson <andreas.hansson@arm.com>

mem: Split port retry for all different packet classes

This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.

The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.

The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.


# 10665:aef704eaedd2 25-Jan-2015 Ali Saidi <Ali.Saidi@ARM.com>

sim: Clean up InstRecord

Track memory size and flags as well as add some comments and consts.


# 10647:899c0e7e85f1 20-Jan-2015 Andreas Hansson <andreas.hansson@arm.com>

cpu: Fix retry bug in MinorCPU LSQ


# 10634:c5a2c5ef6e68 03-Jan-2015 Andrew Lukefahr <lukefahr@umich.edu>

minor: fixed LSQ MasterPortID

Minor was reporting the data cache access as ".inst" accesses.
This just switches the MasterPortID to dataMasterPortId.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>


# 10581:7c4f1d0a8cff 02-Dec-2014 Andrew Bardsley <Andrew.Bardsley@arm.com>

cpu: Fix retries on barrier/store in Minor's store buffer

This patch fixes a case where a store in Minor's store buffer never
leaves the store buffer as it is pre-maturely counted as having been
issued, leading to the store buffer idling.

LSQ::StoreBuffer::numUnissuedAccesses should count the number of accesses
either in memory, or still in the store buffer after being completed.

For stores which are also barriers, the store will stay in the store
buffer for a cycle after it is completed and will be cleaned up by the
barrier clearing code (to ensure that barriers are completed in-order).
To acheive this, numUnissuedAccesses is not decremented when a store-barrier
is issued to memory, but when its barrier effect is cleared.

Without this patch, the correct behaviour happens when a memory transaction
is immediately accepted, but not if it needs a retry.


# 10566:c99c8d2a7c31 02-Dec-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Assume all dynamic packet data is array allocated

This patch simplifies how we deal with dynamically allocated data in
the packet, always assuming that it is array allocated, and hence
should be array deallocated (delete[] as opposed to delete). The only
uses of dataDynamic was in the Ruby testers.

The ARRAY_DATA flag in the packet is removed accordingly. No
defragmentation of the flags is done at this point, leaving a gap in
the bit masks.

As the last part the patch, it renames dataDynamicArray to dataDynamic.


# 10563:755b18321206 02-Dec-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Add const getters for write packet data

This patch takes a first step in tightening up how we use the data
pointer in write packets. A const getter is added for the pointer
itself (getConstPtr), and a number of member functions are also made
const accordingly. In a range of places throughout the memory system
the new member is used.

The patch also removes the unused isReadWrite function.


# 10504:58d5d471b598 30-Oct-2014 Andrew Bardsley <Andrew.Bardsley@arm.com>

cpu: Fix barrier push to store buffer when full bug in Minor

This patch fixes a bug where a completing load or store which is also a
barrier can push a barrier into the store buffer without first checking
that there is a free slot.

The bug was not fatal but would print a warning that the store buffer
was full when inserting.


# 10379:c00f6d7e2681 19-Sep-2014 Andreas Hansson <andreas.hansson@arm.com>

arch: Pass faults by const reference where possible

This patch changes how faults are passed between methods in an attempt
to copy as few reference-counting pointer instances as possible. This
should avoid unecessary copies being created, contributing to the
increment/decrement of the reference counters.


# 10368:a7cb233caa7b 12-Sep-2014 Andrew Bardsley <Andrew.Bardsley@arm.com>

cpu: Fix memory access in Minor not setting parent Request flags

This patch fixes cases where uncacheable/memory type flags are not set
correctly on a memory op which is split in the LSQ. Without this
patch, request->request if freely used to check flags where the flags
should actually come from the accumulation of request fragment flags.

This patch also fixes a bug where an uncacheable access which passes
through tryToSendRequest more than once can increment
LSQ::numAccessesInMemorySystem more than once.


# 10259:ebb376f73dd2 23-Jul-2014 Andrew Bardsley <Andrew.Bardsley@arm.com>

cpu: `Minor' in-order CPU model

This patch contains a new CPU model named `Minor'. Minor models a four
stage in-order execution pipeline (fetch lines, decompose into
macroops, decompose macroops into microops, execute).

The model was developed to support the ARM ISA but should be fixable
to support all the remaining gem5 ISAs. It currently also works for
Alpha, and regressions are included for ARM and Alpha (including Linux
boot).

Documentation for the model can be found in src/doc/inside-minor.doxygen and
its internal operations can be visualised using the Minorview tool
utils/minorview.py.

Minor was designed to be fairly simple and not to engage in a lot of
instruction annotation. As such, it currently has very few gathered
stats and may lack other gem5 features.

Minor is faster than the o3 model. Sample results:

Benchmark | Stat host_seconds (s)
---------------+--------v--------v--------
(on ARM, opt) | simple | o3 | minor
| timing | timing | timing
---------------+--------+--------+--------
10.linux-boot | 169 | 1883 | 1075
10.mcf | 117 | 967 | 491
20.parser | 668 | 6315 | 3146
30.eon | 542 | 3413 | 2414
40.perlbmk | 2339 | 20905 | 11532
50.vortex | 122 | 1094 | 588
60.bzip2 | 2045 | 18061 | 9662
70.twolf | 207 | 2736 | 1036