History log of /gem5/src/mem/dram_ctrl.cc
Revision Date Author Comments
# 14038:8ba13d8b7810 01-May-2019 Matthew Poremba <matthew.poremba@amd.com>

mem: Option to toggle DRAM low-power states

Adding an option to enable DRAM low-power states. The low power
states can have a significant impact on application performance
(sim_ticks) on the order of 2-3x, especially for compute-gpu apps.
The options allows for it to easily be enabled/disabled to compare
performance numbers. The option is disabled by default.

Change-Id: Ib9bddbb792a1a6a4afb5339003472ff8f00a5859
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18548
Reviewed-by: Wendy Elsasser <wendy.elsasser@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>


# 13892:0182a0601f66 22-Apr-2019 Gabe Black <gabeblack@google.com>

mem: Minimize the use of MemObject.

MemObject doesn't provide anything beyond its base ClockedObject any
more, so this change removes it from most inheritance hierarchies.
Occasionally MemObject is replaced with SimObject when I was fairly
confident that the extra functionality of ClockedObject wasn't needed.

Change-Id: Ic014ab61e56402e62548e8c831eb16e26523fdce
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18289
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>


# 13857:9255d7412a58 12-Feb-2019 Daniel R. Carvalho <odanrc@yahoo.com.br>

mem: Make DRAMCtrl::decodeAddr const

DRAMCtrl's decodeAddr does not need to modify the packet it
receives, nor should it modify the contents of the class,
and therefore both the packet and the function are made const.

Change-Id: I577f48d9a43611ba54878a9a793cb7b4fbb326f4
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17540
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 13834:1a7c647cbeac 04-Apr-2019 Jason Lowe-Power <jason@lowepower.com>

mem: Reverse order of write/read mem queue check

For atomic RMW instructions that go directly to memory, we want to put
them on the write queue instead of the read queue. Swap the if/else
condition to accomplish this.

Note: This is ignoring the read latency of the RMW, but these
instructions should usually be handled in caches anyway.

Change-Id: I62dbfff3a16ac470f1ebdb489abe878962b20bb6
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17828
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 13784:1941dc118243 07-Mar-2019 Gabe Black <gabeblack@google.com>

arch, cpu, dev, gpu, mem, sim, python: start using getPort.

Replace the getMasterPort, getSlavePort, and getEthPort functions
with getPort, and remove extraneous mechanisms that are no longer
necessary.

Change-Id: Iab7e3c02d2f3a0cf33e7e824e18c28646b5bc318
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17040
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>


# 13564:9bbd53a77887 27-Nov-2018 Nikos Nikoleris <nikos.nikoleris@arm.com>

mem: Determine if a packet queue forces ordering at construction

A packet queue is typically used to hold on to packets that are
schedules to be sent in the future or when they need to queue behind
younger packets that have been sent out yet. Due to memory order
requirements, some MemObjects need to maintain the order for packet
(mostly responses) that reference the same cache block.

Prior to this patch the ordering requirements where determined when
the packet was scheduled to be sent. This patch moves the parameter to
the constructor.

Change-Id: Ieb4d94e86bc7514f5036b313ec23ea47dd653164
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15555
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 12969:52de9d619ce6 04-Aug-2017 Matteo Andreozzi <Matteo.Andreozzi@arm.com>

mem: Make DRAMCtrl a QoS-aware Memory Controller

This patch is turning DRAMCtrl a QoS-aware Memory Controller with "no
policy" as a default policy.

Change-Id: I48163da8c8208498cf0398b07094cb840272507f
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/11973
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 12823:ba630bc7a36d 19-Jul-2018 Robert Kovacsics <rmk35@cl.cam.ac.uk>

mem: Rename Packet::checkFunctional to trySatisfyFunctional

Packet::checkFunctional also wrote data to/from the packet depending
on if it was read/write, respectively, which the 'check' in the name
would suggest otherwise. This renames it to doFunctional, which is
more suggestive. It also renames any function called checkFunctional
which calls Packet::checkFunctional. These are

- Bridge::BridgeMasterPort::checkFunctional
- calls Packet::checkFunctional
- MSHR::checkFunctional
- calls Packet::checkFunctional
- MSHR::TargetList::checkFunctional
- calls Packet::checkFunctional
- Queue<>::checkFunctional
(of src/mem/cache/queue.hh, not src/cpu/minor/buffers.h)
- Instantiated with Queue<WriteQueueEntry> and Queue<MSHR>
- WriteQueueEntry
- calls Packet::checkFunctional
- WriteQueueEntry::TargetList
- calls Packet::checkFunctional
- MemDelay::checkFunctional
- calls QueuedSlavePort/QueuedMasterPort::checkFunctional
- Packet::checkFunctional
- PacketQueue::checkFunctional
- calls Packet::checkFunctional
- QueuedSlavePort::checkFunctional
- calls PacketQueue::doFunctional
- QueuedMasterPort::checkFunctional
- calls PacketQueue::doFunctional
- SerialLink::SerialLinkMasterPort::checkFunctional
- calls Packet::doFunctional

Change-Id: Ieca2579c020c329040da053ba8e25820801b62c5
Reviewed-on: https://gem5-review.googlesource.com/11810
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 12706:456304051464 28-Mar-2017 Wendy Elsasser <wendy.elsasser@arm.com>

mem: Add support for more flexible DRAM timing and topologies

This patch has 2 main aspects:
1) Add new parameter to adjust write-to-write delay
2) Enable support of more than 64 banks per controller

Changes for new parameter:
Incorporated a new parameter, tCCD_L_WR, which defaults to tCCD_L.
This parameter can be used to set a unique delay between writes and
between reads.

To incorporate this parameter in the controller, modified the DRAMCtrl
class to have separate variables for read and write column delays.
Used these variables to account for tRTW, tWTR, tBURST, tCCD_L, and tCS
as well as the new tCCD_L_WR parameter.

Changes to support more than 64 banks:
Modified the logic selecting the next command (reorderQueue
and minBankPrep functions). Replaced the unint64_t variables with
a vector of uint32_t elements. There is a uint32_t element defined
per ranks to allow up to 32 banks per rank. This will automatically
scale with ranks without issue.
Change will allow analysis of memory sub-systems beyond the current
landscape.

Change-Id: I0ce466efed58276f843ad90e9ecc0ece6c37d646
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10103
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 12705:9668a82ead4b 06-Apr-2017 Wendy Elsasser <wendy.elsasser@arm.com>

mem: Optimize self-refresh entry

Self-refresh is entered during a refresh event, when the
rank was previously in a precharge power-down state.
The original code would enter self-refresh after a refresh
was issued. The device subsequently will issue a refresh
on self-refresh entry. On self-refresh exit, the controller
will issue another refresh command.

Devices require at least one additional refresh to be issued
between self-refresh exit and re-entry. This ensures that enough
refreshes occur in the case when the device narrowly missed a
refresh on self-refresh exit.

To minimize the number of refresh operations and still maintain
the device requirement, the current logic does the following:
1) The controller will still enter self-refresh from a refresh
event, when the previous state was precharge power-down.
However, the refresh itself will be bypassed and the controller
will immediately issue a self-refresh entry.
2) On a self-refresh exit, the controller will immediately
issue a refresh command (per the original logic). This ensures
the devices requirements are met and is a convenient way to
kick off the command state machine.

Change-Id: I1c4b0dcbfa3bdafd755f3ccd65e267fcd700c491
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10102
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 12637:bfc3cb9c7e6c 30-Mar-2018 Daniel R. Carvalho <odanrc@yahoo.com.br>

mem: Remove unused 'using namespace'

Removal of unused/barely used 'using namespace' from C++ files.

Change-Id: I66dc548c04506db2e41180b9ea7ab5abd7d5375a
Reviewed-on: https://gem5-review.googlesource.com/9601
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 12266:63b8da9eeca4 19-Dec-2016 Radhika Jagtap <radhika.jagtap@arm.com>

ext, mem: Pull DRAMPower SHA 90d6290 and rebase

This patch syncs the DRAMPower library of gem5 to the
external github (https://github.com/ravenrd/DRAMPower).

The version pulled in is the commit:
90d6290f802c29b3de9e10233ceee22290907ce6
from 30th Oct. 2016.

This change also modifies the DRAM Ctrl interaction with the
DRAMPower, due to changes in the lib API in the above version.

Previously multiple functions were called to prepare the power
lib before calling the function that would calculate the enery. With
the new API, these functions are encompassed inside the function to
calculate the energy and therefore should now be removed from the
DRAM controller.

The other key difference is the introduction of a new function called
calcWindowEnergy which can be useful for any system that wants
to do measurements over intervals. For gem5 DRAM ctrl that means we
now need to accumulate the window energy measurements into the total
stat.

Change-Id: I3570fff2805962e166ff2a1a3217ebf2d5a197fb
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/5724
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>


# 12084:5a3769ff3d55 07-Jun-2017 Sean Wilson <spwilson2@wisc.edu>

mem: Replace EventWrapper use with EventFunctionWrapper

NOTE: With this change there is a possibility for `DRAMCtrl::Rank`s
event names to not properly match the rank they were generated by. This
could occur if the public rank member is modified after the Rank's
construction. A patch would mean refactoring Rank and `DRAMCtrl`b to
privatize many of the members of Rank behind getters.

Change-Id: I7b8bd15086f4ffdfd3f40be4aeddac5e786fd78e
Signed-off-by: Sean Wilson <spwilson2@wisc.edu>
Reviewed-on: https://gem5-review.googlesource.com/3745
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 12081:cb5fe81fd522 12-Jun-2017 Sean Wilson <spwilson2@wisc.edu>

mem: Move the Rank construction logic to the Rank constructor

This change was made so Rank objects have their name assigned
when they are instantiated. Therefore, they can initialize their
member objects with their name and it is less likely to change during
runtime.

(NOTE: I would recommend hiding the fields which would cause the name to
change behind getters. Since modification of `Rank.rank` during runtime
will cause the `name()` to change.)

Change-Id: Id51c3553b40e489792c57950e18b8ce927e43173
Signed-off-by: Sean Wilson <spwilson2@wisc.edu>
Reviewed-on: https://gem5-review.googlesource.com/3742
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 11846:b9436a4bbbb9 15-Feb-2017 Wendy Elsasser <wendy.elsasser@arm.com>

mem: fix assertion in respondEvent

Assertion in the respondEvent erroneously fired.
The assertion verifies that the controller has not moved to a low-power
state prior to receiving read data from the memory.
The original assertion triggered if the state was not:
PWR_IDLE or PWR_ACT.

In the case that failed, a periodic refresh event occurred around the
read. The REF is stalled until the final read burst is issued
and the subsequent PRE closes the bank. While the PRE will temporarily
move the state to PWR_IDLE, state will immediately transition to PWR_REF
due to the pending refresh operation. This state does not match the
assertion, which is subsequently triggered.

Fixed the assertion by explicitly checking that the state is not a low
power state
!PWR_SREF && !PWR_PRE_PDN && !PWR_ACT_PDN


Change-Id: I82921a733bbeac2bcb5a487c2f981448d41ed50b
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>


# 11793:ef606668d247 09-Nov-2016 Brandon Potter <brandon.potter@amd.com>

style: [patch 1/22] use /r/3648/ to reorganize includes


# 11678:8c6991a00515 13-Oct-2016 Wendy Elsasser <wendy.elsasser@arm.com>

mem: Add DRAM low-power functionality

Added power-down state transitions to the DRAM controller model.

Added per rank parameter, outstandingEvents, which tracks the number
of outstanding command events and is used to determine when the
controller should transition to a low power state.
The controller will only transition when there are no outstanding events
scheduled and the number of command entries for the given rank is 0.

The outstandingEvents parameter is incremented for every RD/WR burst,
PRE, and REF event scheduled. ACT is implicitly covered by RD/WR
since burst will always issue and complete after a required ACT.
The parameter is decremented when the event is serviced (completed).

The controller will automatically transition to ACT power down,
PRE power down, or SREF.

Transition to ACT power down state scheduled from:
1) The RespondEvent, where read data is received from the memory.
ACT power-down entry will be scheduled when one or more banks is
open, all commands for the rank have completed (no more commands
scheduled), and there are no commands in queue for the rank

Transition to PRE power down scheduled from:
1) respondEvent, when all banks are closed, all commands have
completed, and there are no commands in queue for the rank
2) prechargeEvent when all banks are closed, all commands have
completed, and there are no commands in queue for the rank
3) refreshEvent, after the refresh is complete when the previous
state was ACT power-down
4) refreshEvent, after the refresh is complete when the previous
state was PRE power-down and there are commands in the queue.

Transition to SREF will be scheduled from:
1) refreshEvent, after the refresh is completes when the previous
state was PRE power-down with no commands in queue

Power-down exit commands are scheduled from:
1) The refreshEvent, prior to issuing a refresh
2) doDRAMAccess, to wake-up the rank for RD/WR command issue.

Self-refresh exit commands are scheduled from:
1) The next request event, when the queue has commands for the rank
in the readQueue or there are commands for the rank in the
writeQueue and the bus state is WRITE.

Change-Id: I6103f660776e36c686655e71d92ec7b5b752050a
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>


# 11677:beaf1afe2f83 13-Oct-2016 Wendy Elsasser <wendy.elsasser@arm.com>

mem: Add callback to compute stats prior to dump event

The per rank statistics are periodically updated based on
state transition and refresh events.

Add a method to update these when a dump event occurs to
ensure they reflect accurate values.
Specifically, need to ensure that the low-power state
durations, power, and energy are logged correctly.

Change-Id: Ib642a6668340de8f494a608bb34982e58ba7f1eb
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>


# 11676:8a882e297eb2 13-Oct-2016 Wendy Elsasser <wendy.elsasser@arm.com>

mem: Modify drain to ensure banks and power are idled

Add constraint that all ranks have to be in PWR_IDLE
before signaling drain complete

This will ensure that the banks are all closed and the rank
has exited any low-power states.

On suspend, update the power stats to sync the DRAM power logic

The logic maintains the location of the signalDrainDone
method, which is still triggered from either:
1) Read response event
2) Next request event

This ensures that the drain will complete in the READ bus
state and minimizes the changes required.

Change-Id: If1476e631ea7d5999fe50a0c9379c5967a90e3d1
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>


# 11675:60d18201148d 13-Oct-2016 Wendy Elsasser <wendy.elsasser@arm.com>

mem: Sort memory commands and update DRAMPower

Add local variable to stores commands to be issued.
These commands are in order within a single bank but will be out
of order across banks & ranks.

A new procedure, flushCmdList, sorts commands across banks / ranks,
and flushes the sorted list, up to curTick() to DRAMPower.
This is currently called in refresh, once all previous commands are
guaranteed to have completed. Could be called in other events like
the powerEvent as well.

By only flushing commands up to curTick(), will not get out of sync
when flushed at a periodic stats dump (done in subsequent patch).

Change-Id: I4ac65a52407f64270db1e16a1fb04cfe7f638851
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>


# 11673:9f3ccf96bb5a 13-Oct-2016 Omar Naji <Omar.Naji@arm.com>

mem: add DRAM powerdown timing


# 11334:9bd2e84abdca 10-Feb-2016 Andreas Hansson <andreas.hansson@arm.com>

mem: Move the point of coherency to the coherent crossbar

This patch introduces the ability of making the coherent crossbar the
point of coherency. If so, the crossbar does not forward packets where
a cache with ownership has already committed to responding, and also
does not forward any coherency-related packets that are not intended
for a downstream memory controller. Thus, invalidations and upgrades
are turned around in the crossbar, and the memory controller only sees
normal reads and writes.

In addition this patch moves the express snoop promotion of a packet
to the crossbar, thus allowing the downstream cache to check the
express snoop flag (as it should) for bypassing any blocking, rather
than relying on whether a cache is responding or not.


# 11321:02e930db812d 06-Feb-2016 Steve Reinhardt <steve.reinhardt@amd.com>

style: fix missing spaces in control statements

Result of running 'hg m5style --skip-all --fix-control -a'.


# 11284:b3926db25371 31-Dec-2015 Andreas Hansson <andreas.hansson@arm.com>

mem: Make cache terminology easier to understand

This patch changes the name of a bunch of packet flags and MSHR member
functions and variables to make the coherency protocol easier to
understand. In addition the patch adds and updates lots of
descriptions, explicitly spelling out assumptions.

The following name changes are made:

* the packet memInhibit flag is renamed to cacheResponding

* the packet sharedAsserted flag is renamed to hasSharers

* the packet NeedsExclusive attribute is renamed to NeedsWritable

* the packet isSupplyExclusive is renamed responderHadWritable

* the MSHR pendingDirty is renamed to pendingModified

The cache states, Modified, Owned, Exclusive, Shared are also called
out in the cache and MSHR code to make it easier to understand.


# 11194:c3ba89c653a9 06-Nov-2015 Ali Jafri <ali.jafri@arm.com>

mem: Enforce insertion order on the cache response path

This patch enforces insertion order transmission of packets on the
response path in the cache. Note that the logic to enforce order is
already present in the packet queue, this patch simply turns it on for
queues in the response path.

Without this patch, there are corner cases where a request-response is
faster than a response-response forwarded through the cache. This
violation of queuing order causes problems in the snoop filter leaving
it with inaccurate information. This causes assert failures in the
snoop filter later on.

A follow on patch relaxes the order enforcement in the packet queue to
limit the performance impact.


# 11192:4c28abcf8249 06-Nov-2015 Andreas Hansson <andreas.hansson@arm.com>

mem: Align rules for sinking inhibited packets at the slave

This patch aligns how the memory-system slaves, i.e. the various
memory controllers and the bridge, identify and deal with sinking of
inhibited packets that are only useful within the coherent part of the
memory system.

In the future we could shift the onus to the crossbar, and add a
parameter "is_point_of_coherence" that would allow it to sink the
aforementioned packets.


# 11190:0964165d1857 06-Nov-2015 Andreas Hansson <andreas.hansson@arm.com>

mem: Unify delayed packet deletion

This patch unifies how we deal with delayed packet deletion, where the
receiving slave is responsible for deleting the packet, but the
sending agent (e.g. a cache) is still relying on the pointer until the
call to sendTimingReq completes. Previously we used a mix of a
deletion vector and a construct using unique_ptr. With this patch we
ensure all slaves use the latter approach.


# 11189:4237221d3e31 06-Nov-2015 Andreas Hansson <andreas.hansson@arm.com>

misc: Appease clang static analyzer

A few minor fixes to issues identified by the clang static analyzer.


# 10913:38dbdeea7f1f 07-Jul-2015 Andreas Sandberg <andreas.sandberg@arm.com>

sim: Refactor and simplify the drain API

The drain() call currently passes around a DrainManager pointer, which
is now completely pointless since there is only ever one global
DrainManager in the system. It also contains vestiges from the time
when SimObjects had to keep track of their child objects that needed
draining.

This changeset moves all of the DrainState handling to the Drainable
base class and changes the drain() and drainResume() calls to reflect
this. Particularly, the drain() call has been updated to take no
parameters (the DrainManager argument isn't needed) and return a
DrainState instead of an unsigned integer (there is no point returning
anything other than 0 or 1 any more). Drainable objects should return
either DrainState::Draining (equivalent to returning 1 in the old
system) if they need more time to drain or DrainState::Drained
(equivalent to returning 0 in the old system) if they are already in a
consistent state. Returning DrainState::Running is considered an
error.

Drain done signalling is now done through the signalDrainDone() method
in the Drainable class instead of using the DrainManager directly. The
new call checks if the state of the object is DrainState::Draining
before notifying the drain manager. This means that it is safe to call
signalDrainDone() without first checking if the simulator has
requested draining. The intention here is to reduce the code needed to
implement draining in simple objects.


# 10912:b99a6662d7c2 07-Jul-2015 Andreas Sandberg <andreas.sandberg@arm.com>

sim: Decouple draining from the SimObject hierarchy

Draining is currently done by traversing the SimObject graph and
calling drain()/drainResume() on the SimObjects. This is not ideal
when non-SimObjects (e.g., ports) need draining since this means that
SimObjects owning those objects need to be aware of this.

This changeset moves the responsibility for finding objects that need
draining from SimObjects and the Python-side of the simulator to the
DrainManager. The DrainManager now maintains a set of all objects that
need draining. To reduce the overhead in classes owning non-SimObjects
that need draining, objects inheriting from Drainable now
automatically register with the DrainManager. If such an object is
destroyed, it is automatically unregistered. This means that drain()
and drainResume() should never be called directly on a Drainable
object.

While implementing the new functionality, the DrainManager has now
been made thread safe. In practice, this means that it takes a lock
whenever it manipulates the set of Drainable objects since SimObjects
in different threads may create Drainable objects
dynamically. Similarly, the drain counter is now an atomic_uint, which
ensures that it is manipulated correctly when objects signal that they
are done draining.

A nice side effect of these changes is that it makes the drain state
changes stricter, which the simulation scripts can exploit to avoid
redundant drains.


# 10910:32f3d1c454ec 07-Jul-2015 Andreas Sandberg <andreas.sandberg@arm.com>

sim: Make the drain state a global typed enum

The drain state enum is currently a part of the Drainable
interface. The same state machine will be used by the DrainManager to
identify the global state of the simulator. Make the drain state a
global typed enum to better cater for this usage scenario.


# 10890:bac38d2a4acb 03-Jul-2015 Wendy Elsasser <wendy.elsasser@arm.com>

mem: Update DRAM command scheduler for bank groups

This patch updates the command arbitration so that bank group timing
as well as rank-to-rank delays will be taken into account. The
resulting arbitration no longer selects commands (prepped or not) that
cannot issue seamlessly if there are commands that can issue
back-to-back, minimizing the effect of rank-to-rank (tCS) & same bank
group (tCCD_L) delays.

The arbitration selects a new command based on the following priority.
Within each priority band, the arbitration will use FCFS to select the
appropriate command:

1) Bank is prepped and burst can issue seamlessly, without a bubble

2) Bank is not prepped, but can prep and issue seamlessly, without a
bubble

3) Bank is prepped but burst cannot issue seamlessly. In this case, a
bubble will occur on the bus

Thus, to enable more parallelism in subsequent selections, an
unprepped packet is given higher priority if the bank prep can be
hidden. If the bank prep cannot be hidden, the selection logic will
choose a prepped packet that cannot issue seamlessly if one exist.
Otherwise, the default selection will choose the packet with the
minimum bank prep delay.


# 10889:c4c13fced000 03-Jul-2015 Andreas Hansson <andreas.hansson@arm.com>

mem: Avoid DRAM write queue iteration for merging and read lookup

This patch adds a simple lookup structure to avoid iterating over the
write queue to find read matches, and for the merging of write
bursts. Instead of relying on iteration we simply store a set of
currently-buffered write-burst addresses and compare against
these. For the reads we still perform the iteration if we have a
match. For the writes, we rely entirely on the set. Note that there
are corner-cases where sub-bursts would actually not be mergeable
without a read-modify-write. We ignore these cases and opt for speed.


# 10883:9294c4a60251 03-Jul-2015 Ali Jafri <ali.jafri@arm.com>

mem: Add clean evicts to improve snoop filter tracking

This patch adds eviction notices to the caches, to provide accurate
tracking of cache blocks in snoop filters. We add the CleanEvict
message to the memory heirarchy and use both CleanEvicts and
Writebacks with BLOCK_CACHED flags to propagate notice of clean and
dirty evictions respectively, down the memory hierarchy. Note that the
BLOCK_CACHED flag indicates whether there exist any copies of the
evicted block in the caches above the evicting cache.

The purpose of the CleanEvict message is to notify snoop filters of
silent evictions in the relevant caches. The CleanEvict message
behaves much like a Writeback. CleanEvict is a write and a request but
unlike a Writeback, CleanEvict does not have data and does not need
exclusive access to the block. The cache generates the CleanEvict
message on a fill resulting in eviction of a clean block. Before
travelling downwards CleanEvict requests generate zero-time snoop
requests to check if the same block is cached in upper levels of the
memory heirarchy. If the block exists, the cache discards the
CleanEvict message. The snoops check the tags, writeback queue and the
MSHRs of upper level caches in a manner similar to snoops generated
from HardPFReqs. Currently CleanEvicts keep travelling towards main
memory unless they encounter the block corresponding to their address
or reach main memory (since we have no well defined point of
serialisation). Main memory simply discards CleanEvict messages.

We have modified the behavior of Writebacks, such that they generate
snoops to check for the presence of blocks in upper level caches. It
is possible in our current implmentation for a lower level cache to be
writing back a block while a shared copy of the same block exists in
the upper level cache. If the snoops find the same block in upper
level caches, we set the BLOCK_CACHED flag in the Writeback message.

We have also added logic to account for interaction of other message
types with CleanEvicts waiting in the writeback queue. A simple
example is of a response arriving at a cache removing any CleanEvicts
to the same address from the cache's writeback queue.


# 10809:e3963342ead4 29-Apr-2015 Rizwana Begum <rb639@drexel.edu>

mem: Simplify page close checks for adaptive policies

Both open_adaptive and close_adaptive page polices keep the page
open if a row hit is found. If a row hit is not found, close_adaptive
page policy precharges the row, and open_adaptive policy precharges
the row only if there is a bank conflict request waiting in the queue.

This patch makes the checks for above conditions simpler.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>


# 10721:3e6a3eaac71b 02-Mar-2015 Marco Balboni <Marco.Balboni@ARM.com>

mem: Downstream components consumes new crossbar delays

This patch makes the caches and memory controllers consume the delay
that is annotated to a packet by the crossbar. Previously many
components simply threw these delays away. Note that the devices still
do not pay for these delays.


# 10713:eddb533708cb 02-Mar-2015 Andreas Hansson <andreas.hansson@arm.com>

mem: Split port retry for all different packet classes

This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.

The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.

The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.


# 10694:1a6785e37d81 11-Feb-2015 Marco Balboni <Marco.Balboni@ARM.com>

mem: Clarification of packet crossbar timings

This patch clarifies the packet timings annotated
when going through a crossbar.

The old 'firstWordDelay' is replaced by 'headerDelay' that represents
the delay associated to the delivery of the header of the packet.

The old 'lastWordDelay' is replaced by 'payloadDelay' that represents
the delay needed to processing the payload of the packet.

For now the uses and values remain identical. However, going forward
the payloadDelay will be additive, and not include the
headerDelay. Follow-on patches will make the headerDelay capture the
pipeline latency incurred in the crossbar, whereas the payloadDelay
will capture the additional serialisation delay.


# 10646:17d8d0a624a0 20-Jan-2015 Andreas Hansson <andreas.hansson@arm.com>

mem: Move DRAM interleaving check to init

This patch fixes a bug where the DRAM controller tried to access the
system cacheline size before the system pointer was initialised. It
also fixes a bug where the granularity is 0 (no interleaving).


# 10620:74834c49fbbe 23-Dec-2014 Andreas Hansson <andreas.hansson@arm.com>

config: Expose the DRAM ranks as a command-line option

This patch gives the user direct influence over the number of DRAM
ranks to make it easier to tune the memory density without affecting
the bandwidth (previously the only means of scaling the device count
was through the number of channels).

The patch also adds some basic sanity checks to ensure that the number
of ranks is a power of two (since we rely on bit slices in the address
decoding).


# 10619:6dd27a0e0d23 23-Dec-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Ensure DRAM controller is idle when in atomic mode

This patch addresses an issue seen with the KVM CPU where the refresh
events scheduled by the DRAM controller forces the simulator to switch
out of the KVM mode, thus killing performance.

The current patch works around the fact that we currently have no
proper API to inform a SimObject of the mode switches. Instead we rely
on drainResume being called after any switch, and cache the previous
mode locally to be able to decide on appropriate actions.

The switcheroo regression require a minor stats bump as a result.


# 10618:bb665366cc00 23-Dec-2014 Omar Naji <Omar.Naji@arm.com>

mem: Add rank-wise refresh to the DRAM controller

This patch adds rank-wise refresh to the controller, as opposed to the
channel-wide refresh currently in place. In essence each rank can be
refreshed independently, and for this to be possible the controller
is extended with a state machine per rank.

Without this patch the data bus is always idle during a refresh, as
all the ranks are refreshing at the same time. With the rank-wise
refresh it is possible to use one rank while another one is
refreshing, and thus the data bus can be kept busy.

The patch introduces a Rank class to encapsulate the state per rank,
and also shifts all the relevant banks, activation tracking etc to the
rank. The arbitration is also updated to consider the state of the rank.


# 10617:471d390943f0 23-Dec-2014 Omar Naji <Omar.Naji@arm.com>

mem: Fix a bug in the DRAM controller arbitration

Fix a minor issue that affects multi-rank systems.


# 10561:e1a853349529 02-Dec-2014 Omar Naji <Omar.Naji@arm.com>

mem: Add a GDDR5 DRAM config

This patch adds a first cut GDDR5 config to accommodate the users
combining gem5 and GPUSim. The config is based on a SK Hynix
datasheet, and the Nvidia GTX580 specification. Someone from the
GPUSim user-camp should tweak the default page-policy and static
frontend and backend latencies.


# 10509:d5554f97c451 30-Oct-2014 Ali Saidi <Ali.Saidi@ARM.com>

arm, mem: Fix drain bug and provide drain prints for more components.


# 10492:59f9f18aae0c 20-Oct-2014 Omar Naji <Omar.Naji@arm.com>

mem: Fix DRAM activationlLimit bug

Ensure that we do the proper event scheduling also when the activation
limit is disabled.


# 10489:99d59caa4c8f 20-Oct-2014 Omar Naji <Omar.Naji@arm.com>

mem: Add DRAM device size and check against config

This patch adds the size of the DRAM device to the DRAM config. It
also compares the actual DRAM size (calculated using information from
the config) to the size defined in the system. If these two values do
not match gem5 will print a warning. In order to do correct DRAM
research the size of the memory defined in the system should match the
size of the DRAM in the config. The timing and current parameters
found in the DRAM configs are defined for a DRAM device with a
specific size and would differ for another device with a different
size.


# 10466:73b7549d979e 16-Oct-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Dynamically determine page bytes in memory components

This patch takes a step towards an ISA-agnostic memory
system by enabling the components to establish the page size after
instantiation. The swap operation in the memory is now also allowing
any granularity to avoid depending on the IntReg of the ISA.


# 10432:da98b90b5df0 29-Jul-2014 Omar Naji <Omar.Naji@arm.com>

mem: DRAMPower integration for on-line DRAM power stats

This patch takes the final step in integrating DRAMPower and adds the
appropriate calls in the DRAM controller to provide the command trace
and extract the power and energy stats. The debug printouts are still
left in place, but will eventually be removed.

At the moment the DRAM power calculation is always on when using the
DRAM controller model. The run-time impact of this addition is around
1.5% when looking at the total host seconds of the regressions. We
deem this a sensible trade-off to avoid the complication of adding an
enable/disable mechanism.


# 10405:7a618c07e663 20-Sep-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Rename Bus to XBar to better reflect its behaviour

This patch changes the name of the Bus classes to XBar to better
reflect the actual timing behaviour. The actual instances in the
config scripts are not renamed, and remain as e.g. iobus or membus.

As part of this renaming, the code has also been clean up slightly,
making use of range-based for loops and tidying up some comments. The
only changes outside the bus/crossbar code is due to the delay
variables in the packet.


# 10394:70cfafa17653 20-Sep-2014 Wendy Elsasser <wendy.elsasser@arm.com>

mem: Add DDR4 bank group timing

Added the following parameter to the DRAMCtrl class:
- bank_groups_per_rank

This defaults to 1. For the DDR4 case, the default is overridden to indicate
bank group architecture, with multiple bank groups per rank.

Added the following delays to the DRAMCtrl class:
- tCCD_L : CAS-to-CAS, same bank group delay
- tRRD_L : RAS-to-RAS, same bank group delay

These parameters are only applied when bank group timing is enabled. Bank
group timing is currently enabled only for DDR4 memories.

For all other memories, these delays will default to '0 ns'

In the DRAM controller model, applied the bank group timing to the per bank
parameters actAllowedAt and colAllowedAt.
The actAllowedAt will be updated based on bank group when an ACT is issued.
The colAllowedAt will be updated based on bank group when a RD/WR burst is
issued.

At the moment no modifications are made to the scheduling.


# 10393:0fafa62b6c01 20-Sep-2014 Wendy Elsasser <wendy.elsasser@arm.com>

mem: Add memory rank-to-rank delay

Add the following delay to the DRAM controller:
- tCS : Different rank bus turnaround delay

This will be applied for
1) read-to-read,
2) write-to-write,
3) write-to-read, and
4) read-to-write
command sequences, where the new command accesses a different rank
than the previous burst.

The delay defaults to 2*tCK for each defined memory class. Note that
this does not correspond to one particular timing constraint, but is a
way of modelling all the associated constraints.

The DRAM controller has some minor changes to prioritize commands to
the same rank. This prioritization will only occur when the command
stream is not switching from a read to write or vice versa (in the
case of switching we have a gap in any case).

To prioritize commands to the same rank, the model will determine if there are
any commands queued (same type) to the same rank as the previous command.
This check will ensure that the 'same rank' command will be able to execute
without adding bubbles to the command flow, e.g. any ACT delay requirements
can be done under the hoods, allowing the burst to issue seamlessly.


# 10286:e95a0ab1d368 26-Aug-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Fix address interleaving bug in DRAM controller

This patch fixes a bug in the DRAM controller address decoding. In
cases where the DRAM burst size (e.g. 32 bytes in a rank with a single
LPDDR3 x32) was smaller than the channel interleaving size
(e.g. systems with a 64-byte cache line) one address bit effectively
got used as a channel bit when it should have been a low-order column
bit.

This patch adds a notion of "columns per stripe", and more clearly
deals with the low-order column bits and high-order column bits. The
patch also relaxes the granularity check such that it is possible to
use interleaving granularities other than the cache line size.

The patch also adds a missing M5_CLASS_VAR_USED to the tCK member as
it is only used in the debug build for now.


# 10247:0ad233f0a77d 30-Jun-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: DRAMPower trace output

This patch adds a DRAMPower flag to enable off-line DRAM power
analysis using the DRAMPower tool. A new DRAMPower flag is added
and a follow-on patch adds a Python script to post-process the output
and order it based on time stamps.

The long-term goal is to link DRAMPower as a library and provide the
commands through function calls to the model rather than first
printing and then parsing the commands. At the moment it is also up to
the user to ensure that the same DRAM configuration is used by the
gem5 controller model and DRAMPower.


# 10246:e0e3efe3b1d5 30-Jun-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Add bank and rank indices as fields to the DRAM bank

This patch adds the index of the bank and rank as a field so that we can
determine the identity of a given bank (reference or pointer) for the
power tracing. We also grab the opportunity of cleaning up the
arguments used for identifying the bank when activating.


# 10245:70333502b9b5 30-Jun-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Extend DRAM row bits from 16 to 32 for larger densities

This patch extends the DRAM row bits to 32 to support larger density
memories. Additional checks are also added to ensure the row fits in
the 32 bits.


# 10216:52c869140fc2 09-May-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Add DRAM cycle time

This patch extends the current timing parameters with the DRAM cycle
time. This is needed as the DRAMPower tool expects timestamps in DRAM
cycles. At the moment we could get away with doing this in a
post-processing step as the DRAMPower execution is separate from the
simulation run. However, in the long run we want the tool to be called
during the simulation, and then the cycle time is needed.


# 10215:52d46098c1b6 09-May-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Simplify DRAM response scheduling

This patch simplifies the DRAM response scheduling based on the
assumption that they are always returned in order.


# 10214:39eb5d4c400a 09-May-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Add precharge all (PREA) to the DRAM controller

This patch adds the basic ingredients for a precharge all operation,
to be used in conjunction with DRAM power modelling.

Currently we do not try and apply any cleverness when precharging all
banks, thus even if only a single bank is open we use PREA as opposed
to PRE. At the moment we only have a single tRP (tRPpb), and do not
model the slightly longer all-bank precharge constraint (tRPab).


# 10213:2e630c6c2042 09-May-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Remove printing of DRAM params

This patch removes the redundant printing of DRAM params.


# 10212:acc1131e01d6 09-May-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Add tRTP to the DRAM controller

This patch adds the tRTP timing constraint, governing the minimum time
between a read command and a precharge. Default values are provided
for the existing DRAM types.


# 10211:e084db2b1527 09-May-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Merge DRAM latency calculation and bank state update

This patch merges the two control paths used to estimate the latency
and update the bank state. As a result of this merging the computation
is now in one place only, and should be easier to follow as it is all
done in absolute (rather than relative) time.

As part of this change, the scheduling is also refined to ensure that
we look at a sensible estimate of the bank ready time in choosing the
next request. The bank latency stat is removed as it ends up being
misleading when the DRAM access code gets evaluated ahead of time (due
to the eagerness of waking the model up for scheduling the next
request).


# 10210:793e5ff26e0b 09-May-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Add tWR to DRAM activate and precharge constraints

This patch adds the write recovery time to the DRAM timing
constraints, and changes the current tRASDoneAt to a more generic
preAllowedAt, capturing when a precharge is allowed to take place.

The part of the DRAM access code that accounts for the precharge and
activate constraints is updated accordingly.


# 10209:ac71c857e1e1 09-May-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Merge DRAM page-management calculations

This patch treats the closed page policy as yet another case of
auto-precharging, and thus merges the code with that used for the
other policies.


# 10208:c249f7660eb7 09-May-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Add DRAM power states to the controller

This patch adds power states to the controller. These states and the
transitions can be used together with the Micron power model. As a
more elaborate use-case, the transitions can be used to drive the
DRAMPower tool.

At the moment, the power-down modes are not used, and this patch
simply serves to capture the idle, auto refresh and active modes. The
patch adds a third state machine that interacts with the refresh state
machine.


# 10207:3112b31596f0 09-May-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Ensure DRAM refresh respects timings

This patch adds a state machine for the refresh scheduling to
ensure that no accesses are allowed while the refresh is in progress,
and that all banks are propely precharged.

As part of this change, the precharging of banks of broken out into a
method of its own, making is similar to how activations are dealt
with. The idle accounting is also updated to ensure that the refresh
duration is not added to the time that the DRAM is in the idle state
with all banks precharged.


# 10206:823f7fd1a82f 09-May-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Make DRAM read/write switching less conservative

This patch changes the read/write event loop to use a single event
(nextReqEvent), along with a state variable, thus joining the two
control flows. This change makes it easier to follow the state
transitions, and control what happens when.

With the new loop we modify the overly conservative switching times
such that the write-to-read switch allows bank preparation to happen
in parallel with the bus turn around. Similarly, the read-to-write
switch uses the introduced tRTW constraint.


# 10147:3e51a30b8071 23-Mar-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Track DRAM read/write switching and add hysteresis

This patch adds stats for tracking the number of reads/writes per bus
turn around, and also adds hysteresis to the write-to-read switching
to ensure that the queue does not oscilate around the low threshold.


# 10146:27dfed4c8403 23-Mar-2014 Andreas Hansson <andreas.hansson@arm.com>

mem: Rename SimpleDRAM to a more suitable DRAMCtrl

This patch renames the not-so-simple SimpleDRAM to a more suitable
DRAMCtrl. The name change is intended to ensure that we do not send
the wrong message (although the "simple" in SimpleDRAM was originally
intended as in cleverly simple, or elegant).

As the DRAM controller modelling work is being presented at ISPASS'14
our hope is that a broader audience will use the model in the future.