#
14038:8ba13d8b7810 |
|
01-May-2019 |
Matthew Poremba <matthew.poremba@amd.com> |
mem: Option to toggle DRAM low-power states
Adding an option to enable DRAM low-power states. The low power states can have a significant impact on application performance (sim_ticks) on the order of 2-3x, especially for compute-gpu apps. The options allows for it to easily be enabled/disabled to compare performance numbers. The option is disabled by default.
Change-Id: Ib9bddbb792a1a6a4afb5339003472ff8f00a5859 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18548 Reviewed-by: Wendy Elsasser <wendy.elsasser@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com>
|
#
13892:0182a0601f66 |
|
22-Apr-2019 |
Gabe Black <gabeblack@google.com> |
mem: Minimize the use of MemObject.
MemObject doesn't provide anything beyond its base ClockedObject any more, so this change removes it from most inheritance hierarchies. Occasionally MemObject is replaced with SimObject when I was fairly confident that the extra functionality of ClockedObject wasn't needed.
Change-Id: Ic014ab61e56402e62548e8c831eb16e26523fdce Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18289 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
13857:9255d7412a58 |
|
12-Feb-2019 |
Daniel R. Carvalho <odanrc@yahoo.com.br> |
mem: Make DRAMCtrl::decodeAddr const
DRAMCtrl's decodeAddr does not need to modify the packet it receives, nor should it modify the contents of the class, and therefore both the packet and the function are made const.
Change-Id: I577f48d9a43611ba54878a9a793cb7b4fbb326f4 Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17540 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
13834:1a7c647cbeac |
|
04-Apr-2019 |
Jason Lowe-Power <jason@lowepower.com> |
mem: Reverse order of write/read mem queue check
For atomic RMW instructions that go directly to memory, we want to put them on the write queue instead of the read queue. Swap the if/else condition to accomplish this.
Note: This is ignoring the read latency of the RMW, but these instructions should usually be handled in caches anyway.
Change-Id: I62dbfff3a16ac470f1ebdb489abe878962b20bb6 Signed-off-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17828 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
13784:1941dc118243 |
|
07-Mar-2019 |
Gabe Black <gabeblack@google.com> |
arch, cpu, dev, gpu, mem, sim, python: start using getPort.
Replace the getMasterPort, getSlavePort, and getEthPort functions with getPort, and remove extraneous mechanisms that are no longer necessary.
Change-Id: Iab7e3c02d2f3a0cf33e7e824e18c28646b5bc318 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17040 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
13564:9bbd53a77887 |
|
27-Nov-2018 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
mem: Determine if a packet queue forces ordering at construction
A packet queue is typically used to hold on to packets that are schedules to be sent in the future or when they need to queue behind younger packets that have been sent out yet. Due to memory order requirements, some MemObjects need to maintain the order for packet (mostly responses) that reference the same cache block.
Prior to this patch the ordering requirements where determined when the packet was scheduled to be sent. This patch moves the parameter to the constructor.
Change-Id: Ieb4d94e86bc7514f5036b313ec23ea47dd653164 Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15555 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
12969:52de9d619ce6 |
|
04-Aug-2017 |
Matteo Andreozzi <Matteo.Andreozzi@arm.com> |
mem: Make DRAMCtrl a QoS-aware Memory Controller
This patch is turning DRAMCtrl a QoS-aware Memory Controller with "no policy" as a default policy.
Change-Id: I48163da8c8208498cf0398b07094cb840272507f Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/11973 Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
12823:ba630bc7a36d |
|
19-Jul-2018 |
Robert Kovacsics <rmk35@cl.cam.ac.uk> |
mem: Rename Packet::checkFunctional to trySatisfyFunctional
Packet::checkFunctional also wrote data to/from the packet depending on if it was read/write, respectively, which the 'check' in the name would suggest otherwise. This renames it to doFunctional, which is more suggestive. It also renames any function called checkFunctional which calls Packet::checkFunctional. These are
- Bridge::BridgeMasterPort::checkFunctional - calls Packet::checkFunctional - MSHR::checkFunctional - calls Packet::checkFunctional - MSHR::TargetList::checkFunctional - calls Packet::checkFunctional - Queue<>::checkFunctional (of src/mem/cache/queue.hh, not src/cpu/minor/buffers.h) - Instantiated with Queue<WriteQueueEntry> and Queue<MSHR> - WriteQueueEntry - calls Packet::checkFunctional - WriteQueueEntry::TargetList - calls Packet::checkFunctional - MemDelay::checkFunctional - calls QueuedSlavePort/QueuedMasterPort::checkFunctional - Packet::checkFunctional - PacketQueue::checkFunctional - calls Packet::checkFunctional - QueuedSlavePort::checkFunctional - calls PacketQueue::doFunctional - QueuedMasterPort::checkFunctional - calls PacketQueue::doFunctional - SerialLink::SerialLinkMasterPort::checkFunctional - calls Packet::doFunctional
Change-Id: Ieca2579c020c329040da053ba8e25820801b62c5 Reviewed-on: https://gem5-review.googlesource.com/11810 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
12706:456304051464 |
|
28-Mar-2017 |
Wendy Elsasser <wendy.elsasser@arm.com> |
mem: Add support for more flexible DRAM timing and topologies
This patch has 2 main aspects: 1) Add new parameter to adjust write-to-write delay 2) Enable support of more than 64 banks per controller
Changes for new parameter: Incorporated a new parameter, tCCD_L_WR, which defaults to tCCD_L. This parameter can be used to set a unique delay between writes and between reads.
To incorporate this parameter in the controller, modified the DRAMCtrl class to have separate variables for read and write column delays. Used these variables to account for tRTW, tWTR, tBURST, tCCD_L, and tCS as well as the new tCCD_L_WR parameter.
Changes to support more than 64 banks: Modified the logic selecting the next command (reorderQueue and minBankPrep functions). Replaced the unint64_t variables with a vector of uint32_t elements. There is a uint32_t element defined per ranks to allow up to 32 banks per rank. This will automatically scale with ranks without issue. Change will allow analysis of memory sub-systems beyond the current landscape.
Change-Id: I0ce466efed58276f843ad90e9ecc0ece6c37d646 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10103 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
12705:9668a82ead4b |
|
06-Apr-2017 |
Wendy Elsasser <wendy.elsasser@arm.com> |
mem: Optimize self-refresh entry
Self-refresh is entered during a refresh event, when the rank was previously in a precharge power-down state. The original code would enter self-refresh after a refresh was issued. The device subsequently will issue a refresh on self-refresh entry. On self-refresh exit, the controller will issue another refresh command.
Devices require at least one additional refresh to be issued between self-refresh exit and re-entry. This ensures that enough refreshes occur in the case when the device narrowly missed a refresh on self-refresh exit.
To minimize the number of refresh operations and still maintain the device requirement, the current logic does the following: 1) The controller will still enter self-refresh from a refresh event, when the previous state was precharge power-down. However, the refresh itself will be bypassed and the controller will immediately issue a self-refresh entry. 2) On a self-refresh exit, the controller will immediately issue a refresh command (per the original logic). This ensures the devices requirements are met and is a convenient way to kick off the command state machine.
Change-Id: I1c4b0dcbfa3bdafd755f3ccd65e267fcd700c491 Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10102 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
12637:bfc3cb9c7e6c |
|
30-Mar-2018 |
Daniel R. Carvalho <odanrc@yahoo.com.br> |
mem: Remove unused 'using namespace'
Removal of unused/barely used 'using namespace' from C++ files.
Change-Id: I66dc548c04506db2e41180b9ea7ab5abd7d5375a Reviewed-on: https://gem5-review.googlesource.com/9601 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
12266:63b8da9eeca4 |
|
19-Dec-2016 |
Radhika Jagtap <radhika.jagtap@arm.com> |
ext, mem: Pull DRAMPower SHA 90d6290 and rebase
This patch syncs the DRAMPower library of gem5 to the external github (https://github.com/ravenrd/DRAMPower).
The version pulled in is the commit: 90d6290f802c29b3de9e10233ceee22290907ce6 from 30th Oct. 2016.
This change also modifies the DRAM Ctrl interaction with the DRAMPower, due to changes in the lib API in the above version.
Previously multiple functions were called to prepare the power lib before calling the function that would calculate the enery. With the new API, these functions are encompassed inside the function to calculate the energy and therefore should now be removed from the DRAM controller.
The other key difference is the introduction of a new function called calcWindowEnergy which can be useful for any system that wants to do measurements over intervals. For gem5 DRAM ctrl that means we now need to accumulate the window energy measurements into the total stat.
Change-Id: I3570fff2805962e166ff2a1a3217ebf2d5a197fb Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5724 Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
12084:5a3769ff3d55 |
|
07-Jun-2017 |
Sean Wilson <spwilson2@wisc.edu> |
mem: Replace EventWrapper use with EventFunctionWrapper
NOTE: With this change there is a possibility for `DRAMCtrl::Rank`s event names to not properly match the rank they were generated by. This could occur if the public rank member is modified after the Rank's construction. A patch would mean refactoring Rank and `DRAMCtrl`b to privatize many of the members of Rank behind getters.
Change-Id: I7b8bd15086f4ffdfd3f40be4aeddac5e786fd78e Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3745 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
12081:cb5fe81fd522 |
|
12-Jun-2017 |
Sean Wilson <spwilson2@wisc.edu> |
mem: Move the Rank construction logic to the Rank constructor
This change was made so Rank objects have their name assigned when they are instantiated. Therefore, they can initialize their member objects with their name and it is less likely to change during runtime.
(NOTE: I would recommend hiding the fields which would cause the name to change behind getters. Since modification of `Rank.rank` during runtime will cause the `name()` to change.)
Change-Id: Id51c3553b40e489792c57950e18b8ce927e43173 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3742 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
11846:b9436a4bbbb9 |
|
15-Feb-2017 |
Wendy Elsasser <wendy.elsasser@arm.com> |
mem: fix assertion in respondEvent
Assertion in the respondEvent erroneously fired. The assertion verifies that the controller has not moved to a low-power state prior to receiving read data from the memory. The original assertion triggered if the state was not: PWR_IDLE or PWR_ACT.
In the case that failed, a periodic refresh event occurred around the read. The REF is stalled until the final read burst is issued and the subsequent PRE closes the bank. While the PRE will temporarily move the state to PWR_IDLE, state will immediately transition to PWR_REF due to the pending refresh operation. This state does not match the assertion, which is subsequently triggered.
Fixed the assertion by explicitly checking that the state is not a low power state !PWR_SREF && !PWR_PRE_PDN && !PWR_ACT_PDN
Change-Id: I82921a733bbeac2bcb5a487c2f981448d41ed50b Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
#
11793:ef606668d247 |
|
09-Nov-2016 |
Brandon Potter <brandon.potter@amd.com> |
style: [patch 1/22] use /r/3648/ to reorganize includes
|
#
11678:8c6991a00515 |
|
13-Oct-2016 |
Wendy Elsasser <wendy.elsasser@arm.com> |
mem: Add DRAM low-power functionality
Added power-down state transitions to the DRAM controller model.
Added per rank parameter, outstandingEvents, which tracks the number of outstanding command events and is used to determine when the controller should transition to a low power state. The controller will only transition when there are no outstanding events scheduled and the number of command entries for the given rank is 0.
The outstandingEvents parameter is incremented for every RD/WR burst, PRE, and REF event scheduled. ACT is implicitly covered by RD/WR since burst will always issue and complete after a required ACT. The parameter is decremented when the event is serviced (completed).
The controller will automatically transition to ACT power down, PRE power down, or SREF.
Transition to ACT power down state scheduled from: 1) The RespondEvent, where read data is received from the memory. ACT power-down entry will be scheduled when one or more banks is open, all commands for the rank have completed (no more commands scheduled), and there are no commands in queue for the rank
Transition to PRE power down scheduled from: 1) respondEvent, when all banks are closed, all commands have completed, and there are no commands in queue for the rank 2) prechargeEvent when all banks are closed, all commands have completed, and there are no commands in queue for the rank 3) refreshEvent, after the refresh is complete when the previous state was ACT power-down 4) refreshEvent, after the refresh is complete when the previous state was PRE power-down and there are commands in the queue.
Transition to SREF will be scheduled from: 1) refreshEvent, after the refresh is completes when the previous state was PRE power-down with no commands in queue
Power-down exit commands are scheduled from: 1) The refreshEvent, prior to issuing a refresh 2) doDRAMAccess, to wake-up the rank for RD/WR command issue.
Self-refresh exit commands are scheduled from: 1) The next request event, when the queue has commands for the rank in the readQueue or there are commands for the rank in the writeQueue and the bus state is WRITE.
Change-Id: I6103f660776e36c686655e71d92ec7b5b752050a Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
#
11677:beaf1afe2f83 |
|
13-Oct-2016 |
Wendy Elsasser <wendy.elsasser@arm.com> |
mem: Add callback to compute stats prior to dump event
The per rank statistics are periodically updated based on state transition and refresh events.
Add a method to update these when a dump event occurs to ensure they reflect accurate values. Specifically, need to ensure that the low-power state durations, power, and energy are logged correctly.
Change-Id: Ib642a6668340de8f494a608bb34982e58ba7f1eb Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
#
11676:8a882e297eb2 |
|
13-Oct-2016 |
Wendy Elsasser <wendy.elsasser@arm.com> |
mem: Modify drain to ensure banks and power are idled
Add constraint that all ranks have to be in PWR_IDLE before signaling drain complete
This will ensure that the banks are all closed and the rank has exited any low-power states.
On suspend, update the power stats to sync the DRAM power logic
The logic maintains the location of the signalDrainDone method, which is still triggered from either: 1) Read response event 2) Next request event
This ensures that the drain will complete in the READ bus state and minimizes the changes required.
Change-Id: If1476e631ea7d5999fe50a0c9379c5967a90e3d1 Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
#
11675:60d18201148d |
|
13-Oct-2016 |
Wendy Elsasser <wendy.elsasser@arm.com> |
mem: Sort memory commands and update DRAMPower
Add local variable to stores commands to be issued. These commands are in order within a single bank but will be out of order across banks & ranks.
A new procedure, flushCmdList, sorts commands across banks / ranks, and flushes the sorted list, up to curTick() to DRAMPower. This is currently called in refresh, once all previous commands are guaranteed to have completed. Could be called in other events like the powerEvent as well.
By only flushing commands up to curTick(), will not get out of sync when flushed at a periodic stats dump (done in subsequent patch).
Change-Id: I4ac65a52407f64270db1e16a1fb04cfe7f638851 Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
#
11673:9f3ccf96bb5a |
|
13-Oct-2016 |
Omar Naji <Omar.Naji@arm.com> |
mem: add DRAM powerdown timing
|
#
11334:9bd2e84abdca |
|
10-Feb-2016 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Move the point of coherency to the coherent crossbar
This patch introduces the ability of making the coherent crossbar the point of coherency. If so, the crossbar does not forward packets where a cache with ownership has already committed to responding, and also does not forward any coherency-related packets that are not intended for a downstream memory controller. Thus, invalidations and upgrades are turned around in the crossbar, and the memory controller only sees normal reads and writes.
In addition this patch moves the express snoop promotion of a packet to the crossbar, thus allowing the downstream cache to check the express snoop flag (as it should) for bypassing any blocking, rather than relying on whether a cache is responding or not.
|
#
11321:02e930db812d |
|
06-Feb-2016 |
Steve Reinhardt <steve.reinhardt@amd.com> |
style: fix missing spaces in control statements
Result of running 'hg m5style --skip-all --fix-control -a'.
|
#
11284:b3926db25371 |
|
31-Dec-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Make cache terminology easier to understand
This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions.
The following name changes are made:
* the packet memInhibit flag is renamed to cacheResponding
* the packet sharedAsserted flag is renamed to hasSharers
* the packet NeedsExclusive attribute is renamed to NeedsWritable
* the packet isSupplyExclusive is renamed responderHadWritable
* the MSHR pendingDirty is renamed to pendingModified
The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand.
|
#
11194:c3ba89c653a9 |
|
06-Nov-2015 |
Ali Jafri <ali.jafri@arm.com> |
mem: Enforce insertion order on the cache response path
This patch enforces insertion order transmission of packets on the response path in the cache. Note that the logic to enforce order is already present in the packet queue, this patch simply turns it on for queues in the response path.
Without this patch, there are corner cases where a request-response is faster than a response-response forwarded through the cache. This violation of queuing order causes problems in the snoop filter leaving it with inaccurate information. This causes assert failures in the snoop filter later on.
A follow on patch relaxes the order enforcement in the packet queue to limit the performance impact.
|
#
11192:4c28abcf8249 |
|
06-Nov-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Align rules for sinking inhibited packets at the slave
This patch aligns how the memory-system slaves, i.e. the various memory controllers and the bridge, identify and deal with sinking of inhibited packets that are only useful within the coherent part of the memory system.
In the future we could shift the onus to the crossbar, and add a parameter "is_point_of_coherence" that would allow it to sink the aforementioned packets.
|
#
11190:0964165d1857 |
|
06-Nov-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Unify delayed packet deletion
This patch unifies how we deal with delayed packet deletion, where the receiving slave is responsible for deleting the packet, but the sending agent (e.g. a cache) is still relying on the pointer until the call to sendTimingReq completes. Previously we used a mix of a deletion vector and a construct using unique_ptr. With this patch we ensure all slaves use the latter approach.
|
#
11189:4237221d3e31 |
|
06-Nov-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Appease clang static analyzer
A few minor fixes to issues identified by the clang static analyzer.
|
#
10913:38dbdeea7f1f |
|
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Refactor and simplify the drain API
The drain() call currently passes around a DrainManager pointer, which is now completely pointless since there is only ever one global DrainManager in the system. It also contains vestiges from the time when SimObjects had to keep track of their child objects that needed draining.
This changeset moves all of the DrainState handling to the Drainable base class and changes the drain() and drainResume() calls to reflect this. Particularly, the drain() call has been updated to take no parameters (the DrainManager argument isn't needed) and return a DrainState instead of an unsigned integer (there is no point returning anything other than 0 or 1 any more). Drainable objects should return either DrainState::Draining (equivalent to returning 1 in the old system) if they need more time to drain or DrainState::Drained (equivalent to returning 0 in the old system) if they are already in a consistent state. Returning DrainState::Running is considered an error.
Drain done signalling is now done through the signalDrainDone() method in the Drainable class instead of using the DrainManager directly. The new call checks if the state of the object is DrainState::Draining before notifying the drain manager. This means that it is safe to call signalDrainDone() without first checking if the simulator has requested draining. The intention here is to reduce the code needed to implement draining in simple objects.
|
#
10912:b99a6662d7c2 |
|
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Decouple draining from the SimObject hierarchy
Draining is currently done by traversing the SimObject graph and calling drain()/drainResume() on the SimObjects. This is not ideal when non-SimObjects (e.g., ports) need draining since this means that SimObjects owning those objects need to be aware of this.
This changeset moves the responsibility for finding objects that need draining from SimObjects and the Python-side of the simulator to the DrainManager. The DrainManager now maintains a set of all objects that need draining. To reduce the overhead in classes owning non-SimObjects that need draining, objects inheriting from Drainable now automatically register with the DrainManager. If such an object is destroyed, it is automatically unregistered. This means that drain() and drainResume() should never be called directly on a Drainable object.
While implementing the new functionality, the DrainManager has now been made thread safe. In practice, this means that it takes a lock whenever it manipulates the set of Drainable objects since SimObjects in different threads may create Drainable objects dynamically. Similarly, the drain counter is now an atomic_uint, which ensures that it is manipulated correctly when objects signal that they are done draining.
A nice side effect of these changes is that it makes the drain state changes stricter, which the simulation scripts can exploit to avoid redundant drains.
|
#
10910:32f3d1c454ec |
|
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Make the drain state a global typed enum
The drain state enum is currently a part of the Drainable interface. The same state machine will be used by the DrainManager to identify the global state of the simulator. Make the drain state a global typed enum to better cater for this usage scenario.
|
#
10890:bac38d2a4acb |
|
03-Jul-2015 |
Wendy Elsasser <wendy.elsasser@arm.com> |
mem: Update DRAM command scheduler for bank groups
This patch updates the command arbitration so that bank group timing as well as rank-to-rank delays will be taken into account. The resulting arbitration no longer selects commands (prepped or not) that cannot issue seamlessly if there are commands that can issue back-to-back, minimizing the effect of rank-to-rank (tCS) & same bank group (tCCD_L) delays.
The arbitration selects a new command based on the following priority. Within each priority band, the arbitration will use FCFS to select the appropriate command:
1) Bank is prepped and burst can issue seamlessly, without a bubble
2) Bank is not prepped, but can prep and issue seamlessly, without a bubble
3) Bank is prepped but burst cannot issue seamlessly. In this case, a bubble will occur on the bus
Thus, to enable more parallelism in subsequent selections, an unprepped packet is given higher priority if the bank prep can be hidden. If the bank prep cannot be hidden, the selection logic will choose a prepped packet that cannot issue seamlessly if one exist. Otherwise, the default selection will choose the packet with the minimum bank prep delay.
|
#
10889:c4c13fced000 |
|
03-Jul-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Avoid DRAM write queue iteration for merging and read lookup
This patch adds a simple lookup structure to avoid iterating over the write queue to find read matches, and for the merging of write bursts. Instead of relying on iteration we simply store a set of currently-buffered write-burst addresses and compare against these. For the reads we still perform the iteration if we have a match. For the writes, we rely entirely on the set. Note that there are corner-cases where sub-bursts would actually not be mergeable without a read-modify-write. We ignore these cases and opt for speed.
|
#
10883:9294c4a60251 |
|
03-Jul-2015 |
Ali Jafri <ali.jafri@arm.com> |
mem: Add clean evicts to improve snoop filter tracking
This patch adds eviction notices to the caches, to provide accurate tracking of cache blocks in snoop filters. We add the CleanEvict message to the memory heirarchy and use both CleanEvicts and Writebacks with BLOCK_CACHED flags to propagate notice of clean and dirty evictions respectively, down the memory hierarchy. Note that the BLOCK_CACHED flag indicates whether there exist any copies of the evicted block in the caches above the evicting cache.
The purpose of the CleanEvict message is to notify snoop filters of silent evictions in the relevant caches. The CleanEvict message behaves much like a Writeback. CleanEvict is a write and a request but unlike a Writeback, CleanEvict does not have data and does not need exclusive access to the block. The cache generates the CleanEvict message on a fill resulting in eviction of a clean block. Before travelling downwards CleanEvict requests generate zero-time snoop requests to check if the same block is cached in upper levels of the memory heirarchy. If the block exists, the cache discards the CleanEvict message. The snoops check the tags, writeback queue and the MSHRs of upper level caches in a manner similar to snoops generated from HardPFReqs. Currently CleanEvicts keep travelling towards main memory unless they encounter the block corresponding to their address or reach main memory (since we have no well defined point of serialisation). Main memory simply discards CleanEvict messages.
We have modified the behavior of Writebacks, such that they generate snoops to check for the presence of blocks in upper level caches. It is possible in our current implmentation for a lower level cache to be writing back a block while a shared copy of the same block exists in the upper level cache. If the snoops find the same block in upper level caches, we set the BLOCK_CACHED flag in the Writeback message.
We have also added logic to account for interaction of other message types with CleanEvicts waiting in the writeback queue. A simple example is of a response arriving at a cache removing any CleanEvicts to the same address from the cache's writeback queue.
|
#
10809:e3963342ead4 |
|
29-Apr-2015 |
Rizwana Begum <rb639@drexel.edu> |
mem: Simplify page close checks for adaptive policies
Both open_adaptive and close_adaptive page polices keep the page open if a row hit is found. If a row hit is not found, close_adaptive page policy precharges the row, and open_adaptive policy precharges the row only if there is a bank conflict request waiting in the queue.
This patch makes the checks for above conditions simpler.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
#
10721:3e6a3eaac71b |
|
02-Mar-2015 |
Marco Balboni <Marco.Balboni@ARM.com> |
mem: Downstream components consumes new crossbar delays
This patch makes the caches and memory controllers consume the delay that is annotated to a packet by the crossbar. Previously many components simply threw these delays away. Note that the devices still do not pay for these delays.
|
#
10713:eddb533708cb |
|
02-Mar-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Split port retry for all different packet classes
This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.
|
#
10694:1a6785e37d81 |
|
11-Feb-2015 |
Marco Balboni <Marco.Balboni@ARM.com> |
mem: Clarification of packet crossbar timings
This patch clarifies the packet timings annotated when going through a crossbar.
The old 'firstWordDelay' is replaced by 'headerDelay' that represents the delay associated to the delivery of the header of the packet.
The old 'lastWordDelay' is replaced by 'payloadDelay' that represents the delay needed to processing the payload of the packet.
For now the uses and values remain identical. However, going forward the payloadDelay will be additive, and not include the headerDelay. Follow-on patches will make the headerDelay capture the pipeline latency incurred in the crossbar, whereas the payloadDelay will capture the additional serialisation delay.
|
#
10646:17d8d0a624a0 |
|
20-Jan-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Move DRAM interleaving check to init
This patch fixes a bug where the DRAM controller tried to access the system cacheline size before the system pointer was initialised. It also fixes a bug where the granularity is 0 (no interleaving).
|
#
10620:74834c49fbbe |
|
23-Dec-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
config: Expose the DRAM ranks as a command-line option
This patch gives the user direct influence over the number of DRAM ranks to make it easier to tune the memory density without affecting the bandwidth (previously the only means of scaling the device count was through the number of channels).
The patch also adds some basic sanity checks to ensure that the number of ranks is a power of two (since we rely on bit slices in the address decoding).
|
#
10619:6dd27a0e0d23 |
|
23-Dec-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Ensure DRAM controller is idle when in atomic mode
This patch addresses an issue seen with the KVM CPU where the refresh events scheduled by the DRAM controller forces the simulator to switch out of the KVM mode, thus killing performance.
The current patch works around the fact that we currently have no proper API to inform a SimObject of the mode switches. Instead we rely on drainResume being called after any switch, and cache the previous mode locally to be able to decide on appropriate actions.
The switcheroo regression require a minor stats bump as a result.
|
#
10618:bb665366cc00 |
|
23-Dec-2014 |
Omar Naji <Omar.Naji@arm.com> |
mem: Add rank-wise refresh to the DRAM controller
This patch adds rank-wise refresh to the controller, as opposed to the channel-wide refresh currently in place. In essence each rank can be refreshed independently, and for this to be possible the controller is extended with a state machine per rank.
Without this patch the data bus is always idle during a refresh, as all the ranks are refreshing at the same time. With the rank-wise refresh it is possible to use one rank while another one is refreshing, and thus the data bus can be kept busy.
The patch introduces a Rank class to encapsulate the state per rank, and also shifts all the relevant banks, activation tracking etc to the rank. The arbitration is also updated to consider the state of the rank.
|
#
10617:471d390943f0 |
|
23-Dec-2014 |
Omar Naji <Omar.Naji@arm.com> |
mem: Fix a bug in the DRAM controller arbitration
Fix a minor issue that affects multi-rank systems.
|
#
10561:e1a853349529 |
|
02-Dec-2014 |
Omar Naji <Omar.Naji@arm.com> |
mem: Add a GDDR5 DRAM config
This patch adds a first cut GDDR5 config to accommodate the users combining gem5 and GPUSim. The config is based on a SK Hynix datasheet, and the Nvidia GTX580 specification. Someone from the GPUSim user-camp should tweak the default page-policy and static frontend and backend latencies.
|
#
10509:d5554f97c451 |
|
30-Oct-2014 |
Ali Saidi <Ali.Saidi@ARM.com> |
arm, mem: Fix drain bug and provide drain prints for more components.
|
#
10492:59f9f18aae0c |
|
20-Oct-2014 |
Omar Naji <Omar.Naji@arm.com> |
mem: Fix DRAM activationlLimit bug
Ensure that we do the proper event scheduling also when the activation limit is disabled.
|
#
10489:99d59caa4c8f |
|
20-Oct-2014 |
Omar Naji <Omar.Naji@arm.com> |
mem: Add DRAM device size and check against config
This patch adds the size of the DRAM device to the DRAM config. It also compares the actual DRAM size (calculated using information from the config) to the size defined in the system. If these two values do not match gem5 will print a warning. In order to do correct DRAM research the size of the memory defined in the system should match the size of the DRAM in the config. The timing and current parameters found in the DRAM configs are defined for a DRAM device with a specific size and would differ for another device with a different size.
|
#
10466:73b7549d979e |
|
16-Oct-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Dynamically determine page bytes in memory components
This patch takes a step towards an ISA-agnostic memory system by enabling the components to establish the page size after instantiation. The swap operation in the memory is now also allowing any granularity to avoid depending on the IntReg of the ISA.
|
#
10432:da98b90b5df0 |
|
29-Jul-2014 |
Omar Naji <Omar.Naji@arm.com> |
mem: DRAMPower integration for on-line DRAM power stats
This patch takes the final step in integrating DRAMPower and adds the appropriate calls in the DRAM controller to provide the command trace and extract the power and energy stats. The debug printouts are still left in place, but will eventually be removed.
At the moment the DRAM power calculation is always on when using the DRAM controller model. The run-time impact of this addition is around 1.5% when looking at the total host seconds of the regressions. We deem this a sensible trade-off to avoid the complication of adding an enable/disable mechanism.
|
#
10405:7a618c07e663 |
|
20-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Rename Bus to XBar to better reflect its behaviour
This patch changes the name of the Bus classes to XBar to better reflect the actual timing behaviour. The actual instances in the config scripts are not renamed, and remain as e.g. iobus or membus.
As part of this renaming, the code has also been clean up slightly, making use of range-based for loops and tidying up some comments. The only changes outside the bus/crossbar code is due to the delay variables in the packet.
|
#
10394:70cfafa17653 |
|
20-Sep-2014 |
Wendy Elsasser <wendy.elsasser@arm.com> |
mem: Add DDR4 bank group timing
Added the following parameter to the DRAMCtrl class: - bank_groups_per_rank
This defaults to 1. For the DDR4 case, the default is overridden to indicate bank group architecture, with multiple bank groups per rank.
Added the following delays to the DRAMCtrl class: - tCCD_L : CAS-to-CAS, same bank group delay - tRRD_L : RAS-to-RAS, same bank group delay
These parameters are only applied when bank group timing is enabled. Bank group timing is currently enabled only for DDR4 memories.
For all other memories, these delays will default to '0 ns'
In the DRAM controller model, applied the bank group timing to the per bank parameters actAllowedAt and colAllowedAt. The actAllowedAt will be updated based on bank group when an ACT is issued. The colAllowedAt will be updated based on bank group when a RD/WR burst is issued.
At the moment no modifications are made to the scheduling.
|
#
10393:0fafa62b6c01 |
|
20-Sep-2014 |
Wendy Elsasser <wendy.elsasser@arm.com> |
mem: Add memory rank-to-rank delay
Add the following delay to the DRAM controller: - tCS : Different rank bus turnaround delay
This will be applied for 1) read-to-read, 2) write-to-write, 3) write-to-read, and 4) read-to-write command sequences, where the new command accesses a different rank than the previous burst.
The delay defaults to 2*tCK for each defined memory class. Note that this does not correspond to one particular timing constraint, but is a way of modelling all the associated constraints.
The DRAM controller has some minor changes to prioritize commands to the same rank. This prioritization will only occur when the command stream is not switching from a read to write or vice versa (in the case of switching we have a gap in any case).
To prioritize commands to the same rank, the model will determine if there are any commands queued (same type) to the same rank as the previous command. This check will ensure that the 'same rank' command will be able to execute without adding bubbles to the command flow, e.g. any ACT delay requirements can be done under the hoods, allowing the burst to issue seamlessly.
|
#
10286:e95a0ab1d368 |
|
26-Aug-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Fix address interleaving bug in DRAM controller
This patch fixes a bug in the DRAM controller address decoding. In cases where the DRAM burst size (e.g. 32 bytes in a rank with a single LPDDR3 x32) was smaller than the channel interleaving size (e.g. systems with a 64-byte cache line) one address bit effectively got used as a channel bit when it should have been a low-order column bit.
This patch adds a notion of "columns per stripe", and more clearly deals with the low-order column bits and high-order column bits. The patch also relaxes the granularity check such that it is possible to use interleaving granularities other than the cache line size.
The patch also adds a missing M5_CLASS_VAR_USED to the tCK member as it is only used in the debug build for now.
|
#
10247:0ad233f0a77d |
|
30-Jun-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: DRAMPower trace output
This patch adds a DRAMPower flag to enable off-line DRAM power analysis using the DRAMPower tool. A new DRAMPower flag is added and a follow-on patch adds a Python script to post-process the output and order it based on time stamps.
The long-term goal is to link DRAMPower as a library and provide the commands through function calls to the model rather than first printing and then parsing the commands. At the moment it is also up to the user to ensure that the same DRAM configuration is used by the gem5 controller model and DRAMPower.
|
#
10246:e0e3efe3b1d5 |
|
30-Jun-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Add bank and rank indices as fields to the DRAM bank
This patch adds the index of the bank and rank as a field so that we can determine the identity of a given bank (reference or pointer) for the power tracing. We also grab the opportunity of cleaning up the arguments used for identifying the bank when activating.
|
#
10245:70333502b9b5 |
|
30-Jun-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Extend DRAM row bits from 16 to 32 for larger densities
This patch extends the DRAM row bits to 32 to support larger density memories. Additional checks are also added to ensure the row fits in the 32 bits.
|
#
10216:52c869140fc2 |
|
09-May-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Add DRAM cycle time
This patch extends the current timing parameters with the DRAM cycle time. This is needed as the DRAMPower tool expects timestamps in DRAM cycles. At the moment we could get away with doing this in a post-processing step as the DRAMPower execution is separate from the simulation run. However, in the long run we want the tool to be called during the simulation, and then the cycle time is needed.
|
#
10215:52d46098c1b6 |
|
09-May-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Simplify DRAM response scheduling
This patch simplifies the DRAM response scheduling based on the assumption that they are always returned in order.
|
#
10214:39eb5d4c400a |
|
09-May-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Add precharge all (PREA) to the DRAM controller
This patch adds the basic ingredients for a precharge all operation, to be used in conjunction with DRAM power modelling.
Currently we do not try and apply any cleverness when precharging all banks, thus even if only a single bank is open we use PREA as opposed to PRE. At the moment we only have a single tRP (tRPpb), and do not model the slightly longer all-bank precharge constraint (tRPab).
|
#
10213:2e630c6c2042 |
|
09-May-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Remove printing of DRAM params
This patch removes the redundant printing of DRAM params.
|
#
10212:acc1131e01d6 |
|
09-May-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Add tRTP to the DRAM controller
This patch adds the tRTP timing constraint, governing the minimum time between a read command and a precharge. Default values are provided for the existing DRAM types.
|
#
10211:e084db2b1527 |
|
09-May-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Merge DRAM latency calculation and bank state update
This patch merges the two control paths used to estimate the latency and update the bank state. As a result of this merging the computation is now in one place only, and should be easier to follow as it is all done in absolute (rather than relative) time.
As part of this change, the scheduling is also refined to ensure that we look at a sensible estimate of the bank ready time in choosing the next request. The bank latency stat is removed as it ends up being misleading when the DRAM access code gets evaluated ahead of time (due to the eagerness of waking the model up for scheduling the next request).
|
#
10210:793e5ff26e0b |
|
09-May-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Add tWR to DRAM activate and precharge constraints
This patch adds the write recovery time to the DRAM timing constraints, and changes the current tRASDoneAt to a more generic preAllowedAt, capturing when a precharge is allowed to take place.
The part of the DRAM access code that accounts for the precharge and activate constraints is updated accordingly.
|
#
10209:ac71c857e1e1 |
|
09-May-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Merge DRAM page-management calculations
This patch treats the closed page policy as yet another case of auto-precharging, and thus merges the code with that used for the other policies.
|
#
10208:c249f7660eb7 |
|
09-May-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Add DRAM power states to the controller
This patch adds power states to the controller. These states and the transitions can be used together with the Micron power model. As a more elaborate use-case, the transitions can be used to drive the DRAMPower tool.
At the moment, the power-down modes are not used, and this patch simply serves to capture the idle, auto refresh and active modes. The patch adds a third state machine that interacts with the refresh state machine.
|
#
10207:3112b31596f0 |
|
09-May-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Ensure DRAM refresh respects timings
This patch adds a state machine for the refresh scheduling to ensure that no accesses are allowed while the refresh is in progress, and that all banks are propely precharged.
As part of this change, the precharging of banks of broken out into a method of its own, making is similar to how activations are dealt with. The idle accounting is also updated to ensure that the refresh duration is not added to the time that the DRAM is in the idle state with all banks precharged.
|
#
10206:823f7fd1a82f |
|
09-May-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Make DRAM read/write switching less conservative
This patch changes the read/write event loop to use a single event (nextReqEvent), along with a state variable, thus joining the two control flows. This change makes it easier to follow the state transitions, and control what happens when.
With the new loop we modify the overly conservative switching times such that the write-to-read switch allows bank preparation to happen in parallel with the bus turn around. Similarly, the read-to-write switch uses the introduced tRTW constraint.
|
#
10147:3e51a30b8071 |
|
23-Mar-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Track DRAM read/write switching and add hysteresis
This patch adds stats for tracking the number of reads/writes per bus turn around, and also adds hysteresis to the write-to-read switching to ensure that the queue does not oscilate around the low threshold.
|
#
10146:27dfed4c8403 |
|
23-Mar-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
mem: Rename SimpleDRAM to a more suitable DRAMCtrl
This patch renames the not-so-simple SimpleDRAM to a more suitable DRAMCtrl. The name change is intended to ensure that we do not send the wrong message (although the "simple" in SimpleDRAM was originally intended as in cleverly simple, or elegant).
As the DRAM controller modelling work is being presented at ISPASS'14 our hope is that a broader audience will use the model in the future.
|