Searched hist:2015 (Results 1026 - 1050 of 1505) sorted by relevance
/gem5/src/mem/ | ||
H A D | CommMonitor.py | 10996:d48fda705f4d Tue Aug 04 05:29:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> mem: Move trace functionality from the CommMonitor to a probe This changeset moves the access trace functionality from the CommMonitor into a separate probe. The probe can be hooked up to any component that exports probe points of the type ProbePoints::Packet. This patch moves the dependency on Google's Protocol Buffers library from the CommMonitor to the MemTraceProbe, which means that the CommMonitor (including stack distance profiling) no long depends on it. 10995:a114e2712642 Tue Aug 04 05:29:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> mem: Redesign the stack distance calculator as a probe This changeset removes the stack distance calculator hooks from the CommMonitor class and implements a stack distance calculator as a memory system probe instead. The probe can be hooked up to any component that exports probe points of the type ProbePoints::Packet. |
H A D | mem_checker_monitor.cc | 11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. 10713:eddb533708cb Mon Mar 02 04:00:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Split port retry for all different packet classes This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks. |
H A D | external_slave.cc | 10713:eddb533708cb Mon Mar 02 04:00:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Split port retry for all different packet classes This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks. 10694:1a6785e37d81 Wed Feb 11 10:23:00 EST 2015 Marco Balboni <Marco.Balboni@ARM.com> mem: Clarification of packet crossbar timings This patch clarifies the packet timings annotated when going through a crossbar. The old 'firstWordDelay' is replaced by 'headerDelay' that represents the delay associated to the delivery of the header of the packet. The old 'lastWordDelay' is replaced by 'payloadDelay' that represents the delay needed to processing the payload of the packet. For now the uses and values remain identical. However, going forward the payloadDelay will be additive, and not include the headerDelay. Follow-on patches will make the headerDelay capture the pipeline latency incurred in the crossbar, whereas the payloadDelay will capture the additional serialisation delay. |
H A D | serial_link.cc | 11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. 11185:0ff78be3bc67 Tue Nov 03 01:17:00 EST 2015 Erfan Azarkhish <erfan.azarkhish@unibo.it> mem: hmc: serial link model This changeset adds a serial link model for the Hybrid Memory Cube (HMC). SerialLink is a simple variation of the Bridge class, with the ability to account for the latency of packet serialization. Also trySendTiming has been modified to correctly model bandwidth. Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
H A D | DRAMCtrl.py | 11186:2d1d51615e0e Tue Nov 03 01:17:00 EST 2015 Erfan Azarkhish <erfan.azarkhish@unibo.it> mem: hmc: minor fixes This patch performs two minor fixes to DRAMCtrl.py and xbar.hh in favor of the HMC patch series. Committed by: Nilay Vaish <nilay@cs.wisc.edu> 11120:eef83ecab5bf Tue Sep 22 14:17:00 EDT 2015 Wendy Elsasser <wendy.elsasser@arm.com> mem: Add initial HBM configurations Created the following HBM configurations: 1) HBM gen1 (x128/CH), 2Gb die, 4H stack, 1Gbps, 8 channels 2) HBM gen2 (x64/PC), 8Gb die, 4H stack, 1Gbps, 16 pseudo-channels The configuration values are based on: - The HBM gen1 public JEDEC spec - Publically released data from MemCon presentations - Timing extrapolated from existing LPDDR configurations Will adjust once specs become available. 10891:d958fc5f4a00 Fri Jul 03 10:14:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Increase the default buffer sizes for the DDR4 controller This patch increases the default read/write buffer sizes for the DDR4 controller config to values that are more suitable for the high bandwidth and high bank count. 10864:83cec4049505 Sun Jun 07 15:02:00 EDT 2015 Matthias Jung <jungma@eit.uni-kl.de> mem: Add HMC Timing Parameters A single HMC-2500 x32 model based on: [1] DRAMSpec: a high-level DRAM bank modelling tool developed at the University of Kaiserslautern. This high level tool uses RC (resistance-capacitance) and CV (capacitance-voltage) models to estimate the DRAM bank latency and power numbers. [2] A Logic-base Interconnect for Supporting Near Memory Computation in the Hybrid Memory Cube (E. Azarkhish et. al) Assumed for the HMC model is a 30 nm technology node. The modelled HMC consists of a 4 Gbit part with 4 layers connected with TSVs. Each layer has 16 vaults and each vault consists of 2 banks per layer. In order to be able to use the same controller used for 2D DRAM generations for HMC, the following analogy is done: Channel (DDR) => Vault (HMC) device_size (DDR) => size of a single layer in a vault ranks per channel (DDR) => number of layers banks per rank (DDR) => banks per layer devices per rank (DDR) => devices per layer ( 1 for HMC). The parameters for which no input is available are inherited from the DDR3 configuration. 10675:bb7cd7193edc Tue Feb 03 14:25:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> config: Adjust DRAM channel interleaving defaults This patch changes the DRAM channel interleaving default behaviour to be more representative. The default address mapping (RoRaBaCoCh) moves the channel bits towards the least significant bits, and uses 128 byte as the default channel interleaving granularity. These defaults can be overridden if desired, but should serve as a sensible starting point for most use-cases. |
/gem5/src/mem/cache/ | ||
H A D | mshr.hh | 11357:6668387fa488 Mon Aug 10 06:25:00 EDT 2015 Stephan Diestelhorst <stephan.diestelhorst@arm.com> mem, cpu: Add assertions to snoop invalidation logic This patch adds assertions that enforce that only invalidating snoops will ever reach into the logic that tracks in-order load completion and also invalidation of LL/SC (and MONITOR / MWAIT) monitors. Also adds some comments to MSHR::replaceUpgrades(). 11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. 11278:18411ccc4f3c Mon Dec 28 11:14:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Remove unused cache squash functionality This patch removes the unused squash function from the MSHR queue, and the associated (and also unused) threadNum member from the MSHR. 11197:f8fdd931e674 Fri Nov 06 03:26:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Add cache clusivity This patch adds a parameter to control the cache clusivity, that is if the cache is mostly inclusive or exclusive. At the moment there is no intention to support strict policies, and thus the options are: 1) mostly inclusive, or 2) mostly exclusive. The choice of policy guides the behaviuor on a cache fill, and a new helper function, allocOnFill, is created to encapsulate the decision making process. For the timing mode, the decision is annotated on the MSHR on sending out the downstream packet, and in atomic we directly pass the decision to handleFill. We (ab)use the tempBlock in cases where we are not allocating on fill, leaving the rest of the cache unaffected. Simple and effective. This patch also makes it more explicit that multiple caches are allowed to consider a block writable (this is the case also before this patch). That is, for a mostly inclusive cache, multiple caches upstream may also consider the block exclusive. The caches considering the block writable/exclusive all appear along the same path to memory, and from a coherency protocol point of view it works due to the fact that we always snoop upwards in zero time before querying any downstream cache. Note that this patch does not introduce clean writebacks. Thus, for clean lines we are essentially removing a cache level if it is made mostly exclusive. For example, lines from the read-only L1 instruction cache or table-walker cache are always clean, and simply get dropped rather than being passed to the L2. If the L2 is mostly exclusive and does not allocate on fill it will thus never hold the line. A follow on patch adds the clean writebacks. The patch changes the L2 of the O3_ARM_v7a CPU configuration to be mostly exclusive (and stats are affected accordingly). 11177:524c44cf8278 Thu Oct 29 08:48:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Clarify cache MSHR handling on fill This patch addresses the upgrading of deferred targets in the MSHR, and makes it clearer by explicitly calling out what is happening (deferred targets are promoted if we get exclusivity without asking for it). 10766:b2071d0eb5f1 Fri Mar 27 04:55:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Modernise MSHR iterators to C++11 This patch updates the iterators in the MSHR and MSHR queues to use C++11 range-based for loops. It also does a bit of additional house keeping. 10764:b32578b2af99 Fri Mar 27 04:55:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Align all MSHR entries to block boundaries This patch aligns all MSHR queue entries to block boundaries to simplify checks for matches. Previously there were corner cases that could lead to existing entries not being identified as matches. There are, rather alarmingly, a few regressions that change with this patch. 10679:204a0f53035e Tue Feb 03 14:25:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Clarify cache behaviour for pending dirty responses This patch adds a bit of clarification around the assumptions made in the cache when packets are sent out, and dirty responses are pending. As part of the change, the marking of an MSHR as in service is simplified slightly, and comments are added to explain what assumptions are made. |
H A D | cache.hh | 11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. 11276:3561d002d8c7 Mon Dec 28 11:14:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Do not use sender state to track forwarded snoops in cache This patch changes how the cache tracks which snoops are forwarded, and which ones are created locally. Previously the identification was based on an empty sender state of a specific class, but this method fails to distinguish which cache actually attached the sender state. Instead we use the same mechanism as the crossbar, and keep track of the requests that have outstanding snoops. 11211:4e70e13c1a2c Sun Nov 15 16:28:00 EST 2015 Andreas Sandberg <andreas.sandberg@arm.com> arm: Add missing explicit overrides for classic caches Make clang when compiling on OSX. 11199:929fd978ab4e Fri Nov 06 03:26:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Add an option to perform clean writebacks from caches This patch adds the necessary commands and cache functionality to allow clean writebacks. This functionality is crucial, especially when having exclusive (victim) caches. For example, if read-only L1 instruction caches are not sending clean writebacks, there will never be any spills from the L1 to the L2. At the moment the cache model defaults to not sending clean writebacks, and this should possibly be re-evaluated. The implementation of clean writebacks relies on a new packet command WritebackClean, which acts much like a Writeback (renamed WritebackDirty), and also much like a CleanEvict. On eviction of a clean block the cache either sends a clean evict, or a clean writeback, and if any copies are still cached upstream the clean evict/writeback is dropped. Similarly, if a clean evict/writeback reaches a cache where there are outstanding MSHRs for the block, the packet is dropped. In the typical case though, the clean writeback allocates a block in the downstream cache, and marks it writable if the evicted block was writable. The patch changes the O3_ARM_v7a L1 cache configuration and the default L1 caches in config/common/Caches.py 11197:f8fdd931e674 Fri Nov 06 03:26:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Add cache clusivity This patch adds a parameter to control the cache clusivity, that is if the cache is mostly inclusive or exclusive. At the moment there is no intention to support strict policies, and thus the options are: 1) mostly inclusive, or 2) mostly exclusive. The choice of policy guides the behaviuor on a cache fill, and a new helper function, allocOnFill, is created to encapsulate the decision making process. For the timing mode, the decision is annotated on the MSHR on sending out the downstream packet, and in atomic we directly pass the decision to handleFill. We (ab)use the tempBlock in cases where we are not allocating on fill, leaving the rest of the cache unaffected. Simple and effective. This patch also makes it more explicit that multiple caches are allowed to consider a block writable (this is the case also before this patch). That is, for a mostly inclusive cache, multiple caches upstream may also consider the block exclusive. The caches considering the block writable/exclusive all appear along the same path to memory, and from a coherency protocol point of view it works due to the fact that we always snoop upwards in zero time before querying any downstream cache. Note that this patch does not introduce clean writebacks. Thus, for clean lines we are essentially removing a cache level if it is made mostly exclusive. For example, lines from the read-only L1 instruction cache or table-walker cache are always clean, and simply get dropped rather than being passed to the L2. If the L2 is mostly exclusive and does not allocate on fill it will thus never hold the line. A follow on patch adds the clean writebacks. The patch changes the L2 of the O3_ARM_v7a CPU configuration to be mostly exclusive (and stats are affected accordingly). 11190:0964165d1857 Fri Nov 06 03:26:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Unify delayed packet deletion This patch unifies how we deal with delayed packet deletion, where the receiving slave is responsible for deleting the packet, but the sending agent (e.g. a cache) is still relying on the pointer until the call to sendTimingReq completes. Previously we used a mix of a deletion vector and a construct using unique_ptr. With this patch we ensure all slaves use the latter approach. 11169:44b5c183c3cd Mon Oct 12 04:08:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Add explicit overrides and fix other clang >= 3.5 issues This patch adds explicit overrides as this is now required when using "-Wall" with clang >= 3.5, the latter now part of the most recent XCode. The patch consequently removes "virtual" for those methods where "override" is added. The latter should be enough of an indication. As part of this patch, a few minor issues that clang >= 3.5 complains about are also resolved (unused methods and variables). 11168:f98eb2da15a4 Mon Oct 12 04:07:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Remove redundant compiler-specific defines This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions. 11130:45a23e44e93d Fri Sep 25 07:26:00 EDT 2015 Ali Jafri <ali.jafri@arm.com> mem: Add snoops for CleanEvicts and Writebacks in atomic mode This patch mirrors the logic in timing mode which sends up snoops to check for cached copies before sending CleanEvicts and Writebacks down the memory hierarchy. In case there is a copy in a cache above, discard CleanEvicts and set the BLOCK_CACHED flag in Writebacks so that writebacks do not reset the cache residency bit in the snoop filter below. 11127:f39c2cc0d44e Fri Sep 25 07:13:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make the coherent crossbar account for timing snoops This patch introduces the concept of a snoop latency. Given the requirement to snoop and forward packets in zero time (due to the coherency mechanism), the latency is accounted for later. On a snoop, we establish the latency, and later add it to the header delay of the packet. To allow multiple caches to contribute to the snoop latency, we use a separate variable in the packet, and then take the maximum before adding it to the header delay. |
H A D | base.hh | 13717:11e81e2a98bd Mon Dec 03 17:03:00 EST 2018 Ivan Pizarro <ivan.pizarro@metempsy.com> mem-cache: A Best-Offset Prefetcher Michaud, P. (2015, June). A best-offset prefetcher. In 2nd Data Prefetching Championship. Change-Id: I61bb89ca5639356d54aeb04e856d5bf6e8805c22 Reviewed-on: https://gem5-review.googlesource.com/c/14820 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> 11436:f351b7f248db Wed May 27 08:50:00 EDT 2015 Rekai Gonzalez Alberquilla <Rekai.GonzalezAlberquilla@arm.com> mem: Add unused prefetch counter in caches Added stat to the cache to account for HardPF'ed blocks that are evicted before being referenced (over-prefetching). 11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. 11199:929fd978ab4e Fri Nov 06 03:26:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Add an option to perform clean writebacks from caches This patch adds the necessary commands and cache functionality to allow clean writebacks. This functionality is crucial, especially when having exclusive (victim) caches. For example, if read-only L1 instruction caches are not sending clean writebacks, there will never be any spills from the L1 to the L2. At the moment the cache model defaults to not sending clean writebacks, and this should possibly be re-evaluated. The implementation of clean writebacks relies on a new packet command WritebackClean, which acts much like a Writeback (renamed WritebackDirty), and also much like a CleanEvict. On eviction of a clean block the cache either sends a clean evict, or a clean writeback, and if any copies are still cached upstream the clean evict/writeback is dropped. Similarly, if a clean evict/writeback reaches a cache where there are outstanding MSHRs for the block, the packet is dropped. In the typical case though, the clean writeback allocates a block in the downstream cache, and marks it writable if the evicted block was writable. The patch changes the O3_ARM_v7a L1 cache configuration and the default L1 caches in config/common/Caches.py 11197:f8fdd931e674 Fri Nov 06 03:26:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Add cache clusivity This patch adds a parameter to control the cache clusivity, that is if the cache is mostly inclusive or exclusive. At the moment there is no intention to support strict policies, and thus the options are: 1) mostly inclusive, or 2) mostly exclusive. The choice of policy guides the behaviuor on a cache fill, and a new helper function, allocOnFill, is created to encapsulate the decision making process. For the timing mode, the decision is annotated on the MSHR on sending out the downstream packet, and in atomic we directly pass the decision to handleFill. We (ab)use the tempBlock in cases where we are not allocating on fill, leaving the rest of the cache unaffected. Simple and effective. This patch also makes it more explicit that multiple caches are allowed to consider a block writable (this is the case also before this patch). That is, for a mostly inclusive cache, multiple caches upstream may also consider the block exclusive. The caches considering the block writable/exclusive all appear along the same path to memory, and from a coherency protocol point of view it works due to the fact that we always snoop upwards in zero time before querying any downstream cache. Note that this patch does not introduce clean writebacks. Thus, for clean lines we are essentially removing a cache level if it is made mostly exclusive. For example, lines from the read-only L1 instruction cache or table-walker cache are always clean, and simply get dropped rather than being passed to the L2. If the L2 is mostly exclusive and does not allocate on fill it will thus never hold the line. A follow on patch adds the clean writebacks. The patch changes the L2 of the O3_ARM_v7a CPU configuration to be mostly exclusive (and stats are affected accordingly). 11191:9eabb2bf349b Fri Nov 06 03:26:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Do not treat CleanEvict as a write operation This patch changes the CleanEvict command type to not be considered a write. Initially it was made a zero-sized write to match the writeback command, but as things developed it became clear that it causes more problems than it solves. For example, the memory modules (and bridge) should not consider the CleanEvict as a write, but instead discard it. With this patch it will be neither a read, nor write, and as it does not need a response the slave will simply sink it. 11053:62544e45c0f4 Fri Aug 21 07:03:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Add explicit Cache subclass and make BaseCache abstract Open up for other subclasses to BaseCache and transition to using the explicit Cache subclass. 11051:81b1f46061c8 Fri Aug 21 07:03:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Move cache_impl.hh to cache.cc There is no longer any need to keep the implementation in a header. 10942:224c85495f96 Thu Jul 30 03:41:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Remove unused RequestCause in cache This patch removes the RequestCause, and also simplifies how we schedule the sending of packets through the memory-side port. The deassertion of bus requests is removed as it is not used. 10912:b99a6662d7c2 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Decouple draining from the SimObject hierarchy Draining is currently done by traversing the SimObject graph and calling drain()/drainResume() on the SimObjects. This is not ideal when non-SimObjects (e.g., ports) need draining since this means that SimObjects owning those objects need to be aware of this. This changeset moves the responsibility for finding objects that need draining from SimObjects and the Python-side of the simulator to the DrainManager. The DrainManager now maintains a set of all objects that need draining. To reduce the overhead in classes owning non-SimObjects that need draining, objects inheriting from Drainable now automatically register with the DrainManager. If such an object is destroyed, it is automatically unregistered. This means that drain() and drainResume() should never be called directly on a Drainable object. While implementing the new functionality, the DrainManager has now been made thread safe. In practice, this means that it takes a lock whenever it manipulates the set of Drainable objects since SimObjects in different threads may create Drainable objects dynamically. Similarly, the drain counter is now an atomic_uint, which ensures that it is manipulated correctly when objects signal that they are done draining. A nice side effect of these changes is that it makes the drain state changes stricter, which the simulation scripts can exploit to avoid redundant drains. |
/gem5/src/mem/ruby/common/ | ||
H A D | Set.hh | 11085:f1fe63d949c0 Sat Sep 05 10:34:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> ruby: set: reimplement using std::bitset The current Set data structure is slow and therefore is being reimplemented using std::bitset. A maximum limit of 64 is being set on the number of controllers of each type. This means that for simulating a system with more controllers of a given type, one would need to change the value of the variable NUMBER_BITS_PER_SET 10808:c1694b4032a6 Wed Apr 29 23:35:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> ruby: set: replace long by unsigned long UBSan complains about negative value being shifted |
/gem5/src/arch/arm/ | ||
H A D | pagetable.hh | 11168:f98eb2da15a4 Mon Oct 12 04:07:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Remove redundant compiler-specific defines This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions. 10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically: * Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section. * Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects. * Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections). * The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects. * Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code. |
/gem5/src/cpu/minor/ | ||
H A D | dyn_inst.cc | 10935:acd48ddd725f Tue Jul 28 02:58:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> revert 5af8f40d8f2c 10934:5af8f40d8f2c Sun Jul 26 11:21:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> cpu: implements vector registers This adds a vector register type. The type is defined as a std::array of a fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now. |
/gem5/src/arch/generic/ | ||
H A D | tlb.hh | 11169:44b5c183c3cd Mon Oct 12 04:08:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Add explicit overrides and fix other clang >= 3.5 issues This patch adds explicit overrides as this is now required when using "-Wall" with clang >= 3.5, the latter now part of the most recent XCode. The patch consequently removes "virtual" for those methods where "override" is added. The latter should be enough of an indication. As part of this patch, a few minor issues that clang >= 3.5 complains about are also resolved (unused methods and variables). 10687:276da6265ab8 Wed Feb 11 10:23:00 EST 2015 Andreas Sandberg <Andreas.Sandberg@ARM.com> sim: Move the BaseTLB to src/arch/generic/ The TLB-related code is generally architecture dependent and should live in the arch directory to signify that. |
/gem5/src/dev/x86/ | ||
H A D | cmos.cc | 10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically: * Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section. * Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects. * Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections). * The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects. * Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code. 10631:6d6bfdb036ce Sat Jan 03 18:51:00 EST 2015 Cagdas Dirik <cdirik@micron.com> dev: prevent RTC events firing before startup This change includes edits to MC146818 timer to prevent RTC events firing before startup to comply with SimObject initialization call sequence. Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
/gem5/src/dev/arm/ | ||
H A D | pl111.hh | 11359:b0b976a1ceda Fri Nov 27 09:41:00 EST 2015 Andreas Sandberg <andreas@sandberg.pp.se> base: Add support for changing output directories This changeset adds support for changing the simulator output directory. This can be useful when the simulation goes through several stages (e.g., a warming phase, a simulation phase, and a verification phase) since it allows the output from each stage to be located in a different directory. Relocation is done by calling core.setOutputDir() from Python or simout.setOutputDirectory() from C++. This change affects several parts of the design of the gem5's output subsystem. First, files returned by an OutputDirectory instance (e.g., simout) are of the type OutputStream instead of a std::ostream. This allows us to do some more book keeping and control re-opening of files when the output directory is changed. Second, new subdirectories are OutputDirectory instances, which should be used to create files in that sub-directory. Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se> [sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version] Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> 11174:5a9019db4a08 Fri Oct 23 09:51:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> arm: Add missing explicit overrides for ARM devices Make clang >= 3.5 happy when compiling build/ARM/gem5.opt on OSX. 11168:f98eb2da15a4 Mon Oct 12 04:07:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Remove redundant compiler-specific defines This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions. 10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically: * Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section. * Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects. * Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections). * The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects. * Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code. 10839:10cac0f0f419 Sat May 23 08:37:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> base: Redesign internal frame buffer handling Currently, frame buffer handling in gem5 is quite ad hoc. In practice, we pass around naked pointers to raw pixel data and expect consumers to convert frame buffers using the (broken) VideoConverter. This changeset completely redesigns the way we handle frame buffers internally. In summary, it fixes several color conversion bugs, adds support for more color formats (e.g., big endian), and makes the code base easier to follow. In the new world, gem5 always represents pixel data using the Pixel struct when pixels need to be passed between different classes (e.g., a display controller and the VNC server). Producers of entire frames (e.g., display controllers) should use the FrameBuffer class to represent a frame. Frame producers are expected to create one instance of the FrameBuffer class in their constructors and register it with its consumers once. Consumers are expected to check the dimensions of the frame buffer when they consume it. Conversion between the external representation and the internal representation is supported for all common "true color" RGB formats of up to 32-bit color depth. The external pixel representation is expected to be between 1 and 4 bytes in either big endian or little endian. Color channels are assumed to be contiguous ranges of bits within each pixel word. The external pixel value is scaled to an 8-bit internal representation using a floating multiplication to map it to the entire 8-bit range. |
/gem5/src/mem/ruby/system/ | ||
H A D | RubyPort.hh | 11266:452e10b868ea Mon Jul 20 10:15:00 EDT 2015 Brad Beckmann <Brad.Beckmann@amd.com> ruby: more flexible ruby tester support This patch allows the ruby random tester to use ruby ports that may only support instr or data requests. This patch is similar to a previous changeset (8932:1b2c17565ac8) that was unfortunately broken by subsequent changesets. This current patch implements the support in a more straight-forward way. Since retries are now tested when running the ruby random tester, this patch splits up the retry and drain check behavior so that RubyPort children, such as the GPUCoalescer, can perform those operations correctly without having to duplicate code. Finally, the patch also includes better DPRINTFs for debugging the tester. 11169:44b5c183c3cd Mon Oct 12 04:08:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Add explicit overrides and fix other clang >= 3.5 issues This patch adds explicit overrides as this is now required when using "-Wall" with clang >= 3.5, the latter now part of the most recent XCode. The patch consequently removes "virtual" for those methods where "override" is added. The latter should be enough of an indication. As part of this patch, a few minor issues that clang >= 3.5 complains about are also resolved (unused methods and variables). 11168:f98eb2da15a4 Mon Oct 12 04:07:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Remove redundant compiler-specific defines This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions. 11108:6342ddf6d733 Wed Sep 16 00:03:00 EDT 2015 David Hashe <david.hashe@amd.com> ruby: rename System.{hh,cc} to RubySystem.{hh,cc} The eventual aim of this change is to pass RubySystem pointers through to objects generated from the SLICC protocol code. Because some of these objects need to dereference their RubySystem pointers, they need access to the System.hh header file. In src/mem/ruby/SConscript, the MakeInclude function creates single-line header files in the build directory that do nothing except include the corresponding header file from the source tree. However, SLICC also generates a list of header files from its symbol table, and writes it to mem/protocol/Types.hh in the build directory. This code assumes that the header file name is the same as the class name. The end result of this is the many of the generated slicc files try to include RubySystem.hh, when the file they really need is System.hh. The path of least resistence is just to rename System.hh to RubySystem.hh. 11025:4872dbdea907 Fri Aug 14 01:04:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> ruby: replace Address by Addr This patch eliminates the type Address defined by the ruby memory system. This memory system would now use the type Addr that is in use by the rest of the system. 10919:80069a602c83 Fri Jul 10 17:05:00 EDT 2015 Brandon Potter <brandon.potter@amd.com> ruby: replace global g_system_ptr with per-object pointers This is another step in the process of removing global variables from Ruby to enable multiple RubySystem instances in a single simulation. With possibly multiple RubySystem objects, we can no longer use a global variable to find "the" RubySystem object. Instead, each Ruby component has to carry a pointer to the RubySystem object to which it belongs. 10913:38dbdeea7f1f Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor and simplify the drain API The drain() call currently passes around a DrainManager pointer, which is now completely pointless since there is only ever one global DrainManager in the system. It also contains vestiges from the time when SimObjects had to keep track of their child objects that needed draining. This changeset moves all of the DrainState handling to the Drainable base class and changes the drain() and drainResume() calls to reflect this. Particularly, the drain() call has been updated to take no parameters (the DrainManager argument isn't needed) and return a DrainState instead of an unsigned integer (there is no point returning anything other than 0 or 1 any more). Drainable objects should return either DrainState::Draining (equivalent to returning 1 in the old system) if they need more time to drain or DrainState::Drained (equivalent to returning 0 in the old system) if they are already in a consistent state. Returning DrainState::Running is considered an error. Drain done signalling is now done through the signalDrainDone() method in the Drainable class instead of using the DrainManager directly. The new call checks if the state of the object is DrainState::Draining before notifying the drain manager. This means that it is safe to call signalDrainDone() without first checking if the simulator has requested draining. The intention here is to reduce the code needed to implement draining in simple objects. 10912:b99a6662d7c2 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Decouple draining from the SimObject hierarchy Draining is currently done by traversing the SimObject graph and calling drain()/drainResume() on the SimObjects. This is not ideal when non-SimObjects (e.g., ports) need draining since this means that SimObjects owning those objects need to be aware of this. This changeset moves the responsibility for finding objects that need draining from SimObjects and the Python-side of the simulator to the DrainManager. The DrainManager now maintains a set of all objects that need draining. To reduce the overhead in classes owning non-SimObjects that need draining, objects inheriting from Drainable now automatically register with the DrainManager. If such an object is destroyed, it is automatically unregistered. This means that drain() and drainResume() should never be called directly on a Drainable object. While implementing the new functionality, the DrainManager has now been made thread safe. In practice, this means that it takes a lock whenever it manipulates the set of Drainable objects since SimObjects in different threads may create Drainable objects dynamically. Similarly, the drain counter is now an atomic_uint, which ensures that it is manipulated correctly when objects signal that they are done draining. A nice side effect of these changes is that it makes the drain state changes stricter, which the simulation scripts can exploit to avoid redundant drains. 10875:60eb3fef9c2d Thu Jun 25 12:58:00 EDT 2015 Jason Power <power.jg@gmail.com> Ruby: Remove assert in RubyPort retry list logic Remove the assert when adding a port to the RubyPort retry list. Instead of asserting, just ignore the added port, since it's already on the list. Without this patch, Ruby+detailed fails for even the simplest tests 10713:eddb533708cb Mon Mar 02 04:00:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Split port retry for all different packet classes This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks. |
/gem5/src/dev/ | ||
H A D | dma_device.cc | 11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. 11010:034378be28a2 Fri Aug 07 04:59:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev: Add a simple DMA engine that can be used by devices Add a simple DMA engine that sits behind a FIFO. This engine can be used by devices that need to read large amounts of data (e.g., display controllers). Most aspects of the controller, such as FIFO size, maximum number of in-flight accesses, and maximum request sizes can be configured. The DMA copies blocks of data into its FIFO. Transfers are initiated with a call to startFill() command that takes a start address and a size. Advanced users can create a derived class that overrides the onEndOfBlock() callback that is triggered when the last request to a block has been issued. At this point, the DMA engine is ready to start fetching a new block of data, potentially from a different address range. The DMA engine stops issuing new requests while it is draining. Care must be taken to ensure that devices that are fed by a DMA engine are suspended while the system is draining to avoid buffer underruns. 10913:38dbdeea7f1f Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor and simplify the drain API The drain() call currently passes around a DrainManager pointer, which is now completely pointless since there is only ever one global DrainManager in the system. It also contains vestiges from the time when SimObjects had to keep track of their child objects that needed draining. This changeset moves all of the DrainState handling to the Drainable base class and changes the drain() and drainResume() calls to reflect this. Particularly, the drain() call has been updated to take no parameters (the DrainManager argument isn't needed) and return a DrainState instead of an unsigned integer (there is no point returning anything other than 0 or 1 any more). Drainable objects should return either DrainState::Draining (equivalent to returning 1 in the old system) if they need more time to drain or DrainState::Drained (equivalent to returning 0 in the old system) if they are already in a consistent state. Returning DrainState::Running is considered an error. Drain done signalling is now done through the signalDrainDone() method in the Drainable class instead of using the DrainManager directly. The new call checks if the state of the object is DrainState::Draining before notifying the drain manager. This means that it is safe to call signalDrainDone() without first checking if the simulator has requested draining. The intention here is to reduce the code needed to implement draining in simple objects. 10912:b99a6662d7c2 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Decouple draining from the SimObject hierarchy Draining is currently done by traversing the SimObject graph and calling drain()/drainResume() on the SimObjects. This is not ideal when non-SimObjects (e.g., ports) need draining since this means that SimObjects owning those objects need to be aware of this. This changeset moves the responsibility for finding objects that need draining from SimObjects and the Python-side of the simulator to the DrainManager. The DrainManager now maintains a set of all objects that need draining. To reduce the overhead in classes owning non-SimObjects that need draining, objects inheriting from Drainable now automatically register with the DrainManager. If such an object is destroyed, it is automatically unregistered. This means that drain() and drainResume() should never be called directly on a Drainable object. While implementing the new functionality, the DrainManager has now been made thread safe. In practice, this means that it takes a lock whenever it manipulates the set of Drainable objects since SimObjects in different threads may create Drainable objects dynamically. Similarly, the drain counter is now an atomic_uint, which ensures that it is manipulated correctly when objects signal that they are done draining. A nice side effect of these changes is that it makes the drain state changes stricter, which the simulation scripts can exploit to avoid redundant drains. 10910:32f3d1c454ec Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Make the drain state a global typed enum The drain state enum is currently a part of the Drainable interface. The same state machine will be used by the DrainManager to identify the global state of the simulator. Make the drain state a global typed enum to better cater for this usage scenario. 10821:581fb2484bd6 Tue May 05 03:22:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Snoop into caches on uncacheable accesses This patch takes a last step in fixing issues related to uncacheable accesses. We do not separate uncacheable memory from uncacheable devices, and in cases where it is really memory, there are valid scenarios where we need to snoop since we do not support cache maintenance instructions (yet). On snooping an uncacheable access we thus provide data if possible. In essence this makes uncacheable accesses IO coherent. The snoop filter is also queried to steer the snoops, but not updated since the uncacheable accesses do not allocate a block. 10713:eddb533708cb Mon Mar 02 04:00:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Split port retry for all different packet classes This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks. |
/gem5/src/cpu/kvm/ | ||
H A D | base.hh | 11363:f3f72c0ab03e Fri Nov 27 09:52:00 EST 2015 Andreas Sandberg <andreas@sandberg.pp.se> kvm: Shutdown KVM and disconnect performance counters on fork We can't/shouldn't use KVM after a fork since the child and parent probably point to the same VM. Knowing the exact effects of this is hard, but they are likely to be messy. We also disconnect the performance counters attached to the guest. This works around what seems to be a kernel bug where spurious SIGIOs get delivered to the forked child process. Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se> [sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version] Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> [andreas.sandberg@arm.com: Fatal if entering KVM in child process ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> 11168:f98eb2da15a4 Mon Oct 12 04:07:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Remove redundant compiler-specific defines This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions. 11151:ca4ea9b5c052 Wed Sep 30 12:14:00 EDT 2015 Mitch Hayenga <mitch.hayenga@arm.com> cpu,isa,mem: Add per-thread wakeup logic Changes wakeup functionality so that only specific threads on SMT capable cpus are woken. 10913:38dbdeea7f1f Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor and simplify the drain API The drain() call currently passes around a DrainManager pointer, which is now completely pointless since there is only ever one global DrainManager in the system. It also contains vestiges from the time when SimObjects had to keep track of their child objects that needed draining. This changeset moves all of the DrainState handling to the Drainable base class and changes the drain() and drainResume() calls to reflect this. Particularly, the drain() call has been updated to take no parameters (the DrainManager argument isn't needed) and return a DrainState instead of an unsigned integer (there is no point returning anything other than 0 or 1 any more). Drainable objects should return either DrainState::Draining (equivalent to returning 1 in the old system) if they need more time to drain or DrainState::Drained (equivalent to returning 0 in the old system) if they are already in a consistent state. Returning DrainState::Running is considered an error. Drain done signalling is now done through the signalDrainDone() method in the Drainable class instead of using the DrainManager directly. The new call checks if the state of the object is DrainState::Draining before notifying the drain manager. This means that it is safe to call signalDrainDone() without first checking if the simulator has requested draining. The intention here is to reduce the code needed to implement draining in simple objects. 10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically: * Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section. * Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects. * Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections). * The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects. * Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code. 10713:eddb533708cb Mon Mar 02 04:00:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Split port retry for all different packet classes This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks. 10653:e3fc6bc7f97e Thu Jan 22 05:00:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Clean up Request initialisation This patch tidies up how we create and set the fields of a Request. In essence it tries to use the constructor where possible (as opposed to setPhys and setVirt), thus avoiding spreading the information across a number of locations. In fact, setPhys is made private as part of this patch, and a number of places where we callede setVirt instead uses the appropriate constructor. |
/gem5/src/arch/x86/ | ||
H A D | faults.hh | 11218:d135bc832ffe Mon Nov 16 06:08:00 EST 2015 Swapnil Haria <swapnilh@cs.wisc.edu> x86: Invalidating TLB entry on page fault As per the x86 architecture specification, matching TLB entries need to be invalidated on a page fault. For instance, after a page fault due to inadequate protection bits on a TLB hit, the TLB entry needs to be invalidated. This behavior is clearly specified in the x86 architecture manuals from both AMD and Intel. This invalidation is missing currently in gem5, due to which linux kernel versions 3.8 and up cannot be simulated efficiently. This is exposed by a linux optimisation in commit e4a1cc56e4d728eb87072c71c07581524e5160b1, which removes a tlb flush on updating page table entries in x86. Testing: Linux kernel versions 3.8 onwards were booting very slowly in FS mode, due to repeated page faults (~300000 before the first print statement in a bash file). Ensured that page fault rate drops drastically and observed reduction in boot time from order of hours to minutes for linux kernel v3.8 and v3.11 10805:f2c472d4ff9c Wed Apr 29 23:35:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> x86: change divide-by-zero fault to divide-error Same exception is raised whether division with zero is performed or the quotient is greater than the maximum value that the provided space can hold. Divide-by-Zero is the AMD terminology, while Divide-Error is Intel's. 10687:276da6265ab8 Wed Feb 11 10:23:00 EST 2015 Andreas Sandberg <Andreas.Sandberg@ARM.com> sim: Move the BaseTLB to src/arch/generic/ The TLB-related code is generally architecture dependent and should live in the arch directory to signify that. |
H A D | isa.cc | 10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically: * Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section. * Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects. * Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections). * The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects. * Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code. 10899:b8b8ad2c72dd Sat Jul 04 11:43:00 EDT 2015 Nikos Nikoleris <nikos.nikoleris@gmail.com> x86: Adjust the size of the values written to the x87 misc registers All x87 misc registers are implemented in an array of 64 bit values but in real hardware the size of some of these registers is smaller. Previsouly all 64 bits where incorrectly set and then later read. To ensure correctness we mask the value in setMiscRegNoEffect to write only the valid bits. Committed by: Nilay Vaish <nilay@cs.wisc.edu> 10698:829adc48e175 Mon Feb 16 03:33:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> arch: Make readMiscRegNoEffect const throughout Finally took the plunge and made this apply to all ISAs, not just ARM. |
/gem5/src/arch/mips/ | ||
H A D | tlb.hh | 11168:f98eb2da15a4 Mon Oct 12 04:07:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Remove redundant compiler-specific defines This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions. 10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically: * Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section. * Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects. * Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections). * The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects. * Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code. 10687:276da6265ab8 Wed Feb 11 10:23:00 EST 2015 Andreas Sandberg <Andreas.Sandberg@ARM.com> sim: Move the BaseTLB to src/arch/generic/ The TLB-related code is generally architecture dependent and should live in the arch directory to signify that. |
H A D | isa.hh | 10935:acd48ddd725f Tue Jul 28 02:58:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> revert 5af8f40d8f2c 10934:5af8f40d8f2c Sun Jul 26 11:21:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> cpu: implements vector registers This adds a vector register type. The type is defined as a std::array of a fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now. 10698:829adc48e175 Mon Feb 16 03:33:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> arch: Make readMiscRegNoEffect const throughout Finally took the plunge and made this apply to all ISAs, not just ARM. |
/gem5/src/dev/alpha/ | ||
H A D | tsunami_io.cc | 10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically: * Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section. * Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects. * Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections). * The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects. * Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code. 10642:9d3b6e7dd205 Tue Jan 06 17:10:00 EST 2015 cdirik<cdirik@micron.com> dev: prevent intel 8254 timer counter events firing before startup This change includes edits to Intel8254Timer to prevent counter events firing before startup to comply with SimObject initialization call sequence. Committed by: Nilay Vaish <nilay@cs.wisc.edu> 10631:6d6bfdb036ce Sat Jan 03 18:51:00 EST 2015 Cagdas Dirik <cdirik@micron.com> dev: prevent RTC events firing before startup This change includes edits to MC146818 timer to prevent RTC events firing before startup to comply with SimObject initialization call sequence. Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
/gem5/src/arch/alpha/ | ||
H A D | process.hh | 11169:44b5c183c3cd Mon Oct 12 04:08:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Add explicit overrides and fix other clang >= 3.5 issues This patch adds explicit overrides as this is now required when using "-Wall" with clang >= 3.5, the latter now part of the most recent XCode. The patch consequently removes "virtual" for those methods where "override" is added. The latter should be enough of an indication. As part of this patch, a few minor issues that clang >= 3.5 complains about are also resolved (unused methods and variables). 11168:f98eb2da15a4 Mon Oct 12 04:07:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Remove redundant compiler-specific defines This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions. 10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically: * Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section. * Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects. * Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections). * The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects. * Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code. |
/gem5/src/cpu/o3/probe/ | ||
H A D | elastic_trace.cc | 11253:daf9f91b11e9 Mon Dec 07 17:42:00 EST 2015 Radhika Jagtap <radhika.jagtap@ARM.com> cpu: Support virtual addr in elastic traces This patch adds support to optionally capture the virtual address and asid for load/store instructions in the elastic traces. If they are present in the traces, Trace CPU will set those fields of the request during replay. 11252:18bb597fc40c Mon Dec 07 17:42:00 EST 2015 Radhika Jagtap <radhika.jagtap@ARM.com> cpu: Create record type enum for elastic traces This patch replaces the booleans that specified the elastic trace record type with an enum type. The source of change is the proto message for elastic trace where the enum is introduced. The struct definitions in the elastic trace probe listener as well as the Trace CPU replace the boleans with the proto message enum. The patch does not impact functionality, but traces are not compatible with previous version. This is preparation for adding new types of records in subsequent patches. 11247:76f75db08e09 Mon Dec 07 17:42:00 EST 2015 Radhika Jagtap <radhika.jagtap@ARM.com> proto, probe: Add elastic trace probe to o3 cpu The elastic trace is a type of probe listener and listens to probe points in multiple stages of the O3CPU. The notify method is called on a probe point typically when an instruction successfully progresses through that stage. As different listener methods mapped to the different probe points execute, relevant information about the instruction, e.g. timestamps and register accesses, are captured and stored in temporary InstExecInfo class objects. When the instruction progresses through the commit stage, the timing and the dependency information about the instruction is finalised and encapsulated in a struct called TraceInfo. TraceInfo objects are collected in a list instead of writing them out to the trace file one a time. This is required as the trace is processed in chunks to evaluate order dependencies and computational delay in case an instruction does not have any register dependencies. By this we achieve a simpler algorithm during replay because every record in the trace can be hooked onto a record in its past. The instruction dependency trace is written out as a protobuf format file. A second trace containing fetch requests at absolute timestamps is written to a separate protobuf format file. If the instruction is not executed then it is not added to the trace. The code checks if the instruction had a fault, if it predicated false and thus previous register values were restored or if it was a load/store that did not have a request (e.g. when the size of the request is zero). In all these cases the instruction is set as executed by the Execute stage and is picked up by the commit probe listener. But a request is not issued and registers are not written. So practically, skipping these should not hurt the dependency modelling. If squashing results in squashing younger instructions, it may happen that the squash probe discards the inst and removes it from the temporary store but execute stage deals with the instruction in the next cycle which results in the execute probe seeing this inst as 'new' inst. A sequence number of the last processed trace record is used to trap these cases and not add to the temporary store. The elastic instruction trace and fetch request trace can be read in and played back by the TraceCPU. |
/gem5/src/cpu/o3/ | ||
H A D | rename_map.hh | 10935:acd48ddd725f Tue Jul 28 02:58:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> revert 5af8f40d8f2c 10934:5af8f40d8f2c Sun Jul 26 11:21:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> cpu: implements vector registers This adds a vector register type. The type is defined as a std::array of a fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now. 10715:ced453290507 Mon Mar 02 04:00:00 EST 2015 Rekai <Rekai.GonzalezAlberquilla@arm.com> cpu: o3 register renaming request handling improved Now, prior to the renaming, the instruction requests the exact amount of registers it will need, and the rename_map decides whether the instruction is allowed to proceed or not. |
Completed in 216 milliseconds