Searched hist:2015 (Results 1251 - 1275 of 1505) sorted by relevance

<<51525354555657585960>>

/gem5/src/arch/mips/
H A Dremote_gdb.cc11274:d9a0136ab8cc Fri Dec 18 16:12:00 EST 2015 Boris Shingarov <shingarov@labware.com> arm: remote GDB: rationalize structure of register offsets

Currently, the wire format of register values in g- and G-packets is
modelled using a union of uint8/16/32/64 arrays. The offset positions
of each register are expressed as a "register count" scaled according
to the width of the register in question. This results in counter-
intuitive and error-prone "register count arithmetic", and some
formats would even be altogether unrepresentable in such model, e.g.
a 64-bit register following a 32-bit one would have a fractional index
in the regs64 array.
Another difficulty is that the array is allocated before the actual
architecture of the workload is known (and therefore before the correct
size for the array can be calculated).

With this patch I propose a simpler mechanism for expressing the
register set structure. In the new code, GdbRegCache is an abstract
class; its subclasses contain straightforward structs reflecting the
register representation. The determination whether to use e.g. the
AArch32 vs. AArch64 register set (or SPARCv8 vs SPARCv9, etc.) is made
by polymorphically dispatching getregs() to the concrete subclass.
The subclass is not instantiated until it is needed for actual
g-/G-packet processing, when the mode is already known.

This patch is not meant to be merged in on its own, because it changes
the contract between src/base/remote_gdb.* and src/arch/*/remote_gdb.*,
so as it stands right now, it would break the other architectures.
In this patch only the base and the ARM code are provided for review;
once we agree on the structure, I will provide src/arch/*/remote_gdb.*
for the other architectures; those patches could then be merged in
together.

Review Request: http://reviews.gem5.org/r/3207/
Pushed by Joel Hestness <jthestness@gmail.com>
/gem5/src/arch/x86/
H A Dtlb.hh11175:2324ed5fa9f4 Fri Oct 23 09:51:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> x86: Add missing explicit overrides for X86 devices

Make clang >= 3.5 happy when compiling build/X86/gem5.opt on OSX.
11168:f98eb2da15a4 Mon Oct 12 04:07:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Remove redundant compiler-specific defines

This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap
(and similar) abstractions, as these are no longer needed with gcc 4.7
and clang 3.1 as minimum compiler versions.
10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class

Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.

* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.

* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).

* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.

* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.
10687:276da6265ab8 Wed Feb 11 10:23:00 EST 2015 Andreas Sandberg <Andreas.Sandberg@ARM.com> sim: Move the BaseTLB to src/arch/generic/

The TLB-related code is generally architecture dependent and should
live in the arch directory to signify that.
H A Dpseudo_inst.cc11877:5ea85692a53e Mon Jul 20 10:15:00 EDT 2015 Brandon Potter <brandon.potter@amd.com> syscall_emul: [patch 13/22] add system call retry capability

This changeset adds functionality that allows system calls to retry without
affecting thread context state such as the program counter or register values
for the associated thread context (when system calls return with a retry
fault).

This functionality is needed to solve problems with blocking system calls
in multi-process or multi-threaded simulations where information is passed
between processes/threads. Blocking system calls can cause deadlock because
the simulator itself is single threaded. There is only a single thread
servicing the event queue which can cause deadlock if the thread hits a
blocking system call instruction.

To illustrate the problem, consider two processes using the producer/consumer
sharing model. The processes can use file descriptors and the read and write
calls to pass information to one another. If the consumer calls the blocking
read system call before the producer has produced anything, the call will
block the event queue (while executing the system call instruction) and
deadlock the simulation.

The solution implemented in this changeset is to recognize that the system
calls will block and then generate a special retry fault. The fault will
be sent back up through the function call chain until it is exposed to the
cpu model's pipeline where the fault becomes visible. The fault will trigger
the cpu model to replay the instruction at a future tick where the call has
a chance to succeed without actually going into a blocking state.

In subsequent patches, we recognize that a syscall will block by calling a
non-blocking poll (from inside the system call implementation) and checking
for events. When events show up during the poll, it signifies that the call
would not have blocked and the syscall is allowed to proceed (calling an
underlying host system call if necessary). If no events are returned from the
poll, we generate the fault and try the instruction for the thread context
at a distant tick. Note that retrying every tick is not efficient.

As an aside, the simulator has some multi-threading support for the event
queue, but it is not used by default and needs work. Even if the event queue
was completely multi-threaded, meaning that there is a hardware thread on
the host servicing a single simulator thread contexts with a 1:1 mapping
between them, it's still possible to run into deadlock due to the event queue
barriers on quantum boundaries. The solution of replaying at a later tick
is the simplest solution and solves the problem generally.
/gem5/src/cpu/
H A DFuncUnit.py10807:dac26eb4cb64 Wed Apr 29 23:35:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> cpu: o3: replace issueLatency with bool pipelined

Currently, each op class has a parameter issueLat that denotes the cycles after
which another op of the same class can be issued. As of now, this latency can
either be one cycle (fully pipelined) or same as execution latency of the op
(not at all pipelined). The fact that issueLat is a parameter of type Cycles
makes one believe that it can be set to any value. To avoid the confusion, the
parameter is being renamed as 'pipelined' with type boolean. If set to true,
the op would execute in a fully pipelined fashion. Otherwise, it would execute
in an unpipelined fashion.
H A Dop_class.hh10814:46b6043bd32c Tue May 05 03:22:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> cpu: Work around gcc 4.9 issues with Num_OpClasses

This patch fixes a recent issue with gcc 4.9 (and possibly more) being
convinced that indices outside the array bounds are used when
initialising the FUPool members.
/gem5/src/cpu/minor/
H A DMinorCPU.py10785:f56c10663a01 Mon Apr 13 18:33:00 EDT 2015 Dibakar Gope <gope@wisc.edu> cpu: re-organizes the branch predictor structure.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
H A Dpipeline.cc10913:38dbdeea7f1f Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor and simplify the drain API

The drain() call currently passes around a DrainManager pointer, which
is now completely pointless since there is only ever one global
DrainManager in the system. It also contains vestiges from the time
when SimObjects had to keep track of their child objects that needed
draining.

This changeset moves all of the DrainState handling to the Drainable
base class and changes the drain() and drainResume() calls to reflect
this. Particularly, the drain() call has been updated to take no
parameters (the DrainManager argument isn't needed) and return a
DrainState instead of an unsigned integer (there is no point returning
anything other than 0 or 1 any more). Drainable objects should return
either DrainState::Draining (equivalent to returning 1 in the old
system) if they need more time to drain or DrainState::Drained
(equivalent to returning 0 in the old system) if they are already in a
consistent state. Returning DrainState::Running is considered an
error.

Drain done signalling is now done through the signalDrainDone() method
in the Drainable class instead of using the DrainManager directly. The
new call checks if the state of the object is DrainState::Draining
before notifying the drain manager. This means that it is safe to call
signalDrainDone() without first checking if the simulator has
requested draining. The intention here is to reduce the code needed to
implement draining in simple objects.
/gem5/src/cpu/trace/
H A DTraceCPU.py11249:0733a1c08600 Mon Dec 07 17:42:00 EST 2015 Radhika Jagtap <radhika.jagtap@ARM.com> cpu: Add TraceCPU to playback elastic traces

This patch defines a TraceCPU that replays trace generated using the elastic
trace probe attached to the O3 CPU model. The elastic trace is an execution
trace with data dependencies and ordering dependencies annoted to it. It also
replays fixed timestamp instruction fetch trace that is also generated by the
elastic trace probe.

The TraceCPU inherits from BaseCPU as a result of which some methods need
to be defined. It has two port subclasses inherited from MasterPort for
instruction and data ports. It issues the memory requests deducing the
timing from the trace and without performing real execution of micro-ops.
As soon as the last dependency for an instruction is complete,
its computational delay, also provided in the input trace is added. The
dependency-free nodes are maintained in a list, called 'ReadyList',
ordered by ready time. Instructions which depend on load stall until the
responses for read requests are received thus achieving elastic replay. If
the dependency is not found when adding a new node, it is assumed complete.
Thus, if this node is found to be completely dependency-free its issue time is
calculated and it is added to the ready list immediately. This is encapsulated
in the subclass ElasticDataGen.

If ready nodes are issued in an unconstrained way there can be more nodes
outstanding which results in divergence in timing compared to the O3CPU.
Therefore, the Trace CPU also models hardware resources. A sub-class to model
hardware resources is added which contains the maximum sizes of load buffer,
store buffer and ROB. If resources are not available, the node is not issued.
The 'depFreeQueue' structure holds nodes that are pending issue.

Modeling the ROB size in the Trace CPU as a resource limitation is arguably the
most important parameter of all resources. The ROB occupancy is estimated using
the newly added field 'robNum'. We need to use ROB number as sequence number is
at times much higher due to squashing and trace replay is focused on correct
path modeling.

A map called 'inFlightNodes' is added to track nodes that are not only in
the readyList but also load nodes that are executed (and thus removed from
readyList) but are not complete. ReadyList handles what and when to execute
next node while the inFlightNodes is used for resource modelling. The oldest
ROB number is updated when any node occupies the ROB or when an entry in the
ROB is released. The ROB occupancy is equal to the difference in the ROB number
of the newly dependency-free node and the oldest ROB number in flight.

If no node dependends on a non load/store node then there is no reason to track
it in the dependency graph. We filter out such nodes but count them and add a
weight field to the subsequent node that we do include in the trace. The weight
field is used to model ROB occupancy during replay.

The depFreeQueue is chosen to be FIFO so that child nodes which are in
program order get pushed into it in that order and thus issued in the in
program order, like in the O3CPU. This is also why the dependents is made a
sequential container, std::set to std::vector. We only check head of the
depFreeQueue as nodes are issued in order and blocking on head models that
better than looping the entire queue. An alternative choice would be to inspect
top N pending nodes where N is the issue-width. This is left for future as the
timing correlation looks good as it is.

At the start of an execution event, first we attempt to issue such pending
nodes by checking if appropriate resources have become available. If yes, we
compute the execute tick with respect to the time then. Then we proceed to
complete nodes from the readyList.

When a read response is received, sometimes a dependency on it that was
supposed to be released when it was issued is still not released. This occurs
because the dependent gets added to the graph after the read was sent. So the
check is made less strict and the dependency is marked complete on read
response instead of insisting that it should have been removed on read sent.

There is a check for requests spanning two cache lines as this condition
triggers an assert fail in the L1 cache. If it does then truncate the size
to access only until the end of that line and ignore the remainder.
Strictly-ordered requests are skipped and the dependencies on such requests
are handled by simply marking them complete immediately.

The simulated seconds can be calculated as the difference between the
final_tick stat and the tickOffset stat. A CountedExitEvent that contains
a static int belonging to the Trace CPU class as a down counter is used to
implement multi Trace CPU simulation exit.
/gem5/src/dev/alpha/
H A DTsunami.py11244:a2af58a06c4e Fri Dec 04 19:11:00 EST 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev: Rewrite PCI host functionality

The gem5's current PCI host functionality is very ad hoc. The current
implementations require PCI devices to be hooked up to the
configuration space via a separate configuration port. Devices query
the platform to get their config-space address range. Un-mapped parts
of the config space are intercepted using the XBar's default port
mechanism and a magic catch-all device (PciConfigAll).

This changeset redesigns the PCI host functionality to improve code
reuse and make config-space and interrupt mapping more
transparent. Existing platform code has been updated to use the new
PCI host and configured to stay backwards compatible (i.e., no
guest-side visible changes). The current implementation does not
expose any new functionality, but it can easily be extended with
features such as automatic interrupt mapping.

PCI devices now register themselves with a PCI host controller. The
host controller interface is defined in the abstract base class
PciHost. Registration is done by PciHost::registerDevice() which takes
the device, its bus position (bus/dev/func tuple), and its interrupt
pin (INTA-INTC) as a parameter. The registration interface returns a
PciHost::DeviceInterface that the PCI device can use to query memory
mappings and signal interrupts.

The host device manages the entire PCI configuration space. Accesses
to devices decoded into the devices bus position and then forwarded to
the correct device.

Basic PCI host functionality is implemented in the GenericPciHost base
class. Most platforms can use this class as a basic PCI controller. It
provides the following functionality:

* Configurable configuration space decoding. The number of bits
dedicated to a device is a prameter, making it possible to support
both CAM, ECAM, and legacy mappings.

* Basic interrupt mapping using the interruptLine value from a
device's configuration space. This behavior is the same as in the
old implementation. More advanced controllers can override the
interrupt mapping method to dynamically assign host interrupts to
PCI devices.

* Simple (base + addr) remapping from the PCI bus's address space to
physical addresses for PIO, memory, and DMA.
/gem5/src/dev/net/
H A Dethertap.cc11263:8dcc6b40f164 Thu Dec 10 05:35:00 EST 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev: Move network devices to src/dev/net/
H A Dns_gige.cc11263:8dcc6b40f164 Thu Dec 10 05:35:00 EST 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev: Move network devices to src/dev/net/
H A DEthernet.py11263:8dcc6b40f164 Thu Dec 10 05:35:00 EST 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev: Move network devices to src/dev/net/
/gem5/src/mem/ruby/network/
H A DNetwork.py11065:37e19af67f62 Sun Aug 30 01:24:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> ruby: specify number of vnets for each protocol
The default value for number of virtual networks is being removed. Each protocol
should now specify the value it needs.
/gem5/tests/configs/
H A Dt1000-simple-atomic.py10904:532f423d6760 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> tests: Skip SPARC tests if the required binaries are missing

The full-system SPARC tests depend on several binaries that aren't
generally available to the wider community. Flag the tests as skipped
instead of failed if these binaries can't be found.
/gem5/src/cpu/pred/
H A Dbi_mode.cc10785:f56c10663a01 Mon Apr 13 18:33:00 EDT 2015 Dibakar Gope <gope@wisc.edu> cpu: re-organizes the branch predictor structure.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
H A D2bit_local.cc10785:f56c10663a01 Mon Apr 13 18:33:00 EDT 2015 Dibakar Gope <gope@wisc.edu> cpu: re-organizes the branch predictor structure.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
H A D2bit_local.hh10785:f56c10663a01 Mon Apr 13 18:33:00 EDT 2015 Dibakar Gope <gope@wisc.edu> cpu: re-organizes the branch predictor structure.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
H A Dbi_mode.hh10785:f56c10663a01 Mon Apr 13 18:33:00 EDT 2015 Dibakar Gope <gope@wisc.edu> cpu: re-organizes the branch predictor structure.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
/gem5/configs/example/
H A Dread_config.py11228:021524c21cbc Sun Nov 22 05:10:00 EST 2015 Andrew Bardsley <Andrew.Bardsley@arm.com> config: Added missing types to JSON/INI Python reader

Added the missing types EthernetAddr and Current to the JSON/INI file
reader example configs/example/read_config.py.

Also added __str__ to EthernetAddr to make values appear in the same form
in JSON an INI files.
/gem5/src/cpu/testers/memtest/
H A DMemTest.py10688:22452667fd5c Wed Feb 11 10:23:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> cpu: Tidy up the MemTest and make false sharing more obvious

The MemTest class really only tests false sharing, and as such there
was a lot of old cruft that could be removed. This patch cleans up the
tester, and also makes it more clear what the assumptions are. As part
of this simplification the reference functional memory is also
removed.

The regression configs using MemTest are updated to reflect the
changes, and the stats will be bumped in a separate patch. The example
config will be updated in a separate patch due to more extensive
re-work.

In a follow-on patch a new tester will be introduced that uses the
MemChecker to implement true sharing.
/gem5/src/cpu/testers/rubytest/
H A DRubyTester.py11266:452e10b868ea Mon Jul 20 10:15:00 EDT 2015 Brad Beckmann <Brad.Beckmann@amd.com> ruby: more flexible ruby tester support

This patch allows the ruby random tester to use ruby ports that may only
support instr or data requests. This patch is similar to a previous changeset
(8932:1b2c17565ac8) that was unfortunately broken by subsequent changesets.
This current patch implements the support in a more straight-forward way.
Since retries are now tested when running the ruby random tester, this patch
splits up the retry and drain check behavior so that RubyPort children, such
as the GPUCoalescer, can perform those operations correctly without having to
duplicate code. Finally, the patch also includes better DPRINTFs for
debugging the tester.
/gem5/src/dev/virtio/
H A Dbase.cc10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class

Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.

* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.

* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).

* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.

* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.
/gem5/src/mem/
H A Dabstract_mem.cc11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand

This patch changes the name of a bunch of packet flags and MSHR member
functions and variables to make the coherency protocol easier to
understand. In addition the patch adds and updates lots of
descriptions, explicitly spelling out assumptions.

The following name changes are made:

* the packet memInhibit flag is renamed to cacheResponding

* the packet sharedAsserted flag is renamed to hasSharers

* the packet NeedsExclusive attribute is renamed to NeedsWritable

* the packet isSupplyExclusive is renamed responderHadWritable

* the MSHR pendingDirty is renamed to pendingModified

The cache states, Modified, Owned, Exclusive, Shared are also called
out in the cache and MSHR code to make it easier to understand.
11199:929fd978ab4e Fri Nov 06 03:26:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Add an option to perform clean writebacks from caches

This patch adds the necessary commands and cache functionality to
allow clean writebacks. This functionality is crucial, especially when
having exclusive (victim) caches. For example, if read-only L1
instruction caches are not sending clean writebacks, there will never
be any spills from the L1 to the L2. At the moment the cache model
defaults to not sending clean writebacks, and this should possibly be
re-evaluated.

The implementation of clean writebacks relies on a new packet command
WritebackClean, which acts much like a Writeback (renamed
WritebackDirty), and also much like a CleanEvict. On eviction of a
clean block the cache either sends a clean evict, or a clean
writeback, and if any copies are still cached upstream the clean
evict/writeback is dropped. Similarly, if a clean evict/writeback
reaches a cache where there are outstanding MSHRs for the block, the
packet is dropped. In the typical case though, the clean writeback
allocates a block in the downstream cache, and marks it writable if
the evicted block was writable.

The patch changes the O3_ARM_v7a L1 cache configuration and the
default L1 caches in config/common/Caches.py
11151:ca4ea9b5c052 Wed Sep 30 12:14:00 EDT 2015 Mitch Hayenga <mitch.hayenga@arm.com> cpu,isa,mem: Add per-thread wakeup logic

Changes wakeup functionality so that only specific threads on SMT
capable cpus are woken.
10883:9294c4a60251 Fri Jul 03 10:14:00 EDT 2015 Ali Jafri <ali.jafri@arm.com> mem: Add clean evicts to improve snoop filter tracking

This patch adds eviction notices to the caches, to provide accurate
tracking of cache blocks in snoop filters. We add the CleanEvict
message to the memory heirarchy and use both CleanEvicts and
Writebacks with BLOCK_CACHED flags to propagate notice of clean and
dirty evictions respectively, down the memory hierarchy. Note that the
BLOCK_CACHED flag indicates whether there exist any copies of the
evicted block in the caches above the evicting cache.

The purpose of the CleanEvict message is to notify snoop filters of
silent evictions in the relevant caches. The CleanEvict message
behaves much like a Writeback. CleanEvict is a write and a request but
unlike a Writeback, CleanEvict does not have data and does not need
exclusive access to the block. The cache generates the CleanEvict
message on a fill resulting in eviction of a clean block. Before
travelling downwards CleanEvict requests generate zero-time snoop
requests to check if the same block is cached in upper levels of the
memory heirarchy. If the block exists, the cache discards the
CleanEvict message. The snoops check the tags, writeback queue and the
MSHRs of upper level caches in a manner similar to snoops generated
from HardPFReqs. Currently CleanEvicts keep travelling towards main
memory unless they encounter the block corresponding to their address
or reach main memory (since we have no well defined point of
serialisation). Main memory simply discards CleanEvict messages.

We have modified the behavior of Writebacks, such that they generate
snoops to check for the presence of blocks in upper level caches. It
is possible in our current implmentation for a lower level cache to be
writing back a block while a shared copy of the same block exists in
the upper level cache. If the snoops find the same block in upper
level caches, we set the BLOCK_CACHED flag in the Writeback message.

We have also added logic to account for interaction of other message
types with CleanEvicts waiting in the writeback queue. A simple
example is of a response arriving at a cache removing any CleanEvicts
to the same address from the cache's writeback queue.
/gem5/src/mem/cache/prefetch/
H A DPrefetcher.py13772:31b71dadc472 Thu Mar 07 09:42:00 EST 2019 Javier Bueno <javier.bueno@metempsy.com> mem-cache: Added the Indirect Memory Prefetcher

Reference:
Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, and Srinivas Devadas.
2015. IMP: indirect memory prefetcher. In Proceedings of the 48th
International Symposium on Microarchitecture (MICRO-48). ACM,
New York, NY, USA, 178-190. DOI: https://doi.org/10.1145/2830772.2830807

Change-Id: I52790f69c13ec55b8c1c8b9396ef9a1fb1be9797
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/16223
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
13717:11e81e2a98bd Mon Dec 03 17:03:00 EST 2018 Ivan Pizarro <ivan.pizarro@metempsy.com> mem-cache: A Best-Offset Prefetcher

Michaud, P. (2015, June). A best-offset prefetcher.
In 2nd Data Prefetching Championship.

Change-Id: I61bb89ca5639356d54aeb04e856d5bf6e8805c22
Reviewed-on: https://gem5-review.googlesource.com/c/14820
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
13700:56fa28e6fab4 Thu Jan 31 10:24:00 EST 2019 Javier Bueno <javier.bueno@metempsy.com> mem-cache: Added the Slim AMPM Prefetcher

Reference:
Towards Bandwidth-Efficient Prefetching with Slim AMPM.
Young, V., & Krishna, A. (2015). The 2nd Data Prefetching Championship.

Slim AMPM is composed of two prefetchers, the DPCT and the AMPM (both already
in gem5).

Change-Id: I6e868faf216e3e75231cf181d59884ed6f0d382a
Reviewed-on: https://gem5-review.googlesource.com/c/16383
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
13553:047def1fa787 Thu Nov 29 10:59:00 EST 2018 Javier Bueno <javier.bueno@metempsy.com> mem-cache: Signature Path Prefetcher

Related paper:
Lookahead Prefetching with Signature Path
J Kim, PV Gratz, ALN Reddy
The 2nd Data Prefetching Championship (DPC2), 2015

Change-Id: I2319be2fa409f955f65e1bf1e1bb2d6d9a4fea11
Reviewed-on: https://gem5-review.googlesource.com/c/14737
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
/gem5/src/mem/cache/tags/
H A Dbase.hh11055:54071fd5c397 Fri Aug 21 07:03:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> arm, mem: Remove unused CLEAR_LL request flag

Cleaning up dead code. The CLREX stores zero directly to
MISCREG_LOCKFLAG and so the request flag is no longer needed. The
corresponding functionality in the cache tags is also removed.
10941:a39646f4c407 Thu Jul 30 03:41:00 EDT 2015 David Guillen-Fandos <david.guillen@arm.com> mem: Make caches way aware

This patch makes cache sets aware of the way number. This enables
some nice features such as the ablity to restrict way allocation. The
implemented mechanism allows to set a maximum way number to be
allocated 'k' which must fulfill 0 < k <= N (where N is the number of
ways). In the future more sophisticated mechasims can be implemented.
10815:169af9a2779f Tue May 05 03:22:00 EDT 2015 David Guillen <david.guillen@arm.com> mem: Remove templates in cache model

This patch changes the cache implementation to rely on virtual methods
rather than using the replacement policy as a template argument.

There is no impact on the simulation performance, and overall the
changes make it easier to modify (and subclass) the cache and/or
replacement policy.
10693:c0979b2ebda5 Wed Feb 11 10:23:00 EST 2015 Marco Balboni <Marco.Balboni@ARM.com> mem: Clarify usage of latency in the cache

This patch adds some much-needed clarity in the specification of the
cache timing. For now, hit_latency and response_latency are kept as
top-level parameters, but the cache itself has a number of local
variables to better map the individual timing variables to different
behaviours (and sub-components).

The introduced variables are:
- lookupLatency: latency of tag lookup, occuring on any access
- forwardLatency: latency that occurs in case of outbound miss
- fillLatency: latency to fill a cache block
We keep the existing responseLatency

The forwardLatency is used by allocateInternalBuffer() for:
- MSHR allocateWriteBuffer (unchached write forwarded to WriteBuffer);
- MSHR allocateMissBuffer (cacheable miss in MSHR queue);
- MSHR allocateUncachedReadBuffer (unchached read allocated in MSHR
queue)
It is our assumption that the time for the above three buffers is the
same. Similarly, for snoop responses passing through the cache we use
forwardLatency.

Completed in 132 milliseconds

<<51525354555657585960>>