Searched hist:2015 (Results 1101 - 1125 of 1505) sorted by relevance

<<41424344454647484950>>

/gem5/src/cpu/kvm/
H A Dx86_cpu.cc11363:f3f72c0ab03e Fri Nov 27 09:52:00 EST 2015 Andreas Sandberg <andreas@sandberg.pp.se> kvm: Shutdown KVM and disconnect performance counters on fork

We can't/shouldn't use KVM after a fork since the child and parent
probably point to the same VM. Knowing the exact effects of this is
hard, but they are likely to be messy. We also disconnect the
performance counters attached to the guest. This works around what
seems to be a kernel bug where spurious SIGIOs get delivered to the
forked child process.

Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se>
[sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version]
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
[andreas.sandberg@arm.com: Fatal if entering KVM in child process ]
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
11150:a8a64cca231b Wed Sep 30 12:14:00 EDT 2015 Mitch Hayenga <mitch.hayenga@arm.com> isa,cpu: Add support for FS SMT Interrupts

Adds per-thread interrupt controllers and thread/context logic
so that interrupts properly get routed in SMT systems.
10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class

Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.

* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.

* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).

* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.

* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.
10653:e3fc6bc7f97e Thu Jan 22 05:00:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Clean up Request initialisation

This patch tidies up how we create and set the fields of a Request. In
essence it tries to use the constructor where possible (as opposed to
setPhys and setVirt), thus avoiding spreading the information across a
number of locations. In fact, setPhys is made private as part of this
patch, and a number of places where we callede setVirt instead uses
the appropriate constructor.
/gem5/src/cpu/simple/
H A DBaseSimpleCPU.py10785:f56c10663a01 Mon Apr 13 18:33:00 EDT 2015 Dibakar Gope <gope@wisc.edu> cpu: re-organizes the branch predictor structure.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
/gem5/src/dev/arm/
H A Dgeneric_timer.cc10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class

Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.

* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.

* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).

* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.

* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.
10847:1826ee736709 Sat May 23 08:46:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> arm, dev: Add support for a memory mapped generic timer

There are cases when we don't want to use a system register mapped
generic timer, but can't use the SP804. For example, when using KVM on
aarch64, we want to intercept accesses to the generic timer, but can't
do so if it is using the system register interface. In such cases,
we need to use a memory-mapped generic timer.

This changeset adds a device model that implements the memory mapped
generic timer interface. The current implementation only supports a
single frame (i.e., one virtual timer and one physical timer).
10845:75df7a87be83 Sat May 23 08:46:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev, arm: Add virtual timers to the generic timer model

The generic timer model currently does not support virtual
counters. Virtual and physical counters both tick with the same
frequency. However, virtual timers allow a hypervisor to set an offset
that is subtracted from the counter when it is read. This enables the
hypervisor to present a time base that ticks with virtual time in the
VM (i.e., doesn't tick when the VM isn't running). Modern Linux
kernels generally assume that virtual counters exist and try to use
them by default.
10844:8551af601f75 Sat May 23 08:46:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev, arm: Refactor and clean up the generic timer model

This changeset cleans up the generic timer a bit and moves most of the
register juggling from the ISA code into a separate class in the same
source file as the rest of the generic timer. It also removes the
assumption that there is always 8 or fewer CPUs in the system. Instead
of having a fixed limit, we now instantiate per-core timers as they
are requested. This is all in preparation for other patches that add
support for virtual timers and a memory mapped interface.
H A DSConscript10916:5c76426fd9ee Tue Jul 07 05:03:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev, arm: Add a device model that uses the NoMali model

Add a simple device shim that interfaces with the NoMali model
library. The gem5 side of the interface supports Mali T60x/T62x/T760
GPUs. This device model pretends to be a Mali GPU, but doesn't render
anything and executes in zero time.
10802:876341add7be Thu Apr 23 13:37:00 EDT 2015 Rene de Jong <rene.dejong@arm.com> arm, dev: Add a UFS device

This patch introduces a UFS host controller and a UFS device. More
information about the UFS standard can be found at the JEDEC site:
http://www.jedec.org/standards-documents/results/jesd220

Note that the model does not implement the complete standard, and as
such is not an actual implementation of UFS. The following SCSI
commands are implemented: inquiry, read, read capacity, report LUNs,
start/stop, test unit ready, verify, write, format unit, send
diagnostic, synchronize cache, mode select, mode sense, request sense,
unmap, write buffer and read buffer. This is sufficient for usage with
Linux and Android.

To interact with this model a kernel version 3.9 or above is
needed.
10801:049eb85e8ea2 Thu Apr 23 13:37:00 EDT 2015 Rene de Jong <rene.dejong@arm.com> arm, dev: Add a NAND flash timing model

This adds a NAND flash timing model. This model takes the number of
planes into account and is ultimately intended to be used as a
high-level performance model for any device using flash. To access the
memory, use either readMemory or writeMemory.

To make use of the model you will need an interface model
such as UFSHostDevice, which is part of a separate patch.

At the moment the flash device is part of the ARM device tree since
the only use if the UFSHostDevice, and that in turn relies on the ARM
GIC.
10749:ac3611ba911c Thu Mar 19 04:06:00 EDT 2015 Matt Evans <matt.evans@arm.com> arm: Add a GICv2m device

This patch adds a new PIO-accessible GICv2m shim. This shim has a PIO
slave port on one side, and SPI 'wires' on the other. It accepts MSIs
from the system and triggers SPIs on the GIC. It is configurable with
a number of frames, each of which has a number of SPIs and a base SPI
offset.

A Linux driver for GICv2m is available upstream.
/gem5/src/dev/net/
H A Detherbus.cc11263:8dcc6b40f164 Thu Dec 10 05:35:00 EST 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev: Move network devices to src/dev/net/
H A Dethertap.hh11263:8dcc6b40f164 Thu Dec 10 05:35:00 EST 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev: Move network devices to src/dev/net/
H A Di8254xGBe.hh11263:8dcc6b40f164 Thu Dec 10 05:35:00 EST 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev: Move network devices to src/dev/net/
H A Dsinic.cc11263:8dcc6b40f164 Thu Dec 10 05:35:00 EST 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev: Move network devices to src/dev/net/
/gem5/src/dev/pci/
H A DPciHost.py11244:a2af58a06c4e Fri Dec 04 19:11:00 EST 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev: Rewrite PCI host functionality

The gem5's current PCI host functionality is very ad hoc. The current
implementations require PCI devices to be hooked up to the
configuration space via a separate configuration port. Devices query
the platform to get their config-space address range. Un-mapped parts
of the config space are intercepted using the XBar's default port
mechanism and a magic catch-all device (PciConfigAll).

This changeset redesigns the PCI host functionality to improve code
reuse and make config-space and interrupt mapping more
transparent. Existing platform code has been updated to use the new
PCI host and configured to stay backwards compatible (i.e., no
guest-side visible changes). The current implementation does not
expose any new functionality, but it can easily be extended with
features such as automatic interrupt mapping.

PCI devices now register themselves with a PCI host controller. The
host controller interface is defined in the abstract base class
PciHost. Registration is done by PciHost::registerDevice() which takes
the device, its bus position (bus/dev/func tuple), and its interrupt
pin (INTA-INTC) as a parameter. The registration interface returns a
PciHost::DeviceInterface that the PCI device can use to query memory
mappings and signal interrupts.

The host device manages the entire PCI configuration space. Accesses
to devices decoded into the devices bus position and then forwarded to
the correct device.

Basic PCI host functionality is implemented in the GenericPciHost base
class. Most platforms can use this class as a basic PCI controller. It
provides the following functionality:

* Configurable configuration space decoding. The number of bits
dedicated to a device is a prameter, making it possible to support
both CAM, ECAM, and legacy mappings.

* Basic interrupt mapping using the interruptLine value from a
device's configuration space. This behavior is the same as in the
old implementation. More advanced controllers can override the
interrupt mapping method to dynamically assign host interrupts to
PCI devices.

* Simple (base + addr) remapping from the PCI bus's address space to
physical addresses for PIO, memory, and DMA.
H A Dcopy_engine.cc11261:2050602b55f7 Thu Dec 10 05:35:00 EST 2015 Andreas Sandberg <andreas.sandberg@arm.com> dev: Move the CopyEngine class to src/dev/pci
/gem5/src/python/m5/
H A Dtrace.py11153:20bbfe5b2b86 Wed Sep 30 16:21:00 EDT 2015 Curtis Dunham <Curtis.Dunham@arm.com> base: remove Trace::enabled flag

The DTRACE() macro tests both Trace::enabled and the specific flag. This
change uses the same administrative interface for enabling/disabling
tracing, but masks the SimpleFlags settings directly. This eliminates a
load for every DTRACE() test, e.g. DPRINTF.
/gem5/src/sim/power/
H A DThermalModel.py11420:b48c0ba4f524 Tue May 12 05:26:00 EDT 2015 David Guillen Fandos <david.guillen@arm.com> sim: Adding thermal model support

This patch adds basic thermal support to gem5. It models energy dissipation
through a circuital equivalent, which allows us to use RC networks.
This lays down the basic infrastructure to do so, but it does not "work" due
to the lack of power models. For now some hardcoded number is used as a PoC.
The solver is embedded in the patch.
/gem5/configs/common/
H A DCacheConfig.py11251:a15c86af004a Mon Dec 07 17:42:00 EST 2015 Radhika Jagtap <radhika.jagtap@ARM.com> config: Enable elastic trace capture and replay in se/fs

This patch adds changes to the configuration scripts to support elastic
tracing and replay.

The patch adds a command line option to enable elastic tracing in SE mode
and FS mode. When enabled the Elastic Trace cpu probe is attached to O3CPU
and a few O3 CPU parameters are tuned. The Elastic Trace probe writes out
both instruction fetch and data dependency traces. The patch also enables
configuring the TraceCPU to replay traces using the SE and FS script.

The replay run is designed to resume from checkpoint using atomic cpu to
restore state keeping it consistent with FS run flow. It then switches to
TraceCPU to replay the input traces.
10884:c60acdbdd6ad Fri Jul 03 10:14:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Allow read-only caches and check compliance

This patch adds a parameter to the BaseCache to enable a read-only
cache, for example for the instruction cache, or table-walker cache
(not for x86). A number of checks are put in place in the code to
ensure a read-only cache does not end up with dirty data.

A follow-on patch adds suitable read requests to allow a read-only
cache to explicitly ask for clean data.
10780:46070443051e Wed Apr 08 16:56:00 EDT 2015 Curtis Dunham <Curtis.Dunham@arm.com> config: Support full-system with SST's memory system

This patch adds an example configuration in ext/sst/tests/ that allows
an SST/gem5 instance to simulate a 4-core AArch64 system with SST's
memHierarchy components providing all the caches and memories.
10720:67b3e74de9ae Mon Mar 02 04:00:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Move crossbar default latencies to subclasses

This patch introduces a few subclasses to the CoherentXBar and
NoncoherentXBar to distinguish the different uses in the system. We
use the crossbar in a wide range of places: interfacing cores to the
L2, as a system interconnect, connecting I/O and peripherals,
etc. Needless to say, these crossbars have very different performance,
and the clock frequency alone is not enough to distinguish these
scenarios.

Instead of trying to capture every possible case, this patch
introduces dedicated subclasses for the three primary use-cases:
L2XBar, SystemXBar and IOXbar. More can be added if needed, and the
defaults can be overridden.
/gem5/src/arch/arm/freebsd/
H A Dprocess.cc10810:683ab55819fd Wed Apr 29 23:35:00 EDT 2015 Ruslan Bukin <br@bsdpad.com> arch, base, dev, kern, sym: FreeBSD support

This adds support for FreeBSD/aarch64 FS and SE mode (basic set of syscalls only)

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
/gem5/src/base/
H A Dcp_annotate.cc10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class

Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.

* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.

* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).

* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.

* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.
/gem5/src/cpu/
H A Dthread_context.cc11005:e7f403b6b76f Fri Aug 07 04:59:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> base: Declare a type for context IDs

Context IDs used to be declared as ad hoc (usually as int). This
changeset introduces a typedef for ContextIDs and a constant for
invalid context IDs.
10935:acd48ddd725f Tue Jul 28 02:58:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> revert 5af8f40d8f2c
10934:5af8f40d8f2c Sun Jul 26 11:21:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> cpu: implements vector registers

This adds a vector register type. The type is defined as a std::array of a
fixed number of uint64_ts. The isa_parser.py has been modified to parse vector
register operands and generate the required code. Different cpus have vector
register files now.
10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class

Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.

* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.

* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).

* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.

* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.
H A Ddummy_checker.cc11150:a8a64cca231b Wed Sep 30 12:14:00 EDT 2015 Mitch Hayenga <mitch.hayenga@arm.com> isa,cpu: Add support for FS SMT Interrupts

Adds per-thread interrupt controllers and thread/context logic
so that interrupts properly get routed in SMT systems.
/gem5/src/mem/
H A Daddr_mapper.hh10713:eddb533708cb Mon Mar 02 04:00:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Split port retry for all different packet classes

This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.

The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.

The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
H A Dserial_link.hh11185:0ff78be3bc67 Tue Nov 03 01:17:00 EST 2015 Erfan Azarkhish <erfan.azarkhish@unibo.it> mem: hmc: serial link model

This changeset adds a serial link model for the Hybrid Memory Cube (HMC).
SerialLink is a simple variation of the Bridge class, with the ability to
account for the latency of packet serialization. Also trySendTiming has been
modified to correctly model bandwidth.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
H A Dpacket.cc11287:0d5bbeaeb8ca Thu Dec 31 09:34:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Do not rely on the NeedsWritable flag for responses

This patch removes the NeedsWritable flag for all responses, as it is
really only the request that needs a writable response. The response,
on the other hand, should in these cases always provide the line in a
writable state, as indicated by the hasSharers flag not being set.

When we send requests that has NeedsWritable set, the response will
always have the hasSharers flag not set. Additionally, there are cases
where the request did not have NeedsWritable set, and we still get a
writable response with the hasSharers flag not set. This never happens
on snoops, but is used by downstream caches to pass ownership
upstream.

As part of this patch, the affected response types are updated, and
the snoop filter is similarly modified to check only the hasSharers
flag (as it should). A sanity check is also added to the packet class,
asserting that we never look at the NeedsWritable flag for responses.

No regressions are affected.
11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand

This patch changes the name of a bunch of packet flags and MSHR member
functions and variables to make the coherency protocol easier to
understand. In addition the patch adds and updates lots of
descriptions, explicitly spelling out assumptions.

The following name changes are made:

* the packet memInhibit flag is renamed to cacheResponding

* the packet sharedAsserted flag is renamed to hasSharers

* the packet NeedsExclusive attribute is renamed to NeedsWritable

* the packet isSupplyExclusive is renamed responderHadWritable

* the MSHR pendingDirty is renamed to pendingModified

The cache states, Modified, Owned, Exclusive, Shared are also called
out in the cache and MSHR code to make it easier to understand.
11256:65db40192591 Wed Dec 09 22:56:00 EST 2015 Tony Gutierrez <anthony.gutierrez@amd.com> mem: remove acq/rel cmds from packet and add mem fence req
11199:929fd978ab4e Fri Nov 06 03:26:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Add an option to perform clean writebacks from caches

This patch adds the necessary commands and cache functionality to
allow clean writebacks. This functionality is crucial, especially when
having exclusive (victim) caches. For example, if read-only L1
instruction caches are not sending clean writebacks, there will never
be any spills from the L1 to the L2. At the moment the cache model
defaults to not sending clean writebacks, and this should possibly be
re-evaluated.

The implementation of clean writebacks relies on a new packet command
WritebackClean, which acts much like a Writeback (renamed
WritebackDirty), and also much like a CleanEvict. On eviction of a
clean block the cache either sends a clean evict, or a clean
writeback, and if any copies are still cached upstream the clean
evict/writeback is dropped. Similarly, if a clean evict/writeback
reaches a cache where there are outstanding MSHRs for the block, the
packet is dropped. In the typical case though, the clean writeback
allocates a block in the downstream cache, and marks it writable if
the evicted block was writable.

The patch changes the O3_ARM_v7a L1 cache configuration and the
default L1 caches in config/common/Caches.py
11191:9eabb2bf349b Fri Nov 06 03:26:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Do not treat CleanEvict as a write operation

This patch changes the CleanEvict command type to not be considered a
write. Initially it was made a zero-sized write to match the writeback
command, but as things developed it became clear that it causes more
problems than it solves. For example, the memory modules (and bridge)
should not consider the CleanEvict as a write, but instead discard
it. With this patch it will be neither a read, nor write, and as it
does not need a response the slave will simply sink it.
11003:ba91725c8f6b Fri Aug 07 04:55:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Remove extraneous acquire/release flags and attributes

This patch removes the extraneous flags and attributes from the
request and packet, and simply leaves the new commands. The change
introduced when adding acquire/release breaks all compatibility with
existing traces, and there is really no need for any new flags and
attributes. The commands should be sufficient.

This patch fixes packet tracing (urgent), and also removes the
unnecessary complexity.
10975:eba4e93665fc Mon Jul 20 10:15:00 EDT 2015 David Hashe <david.hashe@amd.com> mem: add request types for acquire and release

Add support for acquire and release requests. These synchronization operations
are commonly supported by several modern instruction sets.
10886:fdd4a895f325 Fri Jul 03 10:14:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Split WriteInvalidateReq into write and invalidate

WriteInvalidateReq ensures that a whole-line write does not incur the
cost of first doing a read exclusive, only to later overwrite the
data. This patch splits the existing WriteInvalidateReq into a
WriteLineReq, which is done locally, and an InvalidateReq that is sent
out throughout the memory system. The WriteLineReq re-uses the normal
WriteResp.

The change allows us to better express the difference between the
cache that is performing the write, and the ones that are merely
invalidating. As a consequence, we no longer have to rely on the
isTopLevel flag. Moreover, the actual memory in the system does not
see the intitial write, only the writeback. We were marking the
written line as dirty already, so there is really no need to also push
the write all the way to the memory.

The overall flow of the write-invalidate operation remains the same,
i.e. the operation is only carried out once the response for the
invalidate comes back. This patch adds the InvalidateResp for this
very reason.
10885:3ac92bf1f31f Fri Jul 03 10:14:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Add ReadCleanReq and ReadSharedReq packets

This patch adds two new read requests packets:

ReadCleanReq - For a cache to explicitly request clean data. The
response is thus exclusive or shared, but not owned or modified. The
read-only caches (see previous patch) use this request type to ensure
they do not get dirty data.

ReadSharedReq - We add this to distinguish cache read requests from
those issued by other masters, such as devices and CPUs. Thus, devices
use ReadReq, and caches use ReadCleanReq, ReadExReq, or
ReadSharedReq. For the latter, the response can be any state, shared,
exclusive, owned or even modified.

Both ReadCleanReq and ReadSharedReq re-use the normal ReadResp. The
two transactions are aligned with the emerging cache-coherent TLM
standard and the AMBA nomenclature.

With this change, the normal ReadReq should never be used by a cache,
and is reserved for the actual (non-caching) masters in the system. We
thus have a way of identifying if a request came from a cache or
not. The introduction of ReadSharedReq thus removes the need for the
current isTopLevel hack, and also allows us to stop relying on
checking the packet size to determine if the source is a cache or
not. This is fixed in follow-on patches.
10883:9294c4a60251 Fri Jul 03 10:14:00 EDT 2015 Ali Jafri <ali.jafri@arm.com> mem: Add clean evicts to improve snoop filter tracking

This patch adds eviction notices to the caches, to provide accurate
tracking of cache blocks in snoop filters. We add the CleanEvict
message to the memory heirarchy and use both CleanEvicts and
Writebacks with BLOCK_CACHED flags to propagate notice of clean and
dirty evictions respectively, down the memory hierarchy. Note that the
BLOCK_CACHED flag indicates whether there exist any copies of the
evicted block in the caches above the evicting cache.

The purpose of the CleanEvict message is to notify snoop filters of
silent evictions in the relevant caches. The CleanEvict message
behaves much like a Writeback. CleanEvict is a write and a request but
unlike a Writeback, CleanEvict does not have data and does not need
exclusive access to the block. The cache generates the CleanEvict
message on a fill resulting in eviction of a clean block. Before
travelling downwards CleanEvict requests generate zero-time snoop
requests to check if the same block is cached in upper levels of the
memory heirarchy. If the block exists, the cache discards the
CleanEvict message. The snoops check the tags, writeback queue and the
MSHRs of upper level caches in a manner similar to snoops generated
from HardPFReqs. Currently CleanEvicts keep travelling towards main
memory unless they encounter the block corresponding to their address
or reach main memory (since we have no well defined point of
serialisation). Main memory simply discards CleanEvict messages.

We have modified the behavior of Writebacks, such that they generate
snoops to check for the presence of blocks in upper level caches. It
is possible in our current implmentation for a lower level cache to be
writing back a block while a shared copy of the same block exists in
the upper level cache. If the snoops find the same block in upper
level caches, we set the BLOCK_CACHED flag in the Writeback message.

We have also added logic to account for interaction of other message
types with CleanEvicts waiting in the writeback queue. A simple
example is of a response arriving at a cache removing any CleanEvicts
to the same address from the cache's writeback queue.
/gem5/src/mem/ruby/system/
H A DSequencer.py11266:452e10b868ea Mon Jul 20 10:15:00 EDT 2015 Brad Beckmann <Brad.Beckmann@amd.com> ruby: more flexible ruby tester support

This patch allows the ruby random tester to use ruby ports that may only
support instr or data requests. This patch is similar to a previous changeset
(8932:1b2c17565ac8) that was unfortunately broken by subsequent changesets.
This current patch implements the support in a more straight-forward way.
Since retries are now tested when running the ruby random tester, this patch
splits up the retry and drain check behavior so that RubyPort children, such
as the GPUCoalescer, can perform those operations correctly without having to
duplicate code. Finally, the patch also includes better DPRINTFs for
debugging the tester.
11019:fc1e41e88fd3 Fri Aug 14 01:19:00 EDT 2015 Joel Hestness <jthestness@gmail.com> ruby: Remove the RubyCache/CacheMemory latency

The RubyCache (CacheMemory) latency parameter is only used for top-level caches
instantiated for Ruby coherence protocols. However, the top-level cache hit
latency is assessed by the Sequencer as accesses flow through to the cache
hierarchy. Further, protocol state machines should be enforcing these cache hit
latencies, but RubyCaches do not expose their latency to any existng state
machines through the SLICC/C++ interface. Thus, the RubyCache latency parameter
is superfluous for all caches. This is confusing for users.

As a step toward pushing L0/L1 cache hit latency into the top-level cache
controllers, move their latencies out of the RubyCache declarations and over to
their Sequencers. Eventually, these Sequencer parameters should be exposed as
parameters to the top-level cache controllers, which should assess the latency.
NOTE: Assessing these latencies in the cache controllers will require modifying
each to eliminate instantaneous Ruby hit callbacks in transitions that finish
accesses, which is likely a large undertaking.
10919:80069a602c83 Fri Jul 10 17:05:00 EDT 2015 Brandon Potter <brandon.potter@amd.com> ruby: replace global g_system_ptr with per-object pointers

This is another step in the process of removing global variables
from Ruby to enable multiple RubySystem instances in a single simulation.

With possibly multiple RubySystem objects, we can no longer use a global
variable to find "the" RubySystem object. Instead, each Ruby component
has to carry a pointer to the RubySystem object to which it belongs.
10706:4206946d60fe Thu Feb 26 10:58:00 EST 2015 Jason Power <power.jg@gmail.com> Ruby: Update backing store option to propagate through to all RubyPorts

Previously, the user would have to manually set access_backing_store=True
on all RubyPorts (Sequencers) in the config files.
Now, instead there is one global option that each RubyPort checks on
initialization.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
/gem5/src/cpu/minor/
H A Dexecute.cc11150:a8a64cca231b Wed Sep 30 12:14:00 EDT 2015 Mitch Hayenga <mitch.hayenga@arm.com> isa,cpu: Add support for FS SMT Interrupts

Adds per-thread interrupt controllers and thread/context logic
so that interrupts properly get routed in SMT systems.
10851:a50657a4f0c1 Tue May 26 03:21:00 EDT 2015 Andrew Bardsley <Andrew.Bardsley@arm.com> cpu: Fix a bug in counting issued instructions in MinorCPU

The MinorCPU would count bubbles in Execute::issue as part of
the num_insts_issued and so sometimes reach the instruction
issue limit incorrectly.

Fixed by checking for a bubble in one new place.
10814:46b6043bd32c Tue May 05 03:22:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> cpu: Work around gcc 4.9 issues with Num_OpClasses

This patch fixes a recent issue with gcc 4.9 (and possibly more) being
convinced that indices outside the array bounds are used when
initialising the FUPool members.
10774:68d688cbe26c Fri Apr 03 12:42:00 EDT 2015 Nikos Nikoleris <nikos.nikoleris@gmail.com> cpu: fix system total instructions accounting

The totalInstructions counter is only incremented when the whole instruction is
commited and not on every microop. It was incorrectly reset in atomic and
timing cpus.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>"
/gem5/src/arch/arm/kvm/
H A DKvmGic.py10859:0ba6f47025d1 Mon Jun 01 14:44:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> kvm, arm, dev: Add an in-kernel GIC implementation

This changeset adds a GIC implementation that uses the kernel's
built-in support for simulating the interrupt controller. Since there
is currently no support for state transfer between gem5 and the
kernel, the device model does not support serialization and CPU
switching (which would require switching to a gem5-simulated GIC).
/gem5/src/dev/x86/
H A Di82094aa.hh11175:2324ed5fa9f4 Fri Oct 23 09:51:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> x86: Add missing explicit overrides for X86 devices

Make clang >= 3.5 happy when compiling build/X86/gem5.opt on OSX.
11168:f98eb2da15a4 Mon Oct 12 04:07:00 EDT 2015 Andreas Hansson <andreas.hansson@arm.com> misc: Remove redundant compiler-specific defines

This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap
(and similar) abstractions, as these are no longer needed with gcc 4.7
and clang 3.1 as minimum compiler versions.
11144:90eeefe7e341 Tue Sep 29 10:28:00 EDT 2015 Joel Hestness <jthestness@gmail.com> arch, x86: Delete packet in IntDevice::recvResponse

IntDevice::recvResponse is called from two places in current mainline: (1) the
short circuit path of X86ISA::IntDevice::IntMasterPort::sendMessage for atomic
mode, and (2) the full request->response path to and from the x86 interrupts
device (finally called from MessageMasterPort::recvTimingResp). In the former
case, the packet was deleted correctly, but in the latter case, the packet and
request leak. To fix the leak, move request and packet deletion into IntDevice
inherited class implementations of recvResponse.
10905:a6ca6831e775 Tue Jul 07 04:51:00 EDT 2015 Andreas Sandberg <andreas.sandberg@arm.com> sim: Refactor the serialization base class

Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.

* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.

* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).

* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.

* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.
/gem5/src/cpu/o3/
H A Dchecker.cc11150:a8a64cca231b Wed Sep 30 12:14:00 EDT 2015 Mitch Hayenga <mitch.hayenga@arm.com> isa,cpu: Add support for FS SMT Interrupts

Adds per-thread interrupt controllers and thread/context logic
so that interrupts properly get routed in SMT systems.

Completed in 140 milliseconds

<<41424344454647484950>>