Searched hist:2014 (Results 1226 - 1250 of 1681) sorted by relevance

<<41424344454647484950>>

/gem5/src/cpu/testers/memtest/
H A Dmemtest.cc11422:4f749e00b667 Tue Nov 18 09:00:00 EST 2014 Akash Bagdia <akash.bagdia@ARM.com> power: Add power states to ClockedObject

Add 4 power states to the ClockedObject, provides necessary access functions
to check and update the power state. Default power state is UNDEFINED, it is
responsibility of the respective simulation model to provide the startup state
and any other logic for state change.

Add number of transition stat.
Add distribution of time spent in clock gated state.
Add power state residency stat.

Add dump call back function to allow stats update of distribution and residency
stats.
10566:c99c8d2a7c31 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Assume all dynamic packet data is array allocated

This patch simplifies how we deal with dynamically allocated data in
the packet, always assuming that it is array allocated, and hence
should be array deallocated (delete[] as opposed to delete). The only
uses of dataDynamic was in the Ruby testers.

The ARRAY_DATA flag in the packet is removed accordingly. No
defragmentation of the flags is done at this point, leaving a gap in
the bit masks.

As the last part the patch, it renames dataDynamicArray to dataDynamic.
10563:755b18321206 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Add const getters for write packet data

This patch takes a first step in tightening up how we use the data
pointer in write packets. A const getter is added for the pointer
itself (getConstPtr), and a number of member functions are also made
const accordingly. In a range of places throughout the memory system
the new member is used.

The patch also removes the unused isReadWrite function.
10376:28c63d075e0c Fri Sep 19 10:35:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> misc: Use safe_cast when assumptions are made about return value

This patch changes two dynamic_cast to safe_cast as we assume the
return value is not NULL (without checking).
10348:c91b23c72d5e Wed Sep 03 07:42:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> base: Use the global Mersenne twister throughout

This patch tidies up random number generation to ensure that it is
done consistently throughout the code base. In essence this involves a
clean-up of Ruby, and some code simplifications in the traffic
generator.

As part of this patch a bunch of skewed distributions (off-by-one etc)
have been fixed.

Note that a single global random number generator is used, and that
the object instantiation order will impact the behaviour (the sequence
of numbers will be unaffected, but if module A calles random before
module B then they would obviously see a different outcome). The
dependency on the instantiation order is true in any case due to the
execution-model of gem5, so we leave it as is. Also note that the
global ranom generator is not thread safe at this point.

Regressions using the memtest, TrafficGen or any Ruby tester are
affected and will be updated accordingly.
/gem5/src/mem/cache/prefetch/
H A DPrefetcher.py10623:b9646f4546ad Tue Dec 23 09:31:00 EST 2014 Mitch Hayenga <mitch.hayenga@arm.com> mem: Rework the structuring of the prefetchers

Re-organizes the prefetcher class structure. Previously the
BasePrefetcher forced multiple assumptions on the prefetchers that
inherited from it. This patch makes the BasePrefetcher class truly
representative of base functionality. For example, the base class no
longer enforces FIFO order. Instead, prefetchers with FIFO requests
(like the existing stride and tagged prefetchers) now inherit from a
new QueuedPrefetcher base class.

Finally, the stride-based prefetcher now assumes a custimizable lookup table
(sets/ways) rather than the previous fully associative structure.
10466:73b7549d979e Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Dynamically determine page bytes in memory components

This patch takes a step towards an ISA-agnostic memory
system by enabling the components to establish the page size after
instantiation. The swap operation in the memory is now also allowing
any granularity to avoid depending on the IntReg of the ISA.
10382:452a5f178ec5 Sat Sep 20 17:17:00 EDT 2014 Mitch Hayenga <mitch.hayenga@arm.com> mem: Remove the GHB prefetcher from the source tree

There are two primary issues with this code which make it deserving of deletion.

1) GHB is a way to structure a prefetcher, not a definitive type of prefetcher
2) This prefetcher isn't even structured like a GHB prefetcher.
It's basically a worse version of the stride prefetcher.

It primarily serves to confuse new gem5 users and most functionality is already
present in the stride prefetcher.
10053:b0b69dbafc08 Thu Jan 30 00:21:00 EST 2014 Mitch Hayenga <mitch.hayenga+gem5@gmail.com> mem: Allowed tagged instruction prefetching in stride prefetcher
For systems with a tightly coupled L2, a stride-based prefetcher may observe
access requests from both instruction and data L1 caches. However, the PC
address of an instruction miss gives no relevant training information to the
stride based prefetcher(there is no stride to train). In theses cases, its
better if the L2 stride prefetcher simply reverted back to a simple N-block
ahead prefetcher. This patch enables this option.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
10052:5bb8e054456b Thu Jan 30 00:21:00 EST 2014 Mitch Hayenga <mitch.hayenga+gem5@gmail.com>, Amin Farmahini <aminfar@gmail.com> mem: prefetcher: add options, support for unaligned addresses

This patch extends the classic prefetcher to work on non-block aligned
addresses. Because the existing prefetchers in gem5 mask off the lower
address bits of cache accesses, many predictable strides fail to be
detected. For example, if a load were to stride by 48 bytes, with 64 byte
cachelines, the current stride based prefetcher would see an access pattern
of 0, 64, 64, 128, 192.... Thus not detecting a constant stride pattern. This
patch fixes this, by training the prefetcher on access and not masking off the
lower address bits.

It also adds the following configuration options:
1) Training/prefetching only on cache misses,
2) Training/prefetching only on data acceses,
3) Optionally tagging prefetches with a PC address.
#3 allows prefetchers to train off of prefetch requests in systems with
multiple cache levels and PC-based prefetchers present at multiple levels.
It also effectively allows a pipelining of prefetch requests (like in POWER4)
across multiple levels of cache hierarchy.

Improves performance on my gem5 configuration by 4.3% for SPECINT and 4.7% for SPECFP (geomean).
/gem5/src/arch/arm/
H A Dtable_walker.hh10621:b7bc5b1084a4 Tue Dec 23 09:31:00 EST 2014 Curtis Dunham <Curtis.Dunham@arm.com> arm: Add stats to table walker

This patch adds table walker stats for:
- Walk events
- Instruction vs Data
- Page size histogram
- Wait time and service time histograms
- Pending requests histogram (per cycle) - measures dist. of L
(p(1..) = how often busy, p(0) = how often idle)
- Squashes, before starting and after completion
10537:47fe87b0cf97 Fri Nov 14 03:53:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> arm: Fixes based on UBSan and static analysis

Another churn to clean up undefined behaviour, mostly ARM, but some
parts also touching the generic part of the code base.

Most of the fixes are simply ensuring that proper intialisation. One
of the more subtle changes is the return type of the sign-extension,
which is changed to uint64_t. This is to avoid shifting negative
values (undefined behaviour) in the ISA code.
10474:799c8ee4ecba Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
10324:f40134eb3f85 Tue May 27 12:00:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arm: support 16kb vm granules
10037:5cac77888310 Fri Jan 24 16:29:00 EST 2014 ARM gem5 Developers arm: Add support for ARMv8 (AArch64 & AArch32)

Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.

Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.

Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
H A Dutility.cc10474:799c8ee4ecba Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
10338:8bee5f4edb92 Tue Apr 29 17:05:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arm: use condition code registers for ARM ISA

Analogous to ee049bf (for x86). Requires a bump of the checkpoint version
and corresponding upgrader code to move the condition code register values
to the new register file.
10318:98771a936b61 Wed Sep 03 07:42:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Cleanup unused ISA traits constants

This patch prunes unused values, and also unifies how the values are
defined (not using an enum for ALPHA), aligning the use of int vs Addr
etc.

The patch also removes the duplication of PageBytes/PageShift and
VMPageSize/LogVMPageSize. For all ISAs the two pairs had identical
values and the latter has been removed.
10190:fb83d025d1c3 Fri May 09 18:58:00 EDT 2014 Akash Bagdia <akash.bagdia@arm.com> cpu, arm: Allow the specification of a socket field

Allow the specification of a socket ID for every core that is reflected in the
MPIDR field in ARM systems. This allows studying multi-socket / cluster
systems with ARM CPUs.
10103:af1ec649e251 Fri Mar 07 15:56:00 EST 2014 Stephan Diestelhorst <stephan.diestelhorst@arm.com> arm: Fix uninitialised warning with gcc 4.8

Small fix for a warning that prevents compilation with gcc 4.8.1 due
to detecting that a variable might be uninitialised. The fix is to
assign a safe default.
10037:5cac77888310 Fri Jan 24 16:29:00 EST 2014 ARM gem5 Developers arm: Add support for ARMv8 (AArch64 & AArch32)

Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.

Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.

Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
/gem5/src/mem/ruby/slicc_interface/
H A DAbstractController.cc11422:4f749e00b667 Tue Nov 18 09:00:00 EST 2014 Akash Bagdia <akash.bagdia@ARM.com> power: Add power states to ClockedObject

Add 4 power states to the ClockedObject, provides necessary access functions
to check and update the power state. Default power state is UNDEFINED, it is
responsibility of the respective simulation model to provide the startup state
and any other logic for state change.

Add number of transition stat.
Add distribution of time spent in clock gated state.
Add power state residency stat.

Add dump call back function to allow stats update of distribution and residency
stats.
10524:fff17530cef6 Thu Nov 06 06:42:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: interface with classic memory controller
This patch is the final in the series. The whole series and this patch in
particular were written with the aim of interfacing ruby's directory controller
with the memory controller in the classic memory system. This is being done
since ruby's memory controller has not being kept up to date with the changes
going on in DRAMs. Classic's memory controller is more up to date and
supports multiple different types of DRAM. This also brings classic and
ruby ever more close. The patch also changes ruby's memory controller to
expose the same interface.
10311:ad9c042dce54 Mon Sep 01 17:55:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: message buffers: significant changes

This patch is the final patch in a series of patches. The aim of the series
is to make ruby more configurable than it was. More specifically, the
connections between controllers are not at all possible (unless one is ready
to make significant changes to the coherence protocol). Moreover the buffers
themselves are magically connected to the network inside the slicc code.
These connections are not part of the configuration file.

This patch makes changes so that these connections will now be made in the
python configuration files associated with the protocols. This requires
each state machine to expose the message buffers it uses for input and output.
So, the patch makes these buffers configurable members of the machines.

The patch drops the slicc code that usd to connect these buffers to the
network. Now these buffers are exposed to the python configuration system
as Master and Slave ports. In the configuration files, any master port
can be connected any slave port. The file pyobject.cc has been modified to
take care of allocating the actual message buffer. This is inline with how
other port connections work.
10012:ec5a5bfb941d Fri Jan 10 17:19:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: move all statistics to stats.txt, eliminate ruby.stats
10005:8c2b0dc16ccd Sat Jan 04 01:03:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: add support for clusters

A cluster over here means a set of controllers that can be accessed only by a
certain set of cores. For example, consider a two level hierarchy. Assume
there are 4 L1 controllers (private) and 2 L2 controllers. We can have two
different hierarchies here:

a. the address space is partitioned between the two L2 controllers. Each L1
controller accesses both the L2 controllers. In this case, each L1 controller
is a cluster initself.

b. both the L2 controllers can cache any address. An L1 controller has access
to only one of the L2 controllers. In this case, each L2 controller
along with the L1 controllers that access it, form a cluster.

This patch allows for each controller to have a cluster ID, which is 0 by
default. By setting the cluster ID properly, one can instantiate hierarchies
with clusters. Note that the coherence protocol might have to be changed as
well.
/gem5/src/mem/
H A Dpacket.hh10583:d1e1e8588881 Tue Dec 02 06:08:00 EST 2014 Curtis Dunham <Curtis.Dunham@arm.com> mem: Support WriteInvalidate (again)

This patch takes a clean-slate approach to providing WriteInvalidate
(write streaming, full cache line writes without first reading)
support.

Unlike the prior attempt, which took an aggressive approach of directly
writing into the cache before handling the coherence actions, this
approach follows the existing cache flows as closely as possible.
10572:fc4c90a7d2f5 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Relax packet src/dest check and shift onus to crossbar

This patch allows objects to get the src/dest of a packet even if it
is not set to a valid port id. This simplifies (ab)using the bridge as
a buffer and latency adapter in situations where the neighbouring
MemObjects are not crossbars.

The checks that were done in the packet are now shifted to the
crossbar where the fields are used to index into the port
arrays. Thus, the carrier of the information is not burdened with
checking, and the crossbar can check not only that the destination is
set, but also that the port index is within limits.
10571:c848de089432 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Clean up packet data allocation

This patch attempts to make the rules for data allocation in the
packet explicit, understandable, and easy to verify. The constructor
that copies a packet is extended with an additional flag "alloc_data"
to enable the call site to explicitly say whether the newly created
packet is short-lived (a zero-time snoop), or has an unknown life-time
and therefore should allocate its own data (or copy a static pointer
in the case of static data).

The tricky case is the static data. In essence this is a
copy-avoidance scheme where the original source of the request (DMA,
CPU etc) does not ask the memory system to return data as part of the
packet, but instead provides a pointer, and then the memory system
carries this pointer around, and copies the appropriate data to the
location itself. Thus any derived packet actually never copies any
data. As the original source does not copy any data from the response
packet when arriving back at the source, we must maintain the copy of
the original pointer to not break the system. We might want to revisit
this one day and pay the price for a few extra memcpy invocations.

All in all this patch should make it easier to grok what is going on
in the memory system and how data is actually copied (or not).
10570:dcb908e40547 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Cleanup Packet::checkFunctional and hasData usage

This patch cleans up the use of hasData and checkFunctional in the
packet. The hasData function is unfortunately suggesting that it
checks if the packet has a valid data pointer, when it does in fact
only check if the specific packet type is specified to have a data
payload. The confusion led to a bug in checkFunctional. The latter
function is also tidied up to avoid name overloading.
10569:ffd46545b284 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Make the requests carried by packets const

This adds a basic level of sanity checking to the packet by ensuring
that a request is not modified once the packet is created. The only
issue that had to be worked around is the relaying of
software-prefetches in the cache. The specific situation is now solved
by first copying the request, and then creating a new packet
accordingly.
10567:926802ed1536 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Add checks and explanation for assertMemInhibit usage
10566:c99c8d2a7c31 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Assume all dynamic packet data is array allocated

This patch simplifies how we deal with dynamically allocated data in
the packet, always assuming that it is array allocated, and hence
should be array deallocated (delete[] as opposed to delete). The only
uses of dataDynamic was in the Ruby testers.

The ARRAY_DATA flag in the packet is removed accordingly. No
defragmentation of the flags is done at this point, leaving a gap in
the bit masks.

As the last part the patch, it renames dataDynamicArray to dataDynamic.
10565:23593fdaadcd Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Remove redundant Packet::allocate calls

This patch cleans up the packet memory allocation confusion. The data
is always allocated at the requesting side, when a packet is created
(or copied), and there is never a need for any device to allocate any
space if it is merely responding to a paket. This behaviour is in line
with how SystemC and TLM works as well, thus increasing
interoperability, and matching established conventions.

The redundant calls to Packet::allocate are removed, and the checks in
the function are tightened up to make sure data is only ever allocated
once. There are still some oddities in the packet copy constructor
where we copy the data pointer if it is static (without ownership),
and allocate new space if the data is dynamic (with ownership). The
latter is being worked on further in a follow-on patch.
10564:a8c16e2d466a Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Use const pointers for port proxy write functions

This patch changes the various write functions in the port proxies
to use const pointers for all sources (similar to how memcpy works).

The one unfortunate aspect is the need for a const_cast in the packet,
to avoid having to juggle a const and a non-const data pointer. This
design decision can always be re-evaluated at a later stage.
10563:755b18321206 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Add const getters for write packet data

This patch takes a first step in tightening up how we use the data
pointer in write packets. A const getter is added for the pointer
itself (getConstPtr), and a number of member functions are also made
const accordingly. In a range of places throughout the memory system
the new member is used.

The patch also removes the unused isReadWrite function.
/gem5/src/arch/x86/
H A DSConscript10553:c1ad57c53a36 Sun Nov 23 21:01:00 EST 2014 Alexandru Dutu <alexandru.dutu@amd.com> kvm, x86: Adding support for SE mode execution
This patch adds methods in KvmCPU model to handle KVM exits caused by syscall
instructions and page faults. These types of exits will be encountered if
KvmCPU is run in SE mode.
10494:ffe6ab7141ab Mon Oct 20 18:03:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> x86: Fixes to avoid LTO warnings

This patch fixes a few minor issues that caused link-time warnings
when using LTO, mainly for x86. The most important change is how the
syscall array is created. Previously gcc and clang would complain that
the declaration and definition types did not match. The organisation
is now changed to match how it is done for ARM, moving the code that
was previously in syscalls.cc into process.cc, and having a class
variable pointing to the static array.

With these changes, there are no longer any warnings using gcc 4.6.3
with LTO.
10196:be0e1724eb39 Fri May 09 18:58:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arch: teach ISA parser how to split code across files

This patch encompasses several interrelated and interdependent changes
to the ISA generation step. The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units. I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner. It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type. Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths. In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.
/gem5/src/arch/sparc/isa/formats/
H A Dpriv.isa10474:799c8ee4ecba Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
10196:be0e1724eb39 Fri May 09 18:58:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arch: teach ISA parser how to split code across files

This patch encompasses several interrelated and interdependent changes
to the ISA generation step. The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units. I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner. It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type. Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths. In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.
10184:bbfa3152bdea Fri May 09 18:58:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arch: remove inline specifiers on all inst constrs, all ISAs

With (upcoming) separate compilation, they are useless. Only
link-time optimization could re-inline them, but ideally
feedback-directed optimization would choose to do so only for
profitable (i.e. common) instructions.
/gem5/src/arch/mips/
H A Dtlb.cc11422:4f749e00b667 Tue Nov 18 09:00:00 EST 2014 Akash Bagdia <akash.bagdia@ARM.com> power: Add power states to ClockedObject

Add 4 power states to the ClockedObject, provides necessary access functions
to check and update the power state. Default power state is UNDEFINED, it is
responsibility of the respective simulation model to provide the startup state
and any other logic for state change.

Add number of transition stat.
Add distribution of time spent in clock gated state.
Add power state residency stat.

Add dump call back function to allow stats update of distribution and residency
stats.
10280:5b67e1bdf6ad Wed Aug 13 06:57:00 EDT 2014 Andreas Sandberg <Andreas.Sandberg@ARM.com> mips: Remove unused private members to fix compile-time warning

Certain versions of clang complain about unused private members if
they are not used. This changeset removes such members from the
MIPS-specific classes to silence the warning.
10231:cb2e6950956d Sat May 31 21:00:00 EDT 2014 Steve Reinhardt <steve.reinhardt@amd.com> style: eliminate equality tests with true and false

Using '== true' in a boolean expression is totally redundant,
and using '== false' is pretty verbose (and arguably less
readable in most cases) compared to '!'.

It's somewhat of a pet peeve, perhaps, but I had some time
waiting for some tests to run and decided to clean these up.

Unfortunately, SLICC appears not to have the '!' operator,
so I had to leave the '== false' tests in the SLICC code.
/gem5/src/mem/cache/tags/
H A Dbase_set_assoc.cc10373:342348537a53 Fri Sep 19 10:35:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> misc: Remove assertions ensuring unsigned values >= 0
10360:919c02740209 Tue Sep 09 04:36:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> misc: Fix a number of unitialised variables and members

Static analysis unearther a bunch of uninitialised variables and
members, and this patch addresses the problem. In all cases these
omissions seem benign in the end, but at least fixing them means less
false positives next time round.
10263:c00b5ba43967 Mon Jul 28 00:23:00 EDT 2014 Anthony Gutierrez <atgutier@umich.edu> mem: refactor LRU cache tags and add random replacement tags

this patch implements a new tags class that uses a random replacement policy.
these tags prefer to evict invalid blocks first, if none are available a
replacement candidate is chosen at random.

this patch factors out the common code in the LRU class and creates a new
abstract class: the BaseSetAssoc class. any set associative tag class must
implement the functionality related to the actual replacement policy in the
following methods:

accessBlock()
findVictim()
insertBlock()
invalidate()
/gem5/src/arch/arm/insts/
H A Dpred_inst.hh10537:47fe87b0cf97 Fri Nov 14 03:53:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> arm: Fixes based on UBSan and static analysis

Another churn to clean up undefined behaviour, mostly ARM, but some
parts also touching the generic part of the code base.

Most of the fixes are simply ensuring that proper intialisation. One
of the more subtle changes is the return type of the sign-extension,
which is changed to uint64_t. This is to avoid shifting negative
values (undefined behaviour) in the ISA code.
10418:7a76e13f0101 Sat Sep 27 09:08:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arm: Fixed undefined behaviours identified by gcc

This patch fixes the runtime errors highlighted by the undefined
behaviour sanitizer. In the end there were two issues. First, when
rotating an immediate, we ended up shifting an uint32_t by 32 in some
cases. This case is fixed by checking for a rotation by 0
positions. Second, the Mrc15 and Mcr15 are operating on an IntReg and
a MiscReg, but we used the type RegRegImmOp and passed a MiscRegIndex
as an IntRegIndex. This issue is resolved by introducing a
MiscRegRegImmOp and RegMiscRegImmOp with the appropriate types.

With these fixes there are no runtime errors identified for the full
ARM regressions.
10037:5cac77888310 Fri Jan 24 16:29:00 EST 2014 ARM gem5 Developers arm: Add support for ARMv8 (AArch64 & AArch32)

Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.

Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.

Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
/gem5/src/arch/x86/isa/microops/
H A Dfpop.isa10313:01dda09b93e5 Mon Sep 01 17:55:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> x86: set op class of two fp instructions
This patch sets op class of two fp instructions: movfp and pop x87 stack
as IntAluOp since these instructions do not make use of the fp alu.
10196:be0e1724eb39 Fri May 09 18:58:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arch: teach ISA parser how to split code across files

This patch encompasses several interrelated and interdependent changes
to the ISA generation step. The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units. I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner. It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type. Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths. In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.
10184:bbfa3152bdea Fri May 09 18:58:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arch: remove inline specifiers on all inst constrs, all ISAs

With (upcoming) separate compilation, they are useless. Only
link-time optimization could re-inline them, but ideally
feedback-directed optimization would choose to do so only for
profitable (i.e. common) instructions.
H A Dlimmop.isa10196:be0e1724eb39 Fri May 09 18:58:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arch: teach ISA parser how to split code across files

This patch encompasses several interrelated and interdependent changes
to the ISA generation step. The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units. I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner. It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type. Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths. In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.
10184:bbfa3152bdea Fri May 09 18:58:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arch: remove inline specifiers on all inst constrs, all ISAs

With (upcoming) separate compilation, they are useless. Only
link-time optimization could re-inline them, but ideally
feedback-directed optimization would choose to do so only for
profitable (i.e. common) instructions.
/gem5/tests/configs/
H A Dsimple-timing-mp-ruby.py10524:fff17530cef6 Thu Nov 06 06:42:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: interface with classic memory controller
This patch is the final in the series. The whole series and this patch in
particular were written with the aim of interfacing ruby's directory controller
with the memory controller in the classic memory system. This is being done
since ruby's memory controller has not being kept up to date with the changes
going on in DRAMs. Classic's memory controller is more up to date and
supports multiple different types of DRAM. This also brings classic and
ruby ever more close. The patch also changes ruby's memory controller to
expose the same interface.
10519:7a3ad4b09ce4 Thu Nov 06 06:41:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: single physical memory in fs mode
Both ruby and the system used to maintain memory copies. With the changes
carried for programmed io accesses, only one single memory is required for
fs simulations. This patch sets the copy of memory that used to reside
with the system to null, so that no space is allocated, but address checks
can still be carried out. All the memory accesses now source and sink values
to the memory maintained by ruby.
10120:f5ceb3c3edb6 Thu Mar 20 10:14:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> config: ruby: rename _cpu_ruby_ports to _cpu_ports
/gem5/configs/example/
H A Dfs.py10608:427f988fe6e5 Tue Dec 23 09:31:00 EST 2014 Dam Sunwoo <dam.sunwoo@arm.com> config: Add options to take/resume from SimPoint checkpoints

More documentation at http://gem5.org/Simpoints

Steps to profile, generate, and use SimPoints with gem5:

1. To profile workload and generate SimPoint BBV file, use the
following option:

--simpoint-profile --simpoint-interval <interval length>

Requires single Atomic CPU and fastmem.
<interval length> is in number of instructions.

2. Generate SimPoint analysis using SimPoint 3.2 from UCSD.
(SimPoint 3.2 not included with this flow.)

3. To take gem5 checkpoints based on SimPoint analysis, use the
following option:

--take-simpoint-checkpoint=<simpoint file path>,<weight file
path>,<interval length>,<warmup length>

<simpoint file> and <weight file> is generated by SimPoint analysis
tool from UCSD. SimPoint 3.2 format expected. <interval length> and
<warmup length> are in number of instructions.

4. To resume from gem5 SimPoint checkpoints, use the following option:

--restore-simpoint-checkpoint -r <N> --checkpoint-dir <simpoint
checkpoint path>

<N> is (SimPoint index + 1). E.g., "-r 1" will resume from SimPoint
#0.
10594:4fdc929c0aaa Thu Dec 04 19:42:00 EST 2014 Gabe Black <gabeblack@google.com> config: Add two options for setting the kernel command line.

Both options accept template which will, through python string formatting,
have "mem", "disk", and "script" values substituted in from the mdesc.
Additional values can be used on a case by case basis by passing them as
keyword arguments to the fillInCmdLine function. That makes it possible to
have specialized parameters for a particular ISA, for instance.

The first option lets you specify the template directly, and the other lets
you specify a file which has the template in it.
10547:b61dc895269a Tue Nov 18 20:17:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> configs: small fix to ruby portion of fs.py and se.py
In fs.py the io port controller was being attached to the iobus multiple
times. This should be done only once. In se.py, the the option use_map
was being set which no longer exists.
10524:fff17530cef6 Thu Nov 06 06:42:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: interface with classic memory controller
This patch is the final in the series. The whole series and this patch in
particular were written with the aim of interfacing ruby's directory controller
with the memory controller in the classic memory system. This is being done
since ruby's memory controller has not being kept up to date with the changes
going on in DRAMs. Classic's memory controller is more up to date and
supports multiple different types of DRAM. This also brings classic and
ruby ever more close. The patch also changes ruby's memory controller to
expose the same interface.
10519:7a3ad4b09ce4 Thu Nov 06 06:41:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: single physical memory in fs mode
Both ruby and the system used to maintain memory copies. With the changes
carried for programmed io accesses, only one single memory is required for
fs simulations. This patch sets the copy of memory that used to reside
with the system to null, so that no space is allocated, but address checks
can still be carried out. All the memory accesses now source and sink values
to the memory maintained by ruby.
10512:b423e1d0735e Thu Oct 30 00:18:00 EDT 2014 Ali Saidi <Ali.Saidi@ARM.com> arm, tests: Update config files to more recent kernels and create 64-bit regressions.

This changes the default ARM system to a Versatile Express-like system that supports
2GB of memory and PCI devices and updates the default kernels/file-systems for
AArch64 ARM systems (64-bit) to support up to 32GB of memory and PCI devices. Some
platforms that are no longer supported have been pruned from the configuration files.

In addition a set of 64-bit ARM regressions have been added to the regression system.
10120:f5ceb3c3edb6 Thu Mar 20 10:14:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> config: ruby: rename _cpu_ruby_ports to _cpu_ports
10119:6f3f839bb496 Thu Mar 20 10:14:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> config: fs.py: move creating of test/drive systems to functions
The code that creates test and drive systems is being moved to separate
functions so as to make the code more readable. Ultimately the two
functions would be combined so that the replicated code is eliminated.
10118:5e1f04b4d5e4 Thu Mar 20 09:03:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> config: remove ruby_fs.py

The patch removes the ruby_fs.py file. The functionality is being moved to
fs.py. This would being ruby fs simulations in line with how ruby se
simulations are started (using --ruby option). The alpha fs config functions
are being combined for classing and ruby memory systems. This required
renaming the piobus in ruby to iobus. So, we will have stats being renamed
in the stats file for ruby fs regression.
10056:33db5d81c2cb Fri Jan 31 16:35:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> config: correct bug in x86 drive sys instantiation
/gem5/src/mem/ruby/profiler/
H A DProfiler.hh10301:44839e8febbd Mon Sep 01 17:55:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: move files from ruby/system to ruby/structures

The directory ruby/system is crowded and unorganized. Hence, the files the
hold actual physical structures, are being moved to the directory
ruby/structures. This includes Cache Memory, Directory Memory,
Memory Controller, Wire Buffer, TBE Table, Perfect Cache Memory, Timer Table,
Bank Array.

The directory ruby/systems has the glue code that holds these structures
together.
10094:5be102721895 Sun Mar 02 00:35:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: profiler: statically allocate stats variable
Couple of users observed segmentation fault when the simulator tries to
register the statistical variable m_IncompleteTimes. It seems that there
is some problem with the initialization of these variables when allocated
in the constructor.
10012:ec5a5bfb941d Fri Jan 10 17:19:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: move all statistics to stats.txt, eliminate ruby.stats
/gem5/src/cpu/minor/
H A Dcpu.cc11423:831c7f2f9e39 Tue Dec 09 05:42:00 EST 2014 Akash Bagdia <akash.bagdia@ARM.com> power: Low-power idle power state for idle CPUs

Add functionality to the BaseCPU that will put the entire CPU into a low-power
idle state whenever all threads in it are idle.
10407:a9023811bf9e Sat Sep 20 17:18:00 EDT 2014 Mitch Hayenga <mitch.hayenga@arm.com> alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate

activate(), suspend(), and halt() used on thread contexts had an optional
delay parameter. However this parameter was often ignored. Also, when used,
the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were
ever specified). This patch removes the delay parameter and 'Events'
associated with them across all ISAs and cores. Unused activate logic
is also removed.
10259:ebb376f73dd2 Wed Jul 23 17:09:00 EDT 2014 Andrew Bardsley <Andrew.Bardsley@arm.com> cpu: `Minor' in-order CPU model

This patch contains a new CPU model named `Minor'. Minor models a four
stage in-order execution pipeline (fetch lines, decompose into
macroops, decompose macroops into microops, execute).

The model was developed to support the ARM ISA but should be fixable
to support all the remaining gem5 ISAs. It currently also works for
Alpha, and regressions are included for ARM and Alpha (including Linux
boot).

Documentation for the model can be found in src/doc/inside-minor.doxygen and
its internal operations can be visualised using the Minorview tool
utils/minorview.py.

Minor was designed to be fairly simple and not to engage in a lot of
instruction annotation. As such, it currently has very few gathered
stats and may lack other gem5 features.

Minor is faster than the o3 model. Sample results:

Benchmark | Stat host_seconds (s)
---------------+--------v--------v--------
(on ARM, opt) | simple | o3 | minor
| timing | timing | timing
---------------+--------+--------+--------
10.linux-boot | 169 | 1883 | 1075
10.mcf | 117 | 967 | 491
20.parser | 668 | 6315 | 3146
30.eon | 542 | 3413 | 2414
40.perlbmk | 2339 | 20905 | 11532
50.vortex | 122 | 1094 | 588
60.bzip2 | 2045 | 18061 | 9662
70.twolf | 207 | 2736 | 1036
/gem5/src/mem/ruby/network/
H A DMessageBuffer.cc10524:fff17530cef6 Thu Nov 06 06:42:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: interface with classic memory controller
This patch is the final in the series. The whole series and this patch in
particular were written with the aim of interfacing ruby's directory controller
with the memory controller in the classic memory system. This is being done
since ruby's memory controller has not being kept up to date with the changes
going on in DRAMs. Classic's memory controller is more up to date and
supports multiple different types of DRAM. This also brings classic and
ruby ever more close. The patch also changes ruby's memory controller to
expose the same interface.
10348:c91b23c72d5e Wed Sep 03 07:42:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> base: Use the global Mersenne twister throughout

This patch tidies up random number generation to ensure that it is
done consistently throughout the code base. In essence this involves a
clean-up of Ruby, and some code simplifications in the traffic
generator.

As part of this patch a bunch of skewed distributions (off-by-one etc)
have been fixed.

Note that a single global random number generator is used, and that
the object instantiation order will impact the behaviour (the sequence
of numbers will be unaffected, but if module A calles random before
module B then they would obviously see a different outcome). The
dependency on the instantiation order is true in any case due to the
execution-model of gem5, so we leave it as is. Also note that the
global ranom generator is not thread safe at this point.

Regressions using the memtest, TrafficGen or any Ruby tester are
affected and will be updated accordingly.
10301:44839e8febbd Mon Sep 01 17:55:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: move files from ruby/system to ruby/structures

The directory ruby/system is crowded and unorganized. Hence, the files the
hold actual physical structures, are being moved to the directory
ruby/structures. This includes Cache Memory, Directory Memory,
Memory Controller, Wire Buffer, TBE Table, Perfect Cache Memory, Timer Table,
Bank Array.

The directory ruby/systems has the glue code that holds these structures
together.
/gem5/src/base/
H A Dstatistics.cc10491:452c860fd0ee Mon Oct 20 18:03:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> base: Fix for stats node on gcc < 4.6.3

This patch adds an explicit function to get the underlying node as gcc
4.6.1 and 4.6.2 have issues otherwise.
10453:d0365cc3d05f Thu Oct 16 05:49:00 EDT 2014 Andrew Bardsley <Andrew.Bardsley@arm.com> config: Add a --without-python option to build process

Add the ability to build libgem5 without embedded Python or the
ability to configure with Python.

This is a prelude to a patch to allow config.ini files to be loaded
into libgem5 using only C++ which would make embedding gem5 within
other simulation systems easier.

This adds a few registration interfaces to things which cross
between Python and C++. Namely: stats dumping and SimObject resolving
10011:69bd1011dcf3 Fri Jan 10 17:19:00 EST 2014 Nilay Vaish <nilay@cs.wisc.edu> stats: add function for adding two histograms
This patch adds a function to the HistStor class for adding two histograms.
This functionality is required for Ruby. It also adds support for printing
histograms in a single line.
/gem5/src/mem/cache/
H A Dmshr.cc10582:c04dc66e4316 Tue Dec 02 06:08:00 EST 2014 Curtis Dunham <Curtis.Dunham@arm.com> mem: Remove WriteInvalidate support

Prepare for a different implementation following in the next patch
10571:c848de089432 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Clean up packet data allocation

This patch attempts to make the rules for data allocation in the
packet explicit, understandable, and easy to verify. The constructor
that copies a packet is extended with an additional flag "alloc_data"
to enable the call site to explicitly say whether the newly created
packet is short-lived (a zero-time snoop), or has an unknown life-time
and therefore should allocate its own data (or copy a static pointer
in the case of static data).

The tricky case is the static data. In essence this is a
copy-avoidance scheme where the original source of the request (DMA,
CPU etc) does not ask the memory system to return data as part of the
packet, but instead provides a pointer, and then the memory system
carries this pointer around, and copies the appropriate data to the
location itself. Thus any derived packet actually never copies any
data. As the original source does not copy any data from the response
packet when arriving back at the source, we must maintain the copy of
the original pointer to not break the system. We might want to revisit
this one day and pay the price for a few extra memcpy invocations.

All in all this patch should make it easier to grok what is going on
in the memory system and how data is actually copied (or not).
10503:94d58056729f Tue Oct 21 18:04:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> mem: don't inhibit WriteInv's or defer snoops on their MSHRs

WriteInvalidate semantics depend on the unconditional writeback
or they won't complete. Also, there's no point in deferring snoops
on their MSHRs, as they don't get new data at the end of their life
cycle the way other transactions do.

Add comment in the cache about a minor inefficiency re: WriteInvalidate.
10502:f2f1dbfd505e Thu Oct 30 00:18:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> mem: have WriteInvalidate obsolete MSHRs

Since WriteInvalidate directly writes into the cache, it can
create tricky timing interleavings with reads and writes to the
same cache line that haven't yet completed. This patch ensures
that these requests, when completed, don't overwrite the newer
data from the WriteInvalidate.
10424:a910aeb89098 Thu Oct 09 17:51:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Add packet sanity checks to cache and MSHRs

This patch adds a number of asserts to the cache, checking basic
assumptions about packets being requests or responses.
10028:fb8c44de891a Fri Jan 24 16:29:00 EST 2014 Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> mem: Add support for a security bit in the memory system

This patch adds the basic building blocks required to support e.g. ARM
TrustZone by discerning secure and non-secure memory accesses.
/gem5/src/sim/
H A DSConscript11422:4f749e00b667 Tue Nov 18 09:00:00 EST 2014 Akash Bagdia <akash.bagdia@ARM.com> power: Add power states to ClockedObject

Add 4 power states to the ClockedObject, provides necessary access functions
to check and update the power state. Default power state is UNDEFINED, it is
responsibility of the respective simulation model to provide the startup state
and any other logic for state change.

Add number of transition stat.
Add distribution of time spent in clock gated state.
Add power state residency stat.

Add dump call back function to allow stats update of distribution and residency
stats.
10458:64809024b924 Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> config: Add the ability to read a config file using C++ and Python

This patch adds the ability to load in config.ini files generated from
gem5 into another instance of gem5 built without Python configuration
support. The intended use case is for configuring gem5 when it is a
library embedded in another simulation system.

A parallel config file reader is also provided purely in Python to
demonstrate the approach taken and to provided similar functionality
for as-yet-unknown use models. The Python configuration file reader
can read both .ini and .json files.

C++ configuration file reading:

A command line option has been added for scons to enable C++ configuration
file reading: --with-cxx-config

There is an example in util/cxx_config that shows C++ configuration in action.
util/cxx_config/README explains how to build the example.

Configuration is achieved by the object CxxConfigManager. It handles
reading object descriptions from a CxxConfigFileBase object which
wraps a config file reader. The wrapper class CxxIniFile is provided
which wraps an IniFile for reading .ini files. Reading .json files
from C++ would be possible with a similar wrapper and a JSON parser.

After reading object descriptions, CxxConfigManager creates
SimObjectParam-derived objects from the classes in the (generated with this
patch) directory build/ARCH/cxx_config

CxxConfigManager can then build SimObjects from those SimObjectParams (in an
order dictated by the SimObject-value parameters on other objects) and bind
ports of the produced SimObjects.

A minimal set of instantiate-replacing member functions are provided by
CxxConfigManager and few of the member functions of SimObject (such as drain)
are extended onto CxxConfigManager.

Python configuration file reading (configs/example/read_config.py):

A Python version of the reader is also supplied with a similar interface to
CxxConfigFileBase (In Python: ConfigFile) to config file readers.

The Python config file reading will handle both .ini and .json files.

The object construction strategy is slightly different in Python from the C++
reader as you need to avoid objects prematurely becoming the children of other
objects when setting parameters.

Port binding also needs to be strictly in the same port-index order as the
original instantiation.
10453:d0365cc3d05f Thu Oct 16 05:49:00 EDT 2014 Andrew Bardsley <Andrew.Bardsley@arm.com> config: Add a --without-python option to build process

Add the ability to build libgem5 without embedded Python or the
ability to configure with Python.

This is a prelude to a patch to allow config.ini files to be loaded
into libgem5 using only C++ which would make embedding gem5 within
other simulation systems easier.

This adds a few registration interfaces to things which cross
between Python and C++. Namely: stats dumping and SimObject resolving
10268:9dac4c781ad6 Sun Aug 10 05:39:00 EDT 2014 Geoffrey Blake <Geoffrey.Blake@arm.com> config: Add SubSystem container for simobjects

This patch adds the SubSystem container for grouping
simobjects together in logical subsystems to facilitate
building a larger system from constituent parts. The container
is simply a non-abstract empty simobject to hold the components
that will be connected as its children. In simulation the
object does not participate, its only use is during configuration
of the system.
10259:ebb376f73dd2 Wed Jul 23 17:09:00 EDT 2014 Andrew Bardsley <Andrew.Bardsley@arm.com> cpu: `Minor' in-order CPU model

This patch contains a new CPU model named `Minor'. Minor models a four
stage in-order execution pipeline (fetch lines, decompose into
macroops, decompose macroops into microops, execute).

The model was developed to support the ARM ISA but should be fixable
to support all the remaining gem5 ISAs. It currently also works for
Alpha, and regressions are included for ARM and Alpha (including Linux
boot).

Documentation for the model can be found in src/doc/inside-minor.doxygen and
its internal operations can be visualised using the Minorview tool
utils/minorview.py.

Minor was designed to be fairly simple and not to engage in a lot of
instruction annotation. As such, it currently has very few gathered
stats and may lack other gem5 features.

Minor is faster than the o3 model. Sample results:

Benchmark | Stat host_seconds (s)
---------------+--------v--------v--------
(on ARM, opt) | simple | o3 | minor
| timing | timing | timing
---------------+--------+--------+--------
10.linux-boot | 169 | 1883 | 1075
10.mcf | 117 | 967 | 491
20.parser | 668 | 6315 | 3146
30.eon | 542 | 3413 | 2414
40.perlbmk | 2339 | 20905 | 11532
50.vortex | 122 | 1094 | 588
60.bzip2 | 2045 | 18061 | 9662
70.twolf | 207 | 2736 | 1036
10249:6bbb7ae309ac Mon Jun 30 13:56:00 EDT 2014 Stephan Diestelhorst <stephan.diestelhorst@arm.com> power: Add basic DVFS support for gem5

Adds DVFS capabilities to gem5, by allowing users to specify lists for
frequencies and voltages in SrcClockDomains and VoltageDomains respectively.
A separate component, DVFSHandler, provides a small interface to change
operating points of the associated domains.

Clock domains will be linked to voltage domains and thus allow separate clock,
but shared voltage lines.

Currently all the valid performance-level updates are performed with a fixed
transition latency as specified for the domain.

Config file example:
...
vd = VoltageDomain(voltage = ['1V','0.95V','0.90V','0.85V'])
tsys.cluster1.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster2.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster1.clk_domain.domain_id = 0
tsys.cluster2.clk_domain.domain_id = 1
tsys.cluster1.clk_domain.voltage_domain = vd
tsys.cluster2.clk_domain.voltage_domain = vd
tsys.dvfs_handler.domains = [tsys.cluster1.clk_domain,
tsys.cluster2.clk_domain]
tsys.dvfs_handler.enable = True
/gem5/src/arch/power/
H A Dutility.hh10417:710ee116eb68 Sat Sep 27 09:08:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use const StaticInstPtr references where possible

This patch optimises the passing of StaticInstPtr by avoiding copying
the reference-counting pointer. This avoids first incrementing and
then decrementing the reference-counting pointer.
10407:a9023811bf9e Sat Sep 20 17:18:00 EDT 2014 Mitch Hayenga <mitch.hayenga@arm.com> alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate

activate(), suspend(), and halt() used on thread contexts had an optional
delay parameter. However this parameter was often ignored. Also, when used,
the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were
ever specified). This patch removes the delay parameter and 'Events'
associated with them across all ISAs and cores. Unused activate logic
is also removed.
/gem5/src/mem/ruby/network/simple/
H A DThrottle.hh10370:4466307b8a2a Mon Sep 15 17:19:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: network: revert some of the changes from ad9c042dce54
The changeset ad9c042dce54 made changes to the structures under the network
directory to use a map of buffers instead of vector of buffers.
The reasoning was that not all vnets that are created are used and we
needlessly allocate more buffers than required and then iterate over them
while processing network messages. But the move to map resulted in a slow
down which was pointed out by Andreas Hansson. This patch moves things
back to using vector of message buffers.
10311:ad9c042dce54 Mon Sep 01 17:55:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: message buffers: significant changes

This patch is the final patch in a series of patches. The aim of the series
is to make ruby more configurable than it was. More specifically, the
connections between controllers are not at all possible (unless one is ready
to make significant changes to the coherence protocol). Moreover the buffers
themselves are magically connected to the network inside the slicc code.
These connections are not part of the configuration file.

This patch makes changes so that these connections will now be made in the
python configuration files associated with the protocols. This requires
each state machine to expose the message buffers it uses for input and output.
So, the patch makes these buffers configurable members of the machines.

The patch drops the slicc code that usd to connect these buffers to the
network. Now these buffers are exposed to the python configuration system
as Master and Slave ports. In the configuration files, any master port
can be connected any slave port. The file pyobject.cc has been modified to
take care of allocating the actual message buffer. This is inline with how
other port connections work.
/gem5/src/arch/alpha/
H A Dfaults.hh10474:799c8ee4ecba Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
10417:710ee116eb68 Sat Sep 27 09:08:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use const StaticInstPtr references where possible

This patch optimises the passing of StaticInstPtr by avoiding copying
the reference-counting pointer. This avoids first incrementing and
then decrementing the reference-counting pointer.
/gem5/src/arch/x86/isa/formats/
H A Dunknown.isa10474:799c8ee4ecba Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
10196:be0e1724eb39 Fri May 09 18:58:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arch: teach ISA parser how to split code across files

This patch encompasses several interrelated and interdependent changes
to the ISA generation step. The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units. I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner. It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type. Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths. In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.

Completed in 285 milliseconds

<<41424344454647484950>>