Searched hist:2014 (Results 1376 - 1400 of 1681) sorted by relevance

<<51525354555657585960>>

/gem5/src/mem/slicc/symbols/
H A DType.py10472:399f35ed5cca Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Use shared_ptr for Ruby Message classes

This patch transitions the Ruby Message and its derived classes from
the ad-hoc RefCountingPtr to the c++11 shared_ptr. There are no
changes in behaviour, and the code modifications are mainly replacing
"new" with "make_shared".

The cloning of derived messages is slightly changed as they previously
relied on overriding the base-class through covariant return types.
10307:6df951dcd7d9 Mon Sep 01 17:55:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: slicc: improve the grammar
This patch changes the grammar for SLICC so as to remove some of the
redundant / duplicate rules. In particular rules for object/variable
declaration and class member declaration have been unified. Similarly, the
rules for a general function and a class method have been unified.

One more change is in the priority of two rules. The first rule is on
declaring a function with all the params typed and named. The second rule is
on declaring a function with all the params only typed. Earlier the second
rule had a higher priority. Now the first rule has a higher priority.
10301:44839e8febbd Mon Sep 01 17:55:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: move files from ruby/system to ruby/structures

The directory ruby/system is crowded and unorganized. Hence, the files the
hold actual physical structures, are being moved to the directory
ruby/structures. This includes Cache Memory, Directory Memory,
Memory Controller, Wire Buffer, TBE Table, Perfect Cache Memory, Timer Table,
Bank Array.

The directory ruby/systems has the glue code that holds these structures
together.
10228:1a85c4fc805c Fri May 23 07:07:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> ruby: slicc: remove unused ids DNUCA*
/gem5/src/arch/x86/isa/insts/general_purpose/data_transfer/
H A Dmove.py10544:049273bc03f6 Mon Nov 17 04:00:00 EST 2014 Gabe Black <gabeblack@google.com> x86: Fix setting segment bases in real mode.

The data size used for actually writing the base value for the segment was the
default size, but really it should set the entire value without any possible
truncation.
/gem5/src/mem/
H A Dpacket.cc10570:dcb908e40547 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Cleanup Packet::checkFunctional and hasData usage

This patch cleans up the use of hasData and checkFunctional in the
packet. The hasData function is unfortunately suggesting that it
checks if the packet has a valid data pointer, when it does in fact
only check if the specific packet type is specified to have a data
payload. The confusion led to a bug in checkFunctional. The latter
function is also tidied up to avoid name overloading.
10565:23593fdaadcd Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Remove redundant Packet::allocate calls

This patch cleans up the packet memory allocation confusion. The data
is always allocated at the requesting side, when a packet is created
(or copied), and there is never a need for any device to allocate any
space if it is merely responding to a paket. This behaviour is in line
with how SystemC and TLM works as well, thus increasing
interoperability, and matching established conventions.

The redundant calls to Packet::allocate are removed, and the checks in
the function are tightened up to make sure data is only ever allocated
once. There are still some oddities in the packet copy constructor
where we copy the data pointer if it is static (without ownership),
and allocate new space if the data is dynamic (with ownership). The
latter is being worked on further in a follow-on patch.
10563:755b18321206 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Add const getters for write packet data

This patch takes a first step in tightening up how we use the data
pointer in write packets. A const getter is added for the pointer
itself (getConstPtr), and a number of member functions are also made
const accordingly. In a range of places throughout the memory system
the new member is used.

The patch also removes the unused isReadWrite function.
10376:28c63d075e0c Fri Sep 19 10:35:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> misc: Use safe_cast when assumptions are made about return value

This patch changes two dynamic_cast to safe_cast as we assume the
return value is not NULL (without checking).
10345:b5bef3c8e070 Fri Jun 27 01:29:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> mem: write streaming support via WriteInvalidate promotion

Support full-block writes directly rather than requiring RMW:
* a cache line is allocated in the cache upon receipt of a
WriteInvalidateReq, not the WriteInvalidateResp.
* only top-level caches allocate the line; the others just pass
the request along and invalidate as necessary.
* to close a timing window between the *Req and the *Resp, a new
metadata bit tracks whether another cache has read a copy of
the new line before the writeback to memory.
10325:7aacec2a247d Wed Sep 03 07:42:00 EDT 2014 Geoffrey Blake <geoffrey.blake@arm.com> cache: Fix handling of LL/SC requests under contention

If a set of LL/SC requests contend on the same cache block we
can get into a situation where CPUs will deadlock if they expect
a failed SC to supply them data. This case happens where 3 or
more cores are contending for a cache block using LL/SC and the system
is configured where 2 cores are connected to a local bus and the
third is connected to a remote bus. If a core on the local bus
sends an SCUpgrade and the core on the remote bus sends and SCUpgrade
they will race to see who will win the SC access. In the meantime
if the other core appends a read to one of the SCUpgrades it will expect
to be supplied data by that SCUpgrade transaction. If it happens that
the SCUpgrade that was picked to supply the data is failed, it will
drop the appended request for data and never respond, leaving the requesting
core to deadlock. This patch makes all SC's behave as normal stores to
prevent this case but still makes sure to check whether it can perform
the update.
10028:fb8c44de891a Fri Jan 24 16:29:00 EST 2014 Giacomo Gabrielli <Giacomo.Gabrielli@arm.com> mem: Add support for a security bit in the memory system

This patch adds the basic building blocks required to support e.g. ARM
TrustZone by discerning secure and non-secure memory accesses.
/gem5/src/sim/
H A Dsystem.cc10553:c1ad57c53a36 Sun Nov 23 21:01:00 EST 2014 Alexandru Dutu <alexandru.dutu@amd.com> kvm, x86: Adding support for SE mode execution
This patch adds methods in KvmCPU model to handle KVM exits caused by syscall
instructions and page faults. These types of exits will be encountered if
KvmCPU is run in SE mode.
10494:ffe6ab7141ab Mon Oct 20 18:03:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> x86: Fixes to avoid LTO warnings

This patch fixes a few minor issues that caused link-time warnings
when using LTO, mainly for x86. The most important change is how the
syscall array is created. Previously gcc and clang would complain that
the declaration and definition types did not match. The organisation
is now changed to match how it is done for ARM, moving the code that
was previously in syscalls.cc into process.cc, and having a class
variable pointing to the static array.

With these changes, there are no longer any warnings using gcc 4.6.3
with LTO.
10466:73b7549d979e Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Dynamically determine page bytes in memory components

This patch takes a step towards an ISA-agnostic memory
system by enabling the components to establish the page size after
instantiation. The swap operation in the memory is now also allowing
any granularity to avoid depending on the IntReg of the ISA.
10375:b1bc989611da Fri Sep 19 10:35:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> misc: Restore ostream flags where needed

This patch ensures we adhere to the normal ostream usage rules, and
restore the flags after modifying them.
10367:bf52480abd01 Fri Sep 12 10:22:00 EDT 2014 Andrew Bardsley <Andrew.Bardsley@arm.com> style: Fix line continuation, especially in debug messages

This patch closes a number of space gaps in debug messages caused by
the incorrect use of line continuation within strings. (There's also
one consistency change to a similar, but correct, use of line
continuation)
10360:919c02740209 Tue Sep 09 04:36:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> misc: Fix a number of unitialised variables and members

Static analysis unearther a bunch of uninitialised variables and
members, and this patch addresses the problem. In all cases these
omissions seem benign in the end, but at least fixing them means less
false positives next time round.
10318:98771a936b61 Wed Sep 03 07:42:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Cleanup unused ISA traits constants

This patch prunes unused values, and also unifies how the values are
defined (not using an enum for ALPHA), aligning the use of int vs Addr
etc.

The patch also removes the duplication of PageBytes/PageShift and
VMPageSize/LogVMPageSize. For all ISAs the two pairs had identical
values and the latter has been removed.
10282:3ea92bc6393b Wed Aug 13 06:57:00 EDT 2014 Dam Sunwoo <dam.sunwoo@arm.com> sim: remove kernel mapping check for baremetal workloads

Baremetal workloads are specified using the "kernel" parameter, but
don't always have the correct address mappings. This patch adds a
boolean flag to the system and bypasses the kernel addr mapping checks
when running in baremetal mode.
10255:b71e6c577768 Sat Jul 19 01:05:00 EDT 2014 Steve Reinhardt <steve.reinhardt@amd.com> sim: remove unused MemoryModeStrings array

The System object has a static MemoryModeStrings array
that's (1) unused and (2) redundant, since there's an
auto-generated version in the Enums namespace. No
point in leaving it in.
10037:5cac77888310 Fri Jan 24 16:29:00 EST 2014 ARM gem5 Developers arm: Add support for ARMv8 (AArch64 & AArch32)

Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.

Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.

Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
H A Dsyscall_emul.cc10559:62f5f7363197 Mon Nov 24 09:03:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> misc: Another round of static analysis fixups

Mostly addressing uninitialised members.
10500:5e0a421e2031 Tue Sep 02 17:07:00 EDT 2014 Steve Reinhardt <steve.reinhardt@amd.com> syscall_emul: add retry flag to SyscallReturn

This hook allows blocking emulated system calls to indicate
that they would block, but return control to the simulator
so that the simulation does not hang. The actual retry
functionality requires additional support, to be provided
in a future changeset.
10495:75d2f19fecce Wed Oct 22 16:59:00 EDT 2014 Nilay Vaish <nilay@cs.wisc.edu> sim: revert 6709bbcf564d
The identifier SYS_getdents is not available on Mac OS X. Therefore, its use
results in compilation failure. It seems there is no straight forward way to
implement the system call getdents using readdir() or similar C functions.
Hence the commit 6709bbcf564d is being rolled back.
10484:6709bbcf564d Mon Oct 20 17:44:00 EDT 2014 Michael Adler <Michael.Adler@intel.com> sim: implement getdents/getdents64 in user mode

Has been tested only for alpha.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
10483:8e18835c0dea Mon Oct 20 17:43:00 EDT 2014 Severin Wischmann <wiseveri@student.ethz.ch>, Ioannis Ilkos <ioannis.ilkos09@imperial.ac.uk> x86: syscall: implementation of exit_group
On exit_group syscall, we used to exit the simulator. But now we will only
halt the execution of threads that belong to the group.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
10318:98771a936b61 Wed Sep 03 07:42:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Cleanup unused ISA traits constants

This patch prunes unused values, and also unifies how the values are
defined (not using an enum for ALPHA), aligning the use of int vs Addr
etc.

The patch also removes the duplication of PageBytes/PageShift and
VMPageSize/LogVMPageSize. For all ISAs the two pairs had identical
values and the latter has been removed.
10257:563696c791d2 Sat Jul 19 05:06:00 EDT 2014 Steve Reinhardt <steve.reinhardt@amd.com> syscall emulation: fix fast build issue

Surprisingly gcc will complain about unused variables even
inside an 'if (false)' block.

I thought I had tested this previously, but apparently not.
10253:00086092d9f7 Sat Jul 19 01:05:00 EDT 2014 Steve Reinhardt <steve.reinhardt@amd.com> syscall emulation: fix DPRINTF arg ordering bug

When we switched getSyscallArg() from explicit arg indices to
the implicit method, some DPRINTF arguments were left as calls
to getSyscallArg(), even though C/C++ doesn't guarantee
anything about the order of invocation of these calls. As a
result, the args could be printed out in arbitrary orders.

Interestingly, this bug has been around since 2009:
http://repo.gem5.org/gem5/rev/4842482e1bd1
10223:34f48d0dac97 Mon May 12 17:23:00 EDT 2014 Steve Reinhardt <steve.reinhardt@amd.com> syscall emulation: clean up & comment SyscallReturn
10203:3b9e1fa3da47 Thu Apr 17 17:55:00 EDT 2014 Ali Saidi <Ali.Saidi@ARM.com> sim, arm: implement more of the at variety syscalls

Needed for new AArch64 binaries
H A Deventq.hh10476:f058e09b7d69 Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> sim: EventQueue wakeup on events scheduled outside the event loop

This patch adds a 'wakeup' member function to EventQueue which should be
called on an event queue whenever an event is scheduled on the event queue
from outside code within the call tree of the gem5 event loop.

This clearly isn't necessary for normal gem5 EventQueue operation but
becomes the minimum necessary interface to allow hosting gem5's event loop
onto other schedulers where there may be calls into gem5 from external
code which schedules events onto an EventQueue between the current time and
the time of the next scheduled event.

The use case I have in mind is a SystemC hosting where the event loop is:

while (more events) {
wait(time_to_next_event or wakeup)
setCurTick
service events at this time
}

where the 'wait' needs to be woken up if time_to_next_event becomes shorter
due to a scheduled event from SystemC arriving in a gem5 object.

Requiring 'wakeup' to be called is a more efficient interface than
requiring all gem5 event scheduling actions to affect the host scheduler.

This interface could be located elsewhere, say on another global object,
or by being passed by the host scheduler to objects which will schedule
such events, but it seems cleanest to put it on EventQueue as it is
actually a signal to the queue.

EventQueue::wakeup is called for async_event events on event queue 0 as
it's only important that *some* queue be triggered for such events.
10412:6400a2ab4e22 Sat Sep 27 09:08:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> misc: Fix a bunch of minor issues identified by static analysis

Add some missing initialisation, and fix a handful benign resource
leaks (including some false positives).
10360:919c02740209 Tue Sep 09 04:36:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> misc: Fix a number of unitialised variables and members

Static analysis unearther a bunch of uninitialised variables and
members, and this patch addresses the problem. In all cases these
omissions seem benign in the end, but at least fixing them means less
false positives next time round.
10249:6bbb7ae309ac Mon Jun 30 13:56:00 EDT 2014 Stephan Diestelhorst <stephan.diestelhorst@arm.com> power: Add basic DVFS support for gem5

Adds DVFS capabilities to gem5, by allowing users to specify lists for
frequencies and voltages in SrcClockDomains and VoltageDomains respectively.
A separate component, DVFSHandler, provides a small interface to change
operating points of the associated domains.

Clock domains will be linked to voltage domains and thus allow separate clock,
but shared voltage lines.

Currently all the valid performance-level updates are performed with a fixed
transition latency as specified for the domain.

Config file example:
...
vd = VoltageDomain(voltage = ['1V','0.95V','0.90V','0.85V'])
tsys.cluster1.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster2.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster1.clk_domain.domain_id = 0
tsys.cluster2.clk_domain.domain_id = 1
tsys.cluster1.clk_domain.voltage_domain = vd
tsys.cluster2.clk_domain.voltage_domain = vd
tsys.dvfs_handler.domains = [tsys.cluster1.clk_domain,
tsys.cluster2.clk_domain]
tsys.dvfs_handler.enable = True
10153:936a3a8006f6 Thu Apr 03 05:22:00 EDT 2014 Andreas Sandberg <andreas@sandberg.pp.se> sim: Add the ability to lock and migrate between event queues

We need the ability to lock event queues to enable device accesses
across threads. The serviceOne() method now takes a service lock prior
to handling a new event. By locking an event queue, a different
thread/eq can effectively execute in the context of the locked event
queue. To simplify temporary event queue migrations, this changeset
introduces the EventQueue::ScopedMigration class that unlocks the
current event queue, locks a new event queue, and updates the current
event queue variable.

In order to prevent deadlocks, event queues need to be released when
waiting on barriers. This is implemented using the
EventQueue::ScopedRelease class. An instance of this class is, for
example, used in the BaseGlobalEvent class to release the event queue
when waiting on the synchronization barrier.

The intended use for this functionality is when devices need to be
accessed across thread boundaries. For example, when fast-forwarding,
it might be useful to run devices and CPUs in separate threads. In
such a case, the CPU locks the device queue whenever it needs to
perform IO. This functionality is primarily intended for KVM.

Note: Migrating between event queues can lead to non-deterministic
timing. Use with extreme care!
/gem5/src/cpu/o3/
H A Diew_impl.hh10510:7e54a9a9f6b2 Thu Oct 30 00:18:00 EDT 2014 Mitch Hayenga <mitch.hayenga@arm.com> cpu: Add drain check functionality to IEW

IEW did not check the instQueue and memDepUnit to ensure
they were drained. This caused issues when drainSanityCheck()
did check those structures after asserting IEW was drained.
10333:6be8945d226b Wed Sep 03 07:42:00 EDT 2014 Mitch Hayenga <mitch.hayenga@arm.com> cpu: Fix cache blocked load behavior in o3 cpu

This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than
flushing the entire pipeline, this patch replays loads once the cache becomes
unblocked.

Additionally, deferred memory instructions (loads which had conflicting stores),
when replayed would not respect the number of functional units (only respected
issue width). This patch also corrects that.

Improvements over 20% have been observed on a microbenchmark designed to
exercise this behavior.
10328:867b536a68be Wed Sep 03 07:42:00 EDT 2014 Mitch Hayenga <mitch.hayenga@arm.com> cpu: Fix o3 front-end pipeline interlock behavior

The o3 pipeline interlock/stall logic is incorrect. o3 unnecessicarily stalled
fetch and decode due to later stages in the pipeline. In general, a stage
should usually only consider if it is stalled by the adjacent, downstream stage.
Forcing stalls due to later stages creates and results in bubbles in the
pipeline. Additionally, o3 stalled the entire frontend (fetch, decode, rename)
on a branch mispredict while the ROB is being serially walked to update the
RAT (robSquashing). Only should have stalled at rename.
10327:5b6279635c49 Wed Sep 03 07:42:00 EDT 2014 Mitch Hayenga <mitch.hayenga@arm.com> cpu: Change writeback modeling for outstanding instructions

As highlighed on the mailing list gem5's writeback modeling can impact
performance. This patch removes the limitation on maximum outstanding issued
instructions, however the number that can writeback in a single cycle is still
respected in instToCommit().
10240:15f822e9410a Sat Jun 21 13:26:00 EDT 2014 Binh Pham <binhpham@cs.rutgers.edu> o3: make dispatch LSQ full check more selective

Dispatch should not check LSQ size/LSQ stall for non load/store
instructions.

This work was done while Binh was an intern at AMD Research.
10239:592f0bb6bd6f Sat Jun 21 13:26:00 EDT 2014 Binh Pham <binhpham@cs.rutgers.edu> o3: split load & store queue full cases in rename

Check for free entries in Load Queue and Store Queue separately to
avoid cases when load cannot be renamed due to full Store Queue and
vice versa.

This work was done while Binh was an intern at AMD Research.
10231:cb2e6950956d Sat May 31 21:00:00 EDT 2014 Steve Reinhardt <steve.reinhardt@amd.com> style: eliminate equality tests with true and false

Using '== true' in a boolean expression is totally redundant,
and using '== false' is pretty verbose (and arguably less
readable in most cases) compared to '!'.

It's somewhat of a pet peeve, perhaps, but I had some time
waiting for some tests to run and decided to clean these up.

Unfortunately, SLICC appears not to have the '!' operator,
so I had to leave the '== false' tests in the SLICC code.
10172:790a214be1f4 Wed Apr 23 05:18:00 EDT 2014 Dam Sunwoo <dam.sunwoo@arm.com> cpu: Add O3 CPU width checks

O3CPU has a compile-time maximum width set in o3/impl.hh, but checking
the configuration against this limit was not implemented anywhere
except for fetch. Configuring a wider pipe than the limit can silently
cause various issues during the simulation. This patch adds the proper
checking in the constructor of the various pipeline stages.
10164:2d2c60bda8b2 Sat Apr 19 10:00:00 EDT 2014 Faissal Sleiman <sleimanf@umich.edu> o3: Fix occupancy checks for SMT
A number of calls to isEmpty() and numFreeEntries()
should be thread-specific.

In cpu.cc, the fact that tid is /*commented*/ out is a bug. Say the rob
has instructions from thread 0 (isEmpty() returns false), and none from
thread 1. If we are trying to squash all of thread 1, then
readTailInst(thread 1) will be called because rob->isEmpty() returns
false. The result is end_it is not in the list and the while
statement loops indefinitely back over the cpu's instList.

In iew_impl.hh, all threads are told they have the entire remaining IQ, when
each thread actually has a certain allocation. The result is extra stalls at
the iew dispatch stage which the rename stage usually takes care of.

In commit_impl.hh, rob->readHeadInst(thread 1) can be called if the rob only
contains instructions from thread 0. This returns a dummyInst (which may work
since we are trying to squash all instructions, but hardly seems like the right
way to do it).

In rob_impl.hh this fix skips the rest of the function more frequently and is
more efficient.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
10023:91faf6649de0 Fri Jan 24 16:29:00 EST 2014 Matt Horsnell <matt.horsnell@ARM.com> base: add support for probe points and common probes

The probe patch is motivated by the desire to move analytical and trace code
away from functional code. This is achieved by the probe interface which is
essentially a glorified observer model.

What this means to users:
* add a probe point and a "notify" call at the source of an "event"
* add an isolated module, that is being used to carry out *your* analysis (e.g. generate a trace)
* register that module as a probe listener
Note: an example is given for reference in src/cpu/o3/simple_trace.[hh|cc] and src/cpu/SimpleTrace.py

What is happening under the hood:
* every SimObject maintains has a ProbeManager.
* during initialization (src/python/m5/simulate.py) first regProbePoints and
the regProbeListeners is called on each SimObject. this hooks up the probe
point notify calls with the listeners.

FAQs:
Why did you develop probe points:
* to remove trace, stats gathering, analytical code out of the functional code.
* the belief that probes could be generically useful.

What is a probe point:
* a probe point is used to notify upon a given event (e.g. cpu commits an instruction)

What is a probe listener:
* a class that handles whatever the user wishes to do when they are notified
about an event.

What can be passed on notify:
* probe points are templates, and so the user can generate probes that pass any
type of argument (by const reference) to a listener.

What relationships can be generated (1:1, 1:N, N:M etc):
* there isn't a restriction. You can hook probe points and listeners up in a
1:1, 1:N, N:M relationship. They become useful when a number of modules
listen to the same probe points. The idea being that you can add a small
number of probes into the source code and develop a larger number of useful
analysis modules that use information passed by the probes.

Can you give examples:
* adding a probe point to the cpu's commit method allows you to build a trace
module (outputting assembler), you could re-use this to gather instruction
distribution (arithmetic, load/store, conditional, control flow) stats.

Why is the probe interface currently restricted to passing a const reference:
* the desire, initially at least, is to allow an interface to observe
functionality, but not to change functionality.
* of course this can be subverted by const-casting.

What is the performance impact of adding probes:
* when nothing is actively listening to the probes they should have a
relatively minor impact. Profiling has suggested even with a large number of
probes (60) the impact of them (when not active) is very minimal (<1%).
H A Dlsq_unit_impl.hh10575:a8d612fa170b Tue Dec 02 06:08:00 EST 2014 Marco Elver <Marco.Elver@ARM.com> cpu, o3: Ignored invalidate causing same-address load reordering

In case the memory subsystem sends a combined response with invalidate
(e.g. ReadRespWithInvalidate), we cannot ignore the invalidate part
of the response.

If we were to ignore the invalidate part, under certain circumstances
this effectively leads to reordering of loads to the same address
which is not permitted under any memory consistency model implemented
in gem5.

Consider the case where a later load's address is computed before an
earlier load in program order, and is therefore sent to the memory
subsystem first. At some point the earlier load's address is computed
and in doing so correctly marks the later load as a
possibleLoadViolation. In the meantime some other node writes and
sends invalidations to all other nodes. The invalidation races with
the later load's ReadResp, and arrives before ReadResp and is
deferred. Upon receipt of the ReadResp, the response is changed to
ReadRespWithInvalidate, and sent to the CPU. If we ignore the
invalidate part of the packet, we let the later load read the old
value of the address. Eventually the earlier load's ReadResp arrives,
but with new data. As there was no invalidate snoop (sunk into the
ReadRespWithInvalidate), and if we did not process the invalidate of
the ReadRespWithInvalidate, we obtain a load reordering.

A similar scenario can be constructed where the earlier load's address
is computed after ReadRespWithInvalidate arrives for the younger
load. In this case hitExternalSnoop needs to be set to true on the
ReadRespWithInvalidate, so that upon knowing the address of the
earlier load, checkViolations will cause the later load to be
squashed.

Finally we must account for the case where both loads are sent to the
memory subsystem (reordered), a snoop invalidate arrives and correctly
sets the later loads fault to ReExec. However, before the CPU
processes the fault, the later load's ReadResp arrives and the
writeback discards the outstanding fault. We must add a check to
ensure that we do not skip any unprocessed faults.
10573:3b405d11d6dc Tue Dec 02 06:07:00 EST 2014 Stephan Diestelhorst <stephan.diestelhorst@arm.com> cpu: Move packet deallocation to recvTimingResp in the O3 CPU

Move the packet deallocations in the O3 CPU so that the completeDataAccess
deals only with the LSQ specific parts and the generic recvTimingResp frees the
packet in all other cases.
10474:799c8ee4ecba Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
10386:c81407818741 Sat Sep 20 17:17:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> base: Clean up redundant string functions and use C++11

This patch does a bit of housekeeping on the string helper functions
and relies on the C++11 standard library where possible. It also does
away with our custom string hash as an implementation is already part
of the standard library.
10342:711eb0e64249 Tue May 13 01:20:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> mem: Refactor assignment of Packet types

Put the packet type swizzling (that is currently done in a lot of places)
into a refineCommand() member function.
10333:6be8945d226b Wed Sep 03 07:42:00 EDT 2014 Mitch Hayenga <mitch.hayenga@arm.com> cpu: Fix cache blocked load behavior in o3 cpu

This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than
flushing the entire pipeline, this patch replays loads once the cache becomes
unblocked.

Additionally, deferred memory instructions (loads which had conflicting stores),
when replayed would not respect the number of functional units (only respected
issue width). This patch also corrects that.

Improvements over 20% have been observed on a microbenchmark designed to
exercise this behavior.
10327:5b6279635c49 Wed Sep 03 07:42:00 EDT 2014 Mitch Hayenga <mitch.hayenga@arm.com> cpu: Change writeback modeling for outstanding instructions

As highlighed on the mailing list gem5's writeback modeling can impact
performance. This patch removes the limitation on maximum outstanding issued
instructions, however the number that can writeback in a single cycle is still
respected in instToCommit().
10239:592f0bb6bd6f Sat Jun 21 13:26:00 EDT 2014 Binh Pham <binhpham@cs.rutgers.edu> o3: split load & store queue full cases in rename

Check for free entries in Load Queue and Store Queue separately to
avoid cases when load cannot be renamed due to full Store Queue and
vice versa.

This work was done while Binh was an intern at AMD Research.
10231:cb2e6950956d Sat May 31 21:00:00 EDT 2014 Steve Reinhardt <steve.reinhardt@amd.com> style: eliminate equality tests with true and false

Using '== true' in a boolean expression is totally redundant,
and using '== false' is pretty verbose (and arguably less
readable in most cases) compared to '!'.

It's somewhat of a pet peeve, perhaps, but I had some time
waiting for some tests to run and decided to clean these up.

Unfortunately, SLICC appears not to have the '!' operator,
so I had to leave the '== false' tests in the SLICC code.
10175:e639ff917d2e Tue Apr 01 15:22:00 EDT 2014 Mitch Hayenga <Mitch.Hayenga@ARM.com> cpu: Fix case where o3 lsq could print out uninitialized data

In the O3 LSQ, data read/written is printed out in DPRINTFs. However,
the data field is treated as a character string with a null terminated.
However the data field is not encoded this way. This patch removes
that possibility by removing the data part of the print.
/gem5/src/arch/arm/
H A Dtlb.cc10611:3bba9f2d0c7d Tue Dec 23 09:31:00 EST 2014 Andreas Sandberg <Andreas.Sandberg@ARM.com> arm: Raise an alignment fault if a PC has illegal alignment

We currently don't handle unaligned PCs correctly. There is one check
for unaligned PCs in the TLB when running in aarch64 mode, but this
check does not cover cases where the CPU does not do a TLB lookup when
decoding an instruction (e.g., a branch stays within the same cache
line). Additionally, the Decoder class sometimes throws an assertion
for unaligned PCs which breaks speculation.

This changeset introduces a decoder fault bit field in the ExtMachInst
structure. This field can be used to signal a decoder failure. If set,
the decoder generates an internal gem5fault instruction instead of a
normal instruction. This instruction in turns either panics (fault
type PANIC), returns an PCAlignmentFault (fault type UNALIGNED,
aarch64) or PrefetchAbort (fault type UNALIGNED, aarch32).

The patch causes minor changes to the realview64 regressions, and a
stats bump will follow.
10537:47fe87b0cf97 Fri Nov 14 03:53:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> arm: Fixes based on UBSan and static analysis

Another churn to clean up undefined behaviour, mostly ARM, but some
parts also touching the generic part of the code base.

Most of the fixes are simply ensuring that proper intialisation. One
of the more subtle changes is the return type of the sign-extension,
which is changed to uint64_t. This is to avoid shifting negative
values (undefined behaviour) in the ISA code.
10508:aa46a8ae3487 Thu Oct 30 00:18:00 EDT 2014 Ali Saidi <Ali.Saidi@ARM.com> arm: Fix multi-system AArch64 boot w/caches.

Automatically extract cpu release address from DTB file.
Check SCTLR_EL1 to verify all caches are enabled.
10474:799c8ee4ecba Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
10463:25c5da51bbe0 Thu Oct 16 05:49:00 EDT 2014 Andreas Sandberg <Andreas.Sandberg@ARM.com> arm: Add TLB PMU probes

This changeset adds probe points that can be used to implement PMU
counters for TLB stats. The following probes are supported:

* ArmISA::TLB::ppRefills / TLB Refills (TLB insertions)
10418:7a76e13f0101 Sat Sep 27 09:08:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arm: Fixed undefined behaviours identified by gcc

This patch fixes the runtime errors highlighted by the undefined
behaviour sanitizer. In the end there were two issues. First, when
rotating an immediate, we ended up shifting an uint32_t by 32 in some
cases. This case is fixed by checking for a rotation by 0
positions. Second, the Mrc15 and Mcr15 are operating on an IntReg and
a MiscReg, but we used the type RegRegImmOp and passed a MiscRegIndex
as an IntRegIndex. This issue is resolved by introducing a
MiscRegRegImmOp and RegMiscRegImmOp with the appropriate types.

With these fixes there are no runtime errors identified for the full
ARM regressions.
10367:bf52480abd01 Fri Sep 12 10:22:00 EDT 2014 Andrew Bardsley <Andrew.Bardsley@arm.com> style: Fix line continuation, especially in debug messages

This patch closes a number of space gaps in debug messages caused by
the incorrect use of line continuation within strings. (There's also
one consistency change to a similar, but correct, use of line
continuation)
10194:e6d2e8083d9c Fri May 09 18:58:00 EDT 2014 Geoffrey Blake <Geoffrey.Blake@arm.com> arch, arm: Preserve TLB bootUncacheability when switching CPUs

The ARM TLBs have a bootUncacheability flag used to make some loads
and stores become uncacheable when booting in FS mode. Later the
flag is cleared to let those loads and stores operate as normal. When
doing a takeOverFrom(), this flag's state is not preserved and is
momentarily reset until the CPSR is touched. On single core runs this
is a non-issue. On multi-core runs this can lead to crashes on the O3
CPU model from the following series of events:
1) takeOverFrom executed to switch from Atomic -> O3
2) All bootUncacheability flags are reset to true
3) Core2 tries to execute a load covered by bootUncacheability, it
is flagged as uncacheable
4) Core2's load needs to replay due to a pipeline flush
3) Core1 core does an action on CPSR
4) The handling code for CPSR then checks all other cores
to determine if bootUncacheability can be set to false
5) Asynchronously set bootUncacheability on all cores to false
6) Core2 replays load previously set as uncacheable and notices
it is now flagged as cacheable, leads to a panic.
This patch implements takeOverFrom() functionality for the ARM TLBs
to preserve flag values when switching from atomic -> detailed.
10037:5cac77888310 Fri Jan 24 16:29:00 EST 2014 ARM gem5 Developers arm: Add support for ARMv8 (AArch64 & AArch32)

Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.

Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.

Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
10024:fc10e1f9f124 Fri Jan 24 16:29:00 EST 2014 Dam Sunwoo <dam.sunwoo@arm.com> mem: per-thread cache occupancy and per-block ages

This patch enables tracking of cache occupancy per thread along with
ages (in buckets) per cache blocks. Cache occupancy stats are
recalculated on each stat dump.
H A Disa.hh10508:aa46a8ae3487 Thu Oct 30 00:18:00 EDT 2014 Ali Saidi <Ali.Saidi@ARM.com> arm: Fix multi-system AArch64 boot w/caches.

Automatically extract cpu release address from DTB file.
Check SCTLR_EL1 to verify all caches are enabled.
10461:afeb5cdb3907 Thu Oct 16 05:49:00 EDT 2014 Andreas Sandberg <Andreas.Sandberg@ARM.com> arm: Add a model of an ARM PMUv3

This class implements a subset of the ARM PMU v3 specification as
described in the ARMv8 reference manual. It supports most of the
features of the PMU, however the following features are known to be
missing:

* Event filtering (e.g., from different privilege levels).
* Access controls (the PMU currently ignores the execution level).
* The chain counter (event no. 0x1E) is unimplemented.

The PMU itself does not implement any events, it merely provides an
interface for the configuration scripts to hook up probes that drive
events. Configuration scripts should call addEventProbe() to configure
custom events or high-level methods to configure architected
events. The Python implementation of addEventProbe() automatically
delays event type registration until after instantiation.

In order to support CPU switching and some combined counters (e.g.,
memory references synthesized from loads and stores), the PMU allows
multiple probes per event type. When creating a system that switches
between CPU models that share the same PMU, PMU events for all of the
CPU models can be registered with the PMU.

Kudos to Matt Horsnell for the initial gem5 implementation of the PMU.
10338:8bee5f4edb92 Tue Apr 29 17:05:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arm: use condition code registers for ARM ISA

Analogous to ee049bf (for x86). Requires a bump of the checkpoint version
and corresponding upgrader code to move the condition code register values
to the new register file.
10037:5cac77888310 Fri Jan 24 16:29:00 EST 2014 ARM gem5 Developers arm: Add support for ARMv8 (AArch64 & AArch32)

Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.

Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.

Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
10035:2a0fbecfeb14 Fri Jan 24 16:29:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Make all register index flattening const

This patch makes all the register index flattening methods const for
all the ISAs. As part of this, readMiscRegNoEffect for ARM is also
made const.
H A Dmiscregs.hh10506:aa23216161fa Thu Oct 30 00:18:00 EDT 2014 Ali Saidi <Ali.Saidi@ARM.com> arm: Mark some miscregs (timer counter) registers at unverifiable.

The checker can't verify timer registers, so it should just grab the version
from the executing CPU, otherwise it could get a larger value and diverge
execution.
10421:d469fdcd937e Wed Oct 01 08:05:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arm: Use MiscRegIndex rather than int when flattening

Some additional type checking to avoid future issues.
10338:8bee5f4edb92 Tue Apr 29 17:05:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arm: use condition code registers for ARM ISA

Analogous to ee049bf (for x86). Requires a bump of the checkpoint version
and corresponding upgrader code to move the condition code register values
to the new register file.
10324:f40134eb3f85 Tue May 27 12:00:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arm: support 16kb vm granules
10037:5cac77888310 Fri Jan 24 16:29:00 EST 2014 ARM gem5 Developers arm: Add support for ARMv8 (AArch64 & AArch32)

Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.

Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.

Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
H A Disa_traits.hh10318:98771a936b61 Wed Sep 03 07:42:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Cleanup unused ISA traits constants

This patch prunes unused values, and also unifies how the values are
defined (not using an enum for ALPHA), aligning the use of int vs Addr
etc.

The patch also removes the duplication of PageBytes/PageShift and
VMPageSize/LogVMPageSize. For all ISAs the two pairs had identical
values and the latter has been removed.
10037:5cac77888310 Fri Jan 24 16:29:00 EST 2014 ARM gem5 Developers arm: Add support for ARMv8 (AArch64 & AArch32)

Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.

Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.

Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
/gem5/src/arch/arm/isa/insts/
H A Dmisc.isa10501:e278fa3086b5 Tue Sep 02 06:26:00 EDT 2014 Akash Bagdia <akash.bagdia@ARM.com> arm: Don't speculatively access most miscregisters.

Speculative exeuction can cause panics in detailed execution mode that
shouldn't happen.
10474:799c8ee4ecba Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
10418:7a76e13f0101 Sat Sep 27 09:08:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arm: Fixed undefined behaviours identified by gcc

This patch fixes the runtime errors highlighted by the undefined
behaviour sanitizer. In the end there were two issues. First, when
rotating an immediate, we ended up shifting an uint32_t by 32 in some
cases. This case is fixed by checking for a rotation by 0
positions. Second, the Mrc15 and Mcr15 are operating on an IntReg and
a MiscReg, but we used the type RegRegImmOp and passed a MiscRegIndex
as an IntRegIndex. This issue is resolved by introducing a
MiscRegRegImmOp and RegMiscRegImmOp with the appropriate types.

With these fixes there are no runtime errors identified for the full
ARM regressions.
10188:c09802451018 Fri May 09 18:58:00 EDT 2014 Geoffrey Blake <geoffrey.blake@arm.com> arm: Panics in miscreg read functions can be tripped by O3 model

Unimplemented miscregs for the generic timer were guarded by panics
in arm/isa.cc which can be tripped by the O3 model if it speculatively
executes a wrong path containing a mrs instruction with a bad miscreg
index. These registers were flagged as implemented and accessible.
This patch changes the miscreg info bit vector to flag them as
unimplemented and inaccessible. In this case, and UndefinedInst
fault will be generated if the register access is not trapped
by a hypervisor.
10037:5cac77888310 Fri Jan 24 16:29:00 EST 2014 ARM gem5 Developers arm: Add support for ARMv8 (AArch64 & AArch32)

Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.

Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.

Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
/gem5/src/cpu/checker/
H A Dcpu.cc10505:38c7a9ea7729 Thu Oct 30 00:18:00 EDT 2014 Ali Saidi <Ali.Saidi@ARM.com> cpu: Add support to checker for CACHE_BLOCK_ZERO commands.

The checker didn't know how to properly validate these new commands.
10416:dd64a2984966 Sat Sep 27 09:08:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> scons: Address issues related to gcc 4.9.1

Fix a number few minor issues to please gcc 4.9.1. Removing the
'-fuse-linker-plugin' flag means no libraries are part of the LTO
process, but hopefully this is an acceptable loss, as the flag causes
issues on a lot of systems (only certain combinations of gcc, ld and
ar work).
10367:bf52480abd01 Fri Sep 12 10:22:00 EDT 2014 Andrew Bardsley <Andrew.Bardsley@arm.com> style: Fix line continuation, especially in debug messages

This patch closes a number of space gaps in debug messages caused by
the incorrect use of line continuation within strings. (There's also
one consistency change to a similar, but correct, use of line
continuation)
10342:711eb0e64249 Tue May 13 01:20:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> mem: Refactor assignment of Packet types

Put the packet type swizzling (that is currently done in a lot of places)
into a refineCommand() member function.
10034:f2ce7114b137 Fri Jan 24 16:29:00 EST 2014 Geoffrey Blake <Geoffrey.Blake@arm.com> checker: CheckerCPU handling of MiscRegs was incorrect

The CheckerCPU model in pre-v8 code was not checking the
updates to miscellaneous registers due to some methods
for setting misc regs were not instrumented. The v8 patches
exposed this by calling the instrumented misc reg update
methods and then invoking the checker before the main CPU had
updated its misc regs, leading to false positives about
register mismatches. This patch fixes the non-instrumented
misc reg update methods and places calls to the checker in
the proper places in the O3 model.
/gem5/src/python/m5/
H A DSimObject.py10584:babb40bd2fc6 Tue Dec 02 06:08:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> scons: Ensure dictionary iteration is sorted by key

This patch adds sorting based on the SimObject name or parameter name
for all situations where we iterate over dictionaries. This should
ensure a deterministic and consistent order across the host systems
and hopefully avoid regression results differing across python
versions.
10532:66451b99f3e6 Wed Nov 12 09:05:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> sim: Sort SimObject descendants and ports

This patch fixes a number of occurences where the sorting order of the
objects was implementation defined.
10458:64809024b924 Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> config: Add the ability to read a config file using C++ and Python

This patch adds the ability to load in config.ini files generated from
gem5 into another instance of gem5 built without Python configuration
support. The intended use case is for configuring gem5 when it is a
library embedded in another simulation system.

A parallel config file reader is also provided purely in Python to
demonstrate the approach taken and to provided similar functionality
for as-yet-unknown use models. The Python configuration file reader
can read both .ini and .json files.

C++ configuration file reading:

A command line option has been added for scons to enable C++ configuration
file reading: --with-cxx-config

There is an example in util/cxx_config that shows C++ configuration in action.
util/cxx_config/README explains how to build the example.

Configuration is achieved by the object CxxConfigManager. It handles
reading object descriptions from a CxxConfigFileBase object which
wraps a config file reader. The wrapper class CxxIniFile is provided
which wraps an IniFile for reading .ini files. Reading .json files
from C++ would be possible with a similar wrapper and a JSON parser.

After reading object descriptions, CxxConfigManager creates
SimObjectParam-derived objects from the classes in the (generated with this
patch) directory build/ARCH/cxx_config

CxxConfigManager can then build SimObjects from those SimObjectParams (in an
order dictated by the SimObject-value parameters on other objects) and bind
ports of the produced SimObjects.

A minimal set of instantiate-replacing member functions are provided by
CxxConfigManager and few of the member functions of SimObject (such as drain)
are extended onto CxxConfigManager.

Python configuration file reading (configs/example/read_config.py):

A Python version of the reader is also supplied with a similar interface to
CxxConfigFileBase (In Python: ConfigFile) to config file readers.

The Python config file reading will handle both .ini and .json files.

The object construction strategy is slightly different in Python from the C++
reader as you need to avoid objects prematurely becoming the children of other
objects when setting parameters.

Port binding also needs to be strictly in the same port-index order as the
original instantiation.
10380:ec1af95a2958 Sat Sep 20 17:17:00 EDT 2014 Andrew Bardsley <Andrew.Bardsley@arm.com> config: Cleanup .json config file generation

This patch 'completes' .json config files generation by adding in the
SimObject references and String-valued parameters not currently
printed.

TickParamValues are also changed to print in the same tick-value
format as in .ini files.

This allows .json files to describe a system as fully as the .ini files
currently do.

This patch adds a new function config_value (which mirrors ini_str) to
each ParamValue and to SimObject. This function can then be explicitly
changed to give different .json and .ini printing behaviour rather than
being written in terms of ini_str.
10267:ed97f6f2ed7a Sun Aug 10 05:39:00 EDT 2014 Geoffrey Blake <Geoffrey.Blake@arm.com> config: Add hooks to enable new config sys

This patch adds helper functions to SimObject.py, params.py and
simulate.py to enable the new configuration system. Functions like
enumerateParams() in SimObject lets the config system auto-generate
command line options for simobjects to be modified on the command
line.

Params in params.py have __call__() added
to their definition to allow the argparse module to use them
as a type to check command input is in the proper format.
10195:7d4d0cd3f7e5 Fri May 09 18:58:00 EDT 2014 Geoffrey Blake <Geoffrey.Blake@arm.com> config: Avoid generating a reference to myself for Parent.any

The unproxy code for Parent.any can generate a circular reference
in certain situations with classes hierarchies like those in ClockDomain.py.
This patch solves this by marking ouself as visited to make sure the
search does not resolve to a self-reference.
10023:91faf6649de0 Fri Jan 24 16:29:00 EST 2014 Matt Horsnell <matt.horsnell@ARM.com> base: add support for probe points and common probes

The probe patch is motivated by the desire to move analytical and trace code
away from functional code. This is achieved by the probe interface which is
essentially a glorified observer model.

What this means to users:
* add a probe point and a "notify" call at the source of an "event"
* add an isolated module, that is being used to carry out *your* analysis (e.g. generate a trace)
* register that module as a probe listener
Note: an example is given for reference in src/cpu/o3/simple_trace.[hh|cc] and src/cpu/SimpleTrace.py

What is happening under the hood:
* every SimObject maintains has a ProbeManager.
* during initialization (src/python/m5/simulate.py) first regProbePoints and
the regProbeListeners is called on each SimObject. this hooks up the probe
point notify calls with the listeners.

FAQs:
Why did you develop probe points:
* to remove trace, stats gathering, analytical code out of the functional code.
* the belief that probes could be generically useful.

What is a probe point:
* a probe point is used to notify upon a given event (e.g. cpu commits an instruction)

What is a probe listener:
* a class that handles whatever the user wishes to do when they are notified
about an event.

What can be passed on notify:
* probe points are templates, and so the user can generate probes that pass any
type of argument (by const reference) to a listener.

What relationships can be generated (1:1, 1:N, N:M etc):
* there isn't a restriction. You can hook probe points and listeners up in a
1:1, 1:N, N:M relationship. They become useful when a number of modules
listen to the same probe points. The idea being that you can add a small
number of probes into the source code and develop a larger number of useful
analysis modules that use information passed by the probes.

Can you give examples:
* adding a probe point to the cpu's commit method allows you to build a trace
module (outputting assembler), you could re-use this to gather instruction
distribution (arithmetic, load/store, conditional, control flow) stats.

Why is the probe interface currently restricted to passing a const reference:
* the desire, initially at least, is to allow an interface to observe
functionality, but not to change functionality.
* of course this can be subverted by const-casting.

What is the performance impact of adding probes:
* when nothing is actively listening to the probes they should have a
relatively minor impact. Profiling has suggested even with a large number of
probes (60) the impact of them (when not active) is very minimal (<1%).
10002:11a8fc907e5d Fri Jan 03 20:08:00 EST 2014 Steve Reinhardt <steve.reinhardt@amd.com> python: provide better error message for wrapped C++ methods

If you successfully export a C++ SimObject method, but try to
invoke it from Python before the C++ object is created, you
get a confusing error that says the attribute does not exist,
making you question whether you successfully exported the
method at all. In reality, your only problem is that you're
calling the method too soon. This patch enhances the error
message to give you a better clue.
10001:61763318c788 Fri Jan 03 20:08:00 EST 2014 Steve Reinhardt <steve.reinhardt@amd.com> python: don't die on assignment to cloned object

Updating the SimObject topology of a cloned hierarchy is a little
dangerous, in that cloning is a "deep copy" and the clone does not
inherit SimObject updates the same way it would inherit scalar
variable assignments.

However, because of various SimObject-valued proxy parameters,
like 'memories', 'clk_domain', and 'system', it turns out that
there are a number of implicit topology changes that happen at
instantiation, which means that these changes are impossible to
avoid. So in order to make cloning systems useful, this error
has to go. Changing it to a warning produces a lot of noise,
so it seems best just to delete it.
/gem5/src/base/
H A Dremote_gdb.cc10601:6efb37480d87 Sat Dec 06 01:37:00 EST 2014 Gabe Black <gabeblack@google.com> misc: Generalize GDB single stepping.

The new single stepping implementation for x86 doesn't rely on any ISA
specific properties or functionality. This change pulls out the per ISA
implementation of those functions and promotes the X86 implementation to the
base class.

One drawback of that implementation is that the CPU might stop on an
instruction twice if it's affected by both breakpoints and single stepping.
While that might be a little surprising, it's harmless and would only happen
under somewhat unlikely circumstances.
10599:910fc5624d68 Sat Dec 06 01:35:00 EST 2014 Gabe Black <gabeblack@google.com> misc: Add some utility functions for schedule inst commit events.

These can be used to simplify the implementation of single step in derived
classes.
10598:3d7653a2538b Sat Dec 06 01:34:00 EST 2014 Gabe Black <gabeblack@google.com> misc: Rename the GDB "Event" event class to InputEvent.

The "Event" name is the same as the base event class. That's a bit confusing,
and makes it a little awkward to add other event types.
10597:bd68c6838b9f Fri Dec 05 04:51:00 EST 2014 Gabe Black <gabeblack@google.com> sim: Ensure GDB interrupts the simulation at an instruction boundary.

Use the comInstEventQueue to ensure GDB interrupts the simulation at an
instruction boundary and not in the middle of a macroop, memory access, etc.
10589:5962812f80fe Wed Dec 03 06:27:00 EST 2014 Gabe Black <gabeblack@google.com> sim: Make it possible to override the breakpoint length check.

The check which makes sure the length of the breakpoint being written is the
same as a MachInst is only correct on fixed instruction width ISAs. Instead of
incorrectly applying that check to all ISAs, this change makes that the
default check and lets ISA specific GDB classes override it.
10559:62f5f7363197 Mon Nov 24 09:03:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> misc: Another round of static analysis fixups

Mostly addressing uninitialised members.
/gem5/src/cpu/simple/
H A Dtiming.cc11423:831c7f2f9e39 Tue Dec 09 05:42:00 EST 2014 Akash Bagdia <akash.bagdia@ARM.com> power: Low-power idle power state for idle CPUs

Add functionality to the BaseCPU that will put the entire CPU into a low-power
idle state whenever all threads in it are idle.
10596:1eec33d2fc52 Fri Dec 05 04:47:00 EST 2014 Gabe Black <gabeblack@google.com> cpu: Only check for PC events on instruction boundaries.

Only the instruction address is actually checked, so there's no need to check
repeatedly while we're working through the microops of a macroop and that's
not changing.
10566:c99c8d2a7c31 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Assume all dynamic packet data is array allocated

This patch simplifies how we deal with dynamically allocated data in
the packet, always assuming that it is array allocated, and hence
should be array deallocated (delete[] as opposed to delete). The only
uses of dataDynamic was in the Ruby testers.

The ARRAY_DATA flag in the packet is removed accordingly. No
defragmentation of the flags is done at this point, leaving a gap in
the bit masks.

As the last part the patch, it renames dataDynamicArray to dataDynamic.
10533:d1dce0b728b6 Wed Nov 12 09:05:00 EST 2014 Ali Saidi <ali.saidi@arm.com> arm: Fix timing wakeup with LLSC
10529:05b5a6cf3521 Thu Nov 06 06:42:00 EST 2014 Marc Orr <morr@cs.wisc.edu> x86 isa: This patch attempts an implementation at mwait.

Mwait works as follows:
1. A cpu monitors an address of interest (monitor instruction)
2. A cpu calls mwait - this loads the cache line into that cpu's cache.
3. The cpu goes to sleep.
4. When another processor requests write permission for the line, it is
evicted from the sleeping cpu's cache. This eviction is forwarded to the
sleeping cpu, which then wakes up.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
10464:2a0fe8bca031 Thu Oct 16 05:49:00 EDT 2014 Andreas Sandberg <Andreas.Sandberg@ARM.com> cpu: Probe points for basic PMU stats

This changeset adds probe points that can be used to implement PMU
counters for CPU stats. The following probes are supported:

* BaseCPU::ppCycles / Cycles
* BaseCPU::ppRetiredInsts / RetiredInsts
* BaseCPU::ppRetiredLoads / RetiredLoads
* BaseCPU::ppRetiredStores / RetiredStores
* BaseCPU::ppRetiredBranches RetiredBranches
10407:a9023811bf9e Sat Sep 20 17:18:00 EDT 2014 Mitch Hayenga <mitch.hayenga@arm.com> alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate

activate(), suspend(), and halt() used on thread contexts had an optional
delay parameter. However this parameter was often ignored. Also, when used,
the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were
ever specified). This patch removes the delay parameter and 'Events'
associated with them across all ISAs and cores. Unused activate logic
is also removed.
10379:c00f6d7e2681 Fri Sep 19 10:35:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Pass faults by const reference where possible

This patch changes how faults are passed between methods in an attempt
to copy as few reference-counting pointer instances as possible. This
should avoid unecessary copies being created, contributing to the
increment/decrement of the reference counters.
10342:711eb0e64249 Tue May 13 01:20:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> mem: Refactor assignment of Packet types

Put the packet type swizzling (that is currently done in a lot of places)
into a refineCommand() member function.
10031:79d034cd6ba3 Fri Jan 24 16:29:00 EST 2014 Ali Saidi <Ali.Saidi@ARM.com> cpu: Add support for instructions that zero cache lines.
H A Datomic.cc11423:831c7f2f9e39 Tue Dec 09 05:42:00 EST 2014 Akash Bagdia <akash.bagdia@ARM.com> power: Low-power idle power state for idle CPUs

Add functionality to the BaseCPU that will put the entire CPU into a low-power
idle state whenever all threads in it are idle.
10596:1eec33d2fc52 Fri Dec 05 04:47:00 EST 2014 Gabe Black <gabeblack@google.com> cpu: Only check for PC events on instruction boundaries.

Only the instruction address is actually checked, so there's no need to check
repeatedly while we're working through the microops of a macroop and that's
not changing.
10563:755b18321206 Tue Dec 02 06:07:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Add const getters for write packet data

This patch takes a first step in tightening up how we use the data
pointer in write packets. A const getter is added for the pointer
itself (getConstPtr), and a number of member functions are also made
const accordingly. In a range of places throughout the memory system
the new member is used.

The patch also removes the unused isReadWrite function.
10537:47fe87b0cf97 Fri Nov 14 03:53:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> arm: Fixes based on UBSan and static analysis

Another churn to clean up undefined behaviour, mostly ARM, but some
parts also touching the generic part of the code base.

Most of the fixes are simply ensuring that proper intialisation. One
of the more subtle changes is the return type of the sign-extension,
which is changed to uint64_t. This is to avoid shifting negative
values (undefined behaviour) in the ISA code.
10529:05b5a6cf3521 Thu Nov 06 06:42:00 EST 2014 Marc Orr <morr@cs.wisc.edu> x86 isa: This patch attempts an implementation at mwait.

Mwait works as follows:
1. A cpu monitors an address of interest (monitor instruction)
2. A cpu calls mwait - this loads the cache line into that cpu's cache.
3. The cpu goes to sleep.
4. When another processor requests write permission for the line, it is
evicted from the sleeping cpu's cache. This eviction is forwarded to the
sleeping cpu, which then wakes up.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
10464:2a0fe8bca031 Thu Oct 16 05:49:00 EDT 2014 Andreas Sandberg <Andreas.Sandberg@ARM.com> cpu: Probe points for basic PMU stats

This changeset adds probe points that can be used to implement PMU
counters for CPU stats. The following probes are supported:

* BaseCPU::ppCycles / Cycles
* BaseCPU::ppRetiredInsts / RetiredInsts
* BaseCPU::ppRetiredLoads / RetiredLoads
* BaseCPU::ppRetiredStores / RetiredStores
* BaseCPU::ppRetiredBranches RetiredBranches
10407:a9023811bf9e Sat Sep 20 17:18:00 EDT 2014 Mitch Hayenga <mitch.hayenga@arm.com> alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate

activate(), suspend(), and halt() used on thread contexts had an optional
delay parameter. However this parameter was often ignored. Also, when used,
the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were
ever specified). This patch removes the delay parameter and 'Events'
associated with them across all ISAs and cores. Unused activate logic
is also removed.
10381:ab8b8601b6ff Sat Sep 20 17:17:00 EDT 2014 Dam Sunwoo <dam.sunwoo@arm.com> cpu: use probes infrastructure to do simpoint profiling

Instead of having code embedded in cpu model to do simpoint profiling use
the probes infrastructure to do it.
10342:711eb0e64249 Tue May 13 01:20:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> mem: Refactor assignment of Packet types

Put the packet type swizzling (that is currently done in a lot of places)
into a refineCommand() member function.
10031:79d034cd6ba3 Fri Jan 24 16:29:00 EST 2014 Ali Saidi <Ali.Saidi@ARM.com> cpu: Add support for instructions that zero cache lines.
/gem5/src/arch/x86/isa/microops/
H A Dseqop.isa10196:be0e1724eb39 Fri May 09 18:58:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arch: teach ISA parser how to split code across files

This patch encompasses several interrelated and interdependent changes
to the ISA generation step. The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units. I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner. It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type. Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths. In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.
10184:bbfa3152bdea Fri May 09 18:58:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arch: remove inline specifiers on all inst constrs, all ISAs

With (upcoming) separate compilation, they are useless. Only
link-time optimization could re-inline them, but ideally
feedback-directed optimization would choose to do so only for
profitable (i.e. common) instructions.
/gem5/src/arch/x86/
H A Dfaults.cc10417:710ee116eb68 Sat Sep 27 09:08:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use const StaticInstPtr references where possible

This patch optimises the passing of StaticInstPtr by avoiding copying
the reference-counting pointer. This avoids first incrementing and
then decrementing the reference-counting pointer.
10100:24cfe67c0749 Mon Mar 03 08:44:00 EST 2014 Andreas Sandberg <andreas@sandberg.pp.se> x86: Setup correct TSL/TR segment attributes on INIT

The TSL/LDT & TR/TSS segments didn't contain valid attributes. This
caused problems when transfering the state into KVM where invalid
state is a no-go. Fixup the attributes with values from AMD's
architecture programmer's manual.
H A Disa_traits.hh10593:a39de7b8d2c9 Thu Dec 04 18:53:00 EST 2014 Gabe Black <gabeblack@google.com> x86: Rework opcode parsing to support 3 byte opcodes properly.

Instead of counting the number of opcode bytes in an instruction and recording
each byte before the actual opcode, we can represent the path we took to get to
the actual opcode byte by using a type code. That has a couple of advantages.
First, we can disambiguate the properties of opcodes of the same length which
have different properties. Second, it reduces the amount of data stored in an
ExtMachInst, making them slightly easier/faster to create and process. This
also adds some flexibility as far as how different types of opcodes are
handled, which might come in handy if we decide to support VEX or XOP
instructions.

This change also adds tables to support properly decoding 3 byte opcodes.
Before we would fall off the end of some arrays, on top of the ambiguity
described above.

This change doesn't measureably affect performance on the twolf benchmark.
10318:98771a936b61 Wed Sep 03 07:42:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Cleanup unused ISA traits constants

This patch prunes unused values, and also unifies how the values are
defined (not using an enum for ALPHA), aligning the use of int vs Addr
etc.

The patch also removes the duplication of PageBytes/PageShift and
VMPageSize/LogVMPageSize. For all ISAs the two pairs had identical
values and the latter has been removed.
/gem5/src/arch/sparc/
H A Dremote_gdb.cc10601:6efb37480d87 Sat Dec 06 01:37:00 EST 2014 Gabe Black <gabeblack@google.com> misc: Generalize GDB single stepping.

The new single stepping implementation for x86 doesn't rely on any ISA
specific properties or functionality. This change pulls out the per ISA
implementation of those functions and promotes the X86 implementation to the
base class.

One drawback of that implementation is that the CPU might stop on an
instruction twice if it's affected by both breakpoints and single stepping.
While that might be a little surprising, it's harmless and would only happen
under somewhat unlikely circumstances.
10595:25ecfc14f73f Fri Dec 05 04:44:00 EST 2014 Gabe Black <gabeblack@google.com> misc: Make the GDB register cache accessible in various sized chunks.

Not all ISAs have 64 bit sized registers, so it's not always very convenient
to access the GDB register cache in 64 bit sized chunks. This change makes it
accessible in 8, 16, 32, or 64 bit chunks. The MIPS and ARM implementations
were working around that limitation by bundling and unbundling 32 bit values
into 64 bit values. That code has been removed.
/gem5/src/arch/alpha/
H A Dtlb.hh10474:799c8ee4ecba Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
10194:e6d2e8083d9c Fri May 09 18:58:00 EDT 2014 Geoffrey Blake <Geoffrey.Blake@arm.com> arch, arm: Preserve TLB bootUncacheability when switching CPUs

The ARM TLBs have a bootUncacheability flag used to make some loads
and stores become uncacheable when booting in FS mode. Later the
flag is cleared to let those loads and stores operate as normal. When
doing a takeOverFrom(), this flag's state is not preserved and is
momentarily reset until the CPSR is touched. On single core runs this
is a non-issue. On multi-core runs this can lead to crashes on the O3
CPU model from the following series of events:
1) takeOverFrom executed to switch from Atomic -> O3
2) All bootUncacheability flags are reset to true
3) Core2 tries to execute a load covered by bootUncacheability, it
is flagged as uncacheable
4) Core2's load needs to replay due to a pipeline flush
3) Core1 core does an action on CPSR
4) The handling code for CPSR then checks all other cores
to determine if bootUncacheability can be set to false
5) Asynchronously set bootUncacheability on all cores to false
6) Core2 replays load previously set as uncacheable and notices
it is now flagged as cacheable, leads to a panic.
This patch implements takeOverFrom() functionality for the ARM TLBs
to preserve flag values when switching from atomic -> detailed.
/gem5/src/arch/arm/isa/formats/
H A Dbreakpoint.isa10474:799c8ee4ecba Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
10196:be0e1724eb39 Fri May 09 18:58:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arch: teach ISA parser how to split code across files

This patch encompasses several interrelated and interdependent changes
to the ISA generation step. The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units. I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner. It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type. Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths. In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.
/gem5/src/arch/mips/isa/formats/
H A Dint.isa10474:799c8ee4ecba Thu Oct 16 05:49:00 EDT 2014 Andreas Hansson <andreas.hansson@arm.com> arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
10196:be0e1724eb39 Fri May 09 18:58:00 EDT 2014 Curtis Dunham <Curtis.Dunham@arm.com> arch: teach ISA parser how to split code across files

This patch encompasses several interrelated and interdependent changes
to the ISA generation step. The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units. I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner. It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type. Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths. In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.

Completed in 259 milliseconds

<<51525354555657585960>>