History log of /gem5/src/cpu/o3/commit.hh
Revision Date Author Comments
# 13652:45d94ac03a27 22-Jan-2018 Tuan Ta <qtt2@cornell.edu>

cpu: support atomic memory request type with AtomicOpFunctor

This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU,
MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory
system.

Atomic memory instruction is treated as a special store instruction in
all CPU models.

In simple CPUs, an AMO request with an associated AtomicOpFunctor is
simply sent to L1 dcache.

In MinorCPU, an AMO request bypasses store buffer and waits for any
conflicting store request(s) currently in the store buffer to retire
before the AMO request is sent to the cache. AMO requests are not buffered
in the store buffer, so their effects appear immediately in the cache.

In DerivO3CPU, an AMO request is inserted in the store buffer so that it
is delivered to the cache only after all previous stores are issued to
the cache. Data forwarding between between an outstanding AMO in the
store buffer and a subsequent load is not allowed since the AMO request
does not hold valid data until it's executed in the cache.

This implementation assumes that a target ISA implementation must insert
enough memory fences as micro-ops around an atomic instruction to
enforce a correct order of memory instructions with respect to its
memory consistency model. Without extra memory fences, this implementation
can allow AMOs and other memory instructions that do not conflict
(i.e., not target the same address) to reorder.

This implementation also assumes that atomic instructions execute within
a cache line boundary since the cache for now is not able to execute an
operation on two different cache lines in one single step. Therefore,
ISAs like x86 that require multi-cache-line atomic instructions need to
either use a pair of locking load and unlocking store or change the
cache implementation to guarantee the atomicity of an atomic
instruction.

Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a
Reviewed-on: https://gem5-review.googlesource.com/c/8188
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 13641:648f3106ebdf 02-Apr-2018 Tuan Ta <qtt2@cornell.edu>

cpu: fixed how O3 CPU executes an exit system call

When a thread executed an exit syscall in SE mode, the thread context
was removed immediately in the same cycle, which left inflight squash
operations and trap event incomplete. The problem happened when a new
thread was assigned to the CPU later. The new thread started with some
incomplete transactions of the previous thread (e.g., squashing). This
problem could cause incorrect execution flow for the new thread (i.e.,
pc was not reset properly at the exit point), deadlock (i.e., some
stage-to-stage signals were not reset) and incorrect rename map between
logical and physical registers.

This patch adds a new state called 'Halting' to the thread context and
defers removing thread context from a CPU until a trap event initiated
by an exit syscall execution is processed. This patch also makes sure
that the removal of a thread context happens after all inflight
transactions of the to-be-removed thread in the pipeline complete.

Change-Id: If7ef1462fb8864e22b45371ee7ae67e2a5ad38b8
Reviewed-on: https://gem5-review.googlesource.com/c/8184
Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 13563:68c171235dc5 03-Jan-2019 Nikos Nikoleris <nikos.nikoleris@arm.com>

cpu-o3: Make the smtCommitPolicy a Param.ScopedEnum

The smtCommitPolicy is a parameter in the o3 cpu that can have 3
different values. Previously this setting was done through a string
and a parser function would turn it into a c++ enum value. This
changeset turns the string into a python Param.ScopedEnum.

Change-Id: I3625f2c08a1ae0c3b0dce7a641c6ae1ce3fd79a5
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15400
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 13429:a1e199fd8122 06-Feb-2017 Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com>

cpu: Fix the usage of const DynInstPtr

Summary: Usage of const DynInstPtr& when possible and introduction of
move operators to RefCountingPtr.

In many places, scoped references to dynamic instructions do a copy of
the DynInstPtr when a reference would do. This is detrimental to
performance. On top of that, in case there is a need for reference
tracking for debugging, the redundant copies make the process much more
painful than it already is.

Also, from the theoretical point of view, a function/method that
defines a convenience name to access an instruction should not be
considered an owner of the data, i.e., doing a copy and not a reference
is not justified.

On a related topic, C++11 introduces move semantics, and those are
useful when, for example, there is a class modelling a HW structure that
contains a list, and has a getHeadOfList function, to prevent doing a
copy to an internal variable -> update pointer, remove from the list ->
update pointer, return value making a copy to the assined variable ->
update pointer, destroy the returned value -> update pointer.

Change-Id: I3bb46c20ef23b6873b469fd22befb251ac44d2f6
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/13105
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 12127:4207df055b0d 28-Jun-2017 Sean Wilson <spwilson2@wisc.edu>

cpu: Refactor some Event subclasses to lambdas

Change-Id: If765c6100d67556f157e4e61aa33c2b7eeb8d2f0
Signed-off-by: Sean Wilson <spwilson2@wisc.edu>
Reviewed-on: https://gem5-review.googlesource.com/3923
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 12110:c24ee249b8ba 05-Apr-2017 Rekai Gonzalez-Alberquilla <Rekai.GonzalezAlberquilla@arm.com>

arch: ISA parser additions of vector registers

Reiley's update :) of the isa parser definitions. My addition of the
vector element operand concept for the ISA parser. Nathanael's modification
creating a hierarchy between vector registers and its constituencies to the
isa parser.

Some fixes/updates on top to consider instructions as vectors instead of
floating when they use the VectorRF. Some counters added to all the
models to keep faithful counts.

Change-Id: Id8f162a525240dfd7ba884c5a4d9fa69f4050101
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2706
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>


# 11877:5ea85692a53e 20-Jul-2015 Brandon Potter <brandon.potter@amd.com>

syscall_emul: [patch 13/22] add system call retry capability

This changeset adds functionality that allows system calls to retry without
affecting thread context state such as the program counter or register values
for the associated thread context (when system calls return with a retry
fault).

This functionality is needed to solve problems with blocking system calls
in multi-process or multi-threaded simulations where information is passed
between processes/threads. Blocking system calls can cause deadlock because
the simulator itself is single threaded. There is only a single thread
servicing the event queue which can cause deadlock if the thread hits a
blocking system call instruction.

To illustrate the problem, consider two processes using the producer/consumer
sharing model. The processes can use file descriptors and the read and write
calls to pass information to one another. If the consumer calls the blocking
read system call before the producer has produced anything, the call will
block the event queue (while executing the system call instruction) and
deadlock the simulation.

The solution implemented in this changeset is to recognize that the system
calls will block and then generate a special retry fault. The fault will
be sent back up through the function call chain until it is exposed to the
cpu model's pipeline where the fault becomes visible. The fault will trigger
the cpu model to replay the instruction at a future tick where the call has
a chance to succeed without actually going into a blocking state.

In subsequent patches, we recognize that a syscall will block by calling a
non-blocking poll (from inside the system call implementation) and checking
for events. When events show up during the poll, it signifies that the call
would not have blocked and the syscall is allowed to proceed (calling an
underlying host system call if necessary). If no events are returned from the
poll, we generate the fault and try the instruction for the thread context
at a distant tick. Note that retrying every tick is not efficient.

As an aside, the simulator has some multi-threading support for the event
queue, but it is not used by default and needs work. Even if the event queue
was completely multi-threaded, meaning that there is a hardware thread on
the host servicing a single simulator thread contexts with a 1:1 mapping
between them, it's still possible to run into deadlock due to the event queue
barriers on quantum boundaries. The solution of replaying at a later tick
is the simplest solution and solves the problem generally.


# 11246:93d2a1526103 07-Dec-2015 Radhika Jagtap <radhika.jagtap@ARM.com>

probe: Add probe in Fetch, IEW, Rename and Commit

This patch adds probe points in Fetch, IEW, Rename and Commit stages as follows.

A probe point is added in the Fetch stage for probing when a fetch request is
sent. Notify is fired on the probe point when a request is sent succesfully in
the first attempt as well as on a retry attempt.

Probe points are added in the IEW stage when an instruction begins to execute
and when execution is complete. This points can be used for monitoring the
execution time of an instruction.

Probe points are added in the Rename stage to probe renaming of source and
destination registers and when there is squashing. These probe points can be
used to track register dependencies and remove when there is squashing.

A probe point for squashing is added in Commit to probe squashed instructions.


# 10732:60482901c996 09-Mar-2015 Nilay Vaish <nilay@cs.wisc.edu>

cpu: o3: commit: mark pipeline delay variable as consts


# 10731:17c5d36dfdac 09-Mar-2015 Nilay Vaish <nilay@cs.wisc.edu>

cpu: o3: remove unused stat variables.


# 10729:41c93a3c1051 09-Mar-2015 Nilay Vaish <nilay@cs.wisc.edu>

cpu: o3: remove member variable squashCounter
The variable is used in only one place and a whole new function setNextStatus()
has been defined just to compute the value of the variable. Instead of calling
the function, the value is now computed in the loop that preceded the function
call.


# 10340:40d24a672351 03-Sep-2014 Mitch Hayenga <mitch.hayenga@arm.com>

cpu: Fix o3 drain bug

For X86, the o3 CPU would get stuck with the commit stage not being
drained if an interrupt arrived while drain was pending. isDrained()
makes sure that pcState.microPC() == 0, thus ensuring that we are at
an instruction boundary. However, when we take an interrupt we
execute:

pcState.upc(romMicroPC(entry));
pcState.nupc(romMicroPC(entry) + 1);
tc->pcState(pcState);

As a result, the MicroPC is no longer zero. This patch ensures the drain is
delayed until no interrupts are present. Once draining, non-synchronous
interrupts are deffered until after the switch.


# 10331:ed05298e8566 03-Sep-2014 Mitch Hayenga <mitch.hayenga@arm.com>

cpu: Fix SMT scheduling issue with the O3 cpu

The o3 cpu could attempt to schedule inactive threads under round-robin SMT
mode.

This is because it maintained an independent priority list of threads from the
active thread list. This priority list could be come stale once threads were
inactive, leading to the cpu trying to fetch/commit from inactive threads.


Additionally the fetch queue is now forcibly flushed of instrctuctions
from the de-scheduled thread.

Relevant output:

24557000: system.cpu: [tid:1]: Calling deactivate thread.
24557000: system.cpu: [tid:1]: Removing from active threads list

24557500: system.cpu:
FullO3CPU: Ticking main, FullO3CPU.
24557500: system.cpu.fetch: Running stage.
24557500: system.cpu.fetch: Attempting to fetch from [tid:1]


# 10328:867b536a68be 03-Sep-2014 Mitch Hayenga <mitch.hayenga@arm.com>

cpu: Fix o3 front-end pipeline interlock behavior

The o3 pipeline interlock/stall logic is incorrect. o3 unnecessicarily stalled
fetch and decode due to later stages in the pipeline. In general, a stage
should usually only consider if it is stalled by the adjacent, downstream stage.
Forcing stalls due to later stages creates and results in bubbles in the
pipeline. Additionally, o3 stalled the entire frontend (fetch, decode, rename)
on a branch mispredict while the ROB is being serially walked to update the
RAT (robSquashing). Only should have stalled at rename.


# 10193:d717abc806aa 09-May-2014 Curtis Dunham <Curtis.Dunham@arm.com>

cpu: add more instruction mix statistics

For the o3, add instruction mix (OpClass) histogram at commit (stats
also already collected at issue). For the simple CPUs we add a
histogram of executed instructions


# 10023:91faf6649de0 24-Jan-2014 Matt Horsnell <matt.horsnell@ARM.com>

base: add support for probe points and common probes

The probe patch is motivated by the desire to move analytical and trace code
away from functional code. This is achieved by the probe interface which is
essentially a glorified observer model.

What this means to users:
* add a probe point and a "notify" call at the source of an "event"
* add an isolated module, that is being used to carry out *your* analysis (e.g. generate a trace)
* register that module as a probe listener
Note: an example is given for reference in src/cpu/o3/simple_trace.[hh|cc] and src/cpu/SimpleTrace.py

What is happening under the hood:
* every SimObject maintains has a ProbeManager.
* during initialization (src/python/m5/simulate.py) first regProbePoints and
the regProbeListeners is called on each SimObject. this hooks up the probe
point notify calls with the listeners.

FAQs:
Why did you develop probe points:
* to remove trace, stats gathering, analytical code out of the functional code.
* the belief that probes could be generically useful.

What is a probe point:
* a probe point is used to notify upon a given event (e.g. cpu commits an instruction)

What is a probe listener:
* a class that handles whatever the user wishes to do when they are notified
about an event.

What can be passed on notify:
* probe points are templates, and so the user can generate probes that pass any
type of argument (by const reference) to a listener.

What relationships can be generated (1:1, 1:N, N:M etc):
* there isn't a restriction. You can hook probe points and listeners up in a
1:1, 1:N, N:M relationship. They become useful when a number of modules
listen to the same probe points. The idea being that you can add a small
number of probes into the source code and develop a larger number of useful
analysis modules that use information passed by the probes.

Can you give examples:
* adding a probe point to the cpu's commit method allows you to build a trace
module (outputting assembler), you could re-use this to gather instruction
distribution (arithmetic, load/store, conditional, control flow) stats.

Why is the probe interface currently restricted to passing a const reference:
* the desire, initially at least, is to allow an interface to observe
functionality, but not to change functionality.
* of course this can be subverted by const-casting.

What is the performance impact of adding probes:
* when nothing is actively listening to the probes they should have a
relatively minor impact. Profiling has suggested even with a large number of
probes (60) the impact of them (when not active) is very minimal (<1%).


# 9513:690357ffbce2 15-Feb-2013 Ali Saidi <Ali.Saidi@ARM.com>

cpu: Fix a livelock in the o3 cpu.

Check if an instruction just enabled interrupts and we've previously had an
interrupt pending that was not handled because interrupts were subsequently
disabled before the pipeline reached a place to handle the interrupt. In that
case squash now to make sure the interrupt is handled.


# 9444:ab47fe7f03f0 07-Jan-2013 Andreas Sandberg <Andreas.Sandberg@ARM.com>

cpu: Rewrite O3 draining to avoid stopping in microcode

Previously, the O3 CPU could stop in the middle of a microcode
sequence. This patch makes sure that the pipeline stops when it has
committed a normal instruction or exited from a microcode
sequence. Additionally, it makes sure that the pipeline has no
instructions in flight when it is drained, which should make draining
more robust.

Draining is controlled in the commit stage, which checks if the next
PC after a committed instruction is in microcode. If this isn't the
case, it requests a squash of all instructions after that the
instruction that just committed and immediately signals a drain stall
to the fetch stage. The CPU then continues to execute until the
pipeline and all associated buffers are empty.


# 9437:8088e94a9de0 07-Jan-2013 Andreas Sandberg <Andreas.Sandberg@ARM.com>

cpu: Fix broken squashAfter implementation in O3 CPU

Commit can currently both commit and squash in the same cycle. This
confuses other stages since the signals coming from the commit stage
can only signal either a squash or a commit in a cycle. This changeset
changes the behavior of squashAfter so that it commits all
instructions, including the instruction that requested the squash, in
the first cycle and then starts to squash in the next cycle.


# 9427:ddf45c1d54d4 07-Jan-2013 Andreas Sandberg <Andreas.Sandberg@ARM.com>

cpu: Initialize the O3 pipeline from startup()

The entire O3 pipeline used to be initialized from init(), which is
called before initState() or unserialize(). This causes the pipeline
to be initialized from an incorrect thread context. This doesn't
currently lead to correctness problems as instructions fetched from
the incorrect start PC will be squashed a few cycles after
initialization.

This patch will affect the regressions since the O3 CPU now issues its
first instruction fetch to the correct PC instead of 0x0.


# 9218:7e9e34d4203b 12-Sep-2012 Anthony Gutierrez <atgutier@umich.edu>

stats: remove duplicate instruction stats from the commit stage

these stats are duplicates of insts/opsCommitted, cause
confusion, and are poorly named.


# 9184:a1a8f137b796 07-Sep-2012 Andreas Hansson <andreas.hansson@arm.com>

Param: Transition to Cycles for relevant parameters

This patch is a first step to using Cycles as a parameter type. The
main affected modules are the CPUs and the Ruby caches. There are
definitely plenty more places that are affected, but this patch serves
as a starting point to making the transition.

An important part of this patch is to actually enable parameters to be
specified as Param.Cycles which involves some changes to params.py.


# 9180:ee8d7a51651d 28-Aug-2012 Andreas Hansson <andreas.hansson@arm.com>

Clock: Add a Cycles wrapper class and use where applicable

This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.

Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.

This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.

In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.

An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.


# 9179:666bc9df1e49 28-Aug-2012 Andreas Hansson <andreas.hansson@arm.com>

Clock: Rework clocks to avoid tick-to-cycle transformations

This patch introduces the notion of a clock update function that aims
to avoid costly divisions when turning the current tick into a
cycle. Each clocked object advances a private (hidden) cycle member
and a tick member and uses these to implement functions for getting
the tick of the next cycle, or the tick of a cycle some time in the
future.

In the different modules using the clocks, changes are made to avoid
counting in ticks only to later translate to cycles. There are a few
oddities in how the O3 and inorder CPU count idle cycles, as seen by a
few locations where a cycle is subtracted in the calculation. This is
done such that the regression does not change any stats, but should be
revisited in a future patch.

Another, much needed, change that is not done as part of this patch is
to introduce a new typedef uint64_t Cycle to be able to at least hint
at the unit of the variables counting Ticks vs Cycles. This will be
done as a follow-up patch.

As an additional follow up, the thread context still uses ticks for
the book keeping of last activate and last suspend and this should
probably also be changed into cycles as well.


# 8834:21e8d54ecf07 12-Feb-2012 Anthony Gutierrez <atgutier@umich.edu>

cpu: add separate stats for insts/ops both globally and per cpu model


# 8823:ae411fcf4935 10-Feb-2012 Nilay Vaish <nilay@cs.wisc.edu>

O3 CPU: Strengthen condition for handling interrupts
The condition for handling interrupts is to check whether or not the cpu's
instruction list is empty. As observed, this can lead to cases in which even
though the instruction list is empty, interrupts are handled when they should
not be. The condition is being strengthened so that interrupts get handled only
when the last committed microop did not had IsDelayedCommit set.


# 8809:bb10807da889 01-Feb-2012 Gabe Black <gblack@eecs.umich.edu>

Merge with head, hopefully the last time for this batch.


# 8793:5f25086326ac 18-Nov-2011 Gabe Black <gblack@eecs.umich.edu>

SE/FS: Get rid of FULL_SYSTEM in the CPU directory.


# 8737:770ccf3af571 31-Jan-2012 Koan-Sin Tan <koansin.tan@gmail.com>

clang: Enable compiling gem5 using clang 2.9 and 3.0

This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).

clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.


# 8230:845c8eb5ac49 15-Apr-2011 Nathan Binkert <nate@binkert.org>

includes: fix up code after sorting


# 8229:78bf55f23338 15-Apr-2011 Nathan Binkert <nate@binkert.org>

includes: sort all includes


# 8137:48371b9fb929 17-Mar-2011 Ali Saidi <Ali.Saidi@ARM.com>

O3: Cleanup the commitInfo comm struct.

Get rid of unused members and use base types rather than derrived values
where possible to limit amount of state.


# 7897:d9e8b1fd1a9f 07-Feb-2011 Joel Hestness <hestness@cs.utexas.edu>

mcpat: Adds McPAT performance counters

Updated patches from Rick Strong's set that modify performance counters for
McPAT


# 7855:c0be563517da 18-Jan-2011 Ali Saidi <Ali.Saidi@ARM.com>

O3: Keep around the last committed instruction and use for squashing.

Without this change 0 is always used for the youngest sequence number if
a squash occured and the ROB was empty (E.g. an instruction is marked
serializeAfter or a fetch stall prevents other instructions from issuing).
Using 0 there is a race to rename where an instruction that committed the
same cycle as the squashing instruction can have it's renamed state undone
by the squash using sequence number 0.


# 7847:0c6613ad8f18 18-Jan-2011 Min Kyu Jeong <minkyu.jeong@arm.com>

O3: Fixes fetch deadlock when the interrupt clears before CPU handles it.

When this condition occurs the cpu should restart the fetch stage to fetch from
the original execution path. Fault handling in the commit stage is cleaned up a
little bit so the control flow is simplier. Finally, if an instruction is being
used to carry a fault it isn't executed, so the fault propagates appropriately.


# 7813:7338bc628489 03-Jan-2011 Steve Reinhardt <steve.reinhardt@amd.com>

Move sched_list.hh and timebuf.hh from src/base to src/cpu.
These files really aren't general enough to belong in src/base.
This patch doesn't reorder include lines, leaving them unsorted
in many cases, but Nate's magic script will fix that up shortly.


# 7784:e7649570ff3a 07-Dec-2010 Ali Saidi <Ali.Saidi@ARM.com>

O3: Support squashing all state after special instruction

For SPARC ASIs are added to the ExtMachInst. If the ASI is changed simply
marking the instruction as Serializing isn't enough beacuse that only
stops rename. This provides a mechanism to squash all the instructions
and refetch them


# 7720:65d338a8dba4 31-Oct-2010 Gabe Black <gblack@eecs.umich.edu>

ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.



This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.


# 6221:58a3c04e6344 26-May-2009 Nathan Binkert <nate@binkert.org>

types: add a type for thread IDs and try to use it everywhere


# 5999:3cf8e71257e0 05-Mar-2009 Nathan Binkert <nate@binkert.org>

stats: Fix all stats usages to deal with template fixes


# 5529:9ae69b9cd7fd 11-Aug-2008 Nathan Binkert <nate@binkert.org>

params: Convert the CPU objects to use the auto generated param structs.
A whole bunch of stuff has been converted to use the new params stuff, but
the CPU wasn't one of them. While we're at it, make some things a bit
more stylish. Most of the work was done by Gabe, I just cleaned stuff up
a bit more at the end.


# 5336:c7e21f4e5a2e 06-Feb-2008 Stephen Hines <hines@cs.fsu.edu>

Make the Event::description() a const function


# 4636:afc8da9f526e 14-Apr-2007 Gabe Black <gblack@eecs.umich.edu>

Add support for microcode and pull out the special branch delay slot handling. Branch delay slots need to be squash on a mispredict as well because the nnpc they saw was incorrect.


# 4329:52057dbec096 04-Apr-2007 Kevin Lim <ktlim@umich.edu>

Pass ISA-specific O3 CPU as a constructor parameter instead of using setCPU functions.

src/cpu/o3/alpha/cpu_impl.hh:
Pass ISA-specific O3 CPU to FullO3CPU as a constructor parameter instead of using setCPU functions.


# 4035:f80ad98b2304 23-Mar-2007 Kevin Lim <ktlim@umich.edu>

Updates for commit.
1. Move interrupt handling to a separate function to clean up main commit() function a bit. Also gate the function call off properly based on whether or not there are outstanding interrupts, and the system is not in PAL mode.
2. Better handling of updating instruction's status bits. Instructions are not marked "atCommit" until other stages view it (pushed off to IEW/IQ), and they have been properly handled (faults).
3. Don't consider the ROB "empty" for the purpose of other stages until the ROB is empty, all stores have written back, and there was no store commits this cycle. The last is necessary in case a store committed, in which case it would look like all stores have written back but in actuality have not.

src/cpu/o3/commit.hh:
Slightly modify how interrupts are handled. Also include some extra bools to keep track of state properly.
src/cpu/o3/commit_impl.hh:
Slightly modify how interrupts are handled. Also include some extra bools to keep track of state.

General correctness updates, most specifically for when commit broadcasts to other stages that the ROB is empty.


# 3640:3a2f7b451641 13-Nov-2006 Kevin Lim <ktlim@umich.edu>

More interrupt reworking.


# 2980:eab855f06b79 15-Aug-2006 Gabe Black <gblack@eecs.umich.edu>

Cleaned up include files and got rid of many using directives in header files.


# 2965:82703e01285a 26-Jul-2006 Korey Sewell <ksewell@umich.edu>

MIPS ISA runs 'hello world' in O3CPU ...

src/arch/mips/isa/base.isa:
special case syscall disasembly... maybe give own instruction class?
src/arch/mips/isa/decoder.isa:
add 'IsSerializeAfter' flag for syscall
src/cpu/o3/commit.hh:
Add skidBuffer to commit
src/cpu/o3/commit_impl.hh:
Use skidbuffer in MIPS ISA
src/cpu/o3/fetch_impl.hh:
Print name out when there is a fault
src/cpu/o3/mips/cpu_impl.hh:
change comment


# 2935:d1223a6c9156 23-Jul-2006 Korey Sewell <ksewell@umich.edu>

This changeset gets the MIPS ISA pretty much working in the O3CPU. It builds, runs, and gets very very close to completing the hello world
succesfully but there are some minor quirks to iron out. Who would've known a DELAY SLOT introduces that much complexity?! arrgh!

Anyways, a lot of this stuff had to do with my project at MIPS and me needing to know how I was going to get this working for the MIPS
ISA. So I figured I would try to touch it up and throw it in here (I hate to introduce non-completely working components... )

src/arch/alpha/isa/mem.isa:
spacing
src/arch/mips/faults.cc:
src/arch/mips/faults.hh:
Gabe really authored this
src/arch/mips/isa/decoder.isa:
add StoreConditional Flag to instruction
src/arch/mips/isa/formats/basic.isa:
Steven really did this file
src/arch/mips/isa/formats/branch.isa:
fix bug for uncond/cond control
src/arch/mips/isa/formats/mem.isa:
Adjust O3CPU memory access to use new memory model interface.
src/arch/mips/isa/formats/util.isa:
update LoadStoreBase template
src/arch/mips/isa_traits.cc:
update SERIALIZE partially
src/arch/mips/process.cc:
src/arch/mips/process.hh:
no need for this for NOW. ASID/Virtual addressing handles it
src/arch/mips/regfile/misc_regfile.hh:
add in clear() function and comments for future usage of special misc. regs
src/cpu/base_dyn_inst.hh:
add in nextNPC variable and supporting functions.

add isCondDelaySlot function

Update predTaken and mispredicted functions
src/cpu/base_dyn_inst_impl.hh:
init nextNPC
src/cpu/o3/SConscript:
add MIPS files to compile
src/cpu/o3/alpha/thread_context.hh:
no need for my name on this file
src/cpu/o3/bpred_unit_impl.hh:
Update RAS appropriately for MIPS
src/cpu/o3/comm.hh:
add some extra communication variables to aid in handling the
delay slots
src/cpu/o3/commit.hh:
minor name fix for nextNPC functions.
src/cpu/o3/commit_impl.hh:
src/cpu/o3/decode_impl.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/inst_queue_impl.hh:
src/cpu/o3/rename_impl.hh:
Fix necessary variables and functions for squashes with delay slots
src/cpu/o3/cpu.cc:
Update function interface ...

adjust removeInstsNotInROB function to recognize delay slots insts
src/cpu/o3/cpu.hh:
update removeInstsNotInROB
src/cpu/o3/decode.hh:
declare necessary variables for handling delay slot
src/cpu/o3/dyn_inst.hh:
Add in MipsDynInst
src/cpu/o3/fetch.hh:
src/cpu/o3/iew.hh:
src/cpu/o3/rename.hh:
declare necessary variables and adjust functions for handling delay slot
src/cpu/o3/inst_queue.hh:
src/cpu/simple/base.cc:
no need for my name here
src/cpu/o3/isa_specific.hh:
add in MIPS files
src/cpu/o3/scoreboard.hh:
dont include alpha specific isa traits!
src/cpu/o3/thread_context.hh:
no need for my name here, i just rearranged where the file goes
src/cpu/static_inst.hh:
add isCondDelaySlot function
src/cpu/o3/mips/cpu.cc:
src/cpu/o3/mips/cpu.hh:
src/cpu/o3/mips/cpu_builder.cc:
src/cpu/o3/mips/cpu_impl.hh:
src/cpu/o3/mips/dyn_inst.cc:
src/cpu/o3/mips/dyn_inst.hh:
src/cpu/o3/mips/dyn_inst_impl.hh:
src/cpu/o3/mips/impl.hh:
src/cpu/o3/mips/params.hh:
src/cpu/o3/mips/thread_context.cc:
src/cpu/o3/mips/thread_context.hh:
MIPS file for O3CPU...mirrors ALPHA definition


# 2874:5389a28b80fb 10-Jul-2006 Kevin Lim <ktlim@umich.edu>

Some minor cleanups.

src/cpu/SConscript:
Change the error message to be slightly nicer.
src/cpu/o3/commit.hh:
Remove old code.
src/cpu/o3/commit_impl.hh:
Remove old unused code.


# 2863:2592e056dc5c 06-Jul-2006 Kevin Lim <ktlim@umich.edu>

Fix the O3CPU to support the multi-pass method for checking if the system has fully drained.

src/cpu/o3/commit.hh:
src/cpu/o3/commit_impl.hh:
src/cpu/o3/decode.hh:
src/cpu/o3/decode_impl.hh:
src/cpu/o3/fetch.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/iew.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/rename.hh:
src/cpu/o3/rename_impl.hh:
Return a value so that the CPU can instantly return from draining if the pipeline is already drained.
src/cpu/o3/cpu.cc:
Use values returned from pipeline stages so that the CPU can instantly return from draining if the pipeline is already drained.


# 2843:19c4c6c2b5b1 06-Jul-2006 Kevin Lim <ktlim@umich.edu>

Support for draining, and the new method of switching out. Now switching out happens after the pipeline has been drained, deferring the three way handshake to the normal drain mechanism. The calls of switchOut() and takeOverFrom() both take action immediately.

src/cpu/o3/commit.hh:
src/cpu/o3/commit_impl.hh:
src/cpu/o3/cpu.cc:
src/cpu/o3/cpu.hh:
src/cpu/o3/decode.hh:
src/cpu/o3/decode_impl.hh:
src/cpu/o3/fetch.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/iew.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/rename.hh:
src/cpu/o3/rename_impl.hh:
Support for draining, new method of switching out.


# 2831:0a42b294727c 02-Jul-2006 Korey Sewell <ksewell@umich.edu>

Fix default SMT configuration in O3CPU (i.e. fetch policy, workloads/numThreads)

Edit Test3 for newmem

src/base/traceflags.py:
Add O3CPU flag
src/cpu/base.cc:
for some reason adding a BaseCPU flag doesnt work so just go back to old way...
src/cpu/o3/alpha/cpu_builder.cc:
Determine number threads by workload size instead of solely by parameter.

Default SMT fetch policy to RoundRobin if it's not specified in Config file
src/cpu/o3/commit.hh:
only use nextNPC for !ALPHA
src/cpu/o3/commit_impl.hh:
add FetchTrapPending as condition for commit
src/cpu/o3/cpu.cc:
panic if active threads is more than Impl::MaxThreads
src/cpu/o3/fetch.hh:
src/cpu/o3/inst_queue.hh:
src/cpu/o3/inst_queue_impl.hh:
src/cpu/o3/rob.hh:
src/cpu/o3/rob_impl.hh:
name stuff
src/cpu/o3/fetch_impl.hh:
fatal if try to use SMT branch count, that's unimplemented right now
src/python/m5/config.py:
make it clearer that a parameter is not valid within a configuration class


# 2757:58e3a66e72f7 16-Jun-2006 Kevin Lim <ktlim@umich.edu>

Merge ktlim@zizzer:/bk/newmem
into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge


# 2756:7bf0d6481df9 15-Jun-2006 Korey Sewell <ksewell@umich.edu>

Initial changes to allowed DetailedCPU to work with other architectures (i.e. Sparc & MIPS)

Still need to add some code to fetch & commit stages

src/cpu/o3/commit.hh:
src/cpu/o3/cpu.cc:
src/cpu/o3/cpu.hh:
Add nextNPC read & set functions
src/cpu/o3/fetch.hh:
src/cpu/o3/fetch_impl.hh:
Add nextNPC


# 2733:e0eac8fc5774 16-Jun-2006 Kevin Lim <ktlim@umich.edu>

Two updates that got combined into one ChangeSet accidentally. They're both pretty simple so they shouldn't cause any trouble.

First: Rename FullCPU and its variants in the o3 directory to O3CPU to differentiate from the old model, and also to specify it's an out of order model.

Second: Include build options for selecting the Checker to be used. These options make sure if the Checker is being used there is a CPU that supports it also being compiled.

SConstruct:
Add in option USE_CHECKER to allow for not compiling in checker code. The checker is enabled through this option instead of through the CPU_MODELS list. However it's still necessary to treat the Checker like a CPU model, so it is appended onto the CPU_MODELS list if enabled.
configs/test/test.py:
Name change for DetailedCPU to DetailedO3CPU. Also include option for max tick.
src/base/traceflags.py:
Add in O3CPU trace flag.
src/cpu/SConscript:
Rename AlphaFullCPU to AlphaO3CPU.

Only include checker sources if they're necessary. Also add a list of CPUs that support the Checker, and only allow the Checker to be compiled in if one of those CPUs are also being included.
src/cpu/base_dyn_inst.cc:
src/cpu/base_dyn_inst.hh:
Rename typedef to ImplCPU instead of FullCPU, to differentiate from the old FullCPU.
src/cpu/cpu_models.py:
src/cpu/o3/alpha_cpu.cc:
src/cpu/o3/alpha_cpu.hh:
src/cpu/o3/alpha_cpu_builder.cc:
src/cpu/o3/alpha_cpu_impl.hh:
Rename AlphaFullCPU to AlphaO3CPU to differentiate from old FullCPU model.
src/cpu/o3/alpha_dyn_inst.hh:
src/cpu/o3/alpha_dyn_inst_impl.hh:
src/cpu/o3/alpha_impl.hh:
src/cpu/o3/alpha_params.hh:
src/cpu/o3/commit.hh:
src/cpu/o3/cpu.hh:
src/cpu/o3/decode.hh:
src/cpu/o3/decode_impl.hh:
src/cpu/o3/fetch.hh:
src/cpu/o3/iew.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/inst_queue.hh:
src/cpu/o3/lsq.hh:
src/cpu/o3/lsq_impl.hh:
src/cpu/o3/lsq_unit.hh:
src/cpu/o3/regfile.hh:
src/cpu/o3/rename.hh:
src/cpu/o3/rename_impl.hh:
src/cpu/o3/rob.hh:
src/cpu/o3/rob_impl.hh:
src/cpu/o3/thread_state.hh:
src/python/m5/objects/AlphaO3CPU.py:
Rename FullCPU to O3CPU to differentiate from old FullCPU model.
src/cpu/o3/commit_impl.hh:
src/cpu/o3/cpu.cc:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/lsq_unit_impl.hh:
Rename FullCPU to O3CPU to differentiate from old FullCPU model.
Also #ifdef the checker code so it doesn't need to be included if it's not selected.


# 2702:8a3ee279559b 12-Jun-2006 Kevin Lim <ktlim@umich.edu>

Clean up/shift some code around.

src/cpu/base_dyn_inst.cc:
Clean up some code and update.
src/cpu/base_dyn_inst.hh:
Clean up some code and update with more descriptive function names.
src/cpu/o3/alpha_cpu_builder.cc:
src/cpu/o3/alpha_params.hh:
src/cpu/o3/commit.hh:
Remove unused parameters.
src/cpu/o3/commit_impl.hh:
Remove unused parameters, also set squashCounter directly to the counted number of squashes.
src/cpu/o3/fetch_impl.hh:
Update for function name changes.
src/cpu/o3/iew.hh:
src/cpu/o3/iew_impl.hh:
Remove unused parameter, move some code into a function.


# 2680:246e7104f744 06-Jun-2006 Kevin Lim <ktlim@umich.edu>

Change ExecContext to ThreadContext. This is being renamed to differentiate between the interface used objects outside of the CPU, and the interface used by the ISA. ThreadContext is used by objects outside of the CPU and is specifically defined in thread_context.hh. ExecContext is more implicit, and is defined by files such as base_dyn_inst.hh or cpu/simple/base.hh.

Further renames/reorganization will be coming shortly; what is currently CPUExecContext (the old ExecContext from m5) will be renamed to SimpleThread or something similar.

src/arch/alpha/arguments.cc:
src/arch/alpha/arguments.hh:
src/arch/alpha/ev5.cc:
src/arch/alpha/faults.cc:
src/arch/alpha/faults.hh:
src/arch/alpha/freebsd/system.cc:
src/arch/alpha/freebsd/system.hh:
src/arch/alpha/isa/branch.isa:
src/arch/alpha/isa/decoder.isa:
src/arch/alpha/isa/main.isa:
src/arch/alpha/linux/process.cc:
src/arch/alpha/linux/system.cc:
src/arch/alpha/linux/system.hh:
src/arch/alpha/linux/threadinfo.hh:
src/arch/alpha/process.cc:
src/arch/alpha/regfile.hh:
src/arch/alpha/stacktrace.cc:
src/arch/alpha/stacktrace.hh:
src/arch/alpha/tlb.cc:
src/arch/alpha/tlb.hh:
src/arch/alpha/tru64/process.cc:
src/arch/alpha/tru64/system.cc:
src/arch/alpha/tru64/system.hh:
src/arch/alpha/utility.hh:
src/arch/alpha/vtophys.cc:
src/arch/alpha/vtophys.hh:
src/arch/mips/faults.cc:
src/arch/mips/faults.hh:
src/arch/mips/isa_traits.cc:
src/arch/mips/isa_traits.hh:
src/arch/mips/linux/process.cc:
src/arch/mips/process.cc:
src/arch/mips/regfile/float_regfile.hh:
src/arch/mips/regfile/int_regfile.hh:
src/arch/mips/regfile/misc_regfile.hh:
src/arch/mips/regfile/regfile.hh:
src/arch/mips/stacktrace.hh:
src/arch/sparc/faults.cc:
src/arch/sparc/faults.hh:
src/arch/sparc/isa_traits.hh:
src/arch/sparc/linux/process.cc:
src/arch/sparc/linux/process.hh:
src/arch/sparc/process.cc:
src/arch/sparc/regfile.hh:
src/arch/sparc/solaris/process.cc:
src/arch/sparc/stacktrace.hh:
src/arch/sparc/ua2005.cc:
src/arch/sparc/utility.hh:
src/arch/sparc/vtophys.cc:
src/arch/sparc/vtophys.hh:
src/base/remote_gdb.cc:
src/base/remote_gdb.hh:
src/cpu/base.cc:
src/cpu/base.hh:
src/cpu/base_dyn_inst.hh:
src/cpu/checker/cpu.cc:
src/cpu/checker/cpu.hh:
src/cpu/checker/exec_context.hh:
src/cpu/cpu_exec_context.cc:
src/cpu/cpu_exec_context.hh:
src/cpu/cpuevent.cc:
src/cpu/cpuevent.hh:
src/cpu/exetrace.hh:
src/cpu/intr_control.cc:
src/cpu/memtest/memtest.hh:
src/cpu/o3/alpha_cpu.hh:
src/cpu/o3/alpha_cpu_impl.hh:
src/cpu/o3/alpha_dyn_inst_impl.hh:
src/cpu/o3/commit.hh:
src/cpu/o3/commit_impl.hh:
src/cpu/o3/cpu.cc:
src/cpu/o3/cpu.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/regfile.hh:
src/cpu/o3/thread_state.hh:
src/cpu/ozone/back_end.hh:
src/cpu/ozone/cpu.hh:
src/cpu/ozone/cpu_impl.hh:
src/cpu/ozone/front_end.hh:
src/cpu/ozone/front_end_impl.hh:
src/cpu/ozone/inorder_back_end.hh:
src/cpu/ozone/lw_back_end.hh:
src/cpu/ozone/lw_back_end_impl.hh:
src/cpu/ozone/lw_lsq.hh:
src/cpu/ozone/lw_lsq_impl.hh:
src/cpu/ozone/thread_state.hh:
src/cpu/pc_event.cc:
src/cpu/pc_event.hh:
src/cpu/profile.cc:
src/cpu/profile.hh:
src/cpu/quiesce_event.cc:
src/cpu/quiesce_event.hh:
src/cpu/simple/atomic.cc:
src/cpu/simple/base.cc:
src/cpu/simple/base.hh:
src/cpu/simple/timing.cc:
src/cpu/static_inst.cc:
src/cpu/static_inst.hh:
src/cpu/thread_state.hh:
src/dev/alpha_console.cc:
src/dev/ns_gige.cc:
src/dev/sinic.cc:
src/dev/tsunami_cchip.cc:
src/kern/kernel_stats.cc:
src/kern/kernel_stats.hh:
src/kern/linux/events.cc:
src/kern/linux/events.hh:
src/kern/system_events.cc:
src/kern/system_events.hh:
src/kern/tru64/dump_mbuf.cc:
src/kern/tru64/tru64.hh:
src/kern/tru64/tru64_events.cc:
src/kern/tru64/tru64_events.hh:
src/mem/vport.cc:
src/mem/vport.hh:
src/sim/faults.cc:
src/sim/faults.hh:
src/sim/process.cc:
src/sim/process.hh:
src/sim/pseudo_inst.cc:
src/sim/pseudo_inst.hh:
src/sim/syscall_emul.cc:
src/sim/syscall_emul.hh:
src/sim/system.cc:
src/cpu/thread_context.hh:
src/sim/system.hh:
src/sim/vptr.hh:
Change ExecContext to ThreadContext.


# 2674:6d4afef73a20 04-Jun-2006 Kevin Lim <ktlim@umich.edu>

Merge ktlim@zamp:/z/ktlim2/clean/m5-o3
into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge

src/cpu/checker/o3_cpu_builder.cc:
src/cpu/o3/alpha_cpu.hh:
src/cpu/o3/alpha_cpu_impl.hh:
src/cpu/o3/alpha_dyn_inst_impl.hh:
src/cpu/o3/bpred_unit.cc:
src/cpu/o3/commit.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/lsq_unit.hh:
src/cpu/o3/lsq_unit_impl.hh:
src/cpu/o3/thread_state.hh:
Hand merge.


# 2670:9107b8bd08cd 02-Jun-2006 Kevin Lim <ktlim@umich.edu>

Merge ktlim@zizzer:/bk/newmem
into zizzer.eecs.umich.edu:/.automount/zamp/z/ktlim2/clean/newmem


# 2669:f2b336e89d2a 02-Jun-2006 Kevin Lim <ktlim@umich.edu>

Fixes to get compiling to work. This is mainly fixing up some includes; changing functions within the XCs; changing MemReqPtrs to Requests or Packets where appropriate.

Currently the O3 and Ozone CPUs do not work in the new memory system; I still need to fix up the ports to work and handle responses properly. This check-in is so that the merge between m5 and newmem is no longer outstanding.

src/SConscript:
Need to include FU Pool for new CPU model. I'll try to figure out a cleaner way to handle this in the future.
src/base/traceflags.py:
Include new traces flags, fix up merge mess up.
src/cpu/SConscript:
Include the base_dyn_inst.cc as one of othe sources.
Don't compile the Ozone CPU for now.
src/cpu/base.cc:
Remove an extra } from the merge.
src/cpu/base_dyn_inst.cc:
Fixes to make compiling work. Don't instantiate the OzoneCPU for now.
src/cpu/base_dyn_inst.hh:
src/cpu/o3/2bit_local_pred.cc:
src/cpu/o3/alpha_cpu_builder.cc:
src/cpu/o3/alpha_cpu_impl.hh:
src/cpu/o3/alpha_dyn_inst.hh:
src/cpu/o3/alpha_params.hh:
src/cpu/o3/bpred_unit.cc:
src/cpu/o3/btb.hh:
src/cpu/o3/commit.hh:
src/cpu/o3/commit_impl.hh:
src/cpu/o3/cpu.cc:
src/cpu/o3/cpu.hh:
src/cpu/o3/fetch.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/free_list.hh:
src/cpu/o3/iew.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/inst_queue.hh:
src/cpu/o3/inst_queue_impl.hh:
src/cpu/o3/regfile.hh:
src/cpu/o3/sat_counter.hh:
src/cpu/op_class.hh:
src/cpu/ozone/cpu.hh:
src/cpu/checker/cpu.cc:
src/cpu/checker/cpu.hh:
src/cpu/checker/exec_context.hh:
src/cpu/checker/o3_cpu_builder.cc:
src/cpu/ozone/cpu_impl.hh:
src/mem/request.hh:
src/cpu/o3/fu_pool.hh:
src/cpu/o3/lsq.hh:
src/cpu/o3/lsq_unit.hh:
src/cpu/o3/lsq_unit_impl.hh:
src/cpu/o3/thread_state.hh:
src/cpu/ozone/back_end.hh:
src/cpu/ozone/dyn_inst.cc:
src/cpu/ozone/dyn_inst.hh:
src/cpu/ozone/front_end.hh:
src/cpu/ozone/inorder_back_end.hh:
src/cpu/ozone/lw_back_end.hh:
src/cpu/ozone/lw_lsq.hh:
src/cpu/ozone/ozone_impl.hh:
src/cpu/ozone/thread_state.hh:
Fixes to get compiling to work.
src/cpu/o3/alpha_cpu.hh:
Fixes to get compiling to work.
Float reg accessors have changed, as well as MemReqPtrs to RequestPtrs.
src/cpu/o3/alpha_dyn_inst_impl.hh:
Fixes to get compiling to work.
Pass in the packet to the completeAcc function.
Fix up syscall function.


# 2665:a124942bacb8 31-May-2006 Ali Saidi <saidi@eecs.umich.edu>

Updated Authors from bk prs info


# 2654:9559cfa91b9d 30-May-2006 Kevin Lim <ktlim@umich.edu>

Merge ktlim@zizzer:/bk/m5
into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem

SConstruct:
src/SConscript:
src/arch/SConscript:
src/arch/alpha/faults.cc:
src/arch/alpha/tlb.cc:
src/base/traceflags.py:
src/cpu/SConscript:
src/cpu/base.cc:
src/cpu/base.hh:
src/cpu/base_dyn_inst.cc:
src/cpu/cpu_exec_context.cc:
src/cpu/cpu_exec_context.hh:
src/cpu/exec_context.hh:
src/cpu/o3/alpha_cpu.hh:
src/cpu/o3/alpha_cpu_impl.hh:
src/cpu/o3/alpha_dyn_inst.hh:
src/cpu/o3/cpu.cc:
src/cpu/o3/cpu.hh:
src/cpu/o3/regfile.hh:
src/cpu/ozone/cpu.hh:
src/cpu/simple/base.cc:
src/cpu/base_dyn_inst.hh:
src/cpu/o3/2bit_local_pred.cc:
src/cpu/o3/2bit_local_pred.hh:
src/cpu/o3/alpha_cpu.cc:
src/cpu/o3/alpha_cpu_builder.cc:
src/cpu/o3/alpha_dyn_inst.cc:
src/cpu/o3/alpha_dyn_inst_impl.hh:
src/cpu/o3/alpha_impl.hh:
src/cpu/o3/alpha_params.hh:
src/cpu/o3/bpred_unit.cc:
src/cpu/o3/bpred_unit.hh:
src/cpu/o3/bpred_unit_impl.hh:
src/cpu/o3/btb.cc:
src/cpu/o3/btb.hh:
src/cpu/o3/comm.hh:
src/cpu/o3/commit.cc:
src/cpu/o3/commit.hh:
src/cpu/o3/commit_impl.hh:
src/cpu/o3/cpu_policy.hh:
src/cpu/o3/decode.cc:
src/cpu/o3/decode.hh:
src/cpu/o3/decode_impl.hh:
src/cpu/o3/fetch.cc:
src/cpu/o3/fetch.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/free_list.cc:
src/cpu/o3/free_list.hh:
src/cpu/o3/iew.cc:
src/cpu/o3/iew.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/inst_queue.cc:
src/cpu/o3/inst_queue.hh:
src/cpu/o3/inst_queue_impl.hh:
src/cpu/o3/mem_dep_unit.cc:
src/cpu/o3/mem_dep_unit.hh:
src/cpu/o3/mem_dep_unit_impl.hh:
src/cpu/o3/ras.cc:
src/cpu/o3/ras.hh:
src/cpu/o3/rename.cc:
src/cpu/o3/rename.hh:
src/cpu/o3/rename_impl.hh:
src/cpu/o3/rename_map.cc:
src/cpu/o3/rename_map.hh:
src/cpu/o3/rob.cc:
src/cpu/o3/rob.hh:
src/cpu/o3/rob_impl.hh:
src/cpu/o3/sat_counter.cc:
src/cpu/o3/sat_counter.hh:
src/cpu/o3/store_set.cc:
src/cpu/o3/store_set.hh:
src/cpu/o3/tournament_pred.cc:
src/cpu/o3/tournament_pred.hh:
Hand merges.


# 2632:1bb2f91485ea 22-May-2006 Steve Reinhardt <stever@eecs.umich.edu>

New directory structure:
- simulator source now in 'src' subdirectory
- imported files from 'ext' repository
- support building in arbitrary places, including
outside of the source tree. See comment at top
of SConstruct file for more details.
Regression tests are temporarily disabled; that
syetem needs more extensive revisions.

SConstruct:
Update for new directory structure.
Modify to support build trees that are not subdirectories
of the source tree. See comment at top of file for
more details.
Regression tests are temporarily disabled.
src/arch/SConscript:
src/arch/isa_parser.py:
src/python/SConscript:
Update for new directory structure.