History log of /gem5/src/cpu/o3/lsq_unit.hh
Revision Date Author Comments
# 14083:057fe59ed45a 12-Jul-2019 Pouya Fotouhi <Pouya.Fotouhi@amd.com>

cpu-o3: Set packet data type for IPR read

This change assigns packet data type to static for IPR read.
Caused by change (e13d6dc9c0d7a4ae0215f1ee6793eb32570c5169),
and has been reported a few times in the mailing list.

Change-Id: I0f02c20a16824e220df876e9e552bbc1c9636f95
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19449
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>


# 14030:a58e14bf581c 08-Feb-2018 Gabor Dozsa <gabor.dozsa@arm.com>

cpu-o3: Increase LSQ buffer sizes to match max vector length

Change-Id: I5890c7cfa147125ce3389001f85d56d4b5a9911d
Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13525
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Michael LeBeane <Michael.Lebeane@amd.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>


# 13831:4fba790d88be 06-Mar-2019 Andrea Mondelli <Andrea.Mondelli@ucf.edu>

misc: Removed inconsistency in O3* debug msgs

Added consistency in the DEBUG message form, to allow a better parsing.
Fixed sn/tid type parameter.
Removed some annoying newlines

Change-Id: I4761c49fc12b874a7d8b46779475b606865cad4b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17248
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 13652:45d94ac03a27 22-Jan-2018 Tuan Ta <qtt2@cornell.edu>

cpu: support atomic memory request type with AtomicOpFunctor

This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU,
MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory
system.

Atomic memory instruction is treated as a special store instruction in
all CPU models.

In simple CPUs, an AMO request with an associated AtomicOpFunctor is
simply sent to L1 dcache.

In MinorCPU, an AMO request bypasses store buffer and waits for any
conflicting store request(s) currently in the store buffer to retire
before the AMO request is sent to the cache. AMO requests are not buffered
in the store buffer, so their effects appear immediately in the cache.

In DerivO3CPU, an AMO request is inserted in the store buffer so that it
is delivered to the cache only after all previous stores are issued to
the cache. Data forwarding between between an outstanding AMO in the
store buffer and a subsequent load is not allowed since the AMO request
does not hold valid data until it's executed in the cache.

This implementation assumes that a target ISA implementation must insert
enough memory fences as micro-ops around an atomic instruction to
enforce a correct order of memory instructions with respect to its
memory consistency model. Without extra memory fences, this implementation
can allow AMOs and other memory instructions that do not conflict
(i.e., not target the same address) to reorder.

This implementation also assumes that atomic instructions execute within
a cache line boundary since the cache for now is not able to execute an
operation on two different cache lines in one single step. Therefore,
ISAs like x86 that require multi-cache-line atomic instructions need to
either use a pair of locking load and unlocking store or change the
cache implementation to guarantee the atomicity of an atomic
instruction.

Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a
Reviewed-on: https://gem5-review.googlesource.com/c/8188
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 13590:d7e018859709 13-Feb-2017 Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com>

cpu-o3: O3 LSQ Generalisation

This patch does a large modification of the LSQ in the O3 model. The
main goal of the patch is to remove the 'an operation can be served with
one or two memory requests' assumption that is present in the LSQ
and the instruction with the req, reqLow, reqHigh triplet, and
generalising it to operations that can be addressed with one request,
and operations that require many requests, embodied in the
SingleDataRequest and the SplitDataRequest.

This modification has been done mimicking the minor model to an extent,
shifting the responsibilities of dealing with VtoP translation and
tracking the status and resources from the DynInst to the LSQ via the
LSQRequest. The LSQRequest models the information concerning the
operation, handles the creation of fragments for translation and request
as well as assembling/splitting the data accordingly.

With this modifications, the implementation of vector ISAs, particularly
on the memory side, become more rich, as the new model permits a
dissociation of the ISA characteristics as vector length, from the
microarchitectural characteristics that govern how contiguous loads are
executing, allowing exploration of different LSQ to DL1 bus widths to
understand the tradeoffs in complexity and performance.

Part of the complexities introduced stem from the fact that gem5 keeps a
large amount of metadata regarding, in particular, memory operations,
thus, when an instruction is squashed while some operation as TLB lookup
or cache access is ongoing, when the relevant structure communicates to
the LSQ that the operation is over, it tries to access some pieces of
data that should have died when the instruction is squashed, leading to
asserts, panics, or memory corruption. To ensure the correct behaviour,
the LSQRequest rely on assesing who is their owner, and self-destroying
if they detect their owner is done with the request, and there will be
no subsequent action. For example, in the case of an instruction
squashed whal the TLB is doing a walk to serve the translation, when the
translation is served by the TLB, the LSQRequest detects that the
instruction was squashed, and as the translation is done, no one else
expect to access its information, and therefore, it self-destructs.
Having destroyed the LSQRequest earlier, would lead to wrong behaviour
as the TLB walk may access some fields of it.

Additional authors:
- Gabor Dozsa <gabor.dozsa@arm.com>

Change-Id: I9578a1a3f6b899c390cdd886856a24db68ff7d0c
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/13516
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>


# 13472:7ceacede4f1e 01-Mar-2017 Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com>

cpu: Change raw pointers to STL Containers

This patch changes two members from being raw pointers to being STL
containers. The reason behind, other than cleanlyness and arguable OO
best practices is that containers have more intronspections capabilities
than naked pointers do, as the size is known.

Using STL containers adds little overhead and eases the automation of
process during debugging (gdb).

Change-Id: I4d9d3eedafa8b5e50ac512ea93b458a4200229f2
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/13126
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 13429:a1e199fd8122 06-Feb-2017 Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com>

cpu: Fix the usage of const DynInstPtr

Summary: Usage of const DynInstPtr& when possible and introduction of
move operators to RefCountingPtr.

In many places, scoped references to dynamic instructions do a copy of
the DynInstPtr when a reference would do. This is detrimental to
performance. On top of that, in case there is a need for reference
tracking for debugging, the redundant copies make the process much more
painful than it already is.

Also, from the theoretical point of view, a function/method that
defines a convenience name to access an instruction should not be
considered an owner of the data, i.e., doing a copy and not a reference
is not justified.

On a related topic, C++11 introduces move semantics, and those are
useful when, for example, there is a class modelling a HW structure that
contains a list, and has a getHeadOfList function, to prevent doing a
copy to an internal variable -> update pointer, remove from the list ->
update pointer, return value making a copy to the assined variable ->
update pointer, destroy the returned value -> update pointer.

Change-Id: I3bb46c20ef23b6873b469fd22befb251ac44d2f6
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/13105
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 12749:223c83ed9979 04-Jun-2018 Giacomo Travaglini <giacomo.travaglini@arm.com>

misc: Using smart pointers for memory Requests

This patch is changing the underlying type for RequestPtr from Request*
to shared_ptr<Request>. Having memory requests being managed by smart
pointers will simplify the code; it will also prevent memory leakage and
dangling pointers.

Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10996
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 12748:ae5ce8e42de7 03-Jun-2018 Giacomo Travaglini <giacomo.travaglini@arm.com>

misc: Substitute pointer to Request with aliased RequestPtr

Every usage of Request* in the code has been replaced with the
RequestPtr alias. This is a preparing patch for when RequestPtr will be
the typdefed to a smart pointer to Request rather then a raw pointer to
Request.

Change-Id: I73cbaf2d96ea9313a590cdc731a25662950cd51a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10995
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>


# 12355:568ec3a0c614 07-Feb-2017 Nikos Nikoleris <nikos.nikoleris@arm.com>

cpu: Add support for CMOs in the cpu models

Cache maintenance operations go through the write channel of the
cpu. This changes makes sure that the cpu does not try to fill in the
packet with data.

Change-Id: Ic83205bb1cda7967636d88f15adcb475eb38d158
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/5055
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>


# 12171:b11b56bba18f 28-Aug-2017 Matthias Hille <matthiashille8@gmail.com>

cpu-o3: fix data pkt initialization for split load

When a split load hits a memory region where IPRs are mapped, the
Writebackevent which is scheduled for that was carrying a data packet
that was not correctly initialized which caused an assertion to fire
when the Writeback event is processed.

Change-Id: I71a4e291f0086f7468d7e8124a0a8f098088972f
Signed-off-by: Matthias Hille <matthiashille8@gmail.com>
Reported-by: Matthias Hille <matthiashille8@gmail.com>
Reviewed-on: https://gem5-review.googlesource.com/4620
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>


# 12022:256a709054f3 09-Apr-2017 Alec Roelke <ar4jc@virginia.edu>

cpu: fix problem with forwarding and locked load

If a (regular) store is followed closely enough by a locked load that
overlaps, the LSQ will forward the store's data to the locked load and
never tell the cache about the locked load. As a result, the cache will
not lock the address and all future store-conditional requests on that
address will fail. This patch fixes that by preventing forwarding if
the memory request is a locked load and adding another case to the LSQ
forwarding logic that delays the locked load request if a store in the
LSQ contains all or part of the data that is requested.

[Merge second and last if blocks because their bodies are the same.]

Change-Id: I895cc2b9570035267bdf6ae3fdc8a09049969841
Reviewed-on: https://gem5-review.googlesource.com/2400
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 11780:9af039ea0c1e 21-Dec-2016 Arthur Perais <arthur.perais@inria.fr>

cpu: Clarify meaning of cachePorts variable in lsq_unit.hh of O3

cachePorts currently constrains the number of store packets written to the
D-Cache each cycle), but loads currently affect this variable. This leads
to unexpected congestion (e.g., setting cachePorts to a realistic 1 will
in fact allow a store to WB only if no loads have accessed the D-Cache
this cycle). In the absence of arbitration, this patch decouples how many
loads can be done per cycle from how many stores can be done per cycle.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>


# 11302:bce9037689b0 17-Jan-2016 Steve Reinhardt <steve.reinhardt@amd.com>

cpu: remove unnecessary data ptr from O3 internal read() funcs

The read() function merely initiates a memory read operation; the
data doesn't arrive until the access completes and a response packet
is received from the memory system. Thus there's no need to provide
a data pointer; its existence is historical.

Getting this pointer out of this internal o3 interface sets the
stage for similar cleanup in the ExecContext interface. Also
found that we were pointlessly setting the contents at this pointer
on a store forward (the useful memcpy happens just a few lines
below the deleted one).


# 11168:f98eb2da15a4 12-Oct-2015 Andreas Hansson <andreas.hansson@arm.com>

misc: Remove redundant compiler-specific defines

This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap
(and similar) abstractions, as these are no longer needed with gcc 4.7
and clang 3.1 as minimum compiler versions.


# 10824:308771bd2647 05-May-2015 Andreas Sandberg <Andreas.Sandberg@ARM.com>

mem, cpu: Add a separate flag for strictly ordered memory

The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.

This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:

* Speculation: CPU models are not allowed to issue speculative
loads.

* Write combining: CPU models and caches are not allowed to merge
writes to the same cache line.

Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.


# 10537:47fe87b0cf97 14-Nov-2014 Andreas Hansson <andreas.hansson@arm.com>

arm: Fixes based on UBSan and static analysis

Another churn to clean up undefined behaviour, mostly ARM, but some
parts also touching the generic part of the code base.

Most of the fixes are simply ensuring that proper intialisation. One
of the more subtle changes is the return type of the sign-extension,
which is changed to uint64_t. This is to avoid shifting negative
values (undefined behaviour) in the ISA code.


# 10474:799c8ee4ecba 16-Oct-2014 Andreas Hansson <andreas.hansson@arm.com>

arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".


# 10342:711eb0e64249 13-May-2014 Curtis Dunham <Curtis.Dunham@arm.com>

mem: Refactor assignment of Packet types

Put the packet type swizzling (that is currently done in a lot of places)
into a refineCommand() member function.


# 10333:6be8945d226b 03-Sep-2014 Mitch Hayenga <mitch.hayenga@arm.com>

cpu: Fix cache blocked load behavior in o3 cpu

This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than
flushing the entire pipeline, this patch replays loads once the cache becomes
unblocked.

Additionally, deferred memory instructions (loads which had conflicting stores),
when replayed would not respect the number of functional units (only respected
issue width). This patch also corrects that.

Improvements over 20% have been observed on a microbenchmark designed to
exercise this behavior.


# 10327:5b6279635c49 03-Sep-2014 Mitch Hayenga <mitch.hayenga@arm.com>

cpu: Change writeback modeling for outstanding instructions

As highlighed on the mailing list gem5's writeback modeling can impact
performance. This patch removes the limitation on maximum outstanding issued
instructions, however the number that can writeback in a single cycle is still
respected in instToCommit().


# 10239:592f0bb6bd6f 21-Jun-2014 Binh Pham <binhpham@cs.rutgers.edu>

o3: split load & store queue full cases in rename

Check for free entries in Load Queue and Store Queue separately to
avoid cases when load cannot be renamed due to full Store Queue and
vice versa.

This work was done while Binh was an intern at AMD Research.


# 10175:e639ff917d2e 01-Apr-2014 Mitch Hayenga <Mitch.Hayenga@ARM.com>

cpu: Fix case where o3 lsq could print out uninitialized data

In the O3 LSQ, data read/written is printed out in DPRINTFs. However,
the data field is treated as a character string with a null terminated.
However the data field is not encoded this way. This patch removes
that possibility by removing the data part of the print.


# 10031:79d034cd6ba3 24-Jan-2014 Ali Saidi <Ali.Saidi@ARM.com>

cpu: Add support for instructions that zero cache lines.


# 9444:ab47fe7f03f0 07-Jan-2013 Andreas Sandberg <Andreas.Sandberg@ARM.com>

cpu: Rewrite O3 draining to avoid stopping in microcode

Previously, the O3 CPU could stop in the middle of a microcode
sequence. This patch makes sure that the pipeline stops when it has
committed a normal instruction or exited from a microcode
sequence. Additionally, it makes sure that the pipeline has no
instructions in flight when it is drained, which should make draining
more robust.

Draining is controlled in the commit stage, which checks if the next
PC after a committed instruction is in microcode. If this isn't the
case, it requests a squash of all instructions after that the
instruction that just committed and immediately signals a drain stall
to the fetch stage. The CPU then continues to execute until the
pipeline and all associated buffers are empty.


# 9440:fdc91cab5760 07-Jan-2013 Andreas Sandberg <Andreas.Sandberg@ARM.com>

cpu: Fix O3 LSQ debug dumping constness and formatting


# 9358:aa761458ddcb 06-Dec-2012 Nathanael Premillieu <nathanael.premillieu@irisa.fr>

o3 cpu: remove some unused buggy functions in the lsq
Committed by: Nilay Vaish <nilay@cs.wisc.edu>


# 9180:ee8d7a51651d 28-Aug-2012 Andreas Hansson <andreas.hansson@arm.com>

Clock: Add a Cycles wrapper class and use where applicable

This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.

Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.

This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.

In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.

An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.


# 9179:666bc9df1e49 28-Aug-2012 Andreas Hansson <andreas.hansson@arm.com>

Clock: Rework clocks to avoid tick-to-cycle transformations

This patch introduces the notion of a clock update function that aims
to avoid costly divisions when turning the current tick into a
cycle. Each clocked object advances a private (hidden) cycle member
and a tick member and uses these to implement functions for getting
the tick of the next cycle, or the tick of a cycle some time in the
future.

In the different modules using the clocks, changes are made to avoid
counting in ticks only to later translate to cycles. There are a few
oddities in how the O3 and inorder CPU count idle cycles, as seen by a
few locations where a cycle is subtracted in the calculation. This is
done such that the regression does not change any stats, but should be
revisited in a future patch.

Another, much needed, change that is not done as part of this patch is
to introduce a new typedef uint64_t Cycle to be able to at least hint
at the unit of the variables counting Ticks vs Cycles. This will be
done as a follow-up patch.

As an additional follow up, the thread context still uses ticks for
the book keeping of last activate and last suspend and this should
probably also be changed into cycles as well.


# 9152:86c0e6ca5e7c 15-Aug-2012 Anthony Gutierrez <atgutier@umich.edu>

O3,ARM: fix some problems with drain/switchout functionality and add Drain DPRINTFs

This patch fixes some problems with the drain/switchout functionality
for the O3 cpu and for the ARM ISA and adds some useful debug print
statements.

This is an incremental fix as there are still a few bugs/mem leaks with the
switchout code. Particularly when switching from an O3CPU to a
TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA
I haven't encountered any more assertion failures; now the kernel will
typically panic inside of simulation.


# 9046:a1104cc13db2 05-Jun-2012 Ali Saidi <Ali.Saidi@ARM.com>

O3: Clean up the O3 structures and try to pack them a bit better.

DynInst is extremely large the hope is that this re-organization will put the
most used members close to each other.


# 9044:904ddeecc653 05-Jun-2012 Ali Saidi <Ali.Saidi@ARM.com>

sim: Remove FastAlloc

While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe.
After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc
when running twolf for ARM.


# 8975:7f36d4436074 01-May-2012 Andreas Hansson <andreas.hansson@arm.com>

MEM: Separate requests and responses for timing accesses

This patch moves send/recvTiming and send/recvTimingSnoop from the
Port base class to the MasterPort and SlavePort, and also splits them
into separate member functions for requests and responses:
send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq,
send/recvTimingSnoopResp. A master port sends requests and receives
responses, and also receives snoop requests and sends snoop
responses. A slave port has the reciprocal behaviour as it receives
requests and sends responses, and sends snoop requests and receives
snoop responses.

For all MemObjects that have only master ports or slave ports (but not
both), e.g. a CPU, or a PIO device, this patch merely adds more
clarity to what kind of access is taking place. For example, a CPU
port used to call sendTiming, and will now call
sendTimingReq. Similarly, a response previously came back through
recvTiming, which is now recvTimingResp. For the modules that have
both master and slave ports, e.g. the bus, the behaviour was
previously relying on branches based on pkt->isRequest(), and this is
now replaced with a direct call to the apprioriate member function
depending on the type of access. Please note that send/recvRetry is
still shared by all the timing accessors and remains in the Port base
class for now (to maintain the current bus functionality and avoid
changing the statistics of all regressions).

The packet queue is split into a MasterPort and SlavePort version to
facilitate the use of the new timing accessors. All uses of the
PacketQueue are updated accordingly.

With this patch, the type of packet (request or response) is now well
defined for each type of access, and asserts on pkt->isRequest() and
pkt->isResponse() are now moved to the appropriate send member
functions. It is also worth noting that sendTimingSnoopReq no longer
returns a boolean, as the semantics do not alow snoop requests to be
rejected or stalled. All these assumptions are now excplicitly part of
the port interface itself.


# 8949:3fa1ee293096 14-Apr-2012 Andreas Hansson <andreas.hansson@arm.com>

MEM: Remove the Broadcast destination from the packet

This patch simplifies the packet by removing the broadcast flag and
instead more firmly relying on (and enforcing) the semantics of
transactions in the classic memory system, i.e. request packets are
routed from a master to a slave based on the address, and when they
are created they have neither a valid source, nor destination. On
their way to the slave, the request packet is updated with a source
field for all modules that multiplex packets from multiple master
(e.g. a bus). When a request packet is turned into a response packet
(at the final slave), it moves the potentially populated source field
to the destination field, and the response packet is routed through
any multiplexing components back to the master based on the
destination field.

Modules that connect multiplexing components, such as caches and
bridges store any existing source and destination field in the sender
state as a stack (just as before).

The packet constructor is simplified in that there is no longer a need
to pass the Packet::Broadcast as the destination (this was always the
case for the classic memory system). In the case of Ruby, rather than
using the parameter to the constructor we now rely on setDest, as
there is already another three-argument constructor in the packet
class.

In many places where the packet information was printed as part of
DPRINTFs, request packets would be printed with a numeric "dest" that
would always be -1 (Broadcast) and that field is now removed from the
printing.


# 8922:17f037ad8918 30-Mar-2012 William Wang <william.wang@arm.com>

MEM: Introduce the master/slave port sub-classes in C++

This patch introduces the notion of a master and slave port in the C++
code, thus bringing the previous classification from the Python
classes into the corresponding simulation objects and memory objects.

The patch enables us to classify behaviours into the two bins and add
assumptions and enfore compliance, also simplifying the two
interfaces. As a starting point, isSnooping is confined to a master
port, and getAddrRanges to slave ports. More of these specilisations
are to come in later patches.

The getPort function is not getMasterPort and getSlavePort, and
returns a port reference rather than a pointer as NULL would never be
a valid return value. The default implementation of these two
functions is placed in MemObject, and calls fatal.

The one drawback with this specific patch is that it requires some
code duplication, e.g. QueuedPort becomes QueuedMasterPort and
QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort
(avoiding multiple inheritance). With the later introduction of the
port interfaces, moving the functionality outside the port itself, a
lot of the duplicated code will disappear again.


# 8817:c36441eed919 07-Feb-2012 Gabe Black <gblack@eecs.umich.edu>

Faults: Turn off arch/faults.hh

Because there are no longer architecture independent but specialized functions
in arch/XXX/faults.hh, code that isn't using the faults from a particular ISA
no longer needs to be able to include them through the switching header file
arch/faults.hh. By removing that header file (arch/faults.hh), the potential
interface between ISA code and non ISA code is narrowed.


# 8809:bb10807da889 01-Feb-2012 Gabe Black <gblack@eecs.umich.edu>

Merge with head, hopefully the last time for this batch.


# 8807:35e77c938919 29-Jan-2012 Gabe Black <gblack@eecs.umich.edu>

Yet another merge with the main repository.


# 8794:e2ac2b7164dd 18-Nov-2011 Gabe Black <gblack@eecs.umich.edu>

SE/FS: Get rid of includes of config/full_system.hh.


# 8737:770ccf3af571 31-Jan-2012 Koan-Sin Tan <koansin.tan@gmail.com>

clang: Enable compiling gem5 using clang 2.9 and 3.0

This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).

clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.


# 8727:b3995530319f 28-Jan-2012 Nilay Vaish <nilay@cs.wisc.edu>

O3 CPU LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system. By default, the x86 architecture will have TSO.


# 8591:8f23aeaf6a91 27-Sep-2011 Gabe Black <gblack@eecs.umich.edu>

Faults: Replace calls to genMachineCheckFault with M5PanicFault.


# 8545:a3992291e230 13-Sep-2011 Ali Saidi <saidi@eecs.umich.edu>

LSQ: Only trigger a memory violation with a load/load if the value changes.

Only create a memory ordering violation when the value could have changed
between two subsequent loads, instead of just when loads go out-of-order
to the same address. While not very common in the case of Alpha, with
an architecture with a hardware table walker this can happen reasonably
frequently beacuse a translation will miss and start a table walk and
before the CPU re-schedules the faulting instruction another one will
pass it to the same address (or cache block depending on the dendency
checking).

This patch has been tested with a couple of self-checking hand crafted
programs to stress ordering between two cores.

The performance improvement on SPEC benchmarks can be substantial (2-10%).


# 8506:5a9c6f49f882 16-Aug-2011 Gabe Black <gblack@eecs.umich.edu>

O3: Make lsq_unit.hh include arch/isa_traits.hh directly, not transitively.


# 8481:818aea9960f5 31-Jul-2011 Gabe Black <gblack@eecs.umich.edu>

O3: Implement memory mapped IPRs for O3.


# 8316:6fd588813142 23-May-2011 Geoffrey Blake <geoffrey.blake@arm.com>

O3: Fix offset calculation into storeQueue buffer for store->load forwarding

Calculation of offset to copy from storeQueue[idx].data structure for load to
store forwarding fixed to be difference in bytes between store and load virtual
addresses. Previous method would induce bug where a load would index into
buffer at the wrong location.


# 8315:6173b87e7652 23-May-2011 Geoffrey Blake <geoffrey.blake@arm.com>

O3: Fix issue w/wbOutstading being decremented multiple times on blocked cache.

If a split load fails on a blocked cache wbOutstanding can be decremented
twice if the first part of the split load succeeds and the second part fails.
Condition the decrementing on not having completed the first part of the load.


# 8232:b28d06a175be 15-Apr-2011 Nathan Binkert <nate@binkert.org>

trace: reimplement the DTRACE function so it doesn't use a vector
At the same time, rename the trace flags to debug flags since they
have broader usage than simply tracing. This means that
--trace-flags is now --debug-flags and --trace-help is now --debug-help


# 8230:845c8eb5ac49 15-Apr-2011 Nathan Binkert <nate@binkert.org>

includes: fix up code after sorting


# 8229:78bf55f23338 15-Apr-2011 Nathan Binkert <nate@binkert.org>

includes: sort all includes


# 8199:3d6c08c877a9 04-Apr-2011 Ali Saidi <Ali.Saidi@ARM.com>

O3: Tighten memory order violation checking to 16 bytes.

The comment in the code suggests that the checking granularity should be 16
bytes, however in reality the shift by 8 is 256 bytes which seems much
larger than required.


# 7823:dac01f14f20f 08-Jan-2011 Steve Reinhardt <steve.reinhardt@amd.com>

Replace curTick global variable with accessor functions.
This step makes it easy to replace the accessor functions
(which still access a global variable) with ones that access
per-thread curTick values.


# 7786:bafa8a197088 07-Dec-2010 Ali Saidi <Ali.Saidi@ARM.com>

O3: Allow a store entry to store up to 16 bytes (instead of TheISA::IntReg).

The store queue doesn't need to be ISA specific and architectures can
frequently store more than an int registers worth of data. A 128 bits seems
more common, but even 256 bits may be appropriate. Pretty much anything less
than a cache line size is buildable.


# 7720:65d338a8dba4 31-Oct-2010 Gabe Black <gblack@eecs.umich.edu>

ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.



This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.


# 7598:c0ae58952ed0 23-Aug-2010 Min Kyu Jeong <minkyu.jeong@arm.com>

O3: Handle loads when the destination is the PC.
For loads that PC is the destination, check if the load
was mispredicted again when the value being loaded returns from memory


# 7520:67c670459d01 13-Aug-2010 Gabe Black <gblack@eecs.umich.edu>

CPU: Add readBytes and writeBytes functions to the exec contexts.


# 7511:bd104adbf04d 22-Jul-2010 Timothy M. Jones <tjones1@inf.ed.ac.uk>

LSQ Unit: After deleting part of a split request, set it to NULL so that it
isn't accidentally deleted again later (causing a segmentation fault).


# 7509:3bd51d6ac9ef 22-Jul-2010 Timothy M. Jones <tjones1@inf.ed.ac.uk>

O3CPU: Fix a bug where stores in the cpu where never marked as split.


# 6974:4d4903a3e7c5 12-Feb-2010 Timothy M. Jones <tjones1@inf.ed.ac.uk>

O3PCU: Split loads and stores that cross cache line boundaries.

When each load or store is sent to the LSQ, we check whether it will cross a
cache line boundary and, if so, split it in two. This creates two TLB
translations and two memory requests. Care has to be taken if the first
packet of a split load is sent but the second blocks the cache. Similarly,
for a store, if the first packet cannot be sent, we must store the second
one somewhere to retry later.

This modifies the LSQSenderState class to record both packets in a split
load or store.

Finally, a new const variable, HasUnalignedMemAcc, is added to each ISA
to indicate whether unaligned memory accesses are allowed. This is used
throughout the changed code so that compiler can optimise away code dealing
with split requests for ISAs that don't need them.


# 6658:f4de76601762 23-Sep-2009 Nathan Binkert <nate@binkert.org>

arch: nuke arch/isa_specific.hh and move stuff to generated config/the_isa.hh


# 6221:58a3c04e6344 26-May-2009 Nathan Binkert <nate@binkert.org>

types: add a type for thread IDs and try to use it everywhere


# 6102:7fbf97dc6540 20-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

Mem: Change isLlsc to isLLSC.


# 6076:e141cc7896ce 19-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

Memory: Rename LOCKED for load locked store conditional to LLSC.


# 5999:3cf8e71257e0 05-Mar-2009 Nathan Binkert <nate@binkert.org>

stats: Fix all stats usages to deal with template fixes


# 5606:6da7a58b0bc8 09-Oct-2008 Nathan Binkert <nate@binkert.org>

eventq: convert all usage of events to use the new API.
For now, there is still a single global event queue, but this is
necessary for making the steps towards a parallelized m5.


# 5529:9ae69b9cd7fd 11-Aug-2008 Nathan Binkert <nate@binkert.org>

params: Convert the CPU objects to use the auto generated param structs.
A whole bunch of stuff has been converted to use the new params stuff, but
the CPU wasn't one of them. While we're at it, make some things a bit
more stylish. Most of the work was done by Gabe, I just cleaned stuff up
a bit more at the end.


# 5386:5614618f4027 24-Mar-2008 Steve Reinhardt <stever@gmail.com>

Don't FastAlloc MSHRs since we don't allocate them on the fly.


# 5336:c7e21f4e5a2e 06-Feb-2008 Stephen Hines <hines@cs.fsu.edu>

Make the Event::description() a const function


# 4878:5b747482d2d8 30-Jun-2007 Steve Reinhardt <stever@eecs.umich.edu>

Make CPU models use new LoadLockedReq/StoreCondReq commands.


# 4870:fcc39d001154 30-Jun-2007 Steve Reinhardt <stever@eecs.umich.edu>

Get rid of Packet result field. Error responses are
now encoded in cmd field.


# 4395:9acb011a6c35 21-Apr-2007 Ali Saidi <saidi@eecs.umich.edu>

fixes for solaris compile


# 4332:548ef28989b8 04-Apr-2007 Gabe Black <gblack@eecs.umich.edu>

Merge zizzer.eecs.umich.edu:/bk/newmem
into ahchoo.blinky.homelinux.org:/home/gblack/m5/newmem-o3-spec


# 4329:52057dbec096 04-Apr-2007 Kevin Lim <ktlim@umich.edu>

Pass ISA-specific O3 CPU as a constructor parameter instead of using setCPU functions.

src/cpu/o3/alpha/cpu_impl.hh:
Pass ISA-specific O3 CPU to FullO3CPU as a constructor parameter instead of using setCPU functions.


# 4326:a9277254c1e4 03-Apr-2007 Gabe Black <gblack@eecs.umich.edu>

Made the "data" field of store queue entries into a character array. It's sized to match an IntReg which was what it used to be, but we might want to make it something architecture independent. All data is now endian converted before entering the store queue entries which simplifies store to load forwarding in "trans endian" simulations, and makes twin memory ops work.

src/cpu/o3/lsq_unit.hh:
src/cpu/o3/lsq_unit_impl.hh:
fixed twin memory operations.


# 4284:c8800319ed0c 23-Mar-2007 Kevin Lim <ktlim@umich.edu>

Merge ktlim@zizzer:/bk/newmem
into zamp.eecs.umich.edu:/z/ktlim2/clean/tmp/clean2

src/cpu/base_dyn_inst.hh:
Hand merge. Line is no longer needed because it's handled in the ISA.


# 4032:8b987a6a2afc 23-Mar-2007 Kevin Lim <ktlim@umich.edu>

Two fixes:
1. Requests are handled more properly now. They assume the memory system takes control of the request upon sending out an access.
2. load-load ordering is maintained.

src/cpu/base_dyn_inst.hh:
Update how requests are handled. The BaseDynInst should not be able to hold a pointer to the request because the request becomes owned by the memory system once it is sent out.

Also include some functions to allow certain status bits to be cleared.
src/cpu/base_dyn_inst_impl.hh:
Update how requests are handled. The BaseDynInst should not be able to hold a pointer to the request because the request becomes owned by the memory system once it is sent out.
src/cpu/o3/fetch_impl.hh:
General correctness fixes. retryPkt is not necessarily always set, so handle it properly. Also consider the cache unblocked only when recvRetry is called.
src/cpu/o3/lsq_unit.hh:
Handle requests a little more correctly. Now that the requests aren't pointed to by the DynInst, be sure to delete the request if it's not being used by the memory system.

Also be sure to not store-load forward from an uncacheable store.
src/cpu/o3/lsq_unit_impl.hh:
Check to make sure load-load ordering was maintained.

Also handle requests a little more correctly.


# 4022:c422464ca16e 07-Feb-2007 Steve Reinhardt <stever@eecs.umich.edu>

Make memory commands dense again to avoid cache stat table explosion.
Created MemCmd class to wrap enum and provide handy methods to
check attributes, convert to string/int, etc.


# 3970:d54945bab95d 03-Jan-2007 Gabe Black <gblack@eecs.umich.edu>

Merge zizzer:/bk/newmem
into zower.eecs.umich.edu:/eecshome/m5/newmem


# 3876:127c71cfe21a 26-Dec-2006 Kevin Lim <ktlim@umich.edu>

Remove some #if FULL_SYSTEMs so MP stuff works even in SE mode.


# 3803:031d9d1b3924 16-Dec-2006 Gabe Black <gblack@eecs.umich.edu>

Switch the endianness of data that's forwarded. This is the same sort of problem that was happening when stores went all the way to memory and back.


# 3411:07ea0d74b798 23-Oct-2006 Kevin Lim <ktlim@umich.edu>

Merge ktlim@zizzer:/bk/newmem
into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem


# 3349:fec4a86fa212 20-Oct-2006 Nathan Binkert <binkertn@umich.edu>

Use PacketPtr everywhere


# 3348:11f6ef023158 20-Oct-2006 Nathan Binkert <binkertn@umich.edu>

refactor code for the packet, get rid of packet_impl.hh
and call it packet_access.hh and fix the #includes so
things compile right.


# 3326:d9cc6bae9d77 23-Oct-2006 Kevin Lim <ktlim@umich.edu>

Add in support for LL/SC in the O3 CPU. Needs to be fully tested.

src/cpu/base_dyn_inst.hh:
Extend BaseDynInst a little bit so it can be use as a TC as well (specifically for ll/sc code).
src/cpu/base_dyn_inst_impl.hh:
Add variable to track if the result of the instruction should be recorded.
src/cpu/o3/alpha/cpu_impl.hh:
Clear lock flag upon hwrei.
src/cpu/o3/lsq_unit.hh:
Use ISA specified handling of locked reads.
src/cpu/o3/lsq_unit_impl.hh:
Use ISA specified handling of locked writes.


# 3230:e86a03911728 09-Oct-2006 Kevin Lim <ktlim@umich.edu>

Merge ktlim@zizzer:/bk/newmem
into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem

src/cpu/memtest/memtest.cc:
src/cpu/memtest/memtest.hh:
src/cpu/simple/timing.hh:
tests/configs/o3-timing-mp.py:
Hand merge.


# 3228:f47f69e61ded 09-Oct-2006 Kevin Lim <ktlim@umich.edu>

Be sure to delete packet and sender state if the cache is blocked.

src/cpu/o3/lsq_unit.hh:
Be sure to delete data if the cache is blocked.


# 3221:669a04468c0d 08-Oct-2006 Kevin Lim <ktlim@umich.edu>

Updates to O3 CPU. It should now work in FS mode, although sampling still has a bug.

src/cpu/o3/commit_impl.hh:
Fixes for compile and sampling.
src/cpu/o3/cpu.cc:
Deallocate and activate threads properly. Also hopefully fix being able to use caches while switching over.
src/cpu/o3/cpu.hh:
Fixes for deallocating and activating threads.
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/lsq_unit.hh:
Handle getting back a BadAddress result from the access.
src/cpu/o3/iew_impl.hh:
More debug output.
src/cpu/o3/lsq_unit_impl.hh:
Fixup store conditional handling (still a bit of a hack, but works now).

Also handle getting back a BadAddress result from the access.
src/cpu/o3/thread_context_impl.hh:
Deallocate context now records if the context should be fully removed.


# 3172:2c84db071850 08-Oct-2006 Steve Reinhardt <stever@eecs.umich.edu>

Replace tests of LOCKED/UNCACHEABLE flags with isLocked()/isUncacheable().


# 3126:756092c6383c 02-Oct-2006 Kevin Lim <ktlim@umich.edu>

Updates to fix merge issues and bring almost everything up to working speed. Ozone CPU remains untested, but everything else compiles and runs.

src/arch/alpha/isa_traits.hh:
This got changed to the wrong version by accident.
src/cpu/base.cc:
Fix up progress event to not schedule itself if the interval is set to 0.
src/cpu/base.hh:
Fix up the CPU Progress Event to not print itself if it's set to 0. Also remove stats_reset_inst (something I added to m5 but isn't necessary here).
src/cpu/base_dyn_inst.hh:
src/cpu/checker/cpu.hh:
Remove float variable of instResult; it's always held within the double part now.
src/cpu/checker/cpu_impl.hh:
Use thread and not cpuXC.
src/cpu/o3/alpha/cpu_builder.cc:
src/cpu/o3/checker_builder.cc:
src/cpu/ozone/checker_builder.cc:
src/cpu/ozone/cpu_builder.cc:
src/python/m5/objects/BaseCPU.py:
Remove stats_reset_inst.
src/cpu/o3/commit_impl.hh:
src/cpu/ozone/lw_back_end_impl.hh:
Get TC, not XCProxy.
src/cpu/o3/cpu.cc:
Switch out updates from the version of m5 I have. Also remove serialize code that got added twice.
src/cpu/o3/iew_impl.hh:
src/cpu/o3/lsq_impl.hh:
src/cpu/thread_state.hh:
Remove code that was added twice.
src/cpu/o3/lsq_unit.hh:
Add back in stats that got lost in the merge.
src/cpu/o3/lsq_unit_impl.hh:
Use proper method to get flags. Also wake CPU if we're coming back from a cache miss.
src/cpu/o3/thread_context_impl.hh:
src/cpu/o3/thread_state.hh:
Support profiling.
src/cpu/ozone/cpu.hh:
Update to use proper typename.
src/cpu/ozone/cpu_impl.hh:
src/cpu/ozone/dyn_inst_impl.hh:
Updates for newmem.
src/cpu/ozone/lw_lsq_impl.hh:
Get flags correctly.
src/cpu/ozone/thread_state.hh:
Reorder constructor initialization, use tc.
src/sim/pseudo_inst.cc:
Allow for loading of symbol file. Be sure to use ThreadContext and not ExecContext.


# 3125:febd811bccc6 30-Sep-2006 Kevin Lim <ktlim@umich.edu>

Merge ktlim@zamp:./local/clean/o3-merge/m5
into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem

configs/boot/micro_memlat.rcS:
configs/boot/micro_tlblat.rcS:
src/arch/alpha/ev5.cc:
src/arch/alpha/isa/decoder.isa:
src/arch/alpha/isa_traits.hh:
src/cpu/base.cc:
src/cpu/base.hh:
src/cpu/base_dyn_inst.hh:
src/cpu/checker/cpu.hh:
src/cpu/checker/cpu_impl.hh:
src/cpu/o3/alpha/cpu_impl.hh:
src/cpu/o3/alpha/params.hh:
src/cpu/o3/checker_builder.cc:
src/cpu/o3/commit_impl.hh:
src/cpu/o3/cpu.cc:
src/cpu/o3/decode_impl.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/iew.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/inst_queue.hh:
src/cpu/o3/lsq.hh:
src/cpu/o3/lsq_impl.hh:
src/cpu/o3/lsq_unit.hh:
src/cpu/o3/lsq_unit_impl.hh:
src/cpu/o3/regfile.hh:
src/cpu/o3/rename_impl.hh:
src/cpu/o3/thread_state.hh:
src/cpu/ozone/checker_builder.cc:
src/cpu/ozone/cpu.hh:
src/cpu/ozone/cpu_impl.hh:
src/cpu/ozone/front_end.hh:
src/cpu/ozone/front_end_impl.hh:
src/cpu/ozone/lw_back_end.hh:
src/cpu/ozone/lw_back_end_impl.hh:
src/cpu/ozone/lw_lsq.hh:
src/cpu/ozone/lw_lsq_impl.hh:
src/cpu/ozone/thread_state.hh:
src/cpu/simple/base.cc:
src/cpu/simple_thread.cc:
src/cpu/simple_thread.hh:
src/cpu/thread_state.hh:
src/dev/ide_disk.cc:
src/python/m5/objects/O3CPU.py:
src/python/m5/objects/Root.py:
src/python/m5/objects/System.py:
src/sim/pseudo_inst.cc:
src/sim/pseudo_inst.hh:
src/sim/system.hh:
util/m5/m5.c:
Hand merge.


# 3014:b4309193255a 16-Aug-2006 Ron Dreslinski <rdreslin@umich.edu>

Fixes for Kevins O3 model to work with the blocking caches.

src/cpu/o3/fetch_impl.hh:
Fix ordering so dereference works
src/cpu/o3/lsq_impl.hh:
Check to make sure we didn't squash already
src/cpu/o3/lsq_unit.hh:
Fix for counting squashed retrys in the WB count
src/cpu/o3/lsq_unit_impl.hh:
Make sure to set retryID for stores, and clear it appropriately


# 2927:62f1518ae800 19-Jul-2006 Kevin Lim <ktlim@umich.edu>

O3CPU fixes.

src/cpu/o3/lsq_unit.hh:
LSQ needs to decrement the WB counter if the load is going to be replayed.
src/cpu/o3/lsq_unit_impl.hh:
LSQ needs to decrement the WB counter if the load is squashed.


# 2907:7b0ababb4166 13-Jul-2006 Kevin Lim <ktlim@umich.edu>

Move Dcache port creation from LSQUnit to LSQ in order to support Ron's recent changes, and using the O3CPU in SMT mode.

src/cpu/o3/lsq.hh:
Update to have LSQ work with only one dcache port for all LSQ Units. LSQ has the dcache port, and the LSQ Units must tell the LSQ if the cache has become blocked.
src/cpu/o3/lsq_impl.hh:
Updates to have the LSQ work with only one dcache port for all LSQUnits.
src/cpu/o3/lsq_unit.hh:
src/cpu/o3/lsq_unit_impl.hh:
Update for LSQ to create dcache port instead of LSQUnits. Now LSQUnits are given the dcache port from the LSQ, and also must check the LSQ if the cache is blocked prior to accessing the cache.


# 2871:7ed5c9ef3eb6 07-Jul-2006 Kevin Lim <ktlim@umich.edu>

Support Ron's changes for hooking up ports.

src/cpu/checker/cpu.hh:
Now that BaseCPU is a MemObject, the checker must define this function.
src/cpu/o3/cpu.cc:
src/cpu/o3/cpu.hh:
src/cpu/o3/fetch.hh:
src/cpu/o3/iew.hh:
src/cpu/o3/lsq.hh:
src/cpu/o3/lsq_unit.hh:
Implement getPort function so the connector can connect the ports properly.
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/lsq_unit_impl.hh:
The connector handles connecting the ports now.
src/python/m5/objects/O3CPU.py:
Add ports to the parameters.


# 2808:a88ea76f6738 27-Jun-2006 Ali Saidi <saidi@eecs.umich.edu>

Make full CPU handle SE faults


# 2790:2f8e9762bee9 22-Jun-2006 Kevin Lim <ktlim@umich.edu>

Misc fixes.

src/cpu/o3/alpha_dyn_inst_impl.hh:
Consolidate these calls into one.
src/cpu/o3/commit_impl.hh:
Include checker only if it's being used.
src/cpu/o3/fetch_impl.hh:
Do not deallocate request if it's a squashed response that was received.
src/cpu/o3/lsq_unit.hh:
Add in comment.
src/cpu/o3/lsq_unit_impl.hh:
Only include checker if it's being used.


# 2733:e0eac8fc5774 16-Jun-2006 Kevin Lim <ktlim@umich.edu>

Two updates that got combined into one ChangeSet accidentally. They're both pretty simple so they shouldn't cause any trouble.

First: Rename FullCPU and its variants in the o3 directory to O3CPU to differentiate from the old model, and also to specify it's an out of order model.

Second: Include build options for selecting the Checker to be used. These options make sure if the Checker is being used there is a CPU that supports it also being compiled.

SConstruct:
Add in option USE_CHECKER to allow for not compiling in checker code. The checker is enabled through this option instead of through the CPU_MODELS list. However it's still necessary to treat the Checker like a CPU model, so it is appended onto the CPU_MODELS list if enabled.
configs/test/test.py:
Name change for DetailedCPU to DetailedO3CPU. Also include option for max tick.
src/base/traceflags.py:
Add in O3CPU trace flag.
src/cpu/SConscript:
Rename AlphaFullCPU to AlphaO3CPU.

Only include checker sources if they're necessary. Also add a list of CPUs that support the Checker, and only allow the Checker to be compiled in if one of those CPUs are also being included.
src/cpu/base_dyn_inst.cc:
src/cpu/base_dyn_inst.hh:
Rename typedef to ImplCPU instead of FullCPU, to differentiate from the old FullCPU.
src/cpu/cpu_models.py:
src/cpu/o3/alpha_cpu.cc:
src/cpu/o3/alpha_cpu.hh:
src/cpu/o3/alpha_cpu_builder.cc:
src/cpu/o3/alpha_cpu_impl.hh:
Rename AlphaFullCPU to AlphaO3CPU to differentiate from old FullCPU model.
src/cpu/o3/alpha_dyn_inst.hh:
src/cpu/o3/alpha_dyn_inst_impl.hh:
src/cpu/o3/alpha_impl.hh:
src/cpu/o3/alpha_params.hh:
src/cpu/o3/commit.hh:
src/cpu/o3/cpu.hh:
src/cpu/o3/decode.hh:
src/cpu/o3/decode_impl.hh:
src/cpu/o3/fetch.hh:
src/cpu/o3/iew.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/inst_queue.hh:
src/cpu/o3/lsq.hh:
src/cpu/o3/lsq_impl.hh:
src/cpu/o3/lsq_unit.hh:
src/cpu/o3/regfile.hh:
src/cpu/o3/rename.hh:
src/cpu/o3/rename_impl.hh:
src/cpu/o3/rob.hh:
src/cpu/o3/rob_impl.hh:
src/cpu/o3/thread_state.hh:
src/python/m5/objects/AlphaO3CPU.py:
Rename FullCPU to O3CPU to differentiate from old FullCPU model.
src/cpu/o3/commit_impl.hh:
src/cpu/o3/cpu.cc:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/lsq_unit_impl.hh:
Rename FullCPU to O3CPU to differentiate from old FullCPU model.
Also #ifdef the checker code so it doesn't need to be included if it's not selected.


# 2731:822b96578fba 14-Jun-2006 Kevin Lim <ktlim@umich.edu>

Minor code cleanup of BaseDynInst.

src/cpu/base_dyn_inst.cc:
src/cpu/base_dyn_inst.hh:
Minor code cleanup by putting several bools into a bitset instead.
src/cpu/o3/commit_impl.hh:
src/cpu/o3/decode_impl.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/inst_queue_impl.hh:
src/cpu/o3/lsq_unit.hh:
src/cpu/o3/lsq_unit_impl.hh:
src/cpu/o3/rename_impl.hh:
src/cpu/o3/rob_impl.hh:
Changed around some things in BaseDynInst.


# 2727:91e17c7ee622 13-Jun-2006 Kevin Lim <ktlim@umich.edu>

Minor updates for stats.

src/cpu/o3/commit_impl.hh:
src/cpu/o3/fetch.hh:
Update stats comments.
src/cpu/o3/fetch_impl.hh:
Differentiate stats.
src/cpu/o3/iew.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/inst_queue.hh:
src/cpu/o3/inst_queue_impl.hh:
Update for stats.
src/cpu/o3/lsq.hh:
LSQ now has stats.
src/cpu/o3/lsq_impl.hh:
Register stats of all LSQ units.
src/cpu/o3/lsq_unit.hh:
src/cpu/o3/lsq_unit_impl.hh:
Add in stats.


# 2698:d5f35d41e017 09-Jun-2006 Kevin Lim <ktlim@umich.edu>

Removing of old code and adding in new comments.

src/cpu/base_dyn_inst.cc:
Clean up old functions, comments.
src/cpu/o3/alpha_cpu_builder.cc:
src/cpu/o3/alpha_params.hh:
src/cpu/o3/cpu.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/iew.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/lsq.hh:
src/cpu/o3/lsq_impl.hh:
src/cpu/o3/rename_impl.hh:
src/cpu/ozone/lsq_unit.hh:
src/cpu/ozone/lsq_unit_impl.hh:
Remove old commented code.
src/cpu/o3/fetch.hh:
Remove old commented code, add in comments.
src/cpu/o3/inst_queue_impl.hh:
Move comment to better place.
src/cpu/o3/lsq_unit.hh:
Remove old commented code, add in new comments.
src/cpu/o3/lsq_unit_impl.hh:
Remove old commented code, rename variable.


# 2693:18c6be231eb1 09-Jun-2006 Kevin Lim <ktlim@umich.edu>

Fixes for some outstanding issues in the LSQ. It should now be able to retry. It should also be able to handle LL/SC (through hacks) for the UP case.

src/cpu/o3/lsq_unit.hh:
Handle being able to retry (untested but hopefully very close to working).

Handle lock flag for LL/SC hack. Hopefully the memory system will add in LL/SC soon.

Better output message.
src/cpu/o3/lsq_unit_impl.hh:
Handle being able to retry (untested but should be very close to working).

Make SC's work (hopefully) while the memory system doesn't have a LL/SC implementation.


# 2689:dbf969c18a65 07-Jun-2006 Kevin Lim <ktlim@umich.edu>

Update copyright.


# 2678:1f86b91dc3bb 05-Jun-2006 Kevin Lim <ktlim@umich.edu>

Fixes to get new CPU model working for simple test case. The CPU does not yet support retrying accesses.

src/cpu/base_dyn_inst.cc:
Delete the allocated data in destructor.
src/cpu/base_dyn_inst.hh:
Only copy the addresses if the translation succeeded.
src/cpu/o3/alpha_cpu.hh:
Return actual translating port.
Don't panic on setNextNPC() as it's always called, regardless of the architecture, when the process initializes.
src/cpu/o3/alpha_cpu_impl.hh:
Pass in memobject to the thread state in SE mode.
src/cpu/o3/commit_impl.hh:
Initialize all variables.
src/cpu/o3/decode_impl.hh:
Handle early resolution of branches properly.
src/cpu/o3/fetch.hh:
Switch structure back to requests.
src/cpu/o3/fetch_impl.hh:
Initialize all variables, create/delete requests properly.
src/cpu/o3/lsq_unit.hh:
Include sender state along with the packet. Also include a more generic writeback event that's only used for stores forwarding data to loads.
src/cpu/o3/lsq_unit_impl.hh:
Redo writeback code to support the response path of the memory system.
src/cpu/o3/mem_dep_unit.cc:
src/cpu/o3/mem_dep_unit_impl.hh:
Wrap variables in #ifdefs.
src/cpu/o3/store_set.cc:
Include to get panic() function.
src/cpu/o3/thread_state.hh:
Create with MemObject as well.
src/cpu/thread_state.hh:
Have a translating port in the thread state object.
src/python/m5/objects/AlphaFullCPU.py:
Mem parameter no longer needed.


# 2674:6d4afef73a20 04-Jun-2006 Kevin Lim <ktlim@umich.edu>

Merge ktlim@zamp:/z/ktlim2/clean/m5-o3
into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-merge

src/cpu/checker/o3_cpu_builder.cc:
src/cpu/o3/alpha_cpu.hh:
src/cpu/o3/alpha_cpu_impl.hh:
src/cpu/o3/alpha_dyn_inst_impl.hh:
src/cpu/o3/bpred_unit.cc:
src/cpu/o3/commit.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/lsq_unit.hh:
src/cpu/o3/lsq_unit_impl.hh:
src/cpu/o3/thread_state.hh:
Hand merge.


# 2669:f2b336e89d2a 02-Jun-2006 Kevin Lim <ktlim@umich.edu>

Fixes to get compiling to work. This is mainly fixing up some includes; changing functions within the XCs; changing MemReqPtrs to Requests or Packets where appropriate.

Currently the O3 and Ozone CPUs do not work in the new memory system; I still need to fix up the ports to work and handle responses properly. This check-in is so that the merge between m5 and newmem is no longer outstanding.

src/SConscript:
Need to include FU Pool for new CPU model. I'll try to figure out a cleaner way to handle this in the future.
src/base/traceflags.py:
Include new traces flags, fix up merge mess up.
src/cpu/SConscript:
Include the base_dyn_inst.cc as one of othe sources.
Don't compile the Ozone CPU for now.
src/cpu/base.cc:
Remove an extra } from the merge.
src/cpu/base_dyn_inst.cc:
Fixes to make compiling work. Don't instantiate the OzoneCPU for now.
src/cpu/base_dyn_inst.hh:
src/cpu/o3/2bit_local_pred.cc:
src/cpu/o3/alpha_cpu_builder.cc:
src/cpu/o3/alpha_cpu_impl.hh:
src/cpu/o3/alpha_dyn_inst.hh:
src/cpu/o3/alpha_params.hh:
src/cpu/o3/bpred_unit.cc:
src/cpu/o3/btb.hh:
src/cpu/o3/commit.hh:
src/cpu/o3/commit_impl.hh:
src/cpu/o3/cpu.cc:
src/cpu/o3/cpu.hh:
src/cpu/o3/fetch.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/free_list.hh:
src/cpu/o3/iew.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/inst_queue.hh:
src/cpu/o3/inst_queue_impl.hh:
src/cpu/o3/regfile.hh:
src/cpu/o3/sat_counter.hh:
src/cpu/op_class.hh:
src/cpu/ozone/cpu.hh:
src/cpu/checker/cpu.cc:
src/cpu/checker/cpu.hh:
src/cpu/checker/exec_context.hh:
src/cpu/checker/o3_cpu_builder.cc:
src/cpu/ozone/cpu_impl.hh:
src/mem/request.hh:
src/cpu/o3/fu_pool.hh:
src/cpu/o3/lsq.hh:
src/cpu/o3/lsq_unit.hh:
src/cpu/o3/lsq_unit_impl.hh:
src/cpu/o3/thread_state.hh:
src/cpu/ozone/back_end.hh:
src/cpu/ozone/dyn_inst.cc:
src/cpu/ozone/dyn_inst.hh:
src/cpu/ozone/front_end.hh:
src/cpu/ozone/inorder_back_end.hh:
src/cpu/ozone/lw_back_end.hh:
src/cpu/ozone/lw_lsq.hh:
src/cpu/ozone/ozone_impl.hh:
src/cpu/ozone/thread_state.hh:
Fixes to get compiling to work.
src/cpu/o3/alpha_cpu.hh:
Fixes to get compiling to work.
Float reg accessors have changed, as well as MemReqPtrs to RequestPtrs.
src/cpu/o3/alpha_dyn_inst_impl.hh:
Fixes to get compiling to work.
Pass in the packet to the completeAcc function.
Fix up syscall function.