History log of /gem5/src/cpu/simple_thread.cc
Revision Date Author Comments
# 13954:2f400a5f2627 07-Jul-2017 Giacomo Gabrielli <giacomo.gabrielli@arm.com>

cpu,mem: Add support for partial loads/stores and wide mem. accesses

This changeset adds support for partial (or masked) loads/stores, i.e.
loads/stores that can disable accesses to individual bytes within the
target address range. In addition, this changeset extends the code to
crack memory accesses across most CPU models (TimingSimpleCPU still
TBD), so that arbitrarily wide memory accesses are supported. These
changes are required for supporting ISAs with wide vectors.

Additional authors:
- Gabor Dozsa <gabor.dozsa@arm.com>
- Tiago Muck <tiago.muck@arm.com>

Change-Id: Ibad33541c258ad72925c0b1d5abc3e5e8bf92d92
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13518
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 13910:d5deee7b4279 28-Apr-2019 Gabe Black <gabeblack@google.com>

cpu: alpha: Delete all occurrances of the simPalCheck function.

This is now handled within the ISA description.

Change-Id: Ie409bb46d102e59d4eb41408d9196fe235626d32
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18434
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>


# 13908:6ab98c626b06 27-Apr-2019 Gabe Black <gabeblack@google.com>

cpu: Remove hwrei from the generic interfaces.

This mechanism is specific to Alpha and doesn't belong sprinkled around
the CPU's generic mechanisms.

Change-Id: I87904d1a08df2b03eb770205e2c4b94db25201a1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18432
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>


# 13865:cca49fc49c57 13-Apr-2019 Gabe Black <gabeblack@google.com>

cpu: Eliminate the ProxyThreadContext class.

Replace it with direct inheritance from the ThreadContext class in the
SimpleThread class which was the only place it was used.

Also take the opportunity to use some specialized types instead of
ints, etc., add some consts, and fix some style issues.

Change-Id: I5d2cfa87b20dc43615e33e6755c9d016564e9c0e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18048
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>


# 13759:9941fca869a9 16-Oct-2018 Giacomo Gabrielli <giacomo.gabrielli@arm.com>

arch-arm,cpu: Add initial support for Arm SVE

This changeset adds initial support for the Arm Scalable Vector Extension
(SVE) by implementing:
- support for most data-processing instructions (no loads/stores yet);
- basic system-level support.

Additional authors:
- Javier Setoain <javier.setoain@arm.com>
- Gabor Dozsa <gabor.dozsa@arm.com>
- Giacomo Travaglini <giacomo.travaglini@arm.com>

Thanks to Pau Cabre for his contribution of bugfixes.

Change-Id: I1808b5ff55b401777eeb9b99c9a1129e0d527709
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13515
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>


# 12406:86bde4a026b5 22-Dec-2017 Gabe Black <gabeblack@google.com>

arch,cpu: "virtualize" the TLB interface.

CPUs have historically instantiated the architecture specific version
of the TLBs to avoid a virtual function call, making them a little bit
more dependent on what the current ISA is. Some simple performance
measurement, the x86 twolf regression on the atomic CPU, shows that
there isn't actually any performance benefit, and if anything the
simulator goes slightly faster (although still within margin of error)
when the TLB functions are virtual.

This change switches everything outside of the architectures themselves
to use the generic BaseTLB type, and then inside the ISA for them to
cast that to their architecture specific type to call into architecture
specific interfaces.

The ARM TLB needed the most adjustment since it was using non-standard
translation function signatures. Specifically, they all took an extra
"type" parameter which defaulted to normal, and translateTiming
returned a Fault. translateTiming actually doesn't need to return a
Fault because everywhere that consumed it just stored it into a
structure which it then deleted(?), and the fault is stored in the
Translation object when the translation is done.

A little more work is needed to fully obviate the arch/tlb.hh header,
so the TheISA::TLB type is still visible outside of the ISAs.
Specifically, the TlbEntry type is used in the generic PageTable which
lives in src/mem.

Change-Id: I51b68ee74411f9af778317eff222f9349d2ed575
Reviewed-on: https://gem5-review.googlesource.com/6921
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>


# 12181:2150eff234c1 25-Aug-2017 Gabe Black <gabeblack@google.com>

stats: Get rid of some kernel stats related cruft.

The kernel stat mechanism should really be refactored and moved somewhere
else, but in the mean time there's some old cruft that can be cleared away.

Change-Id: I21e725de590dda0d20bf3bc675bbe976c7b1bd86
Reviewed-on: https://gem5-review.googlesource.com/4600
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>


# 11793:ef606668d247 09-Nov-2016 Brandon Potter <brandon.potter@amd.com>

style: [patch 1/22] use /r/3648/ to reorganize includes


# 11627:fe32a5238754 13-Sep-2016 Michael LeBeane <michael.lebeane@amd.com>

sim: Refactor quiesce and remove FS asserts
The quiesce family of magic ops can be simplified by the inclusion of
quiesceTick() and quiesce() functions on ThreadContext. This patch also
gets rid of the FS guards, since suspending a CPU is also a valid
operation for SE mode.


# 11359:b0b976a1ceda 27-Nov-2015 Andreas Sandberg <andreas@sandberg.pp.se>

base: Add support for changing output directories

This changeset adds support for changing the simulator output
directory. This can be useful when the simulation goes through several
stages (e.g., a warming phase, a simulation phase, and a verification
phase) since it allows the output from each stage to be located in a
different directory. Relocation is done by calling core.setOutputDir()
from Python or simout.setOutputDirectory() from C++.

This change affects several parts of the design of the gem5's output
subsystem. First, files returned by an OutputDirectory instance (e.g.,
simout) are of the type OutputStream instead of a std::ostream. This
allows us to do some more book keeping and control re-opening of files
when the output directory is changed. Second, new subdirectories are
OutputDirectory instances, which should be used to create files in
that sub-directory.

Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se>
[sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version]
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>


# 10905:a6ca6831e775 07-Jul-2015 Andreas Sandberg <andreas.sandberg@arm.com>

sim: Refactor the serialization base class

Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.

* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.

* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).

* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.

* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.


# 10537:47fe87b0cf97 14-Nov-2014 Andreas Hansson <andreas.hansson@arm.com>

arm: Fixes based on UBSan and static analysis

Another churn to clean up undefined behaviour, mostly ARM, but some
parts also touching the generic part of the code base.

Most of the fixes are simply ensuring that proper intialisation. One
of the more subtle changes is the return type of the sign-extension,
which is changed to uint64_t. This is to avoid shifting negative
values (undefined behaviour) in the ISA code.


# 10407:a9023811bf9e 20-Sep-2014 Mitch Hayenga <mitch.hayenga@arm.com>

alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate

activate(), suspend(), and halt() used on thread contexts had an optional
delay parameter. However this parameter was often ignored. Also, when used,
the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were
ever specified). This patch removes the delay parameter and 'Events'
associated with them across all ISAs and cores. Unused activate logic
is also removed.


# 10319:4207f9bfcceb 03-Sep-2014 Andreas Sandberg <Andreas.Sandberg@ARM.com>

arch, cpu: Factor out the ExecContext into a proper base class

We currently generate and compile one version of the ISA code per CPU
model. This is obviously wasting a lot of resources at compile
time. This changeset factors out the interface into a separate
ExecContext class, which also serves as documentation for the
interface between CPUs and the ISA code. While doing so, this
changeset also fixes up interface inconsistencies between the
different CPU models.

The main argument for using one set of ISA code per CPU model has
always been performance as this avoid indirect branches in the
generated code. However, this argument does not hold water. Booting
Linux on a simulated ARM system running in atomic mode
(opt/10.linux-boot/realview-simple-atomic) is actually 2% faster
(compiled using clang 3.4) after applying this patch. Additionally,
compilation time is decreased by 35%.


# 9478:ba80f7d4f452 22-Jan-2013 Nilay Vaish <nilay@cs.wisc.edu>

x86, cpu: corrects 270c9a75e91f, take over decoder on cpu switch
The changes made by the changeset 270c9a75e91f do not work well with switching
of cpus. The problem is that decoder for the old thread context holds state
that is not taken over by the new decoder.

This patch adds a takeOverFrom() function to Decoder class in each ISA. Except
for x86, functions in other ISAs are blank. For x86, the function copies state
from the old decoder to the new decoder.


# 9461:67a6ba6604c8 12-Jan-2013 Nilay Vaish <nilay@cs.wisc.edu>

x86: Changes to decoder, corrects 9376
The changes made by the changeset 9376 were not quite correct. The patch made
changes to the code which resulted in decoder not getting initialized correctly
when the state was restored from a checkpoint.

This patch adds a startup function to each ISA object. For x86, this function
sets the required state in the decoder. For other ISAs, the function is empty
right now.


# 9441:1133617844c8 07-Jan-2013 Andreas Sandberg <Andreas.Sandberg@ARM.com>

cpu: Fix broken thread context handover

The thread context handover code used to break when multiple handovers
were performed during the same quiesce period. Previously, the thread
contexts would assign the TC pointer in the old quiesce event to the
new TC. This obviously broke in cases where multiple switches were
performed within the same quiesce period, in which case the TC pointer
in the quiesce event would point to an old CPU.

The new implementation deschedules pending quiesce events in the old
TC and schedules a new quiesce event in the new TC. The code has been
refactored to remove most of the code duplication.


# 9428:029dfe6324d3 07-Jan-2013 Andreas Sandberg <Andreas.Sandberg@ARM.com>

cpu: Unify SimpleCPU and O3 CPU serialization code

The O3 CPU used to copy its thread context to a SimpleThread in order
to do serialization. This was a bit of a hack involving two static
SimpleThread instances and a magic constructor that was only used by
the O3 CPU.

This patch moves the ThreadContext serialization code into two global
procedures that, in addition to the normal serialization parameters,
take a ThreadContext reference as a parameter. This allows us to reuse
the serialization code in all ThreadContext implementations.


# 9425:a24092160ec7 07-Jan-2013 Andreas Sandberg <Andreas.Sandberg@ARM.com>

arch: Move the ISA object to a separate section

After making the ISA an independent SimObject, it is serialized
automatically by the Python world. Previously, this just resulted in
an empty ISA section. This patch moves the contents of the ISA to that
section and removes the explicit ISA serialization from the thread
contexts, which makes it behave like a normal SimObject during
serialization.

Note: This patch breaks checkpoint backwards compatibility! Use the
cpt_upgrader.py utility to upgrade old checkpoints to the new format.


# 9384:877293183bdf 07-Jan-2013 Andreas Sandberg <Andreas.Sandberg@arm.com>

arch: Make the ISA class inherit from SimObject

The ISA class on stores the contents of ID registers on many
architectures. In order to make reset values of such registers
configurable, we make the class inherit from SimObject, which allows
us to use the normal generated parameter headers.

This patch introduces a Python helper method, BaseCPU.createThreads(),
which creates a set of ISAs for each of the threads in an SMT
system. Although it is currently only needed when creating
multi-threaded CPUs, it should always be called before instantiating
the system as this is an obvious place to configure ID registers
identifying a thread/CPU.


# 9377:6f294e7a93d1 04-Jan-2013 Gabe Black <gblack@eecs.umich.edu>

Decoder: Remove the thread context get/set from the decoder.

This interface is no longer used, and getting rid of it simplifies the
decoders and code that sets up the decoders. The thread context had been used
to read architectural state which was used to contextualize the instruction
memory as it came in. That was changed so that the state is now sent to the
decoders to keep locally if/when it changes. That's significantly more
efficient.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>


# 9180:ee8d7a51651d 28-Aug-2012 Andreas Hansson <andreas.hansson@arm.com>

Clock: Add a Cycles wrapper class and use where applicable

This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.

Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.

This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.

In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.

An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.


# 9023:e9201a7bce59 26-May-2012 Gabe Black <gblack@eecs.umich.edu>

CPU: Merge the predecoder and decoder.

These classes are always used together, and merging them will give the ISAs
more flexibility in how they cache things and manage the process.


# 8820:f39690f70bab 10-Feb-2012 Gabe Black <gblack@eecs.umich.edu>

SE/FS: Record the system pointer all the time for the simple CPU.

This pointer was only being stored in code that came from SE mode. The system
pointer is always meaningful and available, so it should always be stored.


# 8809:bb10807da889 01-Feb-2012 Gabe Black <gblack@eecs.umich.edu>

Merge with head, hopefully the last time for this batch.


# 8799:dac1e33e07b0 28-Jan-2012 Gabe Black <gblack@eecs.umich.edu>

Merge with the main repo.


# 8793:5f25086326ac 18-Nov-2011 Gabe Black <gblack@eecs.umich.edu>

SE/FS: Get rid of FULL_SYSTEM in the CPU directory.


# 8777:dd43f1c9fa0a 31-Oct-2011 Gabe Black <gblack@eecs.umich.edu>

SE/FS: Make the functions available from the TC consistent between SE and FS.


# 8767:e575781f71b8 30-Oct-2011 Gabe Black <gblack@eecs.umich.edu>

SE/FS: Make getProcessPtr available in both modes, and get rid of FULL_SYSTEMs.


# 8766:b0773af78423 30-Oct-2011 Gabe Black <gblack@eecs.umich.edu>

SE/FS: Build the base process class in FS.


# 8761:20322354b80b 16-Oct-2011 Gabe Black <gblack@eecs.umich.edu>

SE/FS: Build/expose vport in SE mode.


# 8754:0996451df6de 16-Oct-2011 Gabe Black <gblack@eecs.umich.edu>

CPU: Make physPort and getPhysPort available in SE mode.


# 8735:dd20a8139788 31-Jan-2012 Andreas Hansson <andreas.hanson@arm.com>

Thread: Use inherited baseCpu rather than cpu in SimpleThread

This patch is a trivial simplification, removing the cpu pointer from
SimpleThread and relying on the baseCpu pointer in ThreadState. The
patch does not add or change any functionality, it merely cleans up
the code.


# 8706:b1838faf3bcc 17-Jan-2012 Andreas Hansson <andreas.hansson@arm.com>

MEM: Add port proxies instead of non-structural ports

Port proxies are used to replace non-structural ports, and thus enable
all ports in the system to correspond to a structural entity. This has
the advantage of accessing memory through the normal memory subsystem
and thus allowing any constellation of distributed memories, address
maps, etc. Most accesses are done through the "system port" that is
used for loading binaries, debugging etc. For the entities that belong
to the CPU, e.g. threads and thread contexts, they wrap the CPU data
port in a port proxy.

The following replacements are made:
FunctionalPort > PortProxy
TranslatingPort > SETranslatingPortProxy
VirtualPort > FSTranslatingPortProxy


# 7823:dac01f14f20f 08-Jan-2011 Steve Reinhardt <steve.reinhardt@amd.com>

Replace curTick global variable with accessor functions.
This step makes it easy to replace the accessor functions
(which still access a global variable) with ones that access
per-thread curTick values.


# 7723:ee4ac00d0774 08-Nov-2010 Ali Saidi <Ali.Saidi@ARM.com>

sim: Use forward declarations for ports.

Virtual ports need TLB data which means anything touching a file in the arch
directory rebuilds any file that includes system.hh which in everything.


# 7720:65d338a8dba4 31-Oct-2010 Gabe Black <gblack@eecs.umich.edu>

ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.



This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.


# 7680:f4eda002333b 14-Sep-2010 Gabe Black <gblack@eecs.umich.edu>

CPU: Trim unnecessary includes from some common files.

This reduces the scope of those includes and makes it less likely for there to
be a dependency loop. This also moves the hashing functions associated with
ExtMachInst objects to be with the ExtMachInst definitions and out of
utility.hh.


# 7679:f26cc2c68b48 14-Sep-2010 Gabe Black <gblack@eecs.umich.edu>

CPU: Get rid of the now unnecessary getInst/setInst family of functions.

This code is no longer needed because of the preceeding change which adds a
StaticInstPtr parameter to the fault's invoke method, obviating the only use
for this pair of functions.


# 6678:34191eea18c1 17-Oct-2009 Gabe Black <gblack@eecs.umich.edu>

ISA: Fix compilation.


# 6677:b741b3e7164b 15-Oct-2009 Brad Beckmann <Brad.Beckmann@amd.com>

fixed MC146818 checkpointing bug and added isa serialization calls to simple_thread


# 6658:f4de76601762 23-Sep-2009 Nathan Binkert <nate@binkert.org>

arch: nuke arch/isa_specific.hh and move stuff to generated config/the_isa.hh


# 6331:d947798df4a1 09-Jul-2009 Gabe Black <gblack@eecs.umich.edu>

Get rid of the unused get(Data|Inst)Asid and (inst|data)Asid functions.


# 6326:008930a4ace5 09-Jul-2009 Gabe Black <gblack@eecs.umich.edu>

Registers: Eliminate the ISA defined RegFile class.


# 6324:a535b2232c08 09-Jul-2009 Gabe Black <gblack@eecs.umich.edu>

Registers: Move the PCs out of the ISAs and into the CPUs.


# 6316:51f3026d4cbb 09-Jul-2009 Gabe Black <gblack@eecs.umich.edu>

Registers: Eliminate the ISA defined integer register file.


# 6315:c7295a4826d5 09-Jul-2009 Gabe Black <gblack@eecs.umich.edu>

Registers: Eliminate the ISA defined floating point register file.


# 6029:007c36616f47 15-Apr-2009 Steve Reinhardt <steve.reinhardt@amd.com>

Get rid of the Unallocated thread context state.
Basically merge it in with Halted.
Also had to get rid of a few other functions that
called ThreadContext::deallocate(), including:
- InOrderCPU's setThreadRescheduleCondition.
- ThreadContext::exit(). This function was there to avoid terminating
simulation when one thread out of a multi-thread workload exits, but we
need to find a better (non-cpu-centric) way.


# 6022:410194bb3049 09-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

tlb: Don't separate the TLB classes into an instruction TLB and a data TLB


# 5715:e8c1d4e669a7 04-Nov-2008 Lisa Hsu <hsul@eecs.umich.edu>

get rid of all instances of readTid() and getThreadNum(). Unify and eliminate
redundancies with threadId() as their replacement.


# 5714:76abee886def 02-Nov-2008 Lisa Hsu <hsul@eecs.umich.edu>

Add in Context IDs to the simulator. From now on, cpuId is almost never used,
the primary identifier for a hardware context should be contextId(). The
concept of threads within a CPU remains, in the form of threadId() because
sometimes you need to know which context within a cpu to manipulate.


# 5712:199d31b47f7b 02-Nov-2008 Lisa Hsu <hsul@eecs.umich.edu>

make BaseCPU the provider of _cpuId, and cpuId() instead of being scattered
across the subclasses. generally make it so that member data is _cpuId and
accessor functions are cpuId(). The ID val comes from the python (default -1 if
none provided), and if it is -1, the index of cpuList will be given. this has
passed util/regress quick and se.py -n4 and fs.py -n4 as well as standard
switch.


# 5704:98224505352a 21-Oct-2008 Nathan Binkert <nate@binkert.org>

style: Use the correct m5 style for things relating to interrupts.


# 5606:6da7a58b0bc8 09-Oct-2008 Nathan Binkert <nate@binkert.org>

eventq: convert all usage of events to use the new API.
For now, there is still a single global event queue, but this is
necessary for making the steps towards a parallelized m5.


# 5543:3af77710f397 10-Sep-2008 Ali Saidi <saidi@eecs.umich.edu>

style: Remove non-leading tabs everywhere they shouldn't be. Developers should configure their editors to not insert tabs


# 5529:9ae69b9cd7fd 11-Aug-2008 Nathan Binkert <nate@binkert.org>

params: Convert the CPU objects to use the auto generated param structs.
A whole bunch of stuff has been converted to use the new params stuff, but
the CPU wasn't one of them. While we're at it, make some things a bit
more stylish. Most of the work was done by Gabe, I just cleaned stuff up
a bit more at the end.


# 5499:8bfc7650c344 01-Jul-2008 Ali Saidi <saidi@eecs.umich.edu>

Remove delVirtPort() and make getVirtPort() only return cached version.


# 5494:85c8d296c1cb 28-Jun-2008 Steve Reinhardt <stever@gmail.com>

Backed out changeset 94a7bb476fca: caused memory leak.


# 5489:94a7bb476fca 21-Jun-2008 Steve Reinhardt <stever@gmail.com>

Generate more useful error messages for unconnected ports.
Force all non-default ports to provide a name and an
owner in the constructor.


# 5482:7fea9bcd84dd 18-Jun-2008 Nathan Binkert <nate@binkert.org>

ThreadState: Ensure that kernelStats is properly initialized


# 4997:e7380529bd2d 26-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

Address Translation: Make SE mode use an actual TLB/MMU for translation like FS.


# 4400:619191b2f011 16-Apr-2007 Ron Dreslinski <rdreslin@umich.edu>

Fixes for splash, may conflict with Korey's SMT work and doesn't support 03cpu yet.

src/cpu/simple/base.cc:
Cpu's should start as unallocated, not suspended
src/cpu/simple_thread.cc:
Wait for a thread to be assigned to activate the cpu
src/kern/tru64/tru64.hh:
When looking for a open cpu to assign threads, look for an unallocated one, not a suspended one.


# 4217:4c966fec2324 13-Mar-2007 Ali Saidi <saidi@eecs.umich.edu>

fix segfault when peer owner attempts to use functional port


# 3675:dc883b610345 19-Nov-2006 Kevin Lim <ktlim@umich.edu>

Update Virtual and Physical ports.

src/cpu/o3/alpha/cpu_impl.hh:
Handle the PhysicalPort and VirtualPort in the ThreadState.
src/cpu/o3/cpu.cc:
Initialize the thread context.
src/cpu/o3/thread_context.hh:
Add new function to initialize thread context.
src/cpu/o3/thread_context_impl.hh:
Use code now put into function.
src/cpu/simple_thread.cc:
Move code to ThreadState and use the new helper function.
src/cpu/simple_thread.hh:
Remove init() in this derived class; use init() from ThreadState base class.
src/cpu/thread_state.cc:
Move setting up of Physical and Virtual ports here. Change getMemFuncPort() to connectToMemFunc(), which connects a port to a functional port of the memory object below the CPU.
src/cpu/thread_state.hh:
Update functions.


# 3673:34386ba8cb41 17-Nov-2006 Ron Dreslinski <rdreslin@umich.edu>

Make an initialization pass for the thread context and set the [phys,virt]Port correctly

src/cpu/simple/atomic.cc:
src/cpu/simple/timing.cc:
Call the thread context initialization


# 3565:6ad587fb7dfd 07-Nov-2006 Gabe Black <gblack@eecs.umich.edu>

Put kernel_stats back into arch.


# 3548:85e64c82c522 07-Nov-2006 Gabe Black <gblack@eecs.umich.edu>

Moved the switched version of kernel_stats.hh back to kern, and moved the base kernel_stats to base_kernel_stats


# 3490:37a313c96683 02-Nov-2006 Kevin Lim <ktlim@umich.edu>

Merge ktlim@zizzer:/bk/newmem
into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-busfix


# 3486:11b71489efd6 02-Nov-2006 Kevin Lim <ktlim@umich.edu>

More proper handling of the ports.

src/cpu/simple_thread.cc:
Fix up port handling to share code.
src/cpu/thread_state.cc:
Separate code off into a function.
src/cpu/thread_state.hh:
Make a separate function that will get the CPU's memory's functional port.


# 3485:ac89a047e5b6 02-Nov-2006 Kevin Lim <ktlim@umich.edu>

Remove function that should have been deleted.

src/cpu/simple_thread.cc:
This function should have been deleted from an earlier push.
src/cpu/simple_thread.hh:
Delete this function; it's now in thread_state.hh/.cc.


# 3479:4fbcaa81d105 01-Nov-2006 Gabe Black <gblack@eecs.umich.edu>

Merge zizzer.eecs.umich.edu:/bk/newmem/
into zeep.eecs.umich.edu:/home/gblack/m5/newmemmemops


# 3453:c3ce58882751 31-Oct-2006 Gabe Black <gblack@eecs.umich.edu>

Put the Alpha tlb stuff into the AlphaISA namespace, and give the classes more neutral names.


# 3402:db60546818d0 31-Oct-2006 Kevin Lim <ktlim@umich.edu>

Remove mem parameter. Now the translating port asks the CPU's dcache's peer for its MemObject instead of having to have a paramter for the MemObject.

configs/example/fs.py:
configs/example/se.py:
src/cpu/simple/base.cc:
src/cpu/simple/base.hh:
src/cpu/simple/timing.cc:
src/cpu/simple_thread.cc:
src/cpu/simple_thread.hh:
src/cpu/thread_state.cc:
src/cpu/thread_state.hh:
tests/configs/o3-timing-mp.py:
tests/configs/o3-timing.py:
tests/configs/simple-atomic-mp.py:
tests/configs/simple-atomic.py:
tests/configs/simple-timing-mp.py:
tests/configs/simple-timing.py:
tests/configs/tsunami-simple-atomic-dual.py:
tests/configs/tsunami-simple-atomic.py:
tests/configs/tsunami-simple-timing-dual.py:
tests/configs/tsunami-simple-timing.py:
No need for mem parameter any more.
src/cpu/checker/cpu.cc:
Use new constructor for simple thread (no more MemObject parameter).
src/cpu/checker/cpu.hh:
Remove MemObject parameter.
src/cpu/memtest/memtest.hh:
Ports now take in their MemObject owner.
src/cpu/o3/alpha/cpu_builder.cc:
Remove mem parameter.
src/cpu/o3/alpha/cpu_impl.hh:
Remove memory parameter and clean up handling of TranslatingPort.
src/cpu/o3/cpu.cc:
src/cpu/o3/cpu.hh:
src/cpu/o3/fetch.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/mips/cpu_builder.cc:
src/cpu/o3/mips/cpu_impl.hh:
src/cpu/o3/params.hh:
src/cpu/o3/thread_state.hh:
src/cpu/ozone/cpu.hh:
src/cpu/ozone/cpu_builder.cc:
src/cpu/ozone/cpu_impl.hh:
src/cpu/ozone/front_end.hh:
src/cpu/ozone/front_end_impl.hh:
src/cpu/ozone/lw_lsq.hh:
src/cpu/ozone/lw_lsq_impl.hh:
src/cpu/ozone/simple_params.hh:
src/cpu/ozone/thread_state.hh:
src/cpu/simple/atomic.cc:
Remove memory parameter.


# 3125:febd811bccc6 30-Sep-2006 Kevin Lim <ktlim@umich.edu>

Merge ktlim@zamp:./local/clean/o3-merge/m5
into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem

configs/boot/micro_memlat.rcS:
configs/boot/micro_tlblat.rcS:
src/arch/alpha/ev5.cc:
src/arch/alpha/isa/decoder.isa:
src/arch/alpha/isa_traits.hh:
src/cpu/base.cc:
src/cpu/base.hh:
src/cpu/base_dyn_inst.hh:
src/cpu/checker/cpu.hh:
src/cpu/checker/cpu_impl.hh:
src/cpu/o3/alpha/cpu_impl.hh:
src/cpu/o3/alpha/params.hh:
src/cpu/o3/checker_builder.cc:
src/cpu/o3/commit_impl.hh:
src/cpu/o3/cpu.cc:
src/cpu/o3/decode_impl.hh:
src/cpu/o3/fetch_impl.hh:
src/cpu/o3/iew.hh:
src/cpu/o3/iew_impl.hh:
src/cpu/o3/inst_queue.hh:
src/cpu/o3/lsq.hh:
src/cpu/o3/lsq_impl.hh:
src/cpu/o3/lsq_unit.hh:
src/cpu/o3/lsq_unit_impl.hh:
src/cpu/o3/regfile.hh:
src/cpu/o3/rename_impl.hh:
src/cpu/o3/thread_state.hh:
src/cpu/ozone/checker_builder.cc:
src/cpu/ozone/cpu.hh:
src/cpu/ozone/cpu_impl.hh:
src/cpu/ozone/front_end.hh:
src/cpu/ozone/front_end_impl.hh:
src/cpu/ozone/lw_back_end.hh:
src/cpu/ozone/lw_back_end_impl.hh:
src/cpu/ozone/lw_lsq.hh:
src/cpu/ozone/lw_lsq_impl.hh:
src/cpu/ozone/thread_state.hh:
src/cpu/simple/base.cc:
src/cpu/simple_thread.cc:
src/cpu/simple_thread.hh:
src/cpu/thread_state.hh:
src/dev/ide_disk.cc:
src/python/m5/objects/O3CPU.py:
src/python/m5/objects/Root.py:
src/python/m5/objects/System.py:
src/sim/pseudo_inst.cc:
src/sim/pseudo_inst.hh:
src/sim/system.hh:
util/m5/m5.c:
Hand merge.


# 2915:1f4d02556ac1 12-Jul-2006 Kevin Lim <ktlim@umich.edu>

Updates for serialization. As long as the tickEvent doesn't need to be serialized (I don't believe it does because we drain all CPUs prior to checkpointing), it should be feasible to start up from other CPU's checkpoints.

src/cpu/simple/atomic.cc:
src/cpu/simple/atomic.hh:
src/cpu/simple/base.cc:
src/cpu/simple/timing.cc:
src/cpu/simple_thread.cc:
Updates for serialization.


# 2864:eab7ff8f6d72 06-Jul-2006 Kevin Lim <ktlim@umich.edu>

Support serializing and unserializing in the O3 CPU. Also a few small fixes for draining/switching CPUs.

src/cpu/o3/commit_impl.hh:
Fix to clear drainPending variable on call to resume.
src/cpu/o3/cpu.cc:
src/cpu/o3/cpu.hh:
Support serializing and unserializing in the O3 CPU.
src/cpu/o3/lsq_impl.hh:
Be sure to say we have no stores to write back if the active thread list is empty.
src/cpu/simple_thread.cc:
src/cpu/simple_thread.hh:
Slightly change how SimpleThread is used to copy from other ThreadContexts.


# 2862:7bc3562e6405 06-Jul-2006 Kevin Lim <ktlim@umich.edu>

Various serialization changes to make it possible for the O3CPU to checkpoint.

src/arch/alpha/regfile.hh:
Define serialize/unserialize functions on MiscRegFile itself.
src/cpu/o3/regfile.hh:
Remove old commented code.
src/cpu/simple_thread.cc:
src/cpu/simple_thread.hh:
Push common serialization code to ThreadState level. Also allow the SimpleThread to be used for checkpointing by other models.
src/cpu/thread_state.cc:
src/cpu/thread_state.hh:
Move common serialization code into ThreadState.


# 2791:7b2a7e21909b 22-Jun-2006 Kevin Lim <ktlim@umich.edu>

Change ThreadState constructor ordering to match the rest of the ThreadStates.


# 2684:71f3cabf891f 08-Jun-2006 Ali Saidi <saidi@eecs.umich.edu>

add write/read functions that have endian conversions in them
when we get a virtual port delete it (even though delete does nothing in these cases)

src/arch/alpha/linux/system.cc:
src/arch/alpha/stacktrace.cc:
src/base/remote_gdb.cc:
src/cpu/simple_thread.cc:
when we get a virtual port delete it (even though delete does nothing in this case)
src/mem/port.hh:
src/mem/vport.hh:
add write/read functions that have endian conversions in them


# 2683:d6b72bb2ed97 07-Jun-2006 Kevin Lim <ktlim@umich.edu>

Reorganization/renaming of CPUExecContext. Now it is called SimpleThread in order to clear up the confusion due to the many ExecContexts. It also derives from a common ThreadState object, which holds various state common to threads across CPU models.

Following with the previous check-in, ExecContext now refers only to the interface provided to the ISA in order to access CPU state. ThreadContext refers to the interface provided to all objects outside the CPU in order to access thread state. SimpleThread provides all thread state and the interface to access it, and is suitable for simple execution models such as the SimpleCPU.

src/SConscript:
Include thread state file.
src/arch/alpha/ev5.cc:
src/cpu/checker/cpu.cc:
src/cpu/checker/cpu.hh:
src/cpu/checker/thread_context.hh:
src/cpu/memtest/memtest.cc:
src/cpu/memtest/memtest.hh:
src/cpu/o3/cpu.cc:
src/cpu/ozone/cpu_impl.hh:
src/cpu/simple/atomic.cc:
src/cpu/simple/base.cc:
src/cpu/simple/base.hh:
src/cpu/simple/timing.cc:
Rename CPUExecContext to SimpleThread.
src/cpu/base_dyn_inst.hh:
Make thread member variables protected..
src/cpu/o3/alpha_cpu.hh:
src/cpu/o3/cpu.hh:
Make various members of ThreadState protected.
src/cpu/o3/alpha_cpu_impl.hh:
Push generation of TranslatingPort into the CPU itself.
Make various members of ThreadState protected.
src/cpu/o3/thread_state.hh:
Pull a lot of common code into the base ThreadState class.
src/cpu/ozone/thread_state.hh:
Rename CPUExecContext to SimpleThread, move a lot of common code into base ThreadState class.
src/cpu/thread_state.hh:
Push a lot of common code into base ThreadState class. This goes along with renaming CPUExecContext to SimpleThread, and making it derive from ThreadState.
src/cpu/simple_thread.cc:
Rename CPUExecContext to SimpleThread, make it derive from ThreadState. This helps push a lot of common code/state into a single class that can be used by all CPUs.
src/cpu/simple_thread.hh:
Rename CPUExecContext to SimpleThread, make it derive from ThreadState.
src/kern/system_events.cc:
Rename cpu_exec_context to thread_context.
src/sim/process.hh:
Remove unused forward declaration.