History log of /gem5/src/arch/x86/utility.cc
Revision Date Author Comments
# 13611:c8b7847b4171 19-Nov-2018 Gabe Black <gabeblack@google.com>

arch: cpu: Rename *FloatRegBits* to *FloatReg*.

Now that there's no plain FloatReg, there's no reason to distinguish
FloatRegBits with a special suffix since it's the only way to read or
write FP registers.

Change-Id: I3a60168c1d4302aed55223ea8e37b421f21efded
Reviewed-on: https://gem5-review.googlesource.com/c/14460
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>


# 11800:54436a1784dc 09-Nov-2016 Brandon Potter <brandon.potter@amd.com>

style: [patch 3/22] reduce include dependencies in some headers

Used cppclean to help identify useless includes and removed them. This
involved erroneously included headers, but also cases where forward
declarations could have been used rather than a full include.


# 11793:ef606668d247 09-Nov-2016 Brandon Potter <brandon.potter@amd.com>

style: [patch 1/22] use /r/3648/ to reorganize includes


# 11709:f7e79ee7fb4c 19-Nov-2016 Tony Gutierrez <anthony.gutierrez@amd.com>

x86: fix loading/storing of Float80 types


# 11324:31ca646c7685 06-Feb-2016 Steve Reinhardt <steve.reinhardt@amd.com>

x86: create function to check miscreg validity

In the process of trying to get rid of an '== false' comparison,
it became apparent that a slightly more involved solution was
needed. Split this out into its own changeset since it's not
a totally trivial local change like the others.


# 11150:a8a64cca231b 30-Sep-2015 Mitch Hayenga <mitch.hayenga@arm.com>

isa,cpu: Add support for FS SMT Interrupts

Adds per-thread interrupt controllers and thread/context logic
so that interrupts properly get routed in SMT systems.


# 10935:acd48ddd725f 28-Jul-2015 Nilay Vaish <nilay@cs.wisc.edu>

revert 5af8f40d8f2c


# 10934:5af8f40d8f2c 26-Jul-2015 Nilay Vaish <nilay@cs.wisc.edu>

cpu: implements vector registers

This adds a vector register type. The type is defined as a std::array of a
fixed number of uint64_ts. The isa_parser.py has been modified to parse vector
register operands and generate the required code. Different cpus have vector
register files now.


# 10553:c1ad57c53a36 23-Nov-2014 Alexandru Dutu <alexandru.dutu@amd.com>

kvm, x86: Adding support for SE mode execution
This patch adds methods in KvmCPU model to handle KVM exits caused by syscall
instructions and page faults. These types of exits will be encountered if
KvmCPU is run in SE mode.


# 10407:a9023811bf9e 20-Sep-2014 Mitch Hayenga <mitch.hayenga@arm.com>

alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate

activate(), suspend(), and halt() used on thread contexts had an optional
delay parameter. However this parameter was often ignored. Also, when used,
the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were
ever specified). This patch removes the delay parameter and 'Events'
associated with them across all ISAs and cores. Unused activate logic
is also removed.


# 10058:32784c63de81 05-Feb-2014 Andreas Sandberg <andreas@sandberg.pp.se>

x86: Fix x87 state transfer bug

Changeset 7274310be1bb (isa: clean up register constants) increased
the value of NumFloatRegs, which triggered a bug in
X86ISA::copyRegs(). This bug is caused by the x87 stack being copied
twice since register indexes past NUM_FLOATREGS are mapped into the
x87 stack relative to the top of the stack, which is undefined when
the copy takes place.

This changeset updates the copyRegs() function to use access registers
using the non-flattening interface, which guarantees that undesirable
register folding does not happen.


# 10057:09507a45c701 02-Feb-2014 Nikos Nikoleris <nikos.nikoleris@gmail.com>

x86, kvm: Fix bug in the RFlags get and set functions

The getRFlags and setRFlags utility functions were not updated
correctly when condition registers were separated into their own
register class. This lead to incorrect state transfer in calls from
kvm into the simulator (e.g., m5 readfile ended up in an infinite
loop) and when switching CPUs. This patch makes these utility
functions use getCCReg and setCCReg instead of getIntReg and setIntReg
which read and write the integer registers.

Reviewed-by: Andreas Sandberg <andreas@sandberg.pp.se>


# 9921:ee049bfce978 15-Oct-2013 Yasuko Eckert <yasuko.eckert@amd.com>

arch/x86: add support for explicit CC register file

Convert condition code registers from being specialized
("pseudo") integer registers to using the recently
added CC register class.

Nilay Vaish also contributed to this patch.


# 9920:028e4da64b42 15-Oct-2013 Yasuko Eckert <yasuko.eckert@amd.com>

cpu: add a condition-code register class

Add a third register class for condition codes,
in parallel with the integer and FP classes.
No ISAs use the CC class at this point though.


# 9889:2dbc34e3b922 30-Sep-2013 Andreas Sandberg <andreas@sandberg.pp.se>

x86: Add support routines to load and store 80-bit floats

The x87 FPU on x86 supports extended floating point. We currently
handle all floating point on x86 as double and don't support 80-bit
loads/stores. This changeset add a utility function to load and
convert 80-bit floats to doubles (loadFloat80) and another function to
store doubles as 80-bit floats (storeFloat80). Both functions use
libfputils to do the conversion in software. The functions are
currently not used, but are required to handle floating point in KVM
and to properly support all x87 loads/stores.


# 9887:8c3a49bd7423 30-Sep-2013 Andreas Sandberg <andreas@sandberg.pp.se>

x86: Add limited support for extracting function call arguments

Add support for extracting the first 6 64-bit integer argumements to a
function call in X86ISA::getArgument().


# 9880:3fda7e22041b 19-Sep-2013 Andreas Sandberg <andreas@sandberg.pp.se>

x86: Add support routines to convert between x87 tag formats

This changeset adds the convX87XTagsToTags() and convX87TagsToXTags()
which convert between the tag formats in the FTW register and the
format used in the xsave area. The conversion from to the x87 FTW
representation is currently loses some information since it does not
reconstruct the valid/zero/special flags which are not included in the
xsave representation.


# 9765:da0e0df0ba97 18-Jun-2013 Andreas Sandberg <andreas@sandberg.pp.se>

x86: Add support for maintaining the x87 tag word

The current implementation of the x87 never updates the x87 tag
word. This is currently not a big issue since the simulated x87 never
checks for stack overflows, however this becomes an issue when
switching between a virtualized CPU and a simulated CPU. This
changeset adds support, which is enabled by default, for updating the
tag register to every floating point microop that updates the stack
top using the spm mechanism.

The new tag words is generated by the helper function
X86ISA::genX87Tags(). This function is currently limited to flagging a
stack position as valid or invalid and does not try to distinguish
between the valid, zero, and special states.


# 9759:8f1f1bdedf8c 18-Jun-2013 Andreas Sandberg <andreas@sandberg.pp.se>

x86: Add helper functions to access rflags

The rflags register is spread across several different registers. Most
of the flags are stored in MISCREG_RFLAGS, but some are stored in
microcode registers. When accessing RFLAGS, we need to reconstruct it
from these registers. This changeset adds two functions,
X86ISA::getRFlags() and X86ISA::setRFlags(), that take care of this
magic.


# 9751:e039a48eeb99 11-Jun-2013 Andreas Sandberg <andreas@sandberg.pp.se>

x86: Fix bug when copying TSC on CPU handover

The TSC value stored in MISCREG_TSC is actually just an offset from
the current CPU cycle to the actual TSC value. Writes with
side-effects to the TSC subtract the current cycle count before
storing the new value, while reads add the current cycle count. When
switching CPUs, the current value is copied without side-effects. This
works as long as the source and the destination CPUs have the same
clock frequencies. The TSC will jump, sometimes backwards, if they
have different clock frequencies. Most OSes assume the TSC to be
monotonic and break when this happens.

This changeset makes sure that the TSC is copied with side-effects to
ensure that the offset is updated to match the new CPU.


# 9544:1a075d9bc1bc 19-Feb-2013 Andreas Hansson <andreas.hansson@arm.com>

x86: Move APIC clock divider to Python

This patch moves the 16x APIC clock divider to the Python code to
avoid the post-instantiation modifications to the clock. The x86 APIC
was the only object setting the clock after creation time and this
required some custom functionality and configuration. With this patch,
the clock multiplier is moved to the Python code and the objects are
instantiated with the appropriate clock.


# 9423:43caa4ca5979 07-Jan-2013 Andreas Sandberg <Andreas.Sandberg@arm.com>

arch: Add support for invalidating TLBs when draining

This patch adds support for the memInvalidate() drain method. TLB
flushing is requested by calling the virtual flushAll() method on the
TLB.

Note: This patch renames invalidateAll() to flushAll() on x86 and
SPARC to make the interface consistent across all supported
architectures.


# 9180:ee8d7a51651d 28-Aug-2012 Andreas Hansson <andreas.hansson@arm.com>

Clock: Add a Cycles wrapper class and use where applicable

This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.

Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.

This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.

In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.

An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.


# 9157:e0bad9d7bbd6 21-Aug-2012 Andreas Hansson <andreas.hansson@arm.com>

Clock: Move the clock and related functions to ClockedObject

This patch moves the clock of the CPU, bus, and numerous devices to
the new class ClockedObject, that sits in between the SimObject and
MemObject in the class hierarchy. Although there are currently a fair
amount of MemObjects that do not make use of the clock, they
potentially should do so, e.g. the caches should at some point have
the same clock as the CPU, potentially with a 1:n ratio. This patch
does not introduce any new clock objects or object hierarchies
(clusters, clock domains etc), but is still a step in the direction of
having a more structured approach clock domains.

The most contentious part of this patch is the serialisation of clocks
that some of the modules (but not all) did previously. This
serialisation should not be needed as the clock is set through the
parameters even when restoring from the checkpoint. In other words,
the state is "stored" in the Python code that creates the modules.

The nextCycle methods are also simplified and the clock phase
parameter of the CPU is removed (this could be part of a clock object
once they are introduced).


# 8768:314eb1e2fa94 30-Oct-2011 Gabe Black <gblack@eecs.umich.edu>

X86: Get rid of more uses of FULL_SYSTEM.


# 8466:9c754e3022b7 11-Jul-2011 Nilay Vaish<nilay@cs.wisc.edu>

X86: implements copyRegs() function
This patch implements the copyRegs() function for the x86 architecture.
The patch assumes that no side effects other than TLB invalidation need
to be considered while copying the registers. This may not hold true in
future.


# 7811:a8fc35183c10 03-Jan-2011 Steve Reinhardt <steve.reinhardt@amd.com>

Make commenting on close namespace brackets consistent.

Ran all the source files through 'perl -pi' with this script:

s|\s*(};?\s*)?/\*\s*(end\s*)?namespace\s*(\S+)\s*\*/(\s*})?|} // namespace $3|;
s|\s*};?\s*//\s*(end\s*)?namespace\s*(\S+)\s*|} // namespace $2\n|;
s|\s*};?\s*//\s*(\S+)\s*namespace\s*|} // namespace $1\n|;

Also did a little manual editing on some of the arch/*/isa_traits.hh files
and src/SConscript.


# 7720:65d338a8dba4 31-Oct-2010 Gabe Black <gblack@eecs.umich.edu>

ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.



This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.


# 7707:e5b6f1157be3 16-Oct-2010 Gabe Black <gblack@eecs.umich.edu>

GetArgument: Rework getArgument so that X86_FS compiles again.

When no size is specified for an argument, push the decision about what size
to use into the ISA by passing a size of -1.


# 7693:f1db1000d957 01-Oct-2010 Ali Saidi <Ali.Saidi@ARM.com>

Debug: Implement getArgument() and function skipping for ARM.

In the process make add skipFuction() to handle isa specific function skipping
instead of ifdefs and other ugliness. For almost all ABIs, 64 bit arguments can
only start in even registers. Size is now passed to getArgument() so that 32
bit systems can make decisions about register selection for 64 bit arguments.
The number argument is now passed by reference because getArgument() will need
to change it based on the size of the argument and the current argument number.

For ARM, if the argument number is odd and a 64-bit register is requested the
number must first be incremented to because all 64 bit arguments are passed
in an even argument register. Then the number will be incremented again to
access both halves of the argument.


# 7629:0f0c231e3e97 23-Aug-2010 Gabe Black <gblack@eecs.umich.edu>

X86: Create a directory for files that define register indexes.

This is to help tidy up arch/x86. These files should not be used external to
the ISA.


# 7087:fb8d5786ff30 24-May-2010 Nathan Binkert <nate@binkert.org>

copyright: Change HP copyright on x86 code to be more friendly


# 6329:5d8b91875859 09-Jul-2009 Gabe Black <gblack@eecs.umich.edu>

Registers: Add a registers.hh file as an ISA switched header.
This file is for register indices, Num* constants, and register types.
copyRegs and copyMiscRegs were moved to utility.hh and utility.cc.


# 6048:65a321a3a691 19-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Implement the INIT IPI.


# 6042:827bd9f03fdc 19-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Condense the startupCPU code.


# 5648:e8abda6e0980 12-Oct-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Make the local APIC accessible through the memory system directly, and make the timer work.


# 5647:b06b49498c79 12-Oct-2008 Gabe Black <gblack@eecs.umich.edu>

Turn Interrupts objects into SimObjects. Also, move local APIC state into x86's Interrupts object.


# 5360:02a3af203516 26-Feb-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Put in initial implementation of the local APIC.


# 5299:e61b9f2a9732 02-Dec-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Move startup code to the system object to initialize a Linux system.


# 5294:7222bdaed33b 02-Dec-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Reorganize segmentation and implement segment selector movs.


# 5289:ca5390e654b8 02-Dec-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Separate the effective seg base and the "hidden" seg base.


# 5264:f290df2f2261 16-Nov-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Fix 32 bit compilation.


# 5234:55e0b1585b04 12-Nov-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Implement the startupCPU function.


# 5182:9032bb4332eb 23-Oct-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Fix X86_FS compilation.


# 5141:a3b0e3a8b83c 07-Oct-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Make x86 initialize more state.


# 5135:6ae576eada5c 07-Oct-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Make initCPU and startupCPU do something basic.


# 5086:e7913ffb379d 24-Sep-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Get X86_FS to compile.