History log of /gem5/src/arch/x86/isa/operands.isa
Revision Date Author Comments
# 11329:82bb3ee706b3 06-Feb-2016 Alexandru Dutu <alexandru.dutu@amd.com>

x86: revamp cmpxchg8b/cmpxchg16b implementation

The previous implementation did a pair of nested RMW operations,
which isn't compatible with the way that locked RMW operations are
implemented in the cache models. It was convenient though in that
it didn't require any new micro-ops, and supported cmpxchg16b using
64-bit memory ops. It also worked in AtomicSimpleCPU where
atomicity was guaranteed by the core and not by the memory system.
It did not work with timing CPU models though.

This new implementation defines new 'split' load and store micro-ops
which allow a single memory operation to use a pair of registers as
the source or destination, then uses a single ldsplit/stsplit RMW
pair to implement cmpxchg. This patch requires support for 128-bit
memory accesses in the ISA (added via a separate patch) to support
cmpxchg16b.


# 11328:9512d2e25f14 06-Feb-2016 Steve Reinhardt <steve.reinhardt@amd.com>

arch, x86: add support for arrays as memory operands

Although the cache models support wider accesses, the ISA descriptions
assume that (for the most part) memory operands are integer types,
which makes it difficult to define instructions that do memory accesses
larger than 64 bits.

This patch adds some generic support for memory operands that are arrays
of uint64_t, and specifically a 'u2qw' operand type for x86 that is an
array of 2 uint64_ts (128 bits). This support is unused at this point,
but will be needed shortly for cmpxchg16b. Ideally the 128-bit SSE
memory accesses will also be rewritten to use this support.

Support for 128-bit accesses could also have been added using the gcc
__int128_t extension, which would have been less disruptive. However,
although clang also supports __int128_t, it's still non-standard.
Also, more importantly, this approach creates a path to defining
256- and 512-byte operands as well, which will be useful for eventual
AVX support.


# 9921:ee049bfce978 15-Oct-2013 Yasuko Eckert <yasuko.eckert@amd.com>

arch/x86: add support for explicit CC register file

Convert condition code registers from being specialized
("pseudo") integer registers to using the recently
added CC register class.

Nilay Vaish also contributed to this patch.


# 9582:0632d2d1575c 11-Mar-2013 Nilay Vaish <nilay@cs.wisc.edu>

x86: implement some of the x87 instructions
This patch implements ftan, fprem, fyl2x, fld* floating-point instructions.


# 9471:4193ed60eed7 15-Jan-2013 Nilay Vaish <nilay@cs.wisc.edu>

x86: implements emms instruction


# 9470:68f7e0bcf4aa 15-Jan-2013 Nilay Vaish <nilay@cs.wisc.edu>

x86: implement fabs, fchs instructions


# 9212:dc386ccc1db9 11-Sep-2012 Nilay Vaish <nilay@cs.wisc.edu>

X86: make use of register predication
The patch introduces two predicates for condition code registers -- one
tests if a register needs to be read, the other tests whether a register
needs to be written to. These predicates are evaluated twice -- during
construction of the microop and during its execution. Register reads
and writes are elided depending on how the predicates evaluate.


# 9211:46c3a74952ec 11-Sep-2012 Nilay Vaish <nilay@cs.wisc.edu>

x86: Add a separate register for D flag bit
The D flag bit is part of the cc flag bit register currently. But since it
is not being used any where in the implementation, it creates an unnecessary
dependency. Hence, it is being moved to a separate register.


# 9010:7891b96e1526 22-May-2012 Nilay Vaish <nilay@cs.wisc.edu>

X86: Split Condition Code register
This patch moves the ECF and EZF bits to individual registers (ecfBit and
ezfBit) and the CF and OF bits to cfofFlag registers. This is being done
so as to lower the read after write dependencies on the the condition code
register. Ultimately we will have the following registers [ZAPS], [OF],
[CF], [ECF], [EZF] and [DF]. Note that this is only one part of the
solution for lowering the dependencies. The other part will check whether
or not the condition code register needs to be actually read. This would
be done through a separate patch.


# 8500:5bae9eee9482 14-Aug-2011 Gabe Black <gblack@eecs.umich.edu>

X86: Use IsSquashAfter if an instruction could affect fetch translation.

Control register operands are set up so that writing to them is serialize
after, serialize before, and non-speculative. These are probably overboard,
but they should usually be safe. Unfortunately there are times when even these
aren't enough. If an instruction modifies state that affects fetch, later
serialized instructions which come after it might have already gone through
fetch and decode by the time it commits. These instructions may have been
translated incorrectly or interpretted incorrectly and need to be destroyed.
This change modifies instructions which will or may have this behavior so that
they use the IsSquashAfter flag when necessary.


# 8449:4be49ad47c74 05-Jul-2011 Gabe Black <gblack@eecs.umich.edu>

ISA parser: Define operand types with a ctype directly.


# 7789:f455790bcd47 08-Dec-2010 Gabe Black <gblack@eecs.umich.edu>

X86: Take advantage of new PCState syntax.


# 7720:65d338a8dba4 31-Oct-2010 Gabe Black <gblack@eecs.umich.edu>

ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.



This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.


# 7087:fb8d5786ff30 24-May-2010 Nathan Binkert <nate@binkert.org>

copyright: Change HP copyright on x86 code to be more friendly


# 6479:b9ab1b56391b 07-Aug-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Implement shift right/left double microops.
This is my best guess as far as what these should do. Other existing microops
use implicit registers, mul1s and mul1u for instance, so this should be ok.
The microop that loads the implicit DoubleBits register would fall into one
of the microop slots for moving to/from special registers.


# 6360:c3058964d06f 17-Jul-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Tame the wilds of def operands.


# 5926:c182698e1ab3 25-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Add microops for reading/writing debug registers.


# 5839:4cc05b7f2a97 01-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Fix some incorrect register widths.


# 5789:46c548dbe620 07-Jan-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Hook in the M5 pseudo insts.


# 5682:6f1cab082ba7 13-Oct-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Add wrval/rdval microops for reading significant miscregs.


# 5659:f4b9c344d1ca 12-Oct-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Implement CPUID with a magical function instead of microcode.


# 5429:52dbcf7f7328 12-Jun-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Keep handy values like the operating mode in one register.


# 5428:5a27fea50fee 12-Jun-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Change what the microop chks does.
Instead of computing the segment descriptor address, this now checks if a
selector value/descriptor are legal for a particular purpose.


# 5426:0bdcc60ccc45 12-Jun-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Add microops and supporting code to manipulate the whole rflags register.


# 5409:0343cd06df4f 12-Jun-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Add in some support for the tsc register.


# 5294:7222bdaed33b 02-Dec-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Reorganize segmentation and implement segment selector movs.


# 5291:5d38610cff05 02-Dec-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Implement the lgdt instruction.


# 5290:7dc3e8ee0a22 02-Dec-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Implement wrbase and wrlimit for loading pseudo descriptors.


# 5289:ca5390e654b8 02-Dec-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Separate the effective seg base and the "hidden" seg base.


# 5246:21f29e99e021 13-Nov-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Make microcode use presegmentation RIPs and the rest of m5 use post segmentation RIPS.


# 5241:a6602acdd046 12-Nov-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Implement the wrcr microop which writes a control register, and some control register work.


# 5083:49559a8060e8 19-Sep-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Move the fp microops to their own file with their own base classes in C++ and python.


# 5082:82dd253231c8 19-Sep-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Put in the foundation for x87 stack based fp registers.


# 5075:4ae876c5037d 13-Sep-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Total overhaul of the division instructions and microops.


# 5063:8eb72b1bd3c6 06-Sep-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Rework the multiplication microops so that they work like they would in the patent.


# 5026:46dd8d55f6c9 29-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Add operands to handle floating point registers.


# 5025:5c264911b7a9 29-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Flesh out register indexing constants.


# 4950:f5f19784acf1 07-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Make a microcode branch microop.
Also some touch up for ruflag.


# 4863:b6dacc9a39ff 04-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Start implementing segmentation support.
Make instructions observe segment prefixes, default segment rules, segment
base addresses.
Also fix some microcode and add sib and riprel "keywords" to the x86
specialization of the microassembler.


# 4816:13391cf96e9c 30-Jul-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Take into account the regular registers and the microcode registers when decided whether or not to fold.


# 4805:cc9a5798e4d1 30-Jul-2007 Gabe Black <gblack@eecs.umich.edu>

Make the register indices use the appropriate "fold" bit.


# 4712:79b4c64296ce 19-Jul-2007 Gabe Black <gblack@eecs.umich.edu>

x86 fixes
Make the emulation environment consider the rex prefix.
Implement and hook in forms of j, jmp, cmp, syscall, movzx
Added a format for an instruction to carry a call to the SE mode syscalls system
Made memory instructions which refer to the rip do so directly
Made the operand size overridable in the microassembly
Made the "ext" field of register operations 16 bits to hold a sparse encoding of flags to set or conditions to predicate on
Added an explicit "rax" operand for the syscall format
Implemented syscall returns.


# 4687:db7ca06d6e6a 17-Jul-2007 Gabe Black <gblack@eecs.umich.edu>

Add in operand which holds the condition code bits of the flag register.


# 4587:2c9a2534a489 19-Jun-2007 Gabe Black <gblack@eecs.umich.edu>

Get rid of the immediate and displacement components of the EmulEnv struct and use them directly out of the instruction. The extra copies are conceptually realistic but are just innefficient as implemented. Also don't use the zeroeth microcode register for general storage since it's now the zero register, and implement a load and a store microops.


# 4519:f8da6b45573f 04-Jun-2007 Gabe Black <gblack@eecs.umich.edu>

Reworking x86's microcode system. This is a work in progress, and X86 doesn't compile.

src/arch/x86/isa/decoder/one_byte_opcodes.isa:
src/arch/x86/isa/macroop.isa:
src/arch/x86/isa/main.isa:
src/arch/x86/isa/microasm.isa:
src/arch/x86/isa/microops/base.isa:
src/arch/x86/isa/microops/microops.isa:
src/arch/x86/isa/operands.isa:
src/arch/x86/isa/microops/regop.isa:
src/arch/x86/isa/microops/specop.isa:
Reworking x86's microcode system


# 4338:24d31b35bcf9 04-Apr-2007 Gabe Black <gblack@eecs.umich.edu>

The process of going from an instruction definition to an instruction to be returned by the decoder has been fleshed out more. The following steps describe how an instruction implementation becomes a StaticInst.

1. Microops are created. These are StaticInsts use templates to provide a basic form of polymorphism without having to make the microassembler smarter.
2. An instruction class is created which has a "templated" microcode program as it's docstring. The template parameters are refernced with ^ following by a number.
3. An instruction in the decoder references an instruction template using it's mnemonic. The parameters to it's format end up replacing the placeholders. These parameters describe a source for an operand which could be memory, a register, or an immediate. It it's a register, the register index is used. If it's memory, eventually a load/store will be pre/postpended to the instruction template and it's destination register will be used in place of the ^. If it's an immediate, the immediate is used. Some operand types, specifically those that come from the ModRM byte, need to be decoded further into memory vs. register versions. This is accomplished by making the decode_block text for these instructions another case statement based off ModRM.
4. Once all of the template parameters have been handled, the instruction goes throw the microcode assembler which resolves labels and creates a list of python op objects. If an operand is a register, it uses a % prefix, an immediate uses $, and a label uses @. If the operand is just letters, numbers, and underscores, it can appear immediately after the prefix. If it's not, it can be encolsed in non nested {}s.
5. If there is a single "op" object (which corresponds to a single microop) the decoder is set up to return it directly. If not, a macroop wrapper is created around it.

In the future, I'm considering seperating the operand type specialization from the template substitution step. A problem this introduces is that either the template arguments need to be kept around for the specialization step, or they need to be re-extracted. Re-extraction might be the way to go so that the operand formats can be coded directly into the micro assembler template without having to pass them in as parameters. I don't know if that's actually useful, though.

src/arch/x86/isa/decoder/one_byte_opcodes.isa:
src/arch/x86/isa/microasm.isa:
src/arch/x86/isa/microops/microops.isa:
src/arch/x86/isa/operands.isa:
src/arch/x86/isa/microops/base.isa:
Implemented polymorphic microops and changed around the microcode assembler syntax.


# 4298:a92aab35e34e 29-Mar-2007 Gabe Black <gblack@eecs.umich.edu>

Add code to generate register and immediate based integer op microop classes.


# 4279:acc38276ca1d 21-Mar-2007 Gabe Black <gblack@eecs.umich.edu>

Add a junk operand. With no operands, the parser breaks.


# 4158:a3fb9e29c6ce 05-Mar-2007 Gabe Black <gblack@eecs.umich.edu>

Stub decoder. This is probably even farther from finished than it looks...