Cross Reference: /gem5/src/arch/x86/isa/operands.isa

History log of /gem5/src/arch/x86/isa/operands.isa
Revision	Date	Author	Comments
# 11329:82bb3ee706b3	06-Feb-2016	Alexandru Dutu <alexandru.dutu@amd.com>	x86: revamp cmpxchg8b/cmpxchg16b implementation The previous implementation did a pair of nested RMW operations, which isn't compatible with the way that locked RMW operations are implemented in the cache models. It was convenient though in that it didn't require any new micro-ops, and supported cmpxchg16b using 64-bit memory ops. It also worked in AtomicSimpleCPU where atomicity was guaranteed by the core and not by the memory system. It did not work with timing CPU models though. This new implementation defines new 'split' load and store micro-ops which allow a single memory operation to use a pair of registers as the source or destination, then uses a single ldsplit/stsplit RMW pair to implement cmpxchg. This patch requires support for 128-bit memory accesses in the ISA (added via a separate patch) to support cmpxchg16b.
# 11328:9512d2e25f14	06-Feb-2016	Steve Reinhardt <steve.reinhardt@amd.com>	arch, x86: add support for arrays as memory operands Although the cache models support wider accesses, the ISA descriptions assume that (for the most part) memory operands are integer types, which makes it difficult to define instructions that do memory accesses larger than 64 bits. This patch adds some generic support for memory operands that are arrays of uint64_t, and specifically a 'u2qw' operand type for x86 that is an array of 2 uint64_ts (128 bits). This support is unused at this point, but will be needed shortly for cmpxchg16b. Ideally the 128-bit SSE memory accesses will also be rewritten to use this support. Support for 128-bit accesses could also have been added using the gcc __int128_t extension, which would have been less disruptive. However, although clang also supports __int128_t, it's still non-standard. Also, more importantly, this approach creates a path to defining 256- and 512-byte operands as well, which will be useful for eventual AVX support.
# 9921:ee049bfce978	15-Oct-2013	Yasuko Eckert <yasuko.eckert@amd.com>	arch/x86: add support for explicit CC register file Convert condition code registers from being specialized ("pseudo") integer registers to using the recently added CC register class. Nilay Vaish also contributed to this patch.
# 9582:0632d2d1575c	11-Mar-2013	Nilay Vaish <nilay@cs.wisc.edu>	x86: implement some of the x87 instructions This patch implements ftan, fprem, fyl2x, fld* floating-point instructions.
# 9471:4193ed60eed7	15-Jan-2013	Nilay Vaish <nilay@cs.wisc.edu>	x86: implements emms instruction
# 9470:68f7e0bcf4aa	15-Jan-2013	Nilay Vaish <nilay@cs.wisc.edu>	x86: implement fabs, fchs instructions
# 9212:dc386ccc1db9	11-Sep-2012	Nilay Vaish <nilay@cs.wisc.edu>	X86: make use of register predication The patch introduces two predicates for condition code registers -- one tests if a register needs to be read, the other tests whether a register needs to be written to. These predicates are evaluated twice -- during construction of the microop and during its execution. Register reads and writes are elided depending on how the predicates evaluate.
# 9211:46c3a74952ec	11-Sep-2012	Nilay Vaish <nilay@cs.wisc.edu>	x86: Add a separate register for D flag bit The D flag bit is part of the cc flag bit register currently. But since it is not being used any where in the implementation, it creates an unnecessary dependency. Hence, it is being moved to a separate register.
# 9010:7891b96e1526	22-May-2012	Nilay Vaish <nilay@cs.wisc.edu>	X86: Split Condition Code register This patch moves the ECF and EZF bits to individual registers (ecfBit and ezfBit) and the CF and OF bits to cfofFlag registers. This is being done so as to lower the read after write dependencies on the the condition code register. Ultimately we will have the following registers [ZAPS], [OF], [CF], [ECF], [EZF] and [DF]. Note that this is only one part of the solution for lowering the dependencies. The other part will check whether or not the condition code register needs to be actually read. This would be done through a separate patch.
# 8500:5bae9eee9482	14-Aug-2011	Gabe Black <gblack@eecs.umich.edu>	X86: Use IsSquashAfter if an instruction could affect fetch translation. Control register operands are set up so that writing to them is serialize after, serialize before, and non-speculative. These are probably overboard, but they should usually be safe. Unfortunately there are times when even these aren't enough. If an instruction modifies state that affects fetch, later serialized instructions which come after it might have already gone through fetch and decode by the time it commits. These instructions may have been translated incorrectly or interpretted incorrectly and need to be destroyed. This change modifies instructions which will or may have this behavior so that they use the IsSquashAfter flag when necessary.
# 8449:4be49ad47c74	05-Jul-2011	Gabe Black <gblack@eecs.umich.edu>	ISA parser: Define operand types with a ctype directly.
# 7789:f455790bcd47	08-Dec-2010	Gabe Black <gblack@eecs.umich.edu>	X86: Take advantage of new PCState syntax.
# 7720:65d338a8dba4	31-Oct-2010	Gabe Black <gblack@eecs.umich.edu>	ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
# 7087:fb8d5786ff30	24-May-2010	Nathan Binkert <nate@binkert.org>	copyright: Change HP copyright on x86 code to be more friendly
# 6479:b9ab1b56391b	07-Aug-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Implement shift right/left double microops. This is my best guess as far as what these should do. Other existing microops use implicit registers, mul1s and mul1u for instance, so this should be ok. The microop that loads the implicit DoubleBits register would fall into one of the microop slots for moving to/from special registers.
# 6360:c3058964d06f	17-Jul-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Tame the wilds of def operands.
# 5926:c182698e1ab3	25-Feb-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Add microops for reading/writing debug registers.
# 5839:4cc05b7f2a97	01-Feb-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Fix some incorrect register widths.
# 5789:46c548dbe620	07-Jan-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Hook in the M5 pseudo insts.
# 5682:6f1cab082ba7	13-Oct-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Add wrval/rdval microops for reading significant miscregs.
# 5659:f4b9c344d1ca	12-Oct-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Implement CPUID with a magical function instead of microcode.
# 5429:52dbcf7f7328	12-Jun-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Keep handy values like the operating mode in one register.
# 5428:5a27fea50fee	12-Jun-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Change what the microop chks does. Instead of computing the segment descriptor address, this now checks if a selector value/descriptor are legal for a particular purpose.
# 5426:0bdcc60ccc45	12-Jun-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Add microops and supporting code to manipulate the whole rflags register.
# 5409:0343cd06df4f	12-Jun-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Add in some support for the tsc register.
# 5294:7222bdaed33b	02-Dec-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Reorganize segmentation and implement segment selector movs.
# 5291:5d38610cff05	02-Dec-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Implement the lgdt instruction.
# 5290:7dc3e8ee0a22	02-Dec-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Implement wrbase and wrlimit for loading pseudo descriptors.
# 5289:ca5390e654b8	02-Dec-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Separate the effective seg base and the "hidden" seg base.
# 5246:21f29e99e021	13-Nov-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Make microcode use presegmentation RIPs and the rest of m5 use post segmentation RIPS.
# 5241:a6602acdd046	12-Nov-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Implement the wrcr microop which writes a control register, and some control register work.
# 5083:49559a8060e8	19-Sep-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Move the fp microops to their own file with their own base classes in C++ and python.
# 5082:82dd253231c8	19-Sep-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Put in the foundation for x87 stack based fp registers.
# 5075:4ae876c5037d	13-Sep-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Total overhaul of the division instructions and microops.
# 5063:8eb72b1bd3c6	06-Sep-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Rework the multiplication microops so that they work like they would in the patent.
# 5026:46dd8d55f6c9	29-Aug-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Add operands to handle floating point registers.
# 5025:5c264911b7a9	29-Aug-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Flesh out register indexing constants.
# 4950:f5f19784acf1	07-Aug-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Make a microcode branch microop. Also some touch up for ruflag.
# 4863:b6dacc9a39ff	04-Aug-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Start implementing segmentation support. Make instructions observe segment prefixes, default segment rules, segment base addresses. Also fix some microcode and add sib and riprel "keywords" to the x86 specialization of the microassembler.
# 4816:13391cf96e9c	30-Jul-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Take into account the regular registers and the microcode registers when decided whether or not to fold.
# 4805:cc9a5798e4d1	30-Jul-2007	Gabe Black <gblack@eecs.umich.edu>	Make the register indices use the appropriate "fold" bit.
# 4712:79b4c64296ce	19-Jul-2007	Gabe Black <gblack@eecs.umich.edu>	x86 fixes Make the emulation environment consider the rex prefix. Implement and hook in forms of j, jmp, cmp, syscall, movzx Added a format for an instruction to carry a call to the SE mode syscalls system Made memory instructions which refer to the rip do so directly Made the operand size overridable in the microassembly Made the "ext" field of register operations 16 bits to hold a sparse encoding of flags to set or conditions to predicate on Added an explicit "rax" operand for the syscall format Implemented syscall returns.
# 4687:db7ca06d6e6a	17-Jul-2007	Gabe Black <gblack@eecs.umich.edu>	Add in operand which holds the condition code bits of the flag register.
# 4587:2c9a2534a489	19-Jun-2007	Gabe Black <gblack@eecs.umich.edu>	Get rid of the immediate and displacement components of the EmulEnv struct and use them directly out of the instruction. The extra copies are conceptually realistic but are just innefficient as implemented. Also don't use the zeroeth microcode register for general storage since it's now the zero register, and implement a load and a store microops.
# 4519:f8da6b45573f	04-Jun-2007	Gabe Black <gblack@eecs.umich.edu>	Reworking x86's microcode system. This is a work in progress, and X86 doesn't compile. src/arch/x86/isa/decoder/one_byte_opcodes.isa: src/arch/x86/isa/macroop.isa: src/arch/x86/isa/main.isa: src/arch/x86/isa/microasm.isa: src/arch/x86/isa/microops/base.isa: src/arch/x86/isa/microops/microops.isa: src/arch/x86/isa/operands.isa: src/arch/x86/isa/microops/regop.isa: src/arch/x86/isa/microops/specop.isa: Reworking x86's microcode system
# 4338:24d31b35bcf9	04-Apr-2007	Gabe Black <gblack@eecs.umich.edu>	The process of going from an instruction definition to an instruction to be returned by the decoder has been fleshed out more. The following steps describe how an instruction implementation becomes a StaticInst. 1. Microops are created. These are StaticInsts use templates to provide a basic form of polymorphism without having to make the microassembler smarter. 2. An instruction class is created which has a "templated" microcode program as it's docstring. The template parameters are refernced with ^ following by a number. 3. An instruction in the decoder references an instruction template using it's mnemonic. The parameters to it's format end up replacing the placeholders. These parameters describe a source for an operand which could be memory, a register, or an immediate. It it's a register, the register index is used. If it's memory, eventually a load/store will be pre/postpended to the instruction template and it's destination register will be used in place of the ^. If it's an immediate, the immediate is used. Some operand types, specifically those that come from the ModRM byte, need to be decoded further into memory vs. register versions. This is accomplished by making the decode_block text for these instructions another case statement based off ModRM. 4. Once all of the template parameters have been handled, the instruction goes throw the microcode assembler which resolves labels and creates a list of python op objects. If an operand is a register, it uses a % prefix, an immediate uses $, and a label uses @. If the operand is just letters, numbers, and underscores, it can appear immediately after the prefix. If it's not, it can be encolsed in non nested {}s. 5. If there is a single "op" object (which corresponds to a single microop) the decoder is set up to return it directly. If not, a macroop wrapper is created around it. In the future, I'm considering seperating the operand type specialization from the template substitution step. A problem this introduces is that either the template arguments need to be kept around for the specialization step, or they need to be re-extracted. Re-extraction might be the way to go so that the operand formats can be coded directly into the micro assembler template without having to pass them in as parameters. I don't know if that's actually useful, though. src/arch/x86/isa/decoder/one_byte_opcodes.isa: src/arch/x86/isa/microasm.isa: src/arch/x86/isa/microops/microops.isa: src/arch/x86/isa/operands.isa: src/arch/x86/isa/microops/base.isa: Implemented polymorphic microops and changed around the microcode assembler syntax.
# 4298:a92aab35e34e	29-Mar-2007	Gabe Black <gblack@eecs.umich.edu>	Add code to generate register and immediate based integer op microop classes.
# 4279:acc38276ca1d	21-Mar-2007	Gabe Black <gblack@eecs.umich.edu>	Add a junk operand. With no operands, the parser breaks.
# 4158:a3fb9e29c6ce	05-Mar-2007	Gabe Black <gblack@eecs.umich.edu>	Stub decoder. This is probably even farther from finished than it looks...