History log of /gem5/src/arch/x86/isa/microops/ldstop.isa
Revision Date Author Comments
# 12682:dfc3bb0db088 13-Apr-2018 Gabe Black <gabeblack@google.com>

x86: Add a ld/st microop flag for marking an access uncacheable.

This percolates down to the memory request object which will have its
"UNCACHEABLE" flag set.

Change-Id: Ie73f4249bfcd57f45a473f220d0988856715a9ce
Reviewed-on: https://gem5-review.googlesource.com/9881
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>


# 12463:84f365522633 15-Jan-2018 Swapnil Haria <swapnilster@gmail.com>

arch-x86: Adding clflush, clflushopt, clwb instructions

This patch adds support for cache flushing instructions in x86.
It piggybacks on support for similar instructions in arm ISA
added by Nikos Nikoleris. I have tested each instruction using
microbenchmarks.

Change-Id: I72b6b8dc30c236a21eff7958fa231f0663532d7d
Reviewed-on: https://gem5-review.googlesource.com/7401
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>


# 12384:481add71d2e4 12-Dec-2017 Gabe Black <gabeblack@google.com>

x86: Rework how "split" loads/stores are handled.

Explicitly separate the way the data is represented in the underlying
representation from how it's represented in the instruction.

In order to make the ISA parser happy, the Mem operand needs to have
a single, particular type. To handle that with scalar types, we just
used uint64_ts and then worked with values that were smaller than the
maximum we could hold. To work with these new array values, we also
use an underlying uint64_t for each element.

To make accessing the underlying memory system more natural, when we
go to actually read or write values, we translate the access into an
array of the actual, correct underlying type. That way we don't have
non-exact asserts which confuse gcc, or weird endianness conversion
which assumes that the data should be flipped 8 bytes at a time.

Because the functions involved are generally inline, the syntactic
niceness should all boil off, and the final implementation in the
binary should be simple and efficient for the given data types.

Change-Id: I14ce7a2fe0dc2cbaf6ad4a0d19f743c45ee78e26
Reviewed-on: https://gem5-review.googlesource.com/6582
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>


# 12236:126ac9da6050 04-Nov-2017 Gabe Black <gabeblack@google.com>

alpha,arm,mips,power,riscv,sparc,x86: Merge exec decl templates.

In the ISA instruction definitions, some classes were declared with
execute, etc., functions outside of the main template because they
had CPU specific signatures and would need to be duplicated with
each CPU plugged into them. Now that the instructions always just
use an ExecContext, there's no reason for those templates to be
separate. This change folds those templates together.

Change-Id: I13bda247d3d1cc07c0ea06968e48aa5b4aace7fa
Reviewed-on: https://gem5-review.googlesource.com/5401
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Alec Roelke <ar4jc@virginia.edu>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>


# 12234:78ece221f9f5 02-Nov-2017 Gabe Black <gabeblack@google.com>

alpha,arm,mips,power,riscv,sparc,x86,isa: De-specialize ExecContexts.

The ISA parser used to generate different copies of exec functions
for each exec context class a particular CPU wanted to use. That's
since been changed so that those functions take a pointer to the base
ExecContext, so the code which would generate those extra functions
can be removed, and some functions which used to be templated on an
ExecContext subclass can be untemplated, or minimally less templated.

Now that some functions aren't going to be instantiated multiple times
with different signatures, there are also opportunities to collapse
templates and make many instruction definitions simpler within the
parser. Since those changes will be less mechanical, they're left for
later changes and will probably be done in smaller increments.

Change-Id: I0015307bb02dfb9c60380b56d2a820f12169ebea
Reviewed-on: https://gem5-review.googlesource.com/5381
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>


# 11829:cb5390385d87 10-Feb-2017 Jason Lowe-Power <jason@lowepower.com>

x86: Fix implicit stack addressing in 64-bit mode

When in 64-bit mode, if the stack is accessed implicitly by an instruction
the alternate address prefix should be ignored if present.

This patch adds an extra flag to the ldstop which signifies when the
address override should be ignored. Then, for all of the affected
instructions, this patch adds two options to the ld and st opcode to use
the current stack addressing mode for all addresses and to ignore the
AddressSizeFlagBit. Finally, this patch updates the x86 TLB to not
truncate the address if it is in 64-bit mode and the IgnoreAddrSizeFlagBit
is set.

This fixes a problem when calling __libc_start_main with a binary that is
linked with a recent version of ld. This version of ld uses the address
override prefix (0x67) on the call instruction instead of a nop.

Note: This has not been tested in compatibility mode and only the call
instruction with the address override prefix has been tested.

See [1] page 9 (pdf page 45)

For instructions that are affected see [1] page 519 (pdf page 555).

[1] http://support.amd.com/TechDocs/24594.pdf

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>


# 11329:82bb3ee706b3 06-Feb-2016 Alexandru Dutu <alexandru.dutu@amd.com>

x86: revamp cmpxchg8b/cmpxchg16b implementation

The previous implementation did a pair of nested RMW operations,
which isn't compatible with the way that locked RMW operations are
implemented in the cache models. It was convenient though in that
it didn't require any new micro-ops, and supported cmpxchg16b using
64-bit memory ops. It also worked in AtomicSimpleCPU where
atomicity was guaranteed by the core and not by the memory system.
It did not work with timing CPU models though.

This new implementation defines new 'split' load and store micro-ops
which allow a single memory operation to use a pair of registers as
the source or destination, then uses a single ldsplit/stsplit RMW
pair to implement cmpxchg. This patch requires support for 128-bit
memory accesses in the ISA (added via a separate patch) to support
cmpxchg16b.


# 11328:9512d2e25f14 06-Feb-2016 Steve Reinhardt <steve.reinhardt@amd.com>

arch, x86: add support for arrays as memory operands

Although the cache models support wider accesses, the ISA descriptions
assume that (for the most part) memory operands are integer types,
which makes it difficult to define instructions that do memory accesses
larger than 64 bits.

This patch adds some generic support for memory operands that are arrays
of uint64_t, and specifically a 'u2qw' operand type for x86 that is an
array of 2 uint64_ts (128 bits). This support is unused at this point,
but will be needed shortly for cmpxchg16b. Ideally the 128-bit SSE
memory accesses will also be rewritten to use this support.

Support for 128-bit accesses could also have been added using the gcc
__int128_t extension, which would have been less disruptive. However,
although clang also supports __int128_t, it's still non-standard.
Also, more importantly, this approach creates a path to defining
256- and 512-byte operands as well, which will be useful for eventual
AVX support.


# 11303:f694764d656d 17-Jan-2016 Steve Reinhardt <steve.reinhardt@amd.com>

cpu. arch: add initiateMemRead() to ExecContext interface

For historical reasons, the ExecContext interface had a single
function, readMem(), that did two different things depending on
whether the ExecContext supported atomic memory mode (i.e.,
AtomicSimpleCPU) or timing memory mode (all the other models).
In the former case, it actually performed a memory read; in the
latter case, it merely initiated a read access, and the read
completion did not happen until later when a response packet
arrived from the memory system.

This led to some confusing things, including timing accesses
being required to provide a pointer for the return data even
though that pointer was only used in atomic mode.

This patch splits this interface, adding a new initiateMemRead()
function to the ExecContext interface to replace the timing-mode
use of readMem().

For consistency and clarity, the readMemTiming() helper function
in the ISA definitions is renamed to initiateMemRead() as well.
For x86, where the access size is passed in explicitly, we can
also get rid of the data parameter at this level. For other ISAs,
where the access size is determined from the type of the data
parameter, we have to keep the parameter for that purpose.


# 11159:9459593cb649 06-Oct-2015 Steve Reinhardt <steve.reinhardt@amd.com>

x86: implement fild, fucomi, and fucomip x87 insts

fild loads an integer value into the x87 top of stack register.
fucomi/fucomip compare two x87 register values (the latter
also doing a stack pop).
These instructions are used by some versions of GNU libstdc++.


# 10760:8f5993cfa916 23-Mar-2015 Steve Reinhardt <steve.reinhardt@amd.com>

mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMW

Makes x86-style locked operations even more distinct from
LLSC operations. Using "locked" by itself should be
obviously ambiguous now.


# 10196:be0e1724eb39 09-May-2014 Curtis Dunham <Curtis.Dunham@arm.com>

arch: teach ISA parser how to split code across files

This patch encompasses several interrelated and interdependent changes
to the ISA generation step. The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.

The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units. I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.

Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner. It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type. Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.

Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths. In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.

Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.


# 10184:bbfa3152bdea 09-May-2014 Curtis Dunham <Curtis.Dunham@arm.com>

arch: remove inline specifiers on all inst constrs, all ISAs

With (upcoming) separate compilation, they are useless. Only
link-time optimization could re-inline them, but ideally
feedback-directed optimization would choose to do so only for
profitable (i.e. common) instructions.


# 9894:c0a3920859bd 29-Sep-2013 Andreas Sandberg <andreas@sandberg.pp.se>

x86: Add support for loading 32-bit and 80-bit floats in the x87

The x87 FPU supports three floating point formats: 32-bit, 64-bit, and
80-bit floats. The current gem5 implementation supports 32-bit and
64-bit floats, but only works correctly for 64-bit floats. This
changeset fixes the 32-bit float handling by correctly loading and
rounding (using truncation) 32-bit floats instead of simply truncating
the bit pattern.

80-bit floats are loaded by first loading the 80-bits of the float to
two temporary integer registers. A micro-op (cvtint_fp80) then
converts the contents of the two integer registers to the internal FP
representation (double). Similarly, when storing an 80-bit float,
there are two conversion routines (ctvfp80h_int and cvtfp80l_int) that
convert an internal FP register to 80-bit and stores the upper 64-bits
or lower 32-bits to an integer register, which is the written to
memory using normal integer stores.


# 8925:97f06a79b6f5 31-Mar-2012 Gabe Black <gblack@eecs.umich.edu>

X86: Fix address size handling so real mode works properly.

Virtual (pre-segmentation) addresses are truncated based on address size, and
any non-64 bit linear address is truncated to 32 bits. This means that real
mode addresses aren't truncated down to 16 bits after their segment bases are
added in.


# 8588:ef28ed90449d 27-Sep-2011 Gabe Black <gblack@eecs.umich.edu>

ISA parser: Use '_' instead of '.' to delimit type modifiers on operands.

By using an underscore, the "." is still available and can unambiguously be
used to refer to members of a structure if an operand is a structure, class,
etc. This change mostly just replaces the appropriate "."s with "_"s, but
there were also a few places where the ISA descriptions where handling the
extensions themselves and had their own regular expressions to update. The
regular expressions in the isa parser were updated as well. It also now
looks for one of the defined type extensions specifically after connecting "_"
where before it would look for any sequence of characters after a "."
following an operand name and try to use it as the extension. This helps to
disambiguate cases where a "_" may legitimately be part of an operand name but
not separate the name from the type suffix.

Because leaving the "_" and suffix on the variable name still leaves a valid
C++ identifier and all extensions need to be consistent in a given context, I
considered leaving them on as a breadcrumb that would show what the intended
type was for that operand. Unfortunately the operands can be referred to in
code templates, the Mem operand in particular, and since the exact type of Mem
can be different for different uses of the same template, that broke things.


# 8442:b1f3dfae06f1 03-Jul-2011 Gabe Black <gblack@eecs.umich.edu>

ISA: Use readBytes/writeBytes for all instruction level memory operations.


# 8440:e513600a3551 03-Jul-2011 Gabe Black <gblack@eecs.umich.edu>

X86: Fix store microops so they don't drop faults in timing mode.

If a fault was returned by the CPU when a store initiated it's write, the
store instruction would ignore the fault. This change fixes that.


# 8432:4a0c9c9409e4 21-Jun-2011 Gabe Black <gblack@eecs.umich.edu>

X86: Eliminate an unused argument for building store microops.


# 8103:53c2d9b1c15d 02-Mar-2011 Gabe Black <gblack@eecs.umich.edu>

X86: Mark IO reads and writes as non-speculative.


# 8102:77ee9ad2e113 02-Mar-2011 Gabe Black <gblack@eecs.umich.edu>

X86: Mark prefetches as such in their instruction and request flags.


# 7967:b243dc8cec8b 13-Feb-2011 Gabe Black <gblack@eecs.umich.edu>

X86: Don't read in dest regs if all bits are replaced.

In x86, 32 and 64 bit writes to registers in which registers appear to be 32 or
64 bits wide overwrite all bits of the destination register. This change
removes false dependencies in these cases where the previous value of a
register doesn't need to be read to write a new value. New versions of most
microops are created that have a "Big" suffix which simply overwrite their
destination, and the right version to use is selected during microop
allocation based on the selected data size.

This does not change the performance of the O3 CPU model significantly, I
assume because there are other false dependencies from the condition code bits
in the flags register.


# 7874:c7f15c60898e 02-Feb-2011 Gabe Black <gblack@eecs.umich.edu>

X86: Get rid of the stupd microop.


# 7626:bdd926760470 23-Aug-2010 Gabe Black <gblack@eecs.umich.edu>

X86: Get rid of the flagless microop constructor.

This will reduce clutter in the source and hopefully speed up compilation.


# 7620:3d8a23caa1ef 23-Aug-2010 Gabe Black <gblack@eecs.umich.edu>

X86: Consolidate extra microop flags into one parameter.

This single parameter replaces the collection of bools that set up various
flavors of microops. A flag parameter also allows other flags to be set like
the serialize before/after flags, etc., without having to change the
constructor.


# 7087:fb8d5786ff30 24-May-2010 Nathan Binkert <nate@binkert.org>

copyright: Change HP copyright on x86 code to be more friendly


# 6736:530e457c88c7 09-Nov-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Make x86 use PREFETCH instead of PF_EXCLUSIVE.


# 6624:b157ef23d76c 23-Aug-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Preserve the NO_ACCESS flag when giving CDA a specialized interface.


# 6345:f9ae7c3a036c 16-Jul-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Take limitted advantage of the compilers type checking for microop operands.


# 6132:916f10213bea 23-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Put the StoreCheck flag with the others, and don't collide with other flags.


# 6080:50890791c591 19-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Implement the stul microop.
This microop does a store and unlocks the requested address. The RISC86
microop ISA doesn't seem to have an equivalent to this, so I'm guessing that
the store following an ldstl is automatically unlocking. We don't do it this
way for performance reasons since the behavior is the same.


# 6079:f39c5598a302 19-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Implement the ldstl microop.
This microop does a load, checks that a store would succeed, and locks the
requested address.


# 6056:4435d13700de 19-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

X86: LEA calculates an address before segmentation.


# 5969:815827deb469 27-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Take address size into account when computing an effective address.


# 5965:71f8d7c12619 27-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Fix segment limit checks.


# 5920:5a9c976270d6 25-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Implement a basic prefetch instruction.


# 5919:08f836f37f61 25-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Use the right portion of a register for stores.


# 5912:d113f6def227 25-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Add a flag to force memory accesses to happen at CPL 0.


# 5892:a0ef4a6349dc 25-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Make the stupd microop not update registers in initiateAcc.


# 5890:bdef71accd68 25-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

CPU: Get rid of translate... functions from various interface classes.


# 5788:6d4161a36ca1 07-Jan-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Autogenerate macroop generateDisassemble function.


# 5727:8b9aaeac5bab 10-Nov-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Fix completeAcc get call.


# 5359:8c6ff200e4c1 26-Feb-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Implement the INVLPG instruction and the TIA microop.


# 5232:d3801ea2792e 12-Nov-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Various fixes to indexing segmentation related registers


# 5178:8914ea55a0c6 22-Oct-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Implement the cda microop which checks if an address is legal to write to.


# 5175:ee904e392de2 21-Oct-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Implement the stupd microop ("store with update", not "stupid") and use it in ENTER.


# 5149:356e00996637 12-Oct-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Implement MSR reads and writes and the wrsmr and rdmsr instructions.
There are no priviledge checks, so these instructions will all work in all
modes.


# 5118:f1b1cb6d0fbe 03-Oct-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Implement the ldst microop and put it in existing microcode where appropriate.


# 5027:e96b8a4f4d96 29-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Add load and store microops that use the fp registers.


# 5002:1b540e93ad34 26-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Remove x86 code that attempted to fix misaligned accesses.


# 4867:2de05bc73640 04-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Make 64 bit unaligned accesses work as well as the other sizes.
There is a fundemental flaw in how unaligned accesses are supported, but this
is still an improvement.


# 4863:b6dacc9a39ff 04-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Start implementing segmentation support.
Make instructions observe segment prefixes, default segment rules, segment
base addresses.
Also fix some microcode and add sib and riprel "keywords" to the x86
specialization of the microassembler.


# 4834:9480bde3ae6a 01-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Fix for compilation bug with new cache code.


# 4767:5e55d650692e 27-Jul-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Add functions to read and write to an exec context.
These functions take care of calling the thread contexts read and write functions with the right sized data type, and handle unaligned accesses.


# 4720:15cb65a86e5a 20-Jul-2007 Gabe Black <gblack@eecs.umich.edu>

Make load and store ops use the appropriate sized data access.


# 4712:79b4c64296ce 19-Jul-2007 Gabe Black <gblack@eecs.umich.edu>

x86 fixes
Make the emulation environment consider the rex prefix.
Implement and hook in forms of j, jmp, cmp, syscall, movzx
Added a format for an instruction to carry a call to the SE mode syscalls system
Made memory instructions which refer to the rip do so directly
Made the operand size overridable in the microassembly
Made the "ext" field of register operations 16 bits to hold a sparse encoding of flags to set or conditions to predicate on
Added an explicit "rax" operand for the syscall format
Implemented syscall returns.


# 4706:4ede9a05bb42 18-Jul-2007 Gabe Black <gblack@eecs.umich.edu>

Make store microops actually store instead of load.


# 4679:0b39fa8f5eb8 14-Jul-2007 Gabe Black <gblack@eecs.umich.edu>

Pull some hard coded base classes out of the isa description.


# 4601:38c989d15fef 20-Jun-2007 Gabe Black <gblack@eecs.umich.edu>

Make memory instructions work better, add more macroop implementations, add an lea microop, move EmulEnv into it's own .cc and .hh.


# 4587:2c9a2534a489 19-Jun-2007 Gabe Black <gblack@eecs.umich.edu>

Get rid of the immediate and displacement components of the EmulEnv struct and use them directly out of the instruction. The extra copies are conceptually realistic but are just innefficient as implemented. Also don't use the zeroeth microcode register for general storage since it's now the zero register, and implement a load and a store microops.


# 4561:ade4960f0832 13-Jun-2007 Gabe Black <gblack@eecs.umich.edu>

Move load/store microops into their own file. They still don't do anything, though.