14289:49005710b522 |
26-Aug-2019 |
Pouya Fotouhi <Pouya.Fotouhi@amd.com> |
arch-x86: ignore non-temporal hint for movntps/movntpd SSE insts
Making the implementation of movntps/movntpd consistent with other non-temporal instructions. We are ignoring the hint here, and implementing those instructions as cacheable instructions.
This change adds a warning to let user know about this workaround. Also, this change add the address check for second part of move.
Change-Id: I811652b24cf39ca2f5c5d4c9e9e417f69190b55c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20408 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14287:1c9774d969ac |
18-Sep-2019 |
Hoa Nguyen <hoanguyen@ucdavis.edu> |
arch-x86: Change warn to warn_once for NT instructions
Change-Id: I50353716f2a913b9b106b140644d95991879f662 Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21039 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Pouya Fotouhi <pfotouhi@ucdavis.edu> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14220:d8f83e601091 |
20-Aug-2019 |
Pouya Fotouhi <Pouya.Fotouhi@amd.com> |
arch-x86: implement movntq/movntdq instructions
Non-temporal quadword/double-quadword move instructions. This change ignores the non-temporal hint and instructions are implemented to send cacheable request to memory. This would have some "performance" impact (i.e. having some cache pollution) to get better "correctness" in behavior.
Change-Id: I2052ac0970f61a54bafb7332762debcb7103202d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20288 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
14033:a1cb162f68d9 |
31-May-2019 |
Brandon Potter <brandon.potter@amd.com> |
x86: fix movsd bug on %xmm register
The movsd instruction should zero out half the register, but does not do it. This changeset adds the necessary microop to the instruction to cause correct behavior.
Change-Id: I5278da3634c78a97ed0586f687a36c6dc5a34c60 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19068 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Michael LeBeane <Michael.Lebeane@amd.com> Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> |
12683:6e14a1dd346d |
20-Apr-2017 |
Steve Reinhardt <steve.reinhardt@amd.com> |
arch-x86: implement movntps/movntpd SSE insts
These are non-temporal packed SSE stores.
Change-Id: I526cd6551b38d6d35010bc6173f23d017106b466 Reviewed-on: https://gem5-review.googlesource.com/9861 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> |
11160:10f28b61fcb1 |
06-Oct-2015 |
Steve Reinhardt <steve.reinhardt@amd.com> |
x86: implement rcpps and rcpss SSE insts
These are packed single-precision approximate reciprocal operations, vector and scalar versions, respectively.
This code was basically developed by copying the code for sqrtps and sqrtss. The mrcp micro-op was simplified relative to msqrt since there are no double-precision versions of this operation. |
10899:b8b8ad2c72dd |
04-Jul-2015 |
Nikos Nikoleris <nikos.nikoleris@gmail.com> |
x86: Adjust the size of the values written to the x87 misc registers All x87 misc registers are implemented in an array of 64 bit values but in real hardware the size of some of these registers is smaller. Previsouly all 64 bits where incorrectly set and then later read. To ensure correctness we mask the value in setMiscRegNoEffect to write only the valid bits.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10644:24447dc69101 |
10-Jan-2015 |
Emilio Castillo <castilloe@unican.es> |
x86 : fxsave and fxrestore missing template code
This patch corrects the FXSAVE and FXRSTOR Macroops. The actual code used for saving/restore the FP registers is in the file but it was not used.
The FXSAVE and FXRSTOR instructions are used in the kernel for saving and loading the state of the mmx,xmm and fpu registers.
This operation is triggered in FS by issuing a Device Not Available Fault. The cr0 register has a TS flag that is set upon each context change. Every time a task access any FP related register (SIMD as well) if the TS flag is set to one, the device not available fault is issued. The kernel saves the current state of the registers, and restore the previous state of the currently running task.
Right now Gem5 lacks of this capability. the Device Not Available Fault is never issued, leading to several problems when different threads share the same CPU and SMT is not used. The PARSEC Ferret benchmark is an example of this behavior.
In order to test this a hack in the atomic cpu code was done to detect if a static instruction has any FP operands and the cr0 reg TS bit is set. This check must be done in the ISA dependent code. But it seems to be tricky to access the cr0 register while executing an instruction.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10632:b415e0dabe21 |
03-Jan-2015 |
Maxime Martinasso <maxime.cscs@gmail.com> |
x86: implements the simd128 ADDSUBPD instruction
This patch implements the simd128 ADDSUBPD instruction for the x86 architecture.
Tested with a simple program in assembly language which executes the instruction. Checked that different versions of the instruction are executed by using the execution tracing option.
Committed by: Nilay Vaish <nilay@cs.wisc.edu |
9896:e31776cf4743 |
29-Sep-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
x86: Add support for FXSAVE, FXSAVE64, FXRSTOR, and FXRSTOR64 |
9009:d45a02bd5391 |
19-May-2012 |
Marc Orr <marc.orr@gmail.com> |
x86 ISA: Implement the sse3 haddps instruction.
Shuffle the 32 bit values into position, and then add in parallel. |
7087:fb8d5786ff30 |
24-May-2010 |
Nathan Binkert <nate@binkert.org> |
copyright: Change HP copyright on x86 code to be more friendly |
6801:353726c415f4 |
19-Dec-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Add a common named flag for signed media operations. |
6800:335f8b406bb9 |
19-Dec-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Create a common flag with a name to indicate high multiplies. |
6799:36131e4dfb6e |
19-Dec-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Create a common flag with a name to indicate scalar media instructions. |
6715:fb4a3a61bc74 |
04-Nov-2009 |
Vince Weaver <vince@csl.cornell.edu> |
X86: Fix problem with movhps instruction
This problem is like the one fixed with movhpd a few weeks ago. A +8 displacement is used to access memory when there should be none.
This fix is needed for the perlbmk spec2k benchmark to run. |
6707:0e5037cecaf7 |
30-Oct-2009 |
Vince Weaver <vince@csl.cornell.edu> |
X86: Add support for x86 psrldq and pslldq instructions
These are complicated instructions and the micro-code might be suboptimal.
This has been tested with some small sample programs (attached)
The psrldq instruction is needed by various spec2k programs. |
6705:3c810b64ee7d |
30-Oct-2009 |
Vince Weaver <vince@csl.cornell.edu> |
X86: Implement the X86 sse2 haddpd instruction
This patch implements the haddpd instruction.
It fixes the problem in the previous version (pointed out by Gabe Black) where an incorrect result would happen if you issue the instruction with the same argument twice, i.e. "haddpd %xmm0,%xmm0"
This instruction is used by many spec2k benchmarks. |
6698:21047815f78e |
28-Oct-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Replace "DISPLACEMENT" with disp in movhpd. |
6697:4863725cb4d9 |
27-Oct-2009 |
Vince Weaver <vince@csl.cornell.edu> |
Fix problem with the x86 sse movhpd instruction.
The movhpd instruction was writing to the wrong memory offset. |
6696:e533bec78924 |
21-Oct-2009 |
Vince Weaver <vince@csl.cornell.edu> |
Implement X86 sse2 movdqu and movdqa instructions
The movdqa instruction should enforce 16-byte alignment. This implementation does not do that.
These instructions are needed for most of x86_64 spec2k to run. |
6608:6d1f74b21533 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement MOVQ2DQ. |
6607:dba8e329e783 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement MOVDQ2Q. |
6606:03fd282998d0 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the media instructions that convert fp values to ints. |
6604:b750348f6da3 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the instructions that compare fp values and write a mask as a result. |
6602:95b882ce7b10 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the instructions that compare fp values and write to rflags. |
6600:bb997cd711af |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement MOVSS. |
6599:a578850e7524 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement LDMXCSR. |
6598:82d1d4d217e4 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement STMXCSR. |
6597:4903cea6a8c2 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the shuffle media instructions. |
6595:2aec993cdd8f |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the mask move instructions. |
6593:f27fd3c3a153 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the instructions that move sign bits. |
6591:3d1ea9362fe5 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the insert/extract instructions. |
6588:f449753172ee |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the media average instructions. |
6586:e8af0cf94c37 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the multiply and add instructions. |
6584:5355f44912f6 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the media shifts that operate on 64 bits or less at a time. |
6582:7e1af04f4ead |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the sum of absolute differences instructions. |
6580:a1c40860fe09 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the media integer subtract instructions. |
6578:825b77196521 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the integer media multiply instructions. |
6575:e5a3ae40c4d0 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the integer media max instructions. |
6573:6e14c5d36a1a |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the integer media min instructions. |
6571:91d9599956f3 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the media integer addition instructions. |
6569:e8cb266c9451 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the instructions that convert between forms of floating point. |
6567:819107c2c851 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the instructions that compare fp values and write masks as the result. |
6565:b7f5a02ea9b7 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the MOVDDUP instruction. |
6564:9ed64f6888cf |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement many of the media mov instructions. |
6563:2c5b80c75da7 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the media instructions that convert integer values to floating point. |
6561:3f716cda05c9 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the floating point media instructions. |
6559:e4f60f716103 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the floating point media multiply instructions. |
6557:f677e05d723d |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the floating point media subtract instructions. |
6555:dae81a15cfcc |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the floating point media add instructions. |
6553:897523ead7ce |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the media sqrt instructions. |
6551:52b4167056ed |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the media floating point max instructions. |
6549:d6ae13f56801 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the media floating point min instructions. |
6547:3f6c31c3d59e |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the pack instructions. |
6545:9c68aea7b1e6 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Rename sel to ext for media microops. |
6544:406ad51ece90 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Move the MMX version of MOVD into the simd64 directory. |
6543:a9a5dd560925 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the remaining unpack instructions. |
6542:059e35b593a8 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement PANDN, ANDNPS, and ANDNPD. |
6540:17414b661543 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement PAND, ANDPS, and ANDPD. |
6538:6cf5a0235ae8 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement POR, ORPD and ORPS. |
6536:dc54f4fd6116 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement PXOR. |
6535:b595412884f9 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: (Re)implement XORPS and XORPD. |
6533:2977e2e2dc27 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement PUNPCKLQDQ. |
6532:f7c42d003529 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement PUNPCKHQDQ. |
6531:6e2f4aa11482 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement PUNPCKHDQ. |
6530:cdb6bde20266 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement PUNPCKHWD. |
6529:cde96afcb3e3 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement PUNPCKHBW. |
6528:5c3a713ec1bb |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement PUNPCKLDQ. |
6527:4af40cccf527 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement PUNPCKLWD. |
6526:2f72755b4af7 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the versions of PUNPCKLBW that use XMM registers. |
6525:b252af5cda46 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the MOVQ instruction. |
6523:da0f91a2d60b |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the versions of MOVD that have an MMX source. |
6520:962f58808d53 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the versions of MOVD that have an MMX destination. |
6519:36369ba5fad6 |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Ignore the size part of XMM/MMX operands. The instructions know what they want. |
6518:1ad4a7774b3c |
17-Aug-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Use suffixes to differentiate XMM/MMX/GPR operands. |
5123:cd30bb46e146 |
03-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Fix places where movfp was used incorrectly. |
5119:a4469f2919f3 |
03-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Put ldst into the microcode (the earlier changeset didn't really). Also clean things up as much as possible so that faulting won't break an instruction. More microops which verify addresses are needed. |
5081:2ccce8600a9d |
19-Sep-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Put in stubs for x87, 64 bit and 128 bit SIMD instruction microcode. |