14292:64ea0057798b |
16-Sep-2019 |
Jing Qu <jqu32@wisc.edu> |
gpu-compute: Fix overriden errors
When building Gem5 with GPU protocols, overriden errors were thrown from files in gpu-compute. After adding override to the files, the errors were resolved and Gem5 builds successfully.
Change-Id: Iab3a0768caf82c226e8bbee5690a834bf92d1e03 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20939 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
14024:abe47b13653d |
02-May-2019 |
Gabe Black <gabeblack@google.com> |
arch, base, cpu, gpu, sim: Merge getMemProxy and getVirtProxy.
These two functions were performing the same function but had two different names for historical reasons. This change merges them together, keeping the getVirtProxy name to be consistent with the getPhysProxy method used to get a non-translating proxy port.
Change-Id: Idd83c6b899f9343795075b030ccbc723a79e52a4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18581 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13995:5d459168a680 |
28-Aug-2018 |
Brandon Potter <brandon.potter@amd.com> |
sim-se: change syscall function signature
The system calls had four parameters. One of the parameters is ThreadContext and another is Process. The ThreadContext holds the value of the current process so the Process parameter is redundant since the system call functions already have indirect access.
With the old API, it is possible to call into the functions with the wrong supplied Process which could end up being a confusing error.
This patch removes the redundancy by forcing access through the ThreadContext field within each system call.
Change-Id: Ib43d3f65824f6d425260dfd9f67de1892b6e8b7c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/12299 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Maintainer: Brandon Potter <Brandon.Potter@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13981:577196ddd040 |
02-May-2019 |
Gabe Black <gabeblack@google.com> |
arch, base, cpu, dev, mem, sim: Remove #if 0-ed out code.
This code will be preserved through version control, but otherwise creates clutter and will rot in place since it's never compiled.
Change-Id: Id265f6deac445116843956ea5cf1210d8127274e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18608 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13892:0182a0601f66 |
22-Apr-2019 |
Gabe Black <gabeblack@google.com> |
mem: Minimize the use of MemObject.
MemObject doesn't provide anything beyond its base ClockedObject any more, so this change removes it from most inheritance hierarchies. Occasionally MemObject is replaced with SimObject when I was fairly confident that the extra functionality of ClockedObject wasn't needed.
Change-Id: Ic014ab61e56402e62548e8c831eb16e26523fdce Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18289 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Gabe Black <gabeblack@google.com> |
13784:1941dc118243 |
07-Mar-2019 |
Gabe Black <gabeblack@google.com> |
arch, cpu, dev, gpu, mem, sim, python: start using getPort.
Replace the getMasterPort, getSlavePort, and getEthPort functions with getPort, and remove extraneous mechanisms that are no longer necessary.
Change-Id: Iab7e3c02d2f3a0cf33e7e824e18c28646b5bc318 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17040 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13665:9c7fe3811b88 |
25-Jan-2019 |
Andreas Sandberg <andreas.sandberg@arm.com> |
python: Don't assume SimObjects live in the global namespace
The importer in Python 3 doesn't like the way we import SimObjects from the global namespace. Convert the existing SimObject declarations to import from m5.objects. As a side-effect, this makes these files consistent with configuration files.
Change-Id: I11153502b430822130722839e1fa767b82a027aa Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15981 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> |
13557:fc33e6048b25 |
13-Oct-2018 |
Gabe Black <gabeblack@google.com> |
cpu: dev: sim: gpu-compute: Banish some ISA specific register types.
These types are IntReg, FloatReg, FloatRegBits, and MiscReg. There are some remaining types, specifically the vector registers and the CCReg. I'm less familiar with these new types of registers, and so will look at getting rid of them at some later time.
Change-Id: Ide8f76b15c531286f61427330053b44074b8ac9b Reviewed-on: https://gem5-review.googlesource.com/c/13624 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> |
13449:2f7efa89c58b |
26-Nov-2018 |
Gabe Black <gabeblack@google.com> |
arch, base, cpu, gpu, mem: Replace assert(0 or false with panic.
Neither assert(0) nor assert(false) give any hint as to why control getting to them is bad, and their more descriptive versions, assert(0 && "description") and assert(false && "description"), jury rig assert to add an error message when the utility function panic() already does that directly with better formatting options.
This change replaces that flavor of call to assert with panic, except in the actual code which processes the formatting that panic uses (to avoid infinitely recurring error handling), and in some *.sm files since I don't know what rules those have to follow and don't want to accidentaly break them.
Change-Id: I8addfbfaf77eaed94ec8191f2ae4efb477cefdd0 Reviewed-on: https://gem5-review.googlesource.com/c/14636 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13345:1c9e1df05191 |
12-Oct-2018 |
Gabe Black <gabeblack@google.com> |
gpu-compute: Explicitly use little endian packet accessors.
The gpu ISA doesn't have a well defined endianness, but it really should. It seems that the GPU is only used with x86, and in that context it would be little endian.
Change-Id: I1620906564a77f44553fbf6d788866e017b6054b Reviewed-on: https://gem5-review.googlesource.com/c/13463 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12889:6d4515549710 |
16-Aug-2018 |
Brandon Potter <brandon.potter@amd.com> |
hsail-x86: fix gpu dynamic instruction error
The gpu_dyn_inst.hh file was missing a clone method from inherited classes. (The clone method is the way to implement the prototype design pattern.) Because the inherited clone method was declare as pure virtual, the method needed to be implemented. Otherwise, the compiler complains that the class is abstract.
Change-Id: I38782d5f7379f32be886401f7c127fe60d2f8811 Reviewed-on: https://gem5-review.googlesource.com/12108 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12888:1582dce7cc91 |
16-Aug-2018 |
Brandon Potter <brandon.potter@amd.com> |
hsail-x86: fix addr_range_map error
683f411dca introduced changes to the addr_range_map's "find" method. Nikos replaced the relevant code with a new "contains" method. Propagate the changes to the gpu-compute code.
Change-Id: I8cfe3b15cbfb476685b0ed5ba423ea5a8f1000d8 Reviewed-on: https://gem5-review.googlesource.com/12107 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12799:b230ebe2f641 |
25-Jun-2018 |
Alexandru Dutu <alexandru.dutu@amd.com> |
gpu-compute: Remove unneeded Request::setVirt call
This sets the members of a Request object to the values they already hold, except the atomicOpFunctor which is set to nullptr. This call introduces a bug for atomics and is not useful for non-atomic requests. This changeset is also adding the wave PC and instruction sequence number to the Request object.
Change-Id: I62f7b4a597483b0aa848a0cfbc72181e1063f56a Reviewed-on: https://gem5-review.googlesource.com/11549 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12749:223c83ed9979 |
04-Jun-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
misc: Using smart pointers for memory Requests
This patch is changing the underlying type for RequestPtr from Request* to shared_ptr<Request>. Having memory requests being managed by smart pointers will simplify the code; it will also prevent memory leakage and dangling pointers.
Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10996 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> |
12748:ae5ce8e42de7 |
03-Jun-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
misc: Substitute pointer to Request with aliased RequestPtr
Every usage of Request* in the code has been replaced with the RequestPtr alias. This is a preparing patch for when RequestPtr will be the typdefed to a smart pointer to Request rather then a raw pointer to Request.
Change-Id: I73cbaf2d96ea9313a590cdc731a25662950cd51a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10995 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12717:2e2c211644d2 |
27-Apr-2018 |
Brandon Potter <brandon.potter@amd.com> |
gpu-compute: use X86ISA::TlbEntry over GpuTlbEntry
GpuTlbEntry was derived from a vanilla X86ISA::TlbEntry definition. It wrapped the class and included an extra member "valid". This member was intended to report on the validity of the entry, however it introduced bugs when folks forgot to set field properly in the code. So, instead of keeping the extra field which we might forget to set, we track validity by using nullptr for invalid tlb entries (as the tlb entries are dynamically allocated). This saves on the extra class definition and prevents bugs creeping into the code since the checks are intrinsically tied into accessing any of the X86ISA::TlbEntry members.
This changeset fixes the issues introduced by a8d030522, a4e722725, and 2a15bfd79.
Change-Id: I30ebe3ec223fb833f3795bf0403d0016ac9a8bc2 Reviewed-on: https://gem5-review.googlesource.com/10481 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12697:cd71b966be1e |
27-Apr-2018 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
style: fix amd license and style issues
Change-Id: I26136fb49f743c4a597f8021cfd27f78897267b5 Reviewed-on: https://gem5-review.googlesource.com/10463 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12696:15d4edfbb01b |
03-May-2018 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: Cleanup the scheduler a bit
Change-Id: If2c626544f208e15c91be975dee9253126862ced Reviewed-on: https://gem5-review.googlesource.com/10222 Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12680:91f4d6668b4f |
04-Apr-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
sim,cpu,mem,arch: Introduced MasterInfo data structure
With this patch a gem5 System will store more info about its Masters. While it was previously keeping track of the Master name and Master ID only, it is now adding a per-Master pointer to the SimObject related to the Master. This will make it possible for a client to query a System for a Master using either the master's name or the master's pointer.
Change-Id: I8b97d328a65cd06f329e2cdd3679451c17d2b8f6 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/9781 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> |
12663:565c16ffe1d1 |
17-Apr-2018 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: fix bad asserts in gpu tlb and cu tlb port
change 2a15bfd79ced20a6d4cbf0a0a4c2fbb1444b9a44 introduced a few bugs in the tlb of the cu. asserts in the gpu tlb and cu expected the page table lookup() function to return a bool, and this value was used directly in the gpu tlb's assert and it was kept in the gpu tlb entry, where later the cu would assert that it is true.
this change fixes the issue by checking the validity of the pte pointer returned by lookup() in order to set the validity of the tlb entry itself.
Change-Id: Ief1f205db65f1911fd132acd314e4407c5e3ffdf Reviewed-on: https://gem5-review.googlesource.com/10001 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
12603:ab2cec4483af |
16-Mar-2018 |
Gabe Black <gabeblack@google.com> |
hsail: Get rid of an inert private member of StorageSpace.
The "segment" private element in this class was only ever set to zero on construction, and then used to index into a list of segment names to get the string "none" in a DPRINTF. If debugging was turned off, there would be no consumers of that variable, and that upset g++. This change removes the essentially useless variable, and also that bit of text in the DPRINTF.
Change-Id: I3f85db4af5f0678768243daf84b8d698350af931 Reviewed-on: https://gem5-review.googlesource.com/9221 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12461:a4cb506cda74 |
09-Jan-2018 |
Gabe Black <gabeblack@google.com> |
tarch, mem: Abstract the data stored in the SE page tables.
Rather than store the actual TLB entry that corresponds to a mapping, we can just store some abstracted information (address, a few flags) and then let the caller turn that into the appropriate entry. There could potentially be some small amount of overhead from creating entries vs. storing them and just installing them, but it's likely pretty minimal since that only happens on a TLB miss (ideally rare), and, if it is problematic, there could be some preallocated TLB entries which are just minimally filled in as necessary.
This has the nice effect of finally making the page tables ISA agnostic.
Change-Id: I11e630f60682f0a0029b0683eb8ff0135fbd4317 Reviewed-on: https://gem5-review.googlesource.com/7350 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> |
12455:c88f0b37f433 |
05-Jan-2018 |
Gabe Black <gabeblack@google.com> |
arch, mem: Make the page table lookup function return a pointer.
This avoids having a copy in the lookup function itself, and the declaration of a lot of temporary TLB entry pointers in callers. The gpu TLB seems to have had the most dependence on the original signature of the lookup function, partially because it was relying on a somewhat unsafe copy to a TLB entry using a base class pointer type.
Change-Id: I8b1cf494468163deee000002d243541657faf57f Reviewed-on: https://gem5-review.googlesource.com/7343 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com> |
12334:e0ab29a34764 |
30-Nov-2017 |
Gabe Black <gabeblack@google.com> |
misc: Rename misc.(hh|cc) to logging.(hh|cc)
These files aren't a collection of miscellaneous stuff, they're the definition of the Logger interface, and a few utility macros for calling into that interface (panic, warn, etc.).
Change-Id: I84267ac3f45896a83c0ef027f8f19c5e9a5667d1 Reviewed-on: https://gem5-review.googlesource.com/6226 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Maintainer: Gabe Black <gabeblack@google.com> |
12126:06c1fbaa5724 |
27-Jun-2017 |
Sean Wilson <spwilson2@wisc.edu> |
gpu-compute: Refactor some Event subclasses to lambdas
Change-Id: Ic1332b8e8ba0afacbe591c80f4d06afbf5f04bd9 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3922 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
12085:de78ea63e0ca |
07-Jun-2017 |
Sean Wilson <spwilson2@wisc.edu> |
cpu, gpu-compute: Replace EventWrapper use with EventFunctionWrapper
Change-Id: Idd5992463bcf9154f823b82461070d1f1842cea3 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3746 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
11905:4a771f8756ad |
01-Mar-2017 |
Brandon Potter <Brandon.Potter@amd.com> |
syscall-emul: Move memState into its own file
The Process class is full of implementation details and structures related to SE Mode. This changeset factors out an internal class from Process and moves it into a separate file. The purpose behind doing this is to clean up the code and make it a bit more modular.
Change-Id: Ic6941a1657751e8d51d5b6b1dcc04f1195884280 Reviewed-on: https://gem5-review.googlesource.com/2263 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
11900:0787df49546b |
27-Feb-2017 |
Andreas Sandberg <andreas.sandberg@arm.com> |
gpu-compute: Fix Python/C++ object hierarchy discrepancies
The GPUCoalescer and the Shader classes have different base classes in C++ and Python. This causes subtle bugs in SWIG and compilation errors for PyBind.
Change-Id: I1ddd2a8ea43f083470538ddfea891347b21d14d8 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2228 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Pierre-Yves Péneau <pierre-yves.peneau@lirmm.fr> Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com> |
11886:43b882cada33 |
27-Feb-2017 |
Brandon Potter <brandon.potter@amd.com> |
syscall_emul: [PATCH 15/22] add clone/execve for threading and multiprocess simulations
Modifies the clone system call and adds execve system call. Requires allowing processes to steal thread contexts from other processes in the same system object and the ability to detach pieces of process state (such as MemState) to allow dynamic sharing. |
11883:3bfed693ff22 |
27-Feb-2017 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: remove unnecessary member from class
The clang compiler complains that the wavefront member in the GpuISA class is unused. This changeset removes the member, because it does not appear serve a purpose. |
11882:68dd3c3349aa |
27-Feb-2017 |
Brandon Potter <brandon.potter@amd.com> |
gpu-compute: mark functions with override if replacing virtual
The clang compiler is more stringent than the recent versions of GCC when dealing with overrides. This changeset adds the specifier to the methods which need it to silence the compiler. |
11856:103e2f92c965 |
09-Nov-2016 |
Brandon Potter <brandon.potter@amd.com> |
syscall_emul: [patch 10/22] refactor fdentry and add fdarray class
Several large changes happen in this patch.
The FDEntry class is rewritten so that file descriptors now correspond to types: 'File' which is normal file-backed file with the file open on the host machine, 'Pipe' which is a pipe that has been opened on the host machine, and 'Device' which does not have an open file on the host yet acts as a pseudo device with which to issue ioctls. Other types which might be added in the future are directory entries and sockets (off the top of my head).
The FDArray class was create to hold most of the file descriptor handling that was stuffed into the Process class. It uses shared pointers and the std::array type to hold the FDEntries mentioned above.
The changes to these two classes needed to be propagated out to the rest of the code so there were quite a few changes for that. Also, comments were added where I thought they were needed to help others and extend our DOxygen coverage. |
11851:824055fe6b30 |
09-Nov-2016 |
Brandon Potter <brandon.potter@amd.com> |
syscall_emul: [patch 5/22] remove LiveProcess class and use Process instead
The EIOProcess class was removed recently and it was the only other class which derived from Process. Since every Process invocation is also a LiveProcess invocation, it makes sense to simplify the organization by combining the fields from LiveProcess into Process. |
11800:54436a1784dc |
09-Nov-2016 |
Brandon Potter <brandon.potter@amd.com> |
style: [patch 3/22] reduce include dependencies in some headers
Used cppclean to help identify useless includes and removed them. This involved erroneously included headers, but also cases where forward declarations could have been used rather than a full include. |
11734:d5ffebd89eb2 |
02-Dec-2016 |
Brandon Potter <brandon.potter@amd.com> |
hsail: remove the panic guarding function directives
HSA functions calls are still not supported properly with HSAIL, but the recent AMP runtime modifications rely on being able to parse the BRIG/HSAIL files that are extracted from the application binaries. We need to parse the function call HSAIL definitions, but we do not actually need to make the function calls.
The reason that this happens is that HCC appends a set of routines to every HSAIL binary that it creates. These extra, unnecessary routines exist in the HCC source as a file; this file is cat'd onto everything that the compiler outputs before being assembled into the application's binary. HCC does this because it might call these helper functions. However, it doesn't actually appear to do so in the AMP codes so we just parse these functions with the HSAIL parser and then ignore them. |
11714:511b47e9b9bc |
21-Nov-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: fix segfault when constructing GPUExecContext
the GPUExecContext context currently stores a reference to its parent WF's GPUISA object, however there are some special instructions that do not have an associated WF. when these objects are constructed they set their WF pointer to null, which causes the GPUExecContext to segfault when trying to dereference the WF pointer to get at the WF's GPUISA object. here we change the GPUISA reference in the GPUExecContext class to a pointer so that it may be set to null. |
11713:499071baed7b |
21-Nov-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: init valid field of GpuTlbEntry in default ctor
valid field for GpuTlbEntry is not set in the default ctor, which can lead to strange behavior, and is also flagged by UBSAN. |
11704:c38fcdaa5fe5 |
26-Oct-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
hsail,gpu-compute: fixes to appease clang++
fixes to appease clang++. tested on:
Ubuntu clang version 3.5.0-4ubuntu2~trusty2 (tags/RELEASE_350/final) (based on LLVM 3.5.0)
Ubuntu clang version 3.6.0-2ubuntu1~trusty1 (tags/RELEASE_360/final) (based on LLVM 3.6.0)
the fixes address the following five issues:
1) the exec continuations in gpu_static_inst.hh were marked as protected when they should be public. here we mark them as public
2) the Abs instruction uses std::abs() in its execute method. because Abs is templated, it can also operate on U32 and U64, types, which cause Abs::execute() to pass uint32_t and uint64_t types to std::abs() respectively. this triggers a warning because std::abs() has no effect in this case. to rememdy this we add template specialization for the execute() method of Abs when its template paramter is U32 or U64.
3) Some potocols that utilize the code in cprintf.hh were missing includes to BoolVec.hh, which defines operator<< for the BoolVec type. This would cause issues when the generated code would try to pass a BoolVec type to a method in cprintf.hh that used operator<< on an instance of a BoolVec.
4) Surprise, clang doesn't like it when you clobber all the bits in a newly allocated object. I.e., this code:
tlb = new GpuTlbEntry\[size\]; std::memset(tlb, 0, sizeof(GpuTlbEntry) \* size);
Let's use std::vector to track the TLB entries in the GpuTlb now...
5) There were a few variables used only in DPRINTFs, so we mark them with M5_VAR_USED. |
11700:7d4d424c9f17 |
26-Oct-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: support in-order data delivery in GM pipe
this patch adds an ordered response buffer to the GM pipeline to ensure in-order data delivery. the buffer is implemented as a stl ordered map, which sorts the request in program order by using their sequence ID. when requests return to the GM pipeline they are marked as done. only the oldest request may be serviced from the ordered buffer, and only if is marked as done.
the FIFO response buffers are kept and used in OoO delivery mode |
11699:c7453f485a5f |
26-Oct-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute, hsail: pass GPUDynInstPtr to getRegisterIndex()
for HSAIL an operand's indices into the register files may be calculated trivially, because the operands are always read from a register file, or are an immediate.
for machine ISA, however, an op selector may specify special registers, or may specify special SGPRs with an alias op selector value. the location of some of the special registers values are dependent on the size of the RF in some cases. here we add a way for the underlying getRegisterIndex() method to know about the size of the RFs, so that it may find the relative positions of the special register values. |
11698:d1ad31187fa5 |
26-Oct-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: use System cache line size in the GPU |
11697:c63431b7bbeb |
26-Oct-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute, hsail: make the PC a byte address, not an instruction index
currently the PC is incremented on an instruction granularity, and not as an instruction's byte address. machine ISA instructions assume the PC is a byte address, and is incremented accordingly. here we make the GPU model, and the HSAIL instructions treat the PC as a byte address as well. |
11696:80c30bd0c7d6 |
26-Oct-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: add gpu_isa.hh to switch hdrs, add GPUISA to WF
the GPUISA class is meant to encapsulate any ISA-specific behavior - special register accesses, isa-specific WF/kernel state, etc. - in a generic enough way so that it may be used in ISA-agnostic code.
gpu-compute: use the GPUISA object to advance the PC
the GPU model treats the PC as a pointer to individual instruction objects - which are store in a contiguous array - and not a byte address to be fetched from the real memory system. this is ok for HSAIL because all instructions are considered by the model to be the same size.
in machine ISA, however, instructions may be 32b or 64b, and branches are calculated by advancing the PC by the number of words (4 byte chunks) it needs to advance in the real instruction stream. because of this there is a mismatch between the PC we use to index into the instruction array, and the actual byte address PC the ISA expects. here we move the PC advance calculation to the ISA so that differences in the instrucion sizes may be accounted for in generic way. |
11695:0a65922d564d |
26-Oct-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: add instruction mix stats for the gpu |
11694:c3b4d57a15c5 |
26-Oct-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute, hsail: call discardFetch() from the WF
because every taken branch causes fetch to be discarded, we move the call to the WF to avoid to have to call it from each and every branch instruction type. |
11693:bc1f702c25b9 |
26-Oct-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
hsail, gpu-compute: remove doGm/SmReturn add completeAcc
we are removing doGmReturn from the GM pipe, and adding completeAcc() implementations for the HSAIL mem ops. the behavior in doGmReturn is dependent on HSAIL and HSAIL mem ops, however the completion phase of memory ops in machine ISA can be very different, even amongst individual machine ISA mem ops. so we remove this functionality from the pipeline and allow it to be implemented by the individual instructions. |
11692:e772fdcd3809 |
26-Oct-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: remove inst enums and use bit flag for attributes
this patch removes the GPUStaticInst enums that were defined in GPU.py. instead, a simple set of attribute flags that can be set in the base instruction class are used. this will help unify the attributes of HSAIL and machine ISA instructions within the model itself.
because the static instrution now carries the attributes, a GPUDynInst must carry a pointer to a valid GPUStaticInst so a new static kernel launch instruction is added, which carries the attributes needed to perform a the kernel launch. |
11691:6d5fc65d64bd |
26-Oct-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: move disassemle() implementation to GPUStaticInst |
11690:3027d6c34fa4 |
26-Oct-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute, arch: add some methods to the base inst classes for ISA support |
11657:5fad5a37d6fc |
04-Oct-2016 |
Alexandru Dutu <alexandru.dutu@amd.com> |
gpu-compute: Added method to compute the actual workgroup size This patch adds a method to the Wavefront class to compute the actual workgroup size. This can be different from the maximum workgroup size specified when launching the kernel through the NDRange object. Current solution is still not optimal, as we are computing these for each wavefront and the dispatcher also needs to have this information and can't actually call Wavefront::computeActuallWgSz before the wavefronts are being created. A long term solution would be to have a Workgroup class that deals with all these details. |
11646:fd783bff017c |
16-Sep-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: fix typo in GPUDispatcher |
11644:d426728892fe |
16-Sep-2016 |
Alexandru Dutu <alexandru.dutu@amd.com> |
gpu-compute: Adding context serialization methods to Wavefront This patch adds methods to serialize the context of a particular wavefront to the simulated system memory. Context serialization is used when a wavefront is preempeted (i.e. context switch). |
11643:42a1873be45c |
16-Sep-2016 |
Alexandru Dutu <alexandru.dutu@amd.com> |
gpu-compute: Refactoring Wavefront::dynWaveId |
11642:46cffde5d8a6 |
16-Sep-2016 |
Alexandru Dutu <alexandru.dutu@amd.com> |
gpu-compute: Adding vector register file debug messages This patch introduces DPRINTFs for reading and writing to and from the vector register file. |
11641:a9f0711e7230 |
16-Sep-2016 |
Alexandru Dutu <alexandru.dutu@amd.com> |
gpu-compute: Changing reconvergenceStack type std::stack has no iterators, therefore the reconvergence stack can't be iterated without poping elements off. We will be using std::list instead to be able to iterate for saving and restoring purposes. |
11640:aa846ec8cd8d |
16-Sep-2016 |
Alexandru Dutu <alexandru.dutu@amd.com> |
gpu-compute: Adding ioctl for HW context size Adding runtime support for determining the memory required by a SIMD engine when executing a particular wavefront. |
11639:2e8d4bd8108d |
16-Sep-2016 |
Alexandru Dutu <alexandru.dutu@amd.com> |
gpu-compute: Wavefront refactoring Renaming members of the Wavefront class in accordance with the style guide. |
11638:b511733958d0 |
16-Sep-2016 |
Alexandru Dutu <alexandru.dutu@amd.com> |
gpu-compute: Remove WFContext WFContext struct is currently unused and it has been rendered not useful in saving and restoring the context of a Wavefront. Wavefront class should be sufficient for that purpose and the runtime can figure out the memory size it will need to allocate for a Wavefront through an IOCTL. |
11623:b56cbe6b63a2 |
13-Sep-2016 |
Michael LeBeane <michael.lebeane@amd.com> |
gpu-compute: Fix bug with return in cfg Connecting basic blocks would stop too early in kernels where ret was not the last instruction. This patch allows basic blocks after the ret instruction to be properly connected. |
11534:7106f550afad |
09-Jun-2016 |
jkalamat <john.kalamatianos@amd.com> |
gpu-compute: parametrize Wavefront size
Eliminate the VSZ constant that defined the Wavefront size (in numbers of work items); replaced it with a parameter in the GPU.py configuration script. Changed all data structures dependent on the Wavefront size to be dynamically sized. Legal values of Wavefront size are 16, 32, 64 for now and checked at initialization time. |
11523:81332eb10367 |
06-Jun-2016 |
David Guillen Fandos <david.guillen@arm.com> |
stats: Fixing regStats function for some SimObjects
Fixing an issue with regStats not calling the parent class method for most SimObjects in Gem5. This causes issues if one adds new stats in the base class (since they are never initialized properly!).
Change-Id: Iebc5aa66f58816ef4295dc8e48a357558d76a77c Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11520:e18a6c55bec0 |
03-Jun-2016 |
Tuan Ta <taquangtuan1992@gmail.com> |
gpu-compute: Fixed a bug in global memory pipeline
Added a condition when inflightStores is incremented to prevent a deadlock caused by many memory fence requests generated by a CU |
11477:54cf9a388a9d |
16-May-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: fix bug in GPUDynInst::isScalarRegister() |
11473:517ba946a860 |
06-May-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: fix spacing in GPUDynInst ctor |
11472:8263ac8f99d9 |
06-May-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: fix uninitialized member bug in GPUDynInst
the n_reg field in the GPUDynInst is not currently set in the constructor. if it is not set externally, there are assertion failures that may occur if the random value it gets is just right. here we set it to 0 by default. |
11435:0f1b46dde3fa |
07-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
mem: Remove threadId from memory request class
In general, the ThreadID parameter is unnecessary in the memory system as the ContextID is what is used for the purposes of locks/wakeups. Since we allocate sequential ContextIDs for each thread on MT-enabled CPUs, ThreadID is unnecessary as the CPUs can identify the requesting thread through sideband info (SenderState / LSQ entries) or ContextID offset from the base ContextID for a cpu.
This is a re-spin of 20264eb after the revert (bd1c6789) and includes some fixes of that commit. |
11394:2bc2120ad76b |
21-Mar-2016 |
jkalamat <john.kalamatianos@amd.com> |
gpu-compute: remove unused variable from scoreboard check stage
appease clang by removing the unused private member variable, 'numGlbMemPipes', from the scoreboard check stage |
11386:94c09b607a84 |
17-Mar-2016 |
Steve Reinhardt <steve.reinhardt@amd.com> |
syscall_emul: move mmapGrowsDown() to LiveProcess
The mmapGrowsDown() method was a static method on the OperatingSystem class (and derived classes), which worked OK for the templated syscall emulation methods, but made it hard to access elsewhere. This patch moves the method to be a virtual function on the LiveProcess method, where it can be overridden for specific platforms (for now, Alpha).
This patch also changes the value of mmapGrowsDown() from being false by default and true only on X86Linux32 to being true by default and false only on Alpha, which seems closer to reality (though in reality most people use ASLR and this doesn't really matter anymore).
In the process, also got rid of the unused mmap_start field on LiveProcess and OperatingSystem mmapGrowsUp variable. |
11364:1bd9f1b27438 |
04-Mar-2016 |
Andreas Hansson <andreas.hansson@arm.com> |
base: Fix gpu-compute output stream creation
Match changes in output stream. |
11345:b6a66a90e0a1 |
18-Feb-2016 |
John Kalamatianos <john.kalamatianos@amd.com> |
gpu: fix bugs with MemFence, Flat Instrs and Resource utilization
Both Memory Fence is now flagged as Global Memory only to avoid resource oversubscribing. Flat instructions now check for Shared Memory resource busy to avoid oversubscribing resources. All WaitClass resources now use cycles (not ticks) to register the number of pipe stages between Scoreboard and Execute to be consistent with instruction scheduling logic which always used clock cycles. |
11344:d9205e67b172 |
17-Feb-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: remove brig_object.hh from hsa_object.cc
brig_object.hh is specific to the HSAIL ISA, and hence should not be included in ISA-agnostic code. |
11308:7d8836fd043d |
19-Jan-2016 |
Tony Gutierrez <anthony.gutierrez@amd.com> |
gpu-compute: AMD's baseline GPU model |