#
13937:a47ac7052832 |
|
30-Apr-2019 |
Gabor Dozsa <gabor.dozsa@arm.com> |
x86: Mark translation as delayed in case of a hw page table walk
This information is used by the LSQ in the O3 cpu (since commit "51becd2... cpu-o3: O3 LSQ Generalisation")
Change-Id: I35fe7e2f8428641d863af0e79e28b0b259fb0b00 Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18508 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com>
|
#
13784:1941dc118243 |
|
07-Mar-2019 |
Gabe Black <gabeblack@google.com> |
arch, cpu, dev, gpu, mem, sim, python: start using getPort.
Replace the getMasterPort, getSlavePort, and getEthPort functions with getPort, and remove extraneous mechanisms that are no longer necessary.
Change-Id: Iab7e3c02d2f3a0cf33e7e824e18c28646b5bc318 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17040 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
13741:d994984b842a |
|
22-Feb-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
mem-cache: alias to mem::getMasterPort in TLB class
TLB:getMasterPort is used to obtain the PageWalkMasterPort if present and hides the BaseTLB::getMasterPort().
The TLB::getMasterPort() is renamed according to the expected behavior.
Change-Id: If4f61189094a706d59805cd10f4f814e5830eda8 Reviewed-on: https://gem5-review.googlesource.com/c/16648 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
13695:cce2b2b4466b |
|
19-Feb-2019 |
Bagus Hanindhito <hanindhito@bagus.my.id> |
x86: Call the base class's regStats in X86ISA::TLB
When I try to build x86 architecture and run the se.py sample script with helloworld example, there is a panic warning stated "Not all stats have been initialized. You may need to add <ParentClass>::regStats() to a new SimObject's regStats() function."
I see that in x86 tlb.cc, there is no initialization in regStats() function that causes memory allocation error in some machine which make gem5 exit abnormally. I add the BaseTLB::regStats(); on TLB::regStats() method and can solve the problem
Change-Id: I8b62bebc15f896c3136ff4f8253dabbf998f618f Reviewed-on: https://gem5-review.googlesource.com/c/16522 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
13613:a19963be12ca |
|
20-Nov-2018 |
Gabe Black <gabeblack@google.com> |
x86: Stop using/defining some ISA specific register types.
These have been replaced with the generic RegVal type.
Change-Id: I75c1134212067dea43aa0903d813633e06f3d6c6 Reviewed-on: https://gem5-review.googlesource.com/c/14476 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
12749:223c83ed9979 |
|
04-Jun-2018 |
Giacomo Travaglini <giacomo.travaglini@arm.com> |
misc: Using smart pointers for memory Requests
This patch is changing the underlying type for RequestPtr from Request* to shared_ptr<Request>. Having memory requests being managed by smart pointers will simplify the code; it will also prevent memory leakage and dangling pointers.
Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10996 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
#
12461:a4cb506cda74 |
|
09-Jan-2018 |
Gabe Black <gabeblack@google.com> |
tarch, mem: Abstract the data stored in the SE page tables.
Rather than store the actual TLB entry that corresponds to a mapping, we can just store some abstracted information (address, a few flags) and then let the caller turn that into the appropriate entry. There could potentially be some small amount of overhead from creating entries vs. storing them and just installing them, but it's likely pretty minimal since that only happens on a TLB miss (ideally rare), and, if it is problematic, there could be some preallocated TLB entries which are just minimally filled in as necessary.
This has the nice effect of finally making the page tables ISA agnostic.
Change-Id: I11e630f60682f0a0029b0683eb8ff0135fbd4317 Reviewed-on: https://gem5-review.googlesource.com/7350 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
12455:c88f0b37f433 |
|
05-Jan-2018 |
Gabe Black <gabeblack@google.com> |
arch, mem: Make the page table lookup function return a pointer.
This avoids having a copy in the lookup function itself, and the declaration of a lot of temporary TLB entry pointers in callers. The gpu TLB seems to have had the most dependence on the original signature of the lookup function, partially because it was relying on a somewhat unsafe copy to a TLB entry using a base class pointer type.
Change-Id: I8b1cf494468163deee000002d243541657faf57f Reviewed-on: https://gem5-review.googlesource.com/7343 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
|
#
12406:86bde4a026b5 |
|
22-Dec-2017 |
Gabe Black <gabeblack@google.com> |
arch,cpu: "virtualize" the TLB interface.
CPUs have historically instantiated the architecture specific version of the TLBs to avoid a virtual function call, making them a little bit more dependent on what the current ISA is. Some simple performance measurement, the x86 twolf regression on the atomic CPU, shows that there isn't actually any performance benefit, and if anything the simulator goes slightly faster (although still within margin of error) when the TLB functions are virtual.
This change switches everything outside of the architectures themselves to use the generic BaseTLB type, and then inside the ISA for them to cast that to their architecture specific type to call into architecture specific interfaces.
The ARM TLB needed the most adjustment since it was using non-standard translation function signatures. Specifically, they all took an extra "type" parameter which defaulted to normal, and translateTiming returned a Fault. translateTiming actually doesn't need to return a Fault because everywhere that consumed it just stored it into a structure which it then deleted(?), and the fault is stored in the Translation object when the translation is done.
A little more work is needed to fully obviate the arch/tlb.hh header, so the TheISA::TLB type is still visible outside of the ISAs. Specifically, the TlbEntry type is used in the generic PageTable which lives in src/mem.
Change-Id: I51b68ee74411f9af778317eff222f9349d2ed575 Reviewed-on: https://gem5-review.googlesource.com/6921 Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
|
#
12140:fab402159cdf |
|
13-Jun-2017 |
Swapnil Haria <swapnilster@gmail.com> |
x86: Add stats to X86 TLB
Change-Id: Iebf7d245de66eebc8d4c59e62e52adf6cf51e1e4 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3980 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
#
11874:663bac0bb1c9 |
|
23-Feb-2017 |
Brandon Potter <brandon.potter@amd.com> |
x86: remove redundant condition check in tlb code
|
#
11800:54436a1784dc |
|
09-Nov-2016 |
Brandon Potter <brandon.potter@amd.com> |
style: [patch 3/22] reduce include dependencies in some headers
Used cppclean to help identify useless includes and removed them. This involved erroneously included headers, but also cases where forward declarations could have been used rather than a full include.
|
#
11793:ef606668d247 |
|
09-Nov-2016 |
Brandon Potter <brandon.potter@amd.com> |
style: [patch 1/22] use /r/3648/ to reorganize includes
|
#
11628:85011e8eaad9 |
|
13-Sep-2016 |
Michael LeBeane <michael.lebeane@amd.com> |
x86: Force strict ordering for memory mapped m5ops Normal MMAPPED_IPR requests are allowed to execute speculatively under the assumption that they have no side effects. The special case of m5ops that are treated like MMAPPED_IPR should not be allowed to execute speculatively, since they can have side-effects. Adding the STRICT_ORDER flag to these requests blocks execution until the associated instruction hits the ROB head.
|
#
11608:6319a1125f1c |
|
14-Aug-2016 |
Nikos Nikoleris <nikos.nikoleris@arm.com> |
cpu, arch: fix the type used for the request flags
Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
#
10905:a6ca6831e775 |
|
07-Jul-2015 |
Andreas Sandberg <andreas.sandberg@arm.com> |
sim: Refactor the serialization base class
Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically:
* Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section.
* Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects.
* Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections).
* The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects.
* Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code.
|
#
10824:308771bd2647 |
|
05-May-2015 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
mem, cpu: Add a separate flag for strictly ordered memory
The Request::UNCACHEABLE flag currently has two different functions. The first, and obvious, function is to prevent the memory system from caching data in the request. The second function is to prevent reordering and speculation in CPU models.
This changeset gives the order/speculation requirement a separate flag (Request::STRICT_ORDER). This flag prevents CPU models from doing the following optimizations:
* Speculation: CPU models are not allowed to issue speculative loads.
* Write combining: CPU models and caches are not allowed to merge writes to the same cache line.
Note: The memory system may still reorder accesses unless the UNCACHEABLE flag is set. It is therefore expected that the STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent this behavior.
|
#
10553:c1ad57c53a36 |
|
23-Nov-2014 |
Alexandru Dutu <alexandru.dutu@amd.com> |
kvm, x86: Adding support for SE mode execution This patch adds methods in KvmCPU model to handle KVM exits caused by syscall instructions and page faults. These types of exits will be encountered if KvmCPU is run in SE mode.
|
#
10474:799c8ee4ecba |
|
16-Oct-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Use shared_ptr for all Faults
This patch takes quite a large step in transitioning from the ad-hoc RefCountingPtr to the c++11 shared_ptr by adopting its use for all Faults. There are no changes in behaviour, and the code modifications are mostly just replacing "new" with "make_shared".
|
#
9911:676d3dcf1cc2 |
|
15-Oct-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
mem: Use a flag instead of address bit 63 for generic IPRs
Using address bit 63 to identify generic IPRs caused problems on SPARC, where IPRs are heavily used. This changeset redefines how generic IPRs are identified. Instead of using bit 63, we now use a separate flag (GENERIC_IPR) a memory request.
|
#
9898:2935441b0870 |
|
29-Sep-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
x86: Add support for m5ops through a memory mapped interface
In order to support m5ops in virtualized environments, we need to use a memory mapped interface. This changeset adds support for that by reserving 0xFFFF0000-0xFFFFFFFF and mapping those to the generic IPR interface for m5ops. The mapping is done in the X86ISA::TLB::finalizePhysical() which means that it just works for all of the CPU models, including virtualized ones.
|
#
9818:ebd7d3e04b5f |
|
07-Aug-2013 |
Nilay Vaish <nilay@cs.wisc.edu> |
x86: add tlb checkpointing This patch adds checkpointing support to x86 tlb. It upgrades the cpt_upgrader.py script so that previously created checkpoints can be updated. It moves the checkpoint version to 6.
|
#
9738:304a37519d11 |
|
03-Jun-2013 |
Andreas Sandberg <andreas@sandberg.pp.se> |
arch: Create a method to finalize physical addresses in the TLB
Some architectures (currently only x86) require some fixing-up of physical addresses after a normal address translation. This is usually to remap devices such as the APIC, but could be used for other memory mapped devices as well. When running the CPU in a using hardware virtualization, we still need to do these address fix-ups before inserting the request into the memory system. This patch moves this patch allows that code to be used by such CPUs without doing full address translations.
|
#
9423:43caa4ca5979 |
|
07-Jan-2013 |
Andreas Sandberg <Andreas.Sandberg@arm.com> |
arch: Add support for invalidating TLBs when draining
This patch adds support for the memInvalidate() drain method. TLB flushing is requested by calling the virtual flushAll() method on the TLB.
Note: This patch renames invalidateAll() to flushAll() on x86 and SPARC to make the interface consistent across all supported architectures.
|
#
9294:8fb03b13de02 |
|
15-Oct-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
Port: Add protocol-agnostic ports in the port hierarchy
This patch adds an additional level of ports in the inheritance hierarchy, separating out the protocol-specific and protocl-agnostic parts. All the functionality related to the binding of ports is now confined to use BaseMaster/BaseSlavePorts, and all the protocol-specific parts stay in the Master/SlavePort. In the future it will be possible to add other protocol-specific implementations.
The functions used in the binding of ports, i.e. getMaster/SlavePort now use the base classes, and the index parameter is updated to use the PortID typedef with the symbolic InvalidPortID as the default.
|
#
9064:d43eb1203aec |
|
07-Jun-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
X86 TLB: Add a missing = sign
|
#
9062:21f92aa46e8f |
|
07-Jun-2012 |
Jayneel Gandhi <jayneel@cs.wisc.edu> |
X86 TLB: Fix for gcc 4.4.3 Due to recent changes to X86 TLB, gem5 stopped compiling on gcc version 4.4.3. This patch provides the fix for that problem. The patch is tested on gcc 4.4.3. The change is not required for more recent versions of gcc (like on 4.6.3).
|
#
9028:f92783bcfd25 |
|
29-May-2012 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Use the HandyM5Reg to avoid a register read and some logic in the TLB.
|
#
9025:545591665fc7 |
|
27-May-2012 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Truncate addresses to 32 bits except in 64 bit mode, not long mode.
A small change was added a while ago to keep addresses from overflowing 32 bits when larger addresses shouldn't be accessible to software. That change truncated when not in long mode, but really it should have truncated when not in 64 bit mode. The difference is whether compatibility mode is included, a mode that's supposed to act like a legacy 32 bit mode.
|
#
8962:397cbf4b11a6 |
|
24-Apr-2012 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Clear out duplicate TLB entries when adding a new one.
It's possible for two page table walks to overlap which will go in the same place in the TLB's trie. They would land on top of each other, so this change adds some code which detects if an address already matches an entry and if so throws away the new one.
|
#
8953:488d45aeb672 |
|
15-Apr-2012 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Use the AddrTrie class to implement the TLB.
This change also adjusts the TlbEntry class so that it stores the number of address bits wide a page is rather than its size in bytes. In other words, instead of storing 4K for a 4K page, it stores 12. 12 is easy to turn into 4K, but it's a little harder going the other way.
|
#
8925:97f06a79b6f5 |
|
31-Mar-2012 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Fix address size handling so real mode works properly.
Virtual (pre-segmentation) addresses are truncated based on address size, and any non-64 bit linear address is truncated to 32 bits. This means that real mode addresses aren't truncated down to 16 bits after their segment bases are added in.
|
#
8922:17f037ad8918 |
|
30-Mar-2012 |
William Wang <william.wang@arm.com> |
MEM: Introduce the master/slave port sub-classes in C++
This patch introduces the notion of a master and slave port in the C++ code, thus bringing the previous classification from the Python classes into the corresponding simulation objects and memory objects.
The patch enables us to classify behaviours into the two bins and add assumptions and enfore compliance, also simplifying the two interfaces. As a starting point, isSnooping is confined to a master port, and getAddrRanges to slave ports. More of these specilisations are to come in later patches.
The getPort function is not getMasterPort and getSlavePort, and returns a port reference rather than a pointer as NULL would never be a valid return value. The default implementation of these two functions is placed in MemObject, and calls fatal.
The one drawback with this specific patch is that it requires some code duplication, e.g. QueuedPort becomes QueuedMasterPort and QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort (avoiding multiple inheritance). With the later introduction of the port interfaces, moving the functionality outside the port itself, a lot of the duplicated code will disappear again.
|
#
8902:75b524b64c28 |
|
19-Mar-2012 |
Andreas Hansson <andreas.hansson@arm.com> |
gcc: Clean-up of non-C++0x compliant code, first steps
This patch cleans up a number of minor issues aiming to get closer to compliance with the C++0x standard as interpreted by gcc and clang (compile with std=c++0x and -pedantic-errors). In particular, the patch cleans up enums where the last item was succeded by a comma, namespaces closed by a curcly brace followed by a semi-colon, and the use of the GNU-extension typeof (replaced by templated functions). It does not address variable-length arrays, zero-size arrays, anonymous structs, range expressions in switch statements, and the use of long long. The generated CPU code also has a large number of issues that remain to be fixed, mainly related to overflows in implicit constant conversion (due to shifts).
|
#
8888:befcf4d79fc1 |
|
09-Mar-2012 |
Geoffrey Blake <geoffrey.blake@arm.com> |
CheckerCPU: Add function stubs to non-ARM ISA source to compile with CheckerCPU
Making the CheckerCPU a runtime time option requires the code to be compatible with ISAs other than ARM. This patch adds the appropriate function stubs to allow compilation.
|
#
8864:fe907afe14a3 |
|
01-Mar-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
x86: Fix x86 TLB and Walker This patch adds a function to X86 tlb that returns the walker port. This port is required for correctly connecting the walker ports for the cpu just switched in
|
#
8797:3202eb01e01e |
|
07-Jan-2012 |
Gabe Black <gblack@eecs.umich.edu> |
Another merge with the main repository.
|
#
8768:314eb1e2fa94 |
|
30-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Get rid of more uses of FULL_SYSTEM.
|
#
8767:e575781f71b8 |
|
30-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
SE/FS: Make getProcessPtr available in both modes, and get rid of FULL_SYSTEMs.
|
#
8752:28e899b7dee3 |
|
13-Oct-2011 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Turn on the page table walker in SE mode.
|
#
8646:ef6cbf0f14dc |
|
05-Jan-2012 |
Nilay Vaish <nilay@cs.wisc.edu> |
X86 TLB: Move a DPRINTF to its correct place The DPRINTF for doing protection checks appears after the checks have been carried out. It is possible that the function returns while the checks are being carried, in which case the printf is missed out. This patch moves the DPRINTF before the checks.
|
#
8582:dd79a696b91c |
|
23-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Move the MSR lookup table out of the TLB and into its own file.
Translating MSR addresses into MSR register indices took a lot of space in the TLB source and made looking around in that file awkward. This change moves the lookup into its own file to get it out of the way. It also changes it from a switch statement to a hash map which should hopefully be a little more efficient.
|
#
8539:7d3ea3c65c66 |
|
09-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Stack: Tidy up some comments, a warning, and make stack extension consistent.
Do some minor cleanup of some recently added comments, a warning, and change other instances of stack extension to be like what's now being done for x86.
|
#
8535:d04ae08781e2 |
|
05-Sep-2011 |
Gabe Black <gblack@eecs.umich.edu> |
X86,TLB: Make sure the "delayedResponse" variable is always set.
When an instruction is translated in the x86 TLB, a variable called delayedResponse is passed back and forth which tracks whether a translation could be completed immediately, or if there's going to be callback that will finish things up. If a read was to the internal memory space, memory mapped registers used to implement things like MSRs, the function hadn't yet gotten to where delayedResponse was set to false, it's default. That meant that the value was never set, and the TLB could start waiting for a callback that would never come. This change simply moves the assignment to above where control can divert to translateInt().
|
#
8534:09745e0c3dd9 |
|
02-Sep-2011 |
Lisa Hsu <Lisa.Hsu@amd.com> |
TLB: comments and a helpful warning.
Nothing big here, but when you have an address that is not in the page table request to be allocated, if it falls outside of the maximum stack range all you get is a page fault and you don't know why. Add a little warn() to explain it a bit. Also add some comments and alter logic a little so that you don't totally ignore the return value of checkAndAllocNextPage().
|
#
8232:b28d06a175be |
|
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
trace: reimplement the DTRACE function so it doesn't use a vector At the same time, rename the trace flags to debug flags since they have broader usage than simply tracing. This means that --trace-flags is now --debug-flags and --trace-help is now --debug-help
|
#
8229:78bf55f23338 |
|
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
includes: sort all includes
|
#
8105:906864dd0937 |
|
02-Mar-2011 |
Gabe Black <gblack@eecs.umich.edu> |
Spelling: Fix the a spelling error by changing mmaped to mmapped.
There may not be a formally correct spelling for the past tense of mmap, but mmapped is the spelling Google doesn't try to autocorrect. This makes sense because it mirrors the past tense of map->mapped and not the past tense of cape->caped.
|
#
8098:59a19310ca65 |
|
27-Feb-2011 |
Gabe Black <gblack@eecs.umich.edu> |
X86: If PCI config space is disabled, pass through to regular IO addresses.
|
#
7933:e00ef55a2c49 |
|
07-Feb-2011 |
Tim Harris <tharris@microsoft.com> |
X86: Obey the wp bit of CR0.
If cr0.wp ("write protect" bit) is clear then do not generate page faults when writing to write-protected pages in kernel mode.
|
#
7912:a9f05ab40763 |
|
07-Feb-2011 |
Joel Hestness <hestness@cs.utexas.edu> |
x86: Timing support for pagetable walker
Move page table walker state to its own object type, and make the walker instantiate state for each outstanding walk. By storing the states in a queue, the walker is able to handle multiple outstanding timing requests. Note that functional walks use separate state elements.
|
#
7811:a8fc35183c10 |
|
03-Jan-2011 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Make commenting on close namespace brackets consistent.
Ran all the source files through 'perl -pi' with this script:
s|\s*(};?\s*)?/\*\s*(end\s*)?namespace\s*(\S+)\s*\*/(\s*})?|} // namespace $3|; s|\s*};?\s*//\s*(end\s*)?namespace\s*(\S+)\s*|} // namespace $2\n|; s|\s*};?\s*//\s*(\S+)\s*namespace\s*|} // namespace $1\n|;
Also did a little manual editing on some of the arch/*/isa_traits.hh files and src/SConscript.
|
#
7775:8e8fa2f28f2e |
|
23-Nov-2010 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Obey the PCD (cache disable) bit in the page tables.
|
#
7774:6246338ac1e9 |
|
22-Nov-2010 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Mark IO space accesses as uncachable.
|
#
7720:65d338a8dba4 |
|
31-Oct-2010 |
Gabe Black <gblack@eecs.umich.edu> |
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.
This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way.
PC type:
Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM.
These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later.
Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses.
Advancing the PC:
The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements.
One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now.
Variable length instructions:
To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back.
ISA parser:
To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out.
Return address stack:
The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works.
Change in stats:
There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS.
TODO:
Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
|
#
7629:0f0c231e3e97 |
|
23-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Create a directory for files that define register indexes.
This is to help tidy up arch/x86. These files should not be used external to the ISA.
|
#
7625:b1e69203bae9 |
|
23-Aug-2010 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make the TLB fault instead of panic when something is unmapped in SE mode.
The fault object, if invoked, would then panic. This is a bit less direct, but it means speculative execution won't panic the simulator.
|
#
7087:fb8d5786ff30 |
|
24-May-2010 |
Nathan Binkert <nate@binkert.org> |
copyright: Change HP copyright on x86 code to be more friendly
|
#
6738:44010fc924d4 |
|
09-Nov-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Don't panic on faults on prefetches in SE mode.
|
#
6737:b3ab661715ac |
|
09-Nov-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Explain what really didn't work with unmapped addresses in SE mode.
|
#
6428:9e35cdc95e81 |
|
02-Aug-2009 |
Steve Reinhardt <steve.reinhardt@amd.com> |
Clean up some inconsistencies with Request flags.
|
#
6315:c7295a4826d5 |
|
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Eliminate the ISA defined floating point register file.
|
#
6313:95f69a436c82 |
|
09-Jul-2009 |
Gabe Black <gblack@eecs.umich.edu> |
Registers: Add an ISA object which replaces the MiscRegFile. This object encapsulates (or will eventually) the identity and characteristics of the ISA in the CPU.
|
#
6141:5babc3f3d8c8 |
|
26-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Split out the internal memory space from the regular translate() and precompute mode.
|
#
6132:916f10213bea |
|
23-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Put the StoreCheck flag with the others, and don't collide with other flags.
|
#
6099:74e5e063a03d |
|
19-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Fix how the TLB handles the storecheck flag.
|
#
6059:d78df8ebc225 |
|
19-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Some segment selectors can be used when "NULL".
|
#
6023:47b4fcb10c11 |
|
09-Apr-2009 |
Nathan Binkert <nate@binkert.org> |
tlb: More fixing of unified TLB
|
#
6022:410194bb3049 |
|
09-Apr-2009 |
Gabe Black <gblack@eecs.umich.edu> |
tlb: Don't separate the TLB classes into an instruction TLB and a data TLB
|
#
5980:0ea37baabfb0 |
|
27-Feb-2009 |
Nathan Binkert <nate@binkert.org> |
quell gcc 4.3 warning
|
#
5965:71f8d7c12619 |
|
27-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Fix segment limit checks.
|
#
5917:7d7df4ad7486 |
|
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Actually check page protections.
|
#
5912:d113f6def227 |
|
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Add a flag to force memory accesses to happen at CPL 0.
|
#
5895:569e3b31a868 |
|
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make the X86 TLB take advantage of delayed translations, and get rid of the fake TLB miss faults.
|
#
5894:8091ac99341a |
|
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
CPU: Implement translateTiming which defers to translateAtomic, and convert the timing simple CPU to use it.
|
#
5891:73084c6bb183 |
|
25-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
ISA: Replace the translate functions in the TLBs with translateAtomic.
|
#
5881:73c0aaaaf186 |
|
23-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Pass whether an access was a read/write/fetch so faults can behave accordingly.
|
#
5837:831413564d0c |
|
01-Feb-2009 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Compute PCI config addresses correctly.
|
#
5736:426510e758ad |
|
10-Nov-2008 |
Nathan Binkert <nate@binkert.org> |
mem: update stuff for changes to Packet and Request
|
#
5714:76abee886def |
|
02-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
Add in Context IDs to the simulator. From now on, cpuId is almost never used, the primary identifier for a hardware context should be contextId(). The concept of threads within a CPU remains, in the form of threadId() because sometimes you need to know which context within a cpu to manipulate.
|
#
5712:199d31b47f7b |
|
02-Nov-2008 |
Lisa Hsu <hsul@eecs.umich.edu> |
make BaseCPU the provider of _cpuId, and cpuId() instead of being scattered across the subclasses. generally make it so that member data is _cpuId and accessor functions are cpuId(). The ID val comes from the python (default -1 if none provided), and if it is -1, the index of cpuList will be given. this has passed util/regress quick and se.py -n4 and fs.py -n4 as well as standard switch.
|
#
5648:e8abda6e0980 |
|
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make the local APIC accessible through the memory system directly, and make the timer work.
|
#
5647:b06b49498c79 |
|
12-Oct-2008 |
Gabe Black <gblack@eecs.umich.edu> |
Turn Interrupts objects into SimObjects. Also, move local APIC state into x86's Interrupts object.
|
#
5440:51d24253bcd9 |
|
12-Jun-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Rename the divide count register to divide configuration.
|
#
5433:1b0b8e9ba6a9 |
|
12-Jun-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Change how segment loading is performed.
|
#
5431:914851b44a74 |
|
12-Jun-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: In non 64bit mode, throw a fault when a NULL segment is accessed.
|
#
5419:a06807c228c1 |
|
12-Jun-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Have all 8 machine check registers since the kernel assumes they're there.
|
#
5418:501cb81c89df |
|
12-Jun-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Bypass unaligned access support for register addressed MSRs.
|
#
5417:84755f1f32d3 |
|
12-Jun-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Remove enforcement of APIC register access alignment. Panic if more than one register is accessed at a time.
|
#
5374:4773d53f88a0 |
|
01-Mar-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Don't map the local APIC into the physical address space in SE mode.
|
#
5360:02a3af203516 |
|
26-Feb-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Put in initial implementation of the local APIC.
|
#
5359:8c6ff200e4c1 |
|
26-Feb-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement the INVLPG instruction and the TIA microop.
|
#
5358:e9acb84bbafb |
|
26-Feb-2008 |
Gabe Black <gblack@eecs.umich.edu> |
TLB: Make a TLB base class and put a virtual demapPage function in it.
|
#
5357:eecb5fd0be62 |
|
26-Feb-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Get PCI config space to work, and adjust address space prefix numbering scheme.
|
#
5323:75f7e6366a41 |
|
12-Jan-2008 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Make the IO ports work using extra physical address lines. Add a serial port.
|
#
5294:7222bdaed33b |
|
02-Dec-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Reorganize segmentation and implement segment selector movs.
|
#
5245:d94bb8af9f76 |
|
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Separate out the page table walker into it's own cc and hh.
|
#
5243:4228b7b5704b |
|
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Fix a stupid typo where WRMSR and RDMSR were switched, and add a debug statement.
|
#
5242:280a99136427 |
|
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement tlb invalidation and make it happen some of the times it should.
|
#
5237:6c819dbe8045 |
|
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Work on the page table walker, TLB, and related faults.
|
#
5236:0050ad4fb3ef |
|
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement a page table walker.
|
#
5232:d3801ea2792e |
|
12-Nov-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Various fixes to indexing segmentation related registers
|
#
5184:8782de2949e5 |
|
25-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
TLB: Fix serialization issues with the tlb entries and make the page table store the process, not the system.
|
#
5149:356e00996637 |
|
12-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Implement MSR reads and writes and the wrsmr and rdmsr instructions. There are no priviledge checks, so these instructions will all work in all modes.
|
#
5140:2fd7f8477b4c |
|
07-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Work on the x86 tlb.
|
#
5126:d3cdea5e0fb3 |
|
03-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Merge with head.
|
#
5124:3d8c50376609 |
|
03-Oct-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Start implementing the x86 tlb which will handle segmentation permission and limit checks and paging.
|
#
5100:7a0180040755 |
|
28-Sep-2007 |
Ali Saidi <saidi@eecs.umich.edu> |
Rename cycles() function to ticks()
|
#
5086:e7913ffb379d |
|
24-Sep-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Get X86_FS to compile.
|
#
5038:c996bb7f1a6d |
|
31-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
X86: Get x86 to compile again after the simobject constructor change.
|
#
5004:7d94cedab264 |
|
26-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Address translation: Make the page table more flexible. The page table now stores actual page table entries. It is still a templated class here, but this will be corrected in the near future.
|
#
4997:e7380529bd2d |
|
26-Aug-2007 |
Gabe Black <gblack@eecs.umich.edu> |
Address Translation: Make SE mode use an actual TLB/MMU for translation like FS.
|