Cross Reference: /gem5/src/arch/x86/tlb.cc

History log of /gem5/src/arch/x86/tlb.cc
Revision	Date	Author	Comments
# 13937:a47ac7052832	30-Apr-2019	Gabor Dozsa <gabor.dozsa@arm.com>	x86: Mark translation as delayed in case of a hw page table walk This information is used by the LSQ in the O3 cpu (since commit "51becd2... cpu-o3: O3 LSQ Generalisation") Change-Id: I35fe7e2f8428641d863af0e79e28b0b259fb0b00 Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18508 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com>
# 13784:1941dc118243	07-Mar-2019	Gabe Black <gabeblack@google.com>	arch, cpu, dev, gpu, mem, sim, python: start using getPort. Replace the getMasterPort, getSlavePort, and getEthPort functions with getPort, and remove extraneous mechanisms that are no longer necessary. Change-Id: Iab7e3c02d2f3a0cf33e7e824e18c28646b5bc318 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17040 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
# 13741:d994984b842a	22-Feb-2019	Andrea Mondelli <Andrea.Mondelli@ucf.edu>	mem-cache: alias to mem::getMasterPort in TLB class TLB:getMasterPort is used to obtain the PageWalkMasterPort if present and hides the BaseTLB::getMasterPort(). The TLB::getMasterPort() is renamed according to the expected behavior. Change-Id: If4f61189094a706d59805cd10f4f814e5830eda8 Reviewed-on: https://gem5-review.googlesource.com/c/16648 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
# 13695:cce2b2b4466b	19-Feb-2019	Bagus Hanindhito <hanindhito@bagus.my.id>	x86: Call the base class's regStats in X86ISA::TLB When I try to build x86 architecture and run the se.py sample script with helloworld example, there is a panic warning stated "Not all stats have been initialized. You may need to add <ParentClass>::regStats() to a new SimObject's regStats() function." I see that in x86 tlb.cc, there is no initialization in regStats() function that causes memory allocation error in some machine which make gem5 exit abnormally. I add the BaseTLB::regStats(); on TLB::regStats() method and can solve the problem Change-Id: I8b62bebc15f896c3136ff4f8253dabbf998f618f Reviewed-on: https://gem5-review.googlesource.com/c/16522 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
# 13613:a19963be12ca	20-Nov-2018	Gabe Black <gabeblack@google.com>	x86: Stop using/defining some ISA specific register types. These have been replaced with the generic RegVal type. Change-Id: I75c1134212067dea43aa0903d813633e06f3d6c6 Reviewed-on: https://gem5-review.googlesource.com/c/14476 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
# 12749:223c83ed9979	04-Jun-2018	Giacomo Travaglini <giacomo.travaglini@arm.com>	misc: Using smart pointers for memory Requests This patch is changing the underlying type for RequestPtr from Request* to shared_ptr<Request>. Having memory requests being managed by smart pointers will simplify the code; it will also prevent memory leakage and dangling pointers. Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10996 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
# 12461:a4cb506cda74	09-Jan-2018	Gabe Black <gabeblack@google.com>	tarch, mem: Abstract the data stored in the SE page tables. Rather than store the actual TLB entry that corresponds to a mapping, we can just store some abstracted information (address, a few flags) and then let the caller turn that into the appropriate entry. There could potentially be some small amount of overhead from creating entries vs. storing them and just installing them, but it's likely pretty minimal since that only happens on a TLB miss (ideally rare), and, if it is problematic, there could be some preallocated TLB entries which are just minimally filled in as necessary. This has the nice effect of finally making the page tables ISA agnostic. Change-Id: I11e630f60682f0a0029b0683eb8ff0135fbd4317 Reviewed-on: https://gem5-review.googlesource.com/7350 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
# 12455:c88f0b37f433	05-Jan-2018	Gabe Black <gabeblack@google.com>	arch, mem: Make the page table lookup function return a pointer. This avoids having a copy in the lookup function itself, and the declaration of a lot of temporary TLB entry pointers in callers. The gpu TLB seems to have had the most dependence on the original signature of the lookup function, partially because it was relying on a somewhat unsafe copy to a TLB entry using a base class pointer type. Change-Id: I8b1cf494468163deee000002d243541657faf57f Reviewed-on: https://gem5-review.googlesource.com/7343 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
# 12406:86bde4a026b5	22-Dec-2017	Gabe Black <gabeblack@google.com>	arch,cpu: "virtualize" the TLB interface. CPUs have historically instantiated the architecture specific version of the TLBs to avoid a virtual function call, making them a little bit more dependent on what the current ISA is. Some simple performance measurement, the x86 twolf regression on the atomic CPU, shows that there isn't actually any performance benefit, and if anything the simulator goes slightly faster (although still within margin of error) when the TLB functions are virtual. This change switches everything outside of the architectures themselves to use the generic BaseTLB type, and then inside the ISA for them to cast that to their architecture specific type to call into architecture specific interfaces. The ARM TLB needed the most adjustment since it was using non-standard translation function signatures. Specifically, they all took an extra "type" parameter which defaulted to normal, and translateTiming returned a Fault. translateTiming actually doesn't need to return a Fault because everywhere that consumed it just stored it into a structure which it then deleted(?), and the fault is stored in the Translation object when the translation is done. A little more work is needed to fully obviate the arch/tlb.hh header, so the TheISA::TLB type is still visible outside of the ISAs. Specifically, the TlbEntry type is used in the generic PageTable which lives in src/mem. Change-Id: I51b68ee74411f9af778317eff222f9349d2ed575 Reviewed-on: https://gem5-review.googlesource.com/6921 Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
# 12140:fab402159cdf	13-Jun-2017	Swapnil Haria <swapnilster@gmail.com>	x86: Add stats to X86 TLB Change-Id: Iebf7d245de66eebc8d4c59e62e52adf6cf51e1e4 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3980 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
# 11874:663bac0bb1c9	23-Feb-2017	Brandon Potter <brandon.potter@amd.com>	x86: remove redundant condition check in tlb code
# 11800:54436a1784dc	09-Nov-2016	Brandon Potter <brandon.potter@amd.com>	style: [patch 3/22] reduce include dependencies in some headers Used cppclean to help identify useless includes and removed them. This involved erroneously included headers, but also cases where forward declarations could have been used rather than a full include.
# 11793:ef606668d247	09-Nov-2016	Brandon Potter <brandon.potter@amd.com>	style: [patch 1/22] use /r/3648/ to reorganize includes
# 11628:85011e8eaad9	13-Sep-2016	Michael LeBeane <michael.lebeane@amd.com>	x86: Force strict ordering for memory mapped m5ops Normal MMAPPED_IPR requests are allowed to execute speculatively under the assumption that they have no side effects. The special case of m5ops that are treated like MMAPPED_IPR should not be allowed to execute speculatively, since they can have side-effects. Adding the STRICT_ORDER flag to these requests blocks execution until the associated instruction hits the ROB head.
# 11608:6319a1125f1c	14-Aug-2016	Nikos Nikoleris <nikos.nikoleris@arm.com>	cpu, arch: fix the type used for the request flags Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
# 10905:a6ca6831e775	07-Jul-2015	Andreas Sandberg <andreas.sandberg@arm.com>	sim: Refactor the serialization base class Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically: * Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section. * Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects. * Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections). * The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects. * Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code.
# 10824:308771bd2647	05-May-2015	Andreas Sandberg <Andreas.Sandberg@ARM.com>	mem, cpu: Add a separate flag for strictly ordered memory The Request::UNCACHEABLE flag currently has two different functions. The first, and obvious, function is to prevent the memory system from caching data in the request. The second function is to prevent reordering and speculation in CPU models. This changeset gives the order/speculation requirement a separate flag (Request::STRICT_ORDER). This flag prevents CPU models from doing the following optimizations: * Speculation: CPU models are not allowed to issue speculative loads. * Write combining: CPU models and caches are not allowed to merge writes to the same cache line. Note: The memory system may still reorder accesses unless the UNCACHEABLE flag is set. It is therefore expected that the STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent this behavior.
# 10553:c1ad57c53a36	23-Nov-2014	Alexandru Dutu <alexandru.dutu@amd.com>	kvm, x86: Adding support for SE mode execution This patch adds methods in KvmCPU model to handle KVM exits caused by syscall instructions and page faults. These types of exits will be encountered if KvmCPU is run in SE mode.
# 10474:799c8ee4ecba	16-Oct-2014	Andreas Hansson <andreas.hansson@arm.com>	arch: Use shared_ptr for all Faults This patch takes quite a large step in transitioning from the ad-hoc RefCountingPtr to the c++11 shared_ptr by adopting its use for all Faults. There are no changes in behaviour, and the code modifications are mostly just replacing "new" with "make_shared".
# 9911:676d3dcf1cc2	15-Oct-2013	Andreas Sandberg <andreas@sandberg.pp.se>	mem: Use a flag instead of address bit 63 for generic IPRs Using address bit 63 to identify generic IPRs caused problems on SPARC, where IPRs are heavily used. This changeset redefines how generic IPRs are identified. Instead of using bit 63, we now use a separate flag (GENERIC_IPR) a memory request.
# 9898:2935441b0870	29-Sep-2013	Andreas Sandberg <andreas@sandberg.pp.se>	x86: Add support for m5ops through a memory mapped interface In order to support m5ops in virtualized environments, we need to use a memory mapped interface. This changeset adds support for that by reserving 0xFFFF0000-0xFFFFFFFF and mapping those to the generic IPR interface for m5ops. The mapping is done in the X86ISA::TLB::finalizePhysical() which means that it just works for all of the CPU models, including virtualized ones.
# 9818:ebd7d3e04b5f	07-Aug-2013	Nilay Vaish <nilay@cs.wisc.edu>	x86: add tlb checkpointing This patch adds checkpointing support to x86 tlb. It upgrades the cpt_upgrader.py script so that previously created checkpoints can be updated. It moves the checkpoint version to 6.
# 9738:304a37519d11	03-Jun-2013	Andreas Sandberg <andreas@sandberg.pp.se>	arch: Create a method to finalize physical addresses in the TLB Some architectures (currently only x86) require some fixing-up of physical addresses after a normal address translation. This is usually to remap devices such as the APIC, but could be used for other memory mapped devices as well. When running the CPU in a using hardware virtualization, we still need to do these address fix-ups before inserting the request into the memory system. This patch moves this patch allows that code to be used by such CPUs without doing full address translations.
# 9423:43caa4ca5979	07-Jan-2013	Andreas Sandberg <Andreas.Sandberg@arm.com>	arch: Add support for invalidating TLBs when draining This patch adds support for the memInvalidate() drain method. TLB flushing is requested by calling the virtual flushAll() method on the TLB. Note: This patch renames invalidateAll() to flushAll() on x86 and SPARC to make the interface consistent across all supported architectures.
# 9294:8fb03b13de02	15-Oct-2012	Andreas Hansson <andreas.hansson@arm.com>	Port: Add protocol-agnostic ports in the port hierarchy This patch adds an additional level of ports in the inheritance hierarchy, separating out the protocol-specific and protocl-agnostic parts. All the functionality related to the binding of ports is now confined to use BaseMaster/BaseSlavePorts, and all the protocol-specific parts stay in the Master/SlavePort. In the future it will be possible to add other protocol-specific implementations. The functions used in the binding of ports, i.e. getMaster/SlavePort now use the base classes, and the index parameter is updated to use the PortID typedef with the symbolic InvalidPortID as the default.
# 9064:d43eb1203aec	07-Jun-2012	Nilay Vaish <nilay@cs.wisc.edu>	X86 TLB: Add a missing = sign
# 9062:21f92aa46e8f	07-Jun-2012	Jayneel Gandhi <jayneel@cs.wisc.edu>	X86 TLB: Fix for gcc 4.4.3 Due to recent changes to X86 TLB, gem5 stopped compiling on gcc version 4.4.3. This patch provides the fix for that problem. The patch is tested on gcc 4.4.3. The change is not required for more recent versions of gcc (like on 4.6.3).
# 9028:f92783bcfd25	29-May-2012	Gabe Black <gblack@eecs.umich.edu>	X86: Use the HandyM5Reg to avoid a register read and some logic in the TLB.
# 9025:545591665fc7	27-May-2012	Gabe Black <gblack@eecs.umich.edu>	X86: Truncate addresses to 32 bits except in 64 bit mode, not long mode. A small change was added a while ago to keep addresses from overflowing 32 bits when larger addresses shouldn't be accessible to software. That change truncated when not in long mode, but really it should have truncated when not in 64 bit mode. The difference is whether compatibility mode is included, a mode that's supposed to act like a legacy 32 bit mode.
# 8962:397cbf4b11a6	24-Apr-2012	Gabe Black <gblack@eecs.umich.edu>	X86: Clear out duplicate TLB entries when adding a new one. It's possible for two page table walks to overlap which will go in the same place in the TLB's trie. They would land on top of each other, so this change adds some code which detects if an address already matches an entry and if so throws away the new one.
# 8953:488d45aeb672	15-Apr-2012	Gabe Black <gblack@eecs.umich.edu>	X86: Use the AddrTrie class to implement the TLB. This change also adjusts the TlbEntry class so that it stores the number of address bits wide a page is rather than its size in bytes. In other words, instead of storing 4K for a 4K page, it stores 12. 12 is easy to turn into 4K, but it's a little harder going the other way.
# 8925:97f06a79b6f5	31-Mar-2012	Gabe Black <gblack@eecs.umich.edu>	X86: Fix address size handling so real mode works properly. Virtual (pre-segmentation) addresses are truncated based on address size, and any non-64 bit linear address is truncated to 32 bits. This means that real mode addresses aren't truncated down to 16 bits after their segment bases are added in.
# 8922:17f037ad8918	30-Mar-2012	William Wang <william.wang@arm.com>	MEM: Introduce the master/slave port sub-classes in C++ This patch introduces the notion of a master and slave port in the C++ code, thus bringing the previous classification from the Python classes into the corresponding simulation objects and memory objects. The patch enables us to classify behaviours into the two bins and add assumptions and enfore compliance, also simplifying the two interfaces. As a starting point, isSnooping is confined to a master port, and getAddrRanges to slave ports. More of these specilisations are to come in later patches. The getPort function is not getMasterPort and getSlavePort, and returns a port reference rather than a pointer as NULL would never be a valid return value. The default implementation of these two functions is placed in MemObject, and calls fatal. The one drawback with this specific patch is that it requires some code duplication, e.g. QueuedPort becomes QueuedMasterPort and QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort (avoiding multiple inheritance). With the later introduction of the port interfaces, moving the functionality outside the port itself, a lot of the duplicated code will disappear again.
# 8902:75b524b64c28	19-Mar-2012	Andreas Hansson <andreas.hansson@arm.com>	gcc: Clean-up of non-C++0x compliant code, first steps This patch cleans up a number of minor issues aiming to get closer to compliance with the C++0x standard as interpreted by gcc and clang (compile with std=c++0x and -pedantic-errors). In particular, the patch cleans up enums where the last item was succeded by a comma, namespaces closed by a curcly brace followed by a semi-colon, and the use of the GNU-extension typeof (replaced by templated functions). It does not address variable-length arrays, zero-size arrays, anonymous structs, range expressions in switch statements, and the use of long long. The generated CPU code also has a large number of issues that remain to be fixed, mainly related to overflows in implicit constant conversion (due to shifts).
# 8888:befcf4d79fc1	09-Mar-2012	Geoffrey Blake <geoffrey.blake@arm.com>	CheckerCPU: Add function stubs to non-ARM ISA source to compile with CheckerCPU Making the CheckerCPU a runtime time option requires the code to be compatible with ISAs other than ARM. This patch adds the appropriate function stubs to allow compilation.
# 8864:fe907afe14a3	01-Mar-2012	Nilay Vaish <nilay@cs.wisc.edu>	x86: Fix x86 TLB and Walker This patch adds a function to X86 tlb that returns the walker port. This port is required for correctly connecting the walker ports for the cpu just switched in
# 8797:3202eb01e01e	07-Jan-2012	Gabe Black <gblack@eecs.umich.edu>	Another merge with the main repository.
# 8768:314eb1e2fa94	30-Oct-2011	Gabe Black <gblack@eecs.umich.edu>	X86: Get rid of more uses of FULL_SYSTEM.
# 8767:e575781f71b8	30-Oct-2011	Gabe Black <gblack@eecs.umich.edu>	SE/FS: Make getProcessPtr available in both modes, and get rid of FULL_SYSTEMs.
# 8752:28e899b7dee3	13-Oct-2011	Gabe Black <gblack@eecs.umich.edu>	X86: Turn on the page table walker in SE mode.
# 8646:ef6cbf0f14dc	05-Jan-2012	Nilay Vaish <nilay@cs.wisc.edu>	X86 TLB: Move a DPRINTF to its correct place The DPRINTF for doing protection checks appears after the checks have been carried out. It is possible that the function returns while the checks are being carried, in which case the printf is missed out. This patch moves the DPRINTF before the checks.
# 8582:dd79a696b91c	23-Sep-2011	Gabe Black <gblack@eecs.umich.edu>	X86: Move the MSR lookup table out of the TLB and into its own file. Translating MSR addresses into MSR register indices took a lot of space in the TLB source and made looking around in that file awkward. This change moves the lookup into its own file to get it out of the way. It also changes it from a switch statement to a hash map which should hopefully be a little more efficient.
# 8539:7d3ea3c65c66	09-Sep-2011	Gabe Black <gblack@eecs.umich.edu>	Stack: Tidy up some comments, a warning, and make stack extension consistent. Do some minor cleanup of some recently added comments, a warning, and change other instances of stack extension to be like what's now being done for x86.
# 8535:d04ae08781e2	05-Sep-2011	Gabe Black <gblack@eecs.umich.edu>	X86,TLB: Make sure the "delayedResponse" variable is always set. When an instruction is translated in the x86 TLB, a variable called delayedResponse is passed back and forth which tracks whether a translation could be completed immediately, or if there's going to be callback that will finish things up. If a read was to the internal memory space, memory mapped registers used to implement things like MSRs, the function hadn't yet gotten to where delayedResponse was set to false, it's default. That meant that the value was never set, and the TLB could start waiting for a callback that would never come. This change simply moves the assignment to above where control can divert to translateInt().
# 8534:09745e0c3dd9	02-Sep-2011	Lisa Hsu <Lisa.Hsu@amd.com>	TLB: comments and a helpful warning. Nothing big here, but when you have an address that is not in the page table request to be allocated, if it falls outside of the maximum stack range all you get is a page fault and you don't know why. Add a little warn() to explain it a bit. Also add some comments and alter logic a little so that you don't totally ignore the return value of checkAndAllocNextPage().
# 8232:b28d06a175be	15-Apr-2011	Nathan Binkert <nate@binkert.org>	trace: reimplement the DTRACE function so it doesn't use a vector At the same time, rename the trace flags to debug flags since they have broader usage than simply tracing. This means that --trace-flags is now --debug-flags and --trace-help is now --debug-help
# 8229:78bf55f23338	15-Apr-2011	Nathan Binkert <nate@binkert.org>	includes: sort all includes
# 8105:906864dd0937	02-Mar-2011	Gabe Black <gblack@eecs.umich.edu>	Spelling: Fix the a spelling error by changing mmaped to mmapped. There may not be a formally correct spelling for the past tense of mmap, but mmapped is the spelling Google doesn't try to autocorrect. This makes sense because it mirrors the past tense of map->mapped and not the past tense of cape->caped.
# 8098:59a19310ca65	27-Feb-2011	Gabe Black <gblack@eecs.umich.edu>	X86: If PCI config space is disabled, pass through to regular IO addresses.
# 7933:e00ef55a2c49	07-Feb-2011	Tim Harris <tharris@microsoft.com>	X86: Obey the wp bit of CR0. If cr0.wp ("write protect" bit) is clear then do not generate page faults when writing to write-protected pages in kernel mode.
# 7912:a9f05ab40763	07-Feb-2011	Joel Hestness <hestness@cs.utexas.edu>	x86: Timing support for pagetable walker Move page table walker state to its own object type, and make the walker instantiate state for each outstanding walk. By storing the states in a queue, the walker is able to handle multiple outstanding timing requests. Note that functional walks use separate state elements.
# 7811:a8fc35183c10	03-Jan-2011	Steve Reinhardt <steve.reinhardt@amd.com>	Make commenting on close namespace brackets consistent. Ran all the source files through 'perl -pi' with this script: s\|\s(};?\s)?/\\s(end\s)?namespace\s(\S+)\s\/(\s})?\|} // namespace $3\|; s\|\s};?\s//\s(end\s)?namespace\s(\S+)\s\|} // namespace $2\n\|; s\|\s};?\s//\s(\S+)\snamespace\s\|} // namespace $1\n\|; Also did a little manual editing on some of the arch/*/isa_traits.hh files and src/SConscript.
# 7775:8e8fa2f28f2e	23-Nov-2010	Gabe Black <gblack@eecs.umich.edu>	X86: Obey the PCD (cache disable) bit in the page tables.
# 7774:6246338ac1e9	22-Nov-2010	Gabe Black <gblack@eecs.umich.edu>	X86: Mark IO space accesses as uncachable.
# 7720:65d338a8dba4	31-Oct-2010	Gabe Black <gblack@eecs.umich.edu>	ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
# 7629:0f0c231e3e97	23-Aug-2010	Gabe Black <gblack@eecs.umich.edu>	X86: Create a directory for files that define register indexes. This is to help tidy up arch/x86. These files should not be used external to the ISA.
# 7625:b1e69203bae9	23-Aug-2010	Gabe Black <gblack@eecs.umich.edu>	X86: Make the TLB fault instead of panic when something is unmapped in SE mode. The fault object, if invoked, would then panic. This is a bit less direct, but it means speculative execution won't panic the simulator.
# 7087:fb8d5786ff30	24-May-2010	Nathan Binkert <nate@binkert.org>	copyright: Change HP copyright on x86 code to be more friendly
# 6738:44010fc924d4	09-Nov-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Don't panic on faults on prefetches in SE mode.
# 6737:b3ab661715ac	09-Nov-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Explain what really didn't work with unmapped addresses in SE mode.
# 6428:9e35cdc95e81	02-Aug-2009	Steve Reinhardt <steve.reinhardt@amd.com>	Clean up some inconsistencies with Request flags.
# 6315:c7295a4826d5	09-Jul-2009	Gabe Black <gblack@eecs.umich.edu>	Registers: Eliminate the ISA defined floating point register file.
# 6313:95f69a436c82	09-Jul-2009	Gabe Black <gblack@eecs.umich.edu>	Registers: Add an ISA object which replaces the MiscRegFile. This object encapsulates (or will eventually) the identity and characteristics of the ISA in the CPU.
# 6141:5babc3f3d8c8	26-Apr-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Split out the internal memory space from the regular translate() and precompute mode.
# 6132:916f10213bea	23-Apr-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Put the StoreCheck flag with the others, and don't collide with other flags.
# 6099:74e5e063a03d	19-Apr-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Fix how the TLB handles the storecheck flag.
# 6059:d78df8ebc225	19-Apr-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Some segment selectors can be used when "NULL".
# 6023:47b4fcb10c11	09-Apr-2009	Nathan Binkert <nate@binkert.org>	tlb: More fixing of unified TLB
# 6022:410194bb3049	09-Apr-2009	Gabe Black <gblack@eecs.umich.edu>	tlb: Don't separate the TLB classes into an instruction TLB and a data TLB
# 5980:0ea37baabfb0	27-Feb-2009	Nathan Binkert <nate@binkert.org>	quell gcc 4.3 warning
# 5965:71f8d7c12619	27-Feb-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Fix segment limit checks.
# 5917:7d7df4ad7486	25-Feb-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Actually check page protections.
# 5912:d113f6def227	25-Feb-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Add a flag to force memory accesses to happen at CPL 0.
# 5895:569e3b31a868	25-Feb-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Make the X86 TLB take advantage of delayed translations, and get rid of the fake TLB miss faults.
# 5894:8091ac99341a	25-Feb-2009	Gabe Black <gblack@eecs.umich.edu>	CPU: Implement translateTiming which defers to translateAtomic, and convert the timing simple CPU to use it.
# 5891:73084c6bb183	25-Feb-2009	Gabe Black <gblack@eecs.umich.edu>	ISA: Replace the translate functions in the TLBs with translateAtomic.
# 5881:73c0aaaaf186	23-Feb-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Pass whether an access was a read/write/fetch so faults can behave accordingly.
# 5837:831413564d0c	01-Feb-2009	Gabe Black <gblack@eecs.umich.edu>	X86: Compute PCI config addresses correctly.
# 5736:426510e758ad	10-Nov-2008	Nathan Binkert <nate@binkert.org>	mem: update stuff for changes to Packet and Request
# 5714:76abee886def	02-Nov-2008	Lisa Hsu <hsul@eecs.umich.edu>	Add in Context IDs to the simulator. From now on, cpuId is almost never used, the primary identifier for a hardware context should be contextId(). The concept of threads within a CPU remains, in the form of threadId() because sometimes you need to know which context within a cpu to manipulate.
# 5712:199d31b47f7b	02-Nov-2008	Lisa Hsu <hsul@eecs.umich.edu>	make BaseCPU the provider of _cpuId, and cpuId() instead of being scattered across the subclasses. generally make it so that member data is _cpuId and accessor functions are cpuId(). The ID val comes from the python (default -1 if none provided), and if it is -1, the index of cpuList will be given. this has passed util/regress quick and se.py -n4 and fs.py -n4 as well as standard switch.
# 5648:e8abda6e0980	12-Oct-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Make the local APIC accessible through the memory system directly, and make the timer work.
# 5647:b06b49498c79	12-Oct-2008	Gabe Black <gblack@eecs.umich.edu>	Turn Interrupts objects into SimObjects. Also, move local APIC state into x86's Interrupts object.
# 5440:51d24253bcd9	12-Jun-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Rename the divide count register to divide configuration.
# 5433:1b0b8e9ba6a9	12-Jun-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Change how segment loading is performed.
# 5431:914851b44a74	12-Jun-2008	Gabe Black <gblack@eecs.umich.edu>	X86: In non 64bit mode, throw a fault when a NULL segment is accessed.
# 5419:a06807c228c1	12-Jun-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Have all 8 machine check registers since the kernel assumes they're there.
# 5418:501cb81c89df	12-Jun-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Bypass unaligned access support for register addressed MSRs.
# 5417:84755f1f32d3	12-Jun-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Remove enforcement of APIC register access alignment. Panic if more than one register is accessed at a time.
# 5374:4773d53f88a0	01-Mar-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Don't map the local APIC into the physical address space in SE mode.
# 5360:02a3af203516	26-Feb-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Put in initial implementation of the local APIC.
# 5359:8c6ff200e4c1	26-Feb-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Implement the INVLPG instruction and the TIA microop.
# 5358:e9acb84bbafb	26-Feb-2008	Gabe Black <gblack@eecs.umich.edu>	TLB: Make a TLB base class and put a virtual demapPage function in it.
# 5357:eecb5fd0be62	26-Feb-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Get PCI config space to work, and adjust address space prefix numbering scheme.
# 5323:75f7e6366a41	12-Jan-2008	Gabe Black <gblack@eecs.umich.edu>	X86: Make the IO ports work using extra physical address lines. Add a serial port.
# 5294:7222bdaed33b	02-Dec-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Reorganize segmentation and implement segment selector movs.
# 5245:d94bb8af9f76	12-Nov-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Separate out the page table walker into it's own cc and hh.
# 5243:4228b7b5704b	12-Nov-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Fix a stupid typo where WRMSR and RDMSR were switched, and add a debug statement.
# 5242:280a99136427	12-Nov-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Implement tlb invalidation and make it happen some of the times it should.
# 5237:6c819dbe8045	12-Nov-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Work on the page table walker, TLB, and related faults.
# 5236:0050ad4fb3ef	12-Nov-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Implement a page table walker.
# 5232:d3801ea2792e	12-Nov-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Various fixes to indexing segmentation related registers
# 5184:8782de2949e5	25-Oct-2007	Gabe Black <gblack@eecs.umich.edu>	TLB: Fix serialization issues with the tlb entries and make the page table store the process, not the system.
# 5149:356e00996637	12-Oct-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Implement MSR reads and writes and the wrsmr and rdmsr instructions. There are no priviledge checks, so these instructions will all work in all modes.
# 5140:2fd7f8477b4c	07-Oct-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Work on the x86 tlb.
# 5126:d3cdea5e0fb3	03-Oct-2007	Gabe Black <gblack@eecs.umich.edu>	Merge with head.
# 5124:3d8c50376609	03-Oct-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Start implementing the x86 tlb which will handle segmentation permission and limit checks and paging.
# 5100:7a0180040755	28-Sep-2007	Ali Saidi <saidi@eecs.umich.edu>	Rename cycles() function to ticks()
# 5086:e7913ffb379d	24-Sep-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Get X86_FS to compile.
# 5038:c996bb7f1a6d	31-Aug-2007	Gabe Black <gblack@eecs.umich.edu>	X86: Get x86 to compile again after the simobject constructor change.
# 5004:7d94cedab264	26-Aug-2007	Gabe Black <gblack@eecs.umich.edu>	Address translation: Make the page table more flexible. The page table now stores actual page table entries. It is still a templated class here, but this will be corrected in the near future.
# 4997:e7380529bd2d	26-Aug-2007	Gabe Black <gblack@eecs.umich.edu>	Address Translation: Make SE mode use an actual TLB/MMU for translation like FS.