History log of /gem5/src/arch/x86/tlb.cc
Revision Date Author Comments
# 13937:a47ac7052832 30-Apr-2019 Gabor Dozsa <gabor.dozsa@arm.com>

x86: Mark translation as delayed in case of a hw page table walk

This information is used by the LSQ in the O3 cpu (since commit
"51becd2... cpu-o3: O3 LSQ Generalisation")

Change-Id: I35fe7e2f8428641d863af0e79e28b0b259fb0b00
Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18508
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>


# 13784:1941dc118243 07-Mar-2019 Gabe Black <gabeblack@google.com>

arch, cpu, dev, gpu, mem, sim, python: start using getPort.

Replace the getMasterPort, getSlavePort, and getEthPort functions
with getPort, and remove extraneous mechanisms that are no longer
necessary.

Change-Id: Iab7e3c02d2f3a0cf33e7e824e18c28646b5bc318
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17040
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>


# 13741:d994984b842a 22-Feb-2019 Andrea Mondelli <Andrea.Mondelli@ucf.edu>

mem-cache: alias to mem::getMasterPort in TLB class

TLB:getMasterPort is used to obtain the PageWalkMasterPort if present and
hides the BaseTLB::getMasterPort().

The TLB::getMasterPort() is renamed according to the expected behavior.

Change-Id: If4f61189094a706d59805cd10f4f814e5830eda8
Reviewed-on: https://gem5-review.googlesource.com/c/16648
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>


# 13695:cce2b2b4466b 19-Feb-2019 Bagus Hanindhito <hanindhito@bagus.my.id>

x86: Call the base class's regStats in X86ISA::TLB

When I try to build x86 architecture and run the se.py sample script
with helloworld example, there is a panic warning stated "Not all stats
have been initialized. You may need to add <ParentClass>::regStats() to
a new SimObject's regStats() function."

I see that in x86 tlb.cc, there is no initialization in regStats() function
that causes memory allocation error in some machine which make gem5 exit
abnormally. I add the BaseTLB::regStats(); on TLB::regStats() method and
can solve the problem

Change-Id: I8b62bebc15f896c3136ff4f8253dabbf998f618f
Reviewed-on: https://gem5-review.googlesource.com/c/16522
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>


# 13613:a19963be12ca 20-Nov-2018 Gabe Black <gabeblack@google.com>

x86: Stop using/defining some ISA specific register types.

These have been replaced with the generic RegVal type.

Change-Id: I75c1134212067dea43aa0903d813633e06f3d6c6
Reviewed-on: https://gem5-review.googlesource.com/c/14476
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>


# 12749:223c83ed9979 04-Jun-2018 Giacomo Travaglini <giacomo.travaglini@arm.com>

misc: Using smart pointers for memory Requests

This patch is changing the underlying type for RequestPtr from Request*
to shared_ptr<Request>. Having memory requests being managed by smart
pointers will simplify the code; it will also prevent memory leakage and
dangling pointers.

Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10996
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>


# 12461:a4cb506cda74 09-Jan-2018 Gabe Black <gabeblack@google.com>

tarch, mem: Abstract the data stored in the SE page tables.

Rather than store the actual TLB entry that corresponds to a mapping,
we can just store some abstracted information (address, a few flags)
and then let the caller turn that into the appropriate entry. There
could potentially be some small amount of overhead from creating
entries vs. storing them and just installing them, but it's likely
pretty minimal since that only happens on a TLB miss (ideally rare),
and, if it is problematic, there could be some preallocated TLB
entries which are just minimally filled in as necessary.

This has the nice effect of finally making the page tables ISA
agnostic.

Change-Id: I11e630f60682f0a0029b0683eb8ff0135fbd4317
Reviewed-on: https://gem5-review.googlesource.com/7350
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>


# 12455:c88f0b37f433 05-Jan-2018 Gabe Black <gabeblack@google.com>

arch, mem: Make the page table lookup function return a pointer.

This avoids having a copy in the lookup function itself, and the
declaration of a lot of temporary TLB entry pointers in callers. The
gpu TLB seems to have had the most dependence on the original signature
of the lookup function, partially because it was relying on a somewhat
unsafe copy to a TLB entry using a base class pointer type.

Change-Id: I8b1cf494468163deee000002d243541657faf57f
Reviewed-on: https://gem5-review.googlesource.com/7343
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>


# 12406:86bde4a026b5 22-Dec-2017 Gabe Black <gabeblack@google.com>

arch,cpu: "virtualize" the TLB interface.

CPUs have historically instantiated the architecture specific version
of the TLBs to avoid a virtual function call, making them a little bit
more dependent on what the current ISA is. Some simple performance
measurement, the x86 twolf regression on the atomic CPU, shows that
there isn't actually any performance benefit, and if anything the
simulator goes slightly faster (although still within margin of error)
when the TLB functions are virtual.

This change switches everything outside of the architectures themselves
to use the generic BaseTLB type, and then inside the ISA for them to
cast that to their architecture specific type to call into architecture
specific interfaces.

The ARM TLB needed the most adjustment since it was using non-standard
translation function signatures. Specifically, they all took an extra
"type" parameter which defaulted to normal, and translateTiming
returned a Fault. translateTiming actually doesn't need to return a
Fault because everywhere that consumed it just stored it into a
structure which it then deleted(?), and the fault is stored in the
Translation object when the translation is done.

A little more work is needed to fully obviate the arch/tlb.hh header,
so the TheISA::TLB type is still visible outside of the ISAs.
Specifically, the TlbEntry type is used in the generic PageTable which
lives in src/mem.

Change-Id: I51b68ee74411f9af778317eff222f9349d2ed575
Reviewed-on: https://gem5-review.googlesource.com/6921
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>


# 12140:fab402159cdf 13-Jun-2017 Swapnil Haria <swapnilster@gmail.com>

x86: Add stats to X86 TLB

Change-Id: Iebf7d245de66eebc8d4c59e62e52adf6cf51e1e4
Signed-off-by: Sean Wilson <spwilson2@wisc.edu>
Reviewed-on: https://gem5-review.googlesource.com/3980
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>


# 11874:663bac0bb1c9 23-Feb-2017 Brandon Potter <brandon.potter@amd.com>

x86: remove redundant condition check in tlb code


# 11800:54436a1784dc 09-Nov-2016 Brandon Potter <brandon.potter@amd.com>

style: [patch 3/22] reduce include dependencies in some headers

Used cppclean to help identify useless includes and removed them. This
involved erroneously included headers, but also cases where forward
declarations could have been used rather than a full include.


# 11793:ef606668d247 09-Nov-2016 Brandon Potter <brandon.potter@amd.com>

style: [patch 1/22] use /r/3648/ to reorganize includes


# 11628:85011e8eaad9 13-Sep-2016 Michael LeBeane <michael.lebeane@amd.com>

x86: Force strict ordering for memory mapped m5ops
Normal MMAPPED_IPR requests are allowed to execute speculatively under the
assumption that they have no side effects. The special case of m5ops that are
treated like MMAPPED_IPR should not be allowed to execute speculatively, since
they can have side-effects. Adding the STRICT_ORDER flag to these requests
blocks execution until the associated instruction hits the ROB head.


# 11608:6319a1125f1c 14-Aug-2016 Nikos Nikoleris <nikos.nikoleris@arm.com>

cpu, arch: fix the type used for the request flags

Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>


# 10905:a6ca6831e775 07-Jul-2015 Andreas Sandberg <andreas.sandberg@arm.com>

sim: Refactor the serialization base class

Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:

* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.

* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.

* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).

* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.

* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.


# 10824:308771bd2647 05-May-2015 Andreas Sandberg <Andreas.Sandberg@ARM.com>

mem, cpu: Add a separate flag for strictly ordered memory

The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.

This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:

* Speculation: CPU models are not allowed to issue speculative
loads.

* Write combining: CPU models and caches are not allowed to merge
writes to the same cache line.

Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.


# 10553:c1ad57c53a36 23-Nov-2014 Alexandru Dutu <alexandru.dutu@amd.com>

kvm, x86: Adding support for SE mode execution
This patch adds methods in KvmCPU model to handle KVM exits caused by syscall
instructions and page faults. These types of exits will be encountered if
KvmCPU is run in SE mode.


# 10474:799c8ee4ecba 16-Oct-2014 Andreas Hansson <andreas.hansson@arm.com>

arch: Use shared_ptr for all Faults

This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".


# 9911:676d3dcf1cc2 15-Oct-2013 Andreas Sandberg <andreas@sandberg.pp.se>

mem: Use a flag instead of address bit 63 for generic IPRs

Using address bit 63 to identify generic IPRs caused problems on
SPARC, where IPRs are heavily used. This changeset redefines how
generic IPRs are identified. Instead of using bit 63, we now use a
separate flag (GENERIC_IPR) a memory request.


# 9898:2935441b0870 29-Sep-2013 Andreas Sandberg <andreas@sandberg.pp.se>

x86: Add support for m5ops through a memory mapped interface

In order to support m5ops in virtualized environments, we need to use
a memory mapped interface. This changeset adds support for that by
reserving 0xFFFF0000-0xFFFFFFFF and mapping those to the generic IPR
interface for m5ops. The mapping is done in the
X86ISA::TLB::finalizePhysical() which means that it just works for all
of the CPU models, including virtualized ones.


# 9818:ebd7d3e04b5f 07-Aug-2013 Nilay Vaish <nilay@cs.wisc.edu>

x86: add tlb checkpointing
This patch adds checkpointing support to x86 tlb. It upgrades the
cpt_upgrader.py script so that previously created checkpoints can
be updated. It moves the checkpoint version to 6.


# 9738:304a37519d11 03-Jun-2013 Andreas Sandberg <andreas@sandberg.pp.se>

arch: Create a method to finalize physical addresses
in the TLB

Some architectures (currently only x86) require some fixing-up of
physical addresses after a normal address translation. This is usually
to remap devices such as the APIC, but could be used for other memory
mapped devices as well. When running the CPU in a using hardware
virtualization, we still need to do these address fix-ups before
inserting the request into the memory system. This patch moves this
patch allows that code to be used by such CPUs without doing full
address translations.


# 9423:43caa4ca5979 07-Jan-2013 Andreas Sandberg <Andreas.Sandberg@arm.com>

arch: Add support for invalidating TLBs when draining

This patch adds support for the memInvalidate() drain method. TLB
flushing is requested by calling the virtual flushAll() method on the
TLB.

Note: This patch renames invalidateAll() to flushAll() on x86 and
SPARC to make the interface consistent across all supported
architectures.


# 9294:8fb03b13de02 15-Oct-2012 Andreas Hansson <andreas.hansson@arm.com>

Port: Add protocol-agnostic ports in the port hierarchy

This patch adds an additional level of ports in the inheritance
hierarchy, separating out the protocol-specific and protocl-agnostic
parts. All the functionality related to the binding of ports is now
confined to use BaseMaster/BaseSlavePorts, and all the
protocol-specific parts stay in the Master/SlavePort. In the future it
will be possible to add other protocol-specific implementations.

The functions used in the binding of ports, i.e. getMaster/SlavePort
now use the base classes, and the index parameter is updated to use
the PortID typedef with the symbolic InvalidPortID as the default.


# 9064:d43eb1203aec 07-Jun-2012 Nilay Vaish <nilay@cs.wisc.edu>

X86 TLB: Add a missing = sign


# 9062:21f92aa46e8f 07-Jun-2012 Jayneel Gandhi <jayneel@cs.wisc.edu>

X86 TLB: Fix for gcc 4.4.3
Due to recent changes to X86 TLB, gem5 stopped compiling on
gcc version 4.4.3. This patch provides the fix for that problem. The patch
is tested on gcc 4.4.3. The change is not required for more recent
versions of gcc (like on 4.6.3).


# 9028:f92783bcfd25 29-May-2012 Gabe Black <gblack@eecs.umich.edu>

X86: Use the HandyM5Reg to avoid a register read and some logic in the TLB.


# 9025:545591665fc7 27-May-2012 Gabe Black <gblack@eecs.umich.edu>

X86: Truncate addresses to 32 bits except in 64 bit mode, not long mode.

A small change was added a while ago to keep addresses from overflowing 32
bits when larger addresses shouldn't be accessible to software. That change
truncated when not in long mode, but really it should have truncated when not
in 64 bit mode. The difference is whether compatibility mode is included, a
mode that's supposed to act like a legacy 32 bit mode.


# 8962:397cbf4b11a6 24-Apr-2012 Gabe Black <gblack@eecs.umich.edu>

X86: Clear out duplicate TLB entries when adding a new one.

It's possible for two page table walks to overlap which will go in the same
place in the TLB's trie. They would land on top of each other, so this change
adds some code which detects if an address already matches an entry and if so
throws away the new one.


# 8953:488d45aeb672 15-Apr-2012 Gabe Black <gblack@eecs.umich.edu>

X86: Use the AddrTrie class to implement the TLB.

This change also adjusts the TlbEntry class so that it stores the number of
address bits wide a page is rather than its size in bytes. In other words,
instead of storing 4K for a 4K page, it stores 12. 12 is easy to turn into 4K,
but it's a little harder going the other way.


# 8925:97f06a79b6f5 31-Mar-2012 Gabe Black <gblack@eecs.umich.edu>

X86: Fix address size handling so real mode works properly.

Virtual (pre-segmentation) addresses are truncated based on address size, and
any non-64 bit linear address is truncated to 32 bits. This means that real
mode addresses aren't truncated down to 16 bits after their segment bases are
added in.


# 8922:17f037ad8918 30-Mar-2012 William Wang <william.wang@arm.com>

MEM: Introduce the master/slave port sub-classes in C++

This patch introduces the notion of a master and slave port in the C++
code, thus bringing the previous classification from the Python
classes into the corresponding simulation objects and memory objects.

The patch enables us to classify behaviours into the two bins and add
assumptions and enfore compliance, also simplifying the two
interfaces. As a starting point, isSnooping is confined to a master
port, and getAddrRanges to slave ports. More of these specilisations
are to come in later patches.

The getPort function is not getMasterPort and getSlavePort, and
returns a port reference rather than a pointer as NULL would never be
a valid return value. The default implementation of these two
functions is placed in MemObject, and calls fatal.

The one drawback with this specific patch is that it requires some
code duplication, e.g. QueuedPort becomes QueuedMasterPort and
QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort
(avoiding multiple inheritance). With the later introduction of the
port interfaces, moving the functionality outside the port itself, a
lot of the duplicated code will disappear again.


# 8902:75b524b64c28 19-Mar-2012 Andreas Hansson <andreas.hansson@arm.com>

gcc: Clean-up of non-C++0x compliant code, first steps

This patch cleans up a number of minor issues aiming to get closer to
compliance with the C++0x standard as interpreted by gcc and clang
(compile with std=c++0x and -pedantic-errors). In particular, the
patch cleans up enums where the last item was succeded by a comma,
namespaces closed by a curcly brace followed by a semi-colon, and the
use of the GNU-extension typeof (replaced by templated functions). It
does not address variable-length arrays, zero-size arrays, anonymous
structs, range expressions in switch statements, and the use of long
long. The generated CPU code also has a large number of issues that
remain to be fixed, mainly related to overflows in implicit constant
conversion (due to shifts).


# 8888:befcf4d79fc1 09-Mar-2012 Geoffrey Blake <geoffrey.blake@arm.com>

CheckerCPU: Add function stubs to non-ARM ISA source to compile with CheckerCPU

Making the CheckerCPU a runtime time option requires the code to be compatible
with ISAs other than ARM. This patch adds the appropriate function
stubs to allow compilation.


# 8864:fe907afe14a3 01-Mar-2012 Nilay Vaish <nilay@cs.wisc.edu>

x86: Fix x86 TLB and Walker
This patch adds a function to X86 tlb that returns the
walker port. This port is required for correctly connecting
the walker ports for the cpu just switched in


# 8797:3202eb01e01e 07-Jan-2012 Gabe Black <gblack@eecs.umich.edu>

Another merge with the main repository.


# 8768:314eb1e2fa94 30-Oct-2011 Gabe Black <gblack@eecs.umich.edu>

X86: Get rid of more uses of FULL_SYSTEM.


# 8767:e575781f71b8 30-Oct-2011 Gabe Black <gblack@eecs.umich.edu>

SE/FS: Make getProcessPtr available in both modes, and get rid of FULL_SYSTEMs.


# 8752:28e899b7dee3 13-Oct-2011 Gabe Black <gblack@eecs.umich.edu>

X86: Turn on the page table walker in SE mode.


# 8646:ef6cbf0f14dc 05-Jan-2012 Nilay Vaish <nilay@cs.wisc.edu>

X86 TLB: Move a DPRINTF to its correct place
The DPRINTF for doing protection checks appears after the checks have been
carried out. It is possible that the function returns while the checks are
being carried, in which case the printf is missed out. This patch moves the
DPRINTF before the checks.


# 8582:dd79a696b91c 23-Sep-2011 Gabe Black <gblack@eecs.umich.edu>

X86: Move the MSR lookup table out of the TLB and into its own file.

Translating MSR addresses into MSR register indices took a lot of space in the
TLB source and made looking around in that file awkward. This change moves
the lookup into its own file to get it out of the way. It also changes it from
a switch statement to a hash map which should hopefully be a little more
efficient.


# 8539:7d3ea3c65c66 09-Sep-2011 Gabe Black <gblack@eecs.umich.edu>

Stack: Tidy up some comments, a warning, and make stack extension consistent.

Do some minor cleanup of some recently added comments, a warning, and change
other instances of stack extension to be like what's now being done for x86.


# 8535:d04ae08781e2 05-Sep-2011 Gabe Black <gblack@eecs.umich.edu>

X86,TLB: Make sure the "delayedResponse" variable is always set.

When an instruction is translated in the x86 TLB, a variable called
delayedResponse is passed back and forth which tracks whether a translation
could be completed immediately, or if there's going to be callback that will
finish things up. If a read was to the internal memory space, memory mapped
registers used to implement things like MSRs, the function hadn't yet gotten
to where delayedResponse was set to false, it's default. That meant that the
value was never set, and the TLB could start waiting for a callback that would
never come. This change simply moves the assignment to above where control
can divert to translateInt().


# 8534:09745e0c3dd9 02-Sep-2011 Lisa Hsu <Lisa.Hsu@amd.com>

TLB: comments and a helpful warning.

Nothing big here, but when you have an address that is not in the page table request to be allocated, if it falls outside of the maximum stack range all you get is a page fault and you don't know why. Add a little warn() to explain it a bit. Also add some comments and alter logic a little so that you don't totally ignore the return value of checkAndAllocNextPage().


# 8232:b28d06a175be 15-Apr-2011 Nathan Binkert <nate@binkert.org>

trace: reimplement the DTRACE function so it doesn't use a vector
At the same time, rename the trace flags to debug flags since they
have broader usage than simply tracing. This means that
--trace-flags is now --debug-flags and --trace-help is now --debug-help


# 8229:78bf55f23338 15-Apr-2011 Nathan Binkert <nate@binkert.org>

includes: sort all includes


# 8105:906864dd0937 02-Mar-2011 Gabe Black <gblack@eecs.umich.edu>

Spelling: Fix the a spelling error by changing mmaped to mmapped.

There may not be a formally correct spelling for the past tense of mmap, but
mmapped is the spelling Google doesn't try to autocorrect. This makes sense
because it mirrors the past tense of map->mapped and not the past tense of
cape->caped.


# 8098:59a19310ca65 27-Feb-2011 Gabe Black <gblack@eecs.umich.edu>

X86: If PCI config space is disabled, pass through to regular IO addresses.


# 7933:e00ef55a2c49 07-Feb-2011 Tim Harris <tharris@microsoft.com>

X86: Obey the wp bit of CR0.

If cr0.wp ("write protect" bit) is clear then do not generate page faults when
writing to write-protected pages in kernel mode.


# 7912:a9f05ab40763 07-Feb-2011 Joel Hestness <hestness@cs.utexas.edu>

x86: Timing support for pagetable walker

Move page table walker state to its own object type, and make the
walker instantiate state for each outstanding walk. By storing the
states in a queue, the walker is able to handle multiple outstanding
timing requests. Note that functional walks use separate state
elements.


# 7811:a8fc35183c10 03-Jan-2011 Steve Reinhardt <steve.reinhardt@amd.com>

Make commenting on close namespace brackets consistent.

Ran all the source files through 'perl -pi' with this script:

s|\s*(};?\s*)?/\*\s*(end\s*)?namespace\s*(\S+)\s*\*/(\s*})?|} // namespace $3|;
s|\s*};?\s*//\s*(end\s*)?namespace\s*(\S+)\s*|} // namespace $2\n|;
s|\s*};?\s*//\s*(\S+)\s*namespace\s*|} // namespace $1\n|;

Also did a little manual editing on some of the arch/*/isa_traits.hh files
and src/SConscript.


# 7775:8e8fa2f28f2e 23-Nov-2010 Gabe Black <gblack@eecs.umich.edu>

X86: Obey the PCD (cache disable) bit in the page tables.


# 7774:6246338ac1e9 22-Nov-2010 Gabe Black <gblack@eecs.umich.edu>

X86: Mark IO space accesses as uncachable.


# 7720:65d338a8dba4 31-Oct-2010 Gabe Black <gblack@eecs.umich.edu>

ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.



This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.


# 7629:0f0c231e3e97 23-Aug-2010 Gabe Black <gblack@eecs.umich.edu>

X86: Create a directory for files that define register indexes.

This is to help tidy up arch/x86. These files should not be used external to
the ISA.


# 7625:b1e69203bae9 23-Aug-2010 Gabe Black <gblack@eecs.umich.edu>

X86: Make the TLB fault instead of panic when something is unmapped in SE mode.

The fault object, if invoked, would then panic. This is a bit less direct, but
it means speculative execution won't panic the simulator.


# 7087:fb8d5786ff30 24-May-2010 Nathan Binkert <nate@binkert.org>

copyright: Change HP copyright on x86 code to be more friendly


# 6738:44010fc924d4 09-Nov-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Don't panic on faults on prefetches in SE mode.


# 6737:b3ab661715ac 09-Nov-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Explain what really didn't work with unmapped addresses in SE mode.


# 6428:9e35cdc95e81 02-Aug-2009 Steve Reinhardt <steve.reinhardt@amd.com>

Clean up some inconsistencies with Request flags.


# 6315:c7295a4826d5 09-Jul-2009 Gabe Black <gblack@eecs.umich.edu>

Registers: Eliminate the ISA defined floating point register file.


# 6313:95f69a436c82 09-Jul-2009 Gabe Black <gblack@eecs.umich.edu>

Registers: Add an ISA object which replaces the MiscRegFile.
This object encapsulates (or will eventually) the identity and characteristics
of the ISA in the CPU.


# 6141:5babc3f3d8c8 26-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Split out the internal memory space from the regular translate() and precompute mode.


# 6132:916f10213bea 23-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Put the StoreCheck flag with the others, and don't collide with other flags.


# 6099:74e5e063a03d 19-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Fix how the TLB handles the storecheck flag.


# 6059:d78df8ebc225 19-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Some segment selectors can be used when "NULL".


# 6023:47b4fcb10c11 09-Apr-2009 Nathan Binkert <nate@binkert.org>

tlb: More fixing of unified TLB


# 6022:410194bb3049 09-Apr-2009 Gabe Black <gblack@eecs.umich.edu>

tlb: Don't separate the TLB classes into an instruction TLB and a data TLB


# 5980:0ea37baabfb0 27-Feb-2009 Nathan Binkert <nate@binkert.org>

quell gcc 4.3 warning


# 5965:71f8d7c12619 27-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Fix segment limit checks.


# 5917:7d7df4ad7486 25-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Actually check page protections.


# 5912:d113f6def227 25-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Add a flag to force memory accesses to happen at CPL 0.


# 5895:569e3b31a868 25-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Make the X86 TLB take advantage of delayed translations, and get rid of the fake TLB miss faults.


# 5894:8091ac99341a 25-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

CPU: Implement translateTiming which defers to translateAtomic, and convert the timing simple CPU to use it.


# 5891:73084c6bb183 25-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

ISA: Replace the translate functions in the TLBs with translateAtomic.


# 5881:73c0aaaaf186 23-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Pass whether an access was a read/write/fetch so faults can behave accordingly.


# 5837:831413564d0c 01-Feb-2009 Gabe Black <gblack@eecs.umich.edu>

X86: Compute PCI config addresses correctly.


# 5736:426510e758ad 10-Nov-2008 Nathan Binkert <nate@binkert.org>

mem: update stuff for changes to Packet and Request


# 5714:76abee886def 02-Nov-2008 Lisa Hsu <hsul@eecs.umich.edu>

Add in Context IDs to the simulator. From now on, cpuId is almost never used,
the primary identifier for a hardware context should be contextId(). The
concept of threads within a CPU remains, in the form of threadId() because
sometimes you need to know which context within a cpu to manipulate.


# 5712:199d31b47f7b 02-Nov-2008 Lisa Hsu <hsul@eecs.umich.edu>

make BaseCPU the provider of _cpuId, and cpuId() instead of being scattered
across the subclasses. generally make it so that member data is _cpuId and
accessor functions are cpuId(). The ID val comes from the python (default -1 if
none provided), and if it is -1, the index of cpuList will be given. this has
passed util/regress quick and se.py -n4 and fs.py -n4 as well as standard
switch.


# 5648:e8abda6e0980 12-Oct-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Make the local APIC accessible through the memory system directly, and make the timer work.


# 5647:b06b49498c79 12-Oct-2008 Gabe Black <gblack@eecs.umich.edu>

Turn Interrupts objects into SimObjects. Also, move local APIC state into x86's Interrupts object.


# 5440:51d24253bcd9 12-Jun-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Rename the divide count register to divide configuration.


# 5433:1b0b8e9ba6a9 12-Jun-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Change how segment loading is performed.


# 5431:914851b44a74 12-Jun-2008 Gabe Black <gblack@eecs.umich.edu>

X86: In non 64bit mode, throw a fault when a NULL segment is accessed.


# 5419:a06807c228c1 12-Jun-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Have all 8 machine check registers since the kernel assumes they're there.


# 5418:501cb81c89df 12-Jun-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Bypass unaligned access support for register addressed MSRs.


# 5417:84755f1f32d3 12-Jun-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Remove enforcement of APIC register access alignment. Panic if more than one register is accessed at a time.


# 5374:4773d53f88a0 01-Mar-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Don't map the local APIC into the physical address space in SE mode.


# 5360:02a3af203516 26-Feb-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Put in initial implementation of the local APIC.


# 5359:8c6ff200e4c1 26-Feb-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Implement the INVLPG instruction and the TIA microop.


# 5358:e9acb84bbafb 26-Feb-2008 Gabe Black <gblack@eecs.umich.edu>

TLB: Make a TLB base class and put a virtual demapPage function in it.


# 5357:eecb5fd0be62 26-Feb-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Get PCI config space to work, and adjust address space prefix numbering scheme.


# 5323:75f7e6366a41 12-Jan-2008 Gabe Black <gblack@eecs.umich.edu>

X86: Make the IO ports work using extra physical address lines. Add a serial port.


# 5294:7222bdaed33b 02-Dec-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Reorganize segmentation and implement segment selector movs.


# 5245:d94bb8af9f76 12-Nov-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Separate out the page table walker into it's own cc and hh.


# 5243:4228b7b5704b 12-Nov-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Fix a stupid typo where WRMSR and RDMSR were switched, and add a debug statement.


# 5242:280a99136427 12-Nov-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Implement tlb invalidation and make it happen some of the times it should.


# 5237:6c819dbe8045 12-Nov-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Work on the page table walker, TLB, and related faults.


# 5236:0050ad4fb3ef 12-Nov-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Implement a page table walker.


# 5232:d3801ea2792e 12-Nov-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Various fixes to indexing segmentation related registers


# 5184:8782de2949e5 25-Oct-2007 Gabe Black <gblack@eecs.umich.edu>

TLB: Fix serialization issues with the tlb entries and make the page table store the process, not the system.


# 5149:356e00996637 12-Oct-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Implement MSR reads and writes and the wrsmr and rdmsr instructions.
There are no priviledge checks, so these instructions will all work in all
modes.


# 5140:2fd7f8477b4c 07-Oct-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Work on the x86 tlb.


# 5126:d3cdea5e0fb3 03-Oct-2007 Gabe Black <gblack@eecs.umich.edu>

Merge with head.


# 5124:3d8c50376609 03-Oct-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Start implementing the x86 tlb which will handle segmentation permission and limit checks and paging.


# 5100:7a0180040755 28-Sep-2007 Ali Saidi <saidi@eecs.umich.edu>

Rename cycles() function to ticks()


# 5086:e7913ffb379d 24-Sep-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Get X86_FS to compile.


# 5038:c996bb7f1a6d 31-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

X86: Get x86 to compile again after the simobject constructor change.


# 5004:7d94cedab264 26-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

Address translation: Make the page table more flexible.
The page table now stores actual page table entries. It is still a templated
class here, but this will be corrected in the near future.


# 4997:e7380529bd2d 26-Aug-2007 Gabe Black <gblack@eecs.umich.edu>

Address Translation: Make SE mode use an actual TLB/MMU for translation like FS.