Cross Reference: /gem5/src/arch/arm/table

History log of /gem5/src/arch/arm/table_walker.cc
Revision	Date	Author	Comments
# 14280:9e3f2937f72c	31-Jul-2019	Giacomo Travaglini <giacomo.travaglini@arm.com>	arch-arm: Fix Data Abort ISS when caused by Atomic operation Data Aborts caused by an atomic instruction have a special rule for their syndrome: From a ISS point of view they count as read if a read to that address would generate a fault; they count as writes otherwise (ISS.WnR bit) This patch is implementing this in the TLB. For permission faults we need to explicitly check if a read would trigger a fault (e.g. checking for the AP bits) since permissions can allow read-only accesses. For other MMU exceptions (like translation faults) we are confident the nature of the access doesn't affect the genration of a fault. This means that if the access is atomic, we treat it as a read from an ISS.WnR point of view. Change-Id: Ia524aa6ae07f81513cdc26c516b5fd9b01a931c3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20981 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>
# 14095:4f5d16d7cf45	18-Jul-2019	Giacomo Travaglini <giacomo.travaglini@arm.com>	arch-arm: Implement ARMv8.1-HPD, Hierarchical permission disable According to the armarm: ARMv8.1-HPD introduces the facility to disable the hierarchical attributes, APTable, PXNTable, and UXNTable, in the translation tables. This disable has no effect on the NSTable bit. This feature is mandatory in ARMv8.1 implementations. This feature is added only to the VMSAv8-64 translation regimes. ARMv8.2 extends this to the AArch32 translation regimes, see ARMv8.2-AA32HPD. The ID_AA64MMFR1_EL1.HPDS field identifies the support for ARMv8.1-HPD. Change-Id: Ibbf589b82f2c1e4437b43252f8f633e8f6fb0b80 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Ciro Santilli <ciro.santilli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19610 Tested-by: kokoro <noreply+kokoro@google.com>
# 14093:5fbd7d00b58e	17-Jul-2019	Giacomo Travaglini <giacomo.travaglini@arm.com>	arch-arm: Clean Fault generation when processing Long Descriptor A new shared method has been introduced: generateLongDescFault Change-Id: I7eb6fa1347a6c2cf9cb11fd9f2137d983c4f7a40 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Ciro Santilli <ciro.santilli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19608 Tested-by: kokoro <noreply+kokoro@google.com>
# 14088:8de55a7aa53b	01-May-2019	Giacomo Travaglini <giacomo.travaglini@arm.com>	arch-arm: Use ExceptionLevel type in TlbEntry Replacing uint8_t with ExceptionLevel type in the arm TlbEntry. The variable is representing the translation regime it is targeting. Change-Id: Ifcd6e86c5d73f752e8476a2b7fda9ea74a0c7a3b Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19488 Reviewed-by: Ciro Santilli <ciro.santilli@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>
# 14040:0c4153500e9c	06-Jun-2019	Giacomo Travaglini <giacomo.travaglini@arm.com>	arch-arm: Fix WalkerState,Descriptors default constructor Those POD strutures are not initializing all members at construction. This could lead to undefined behaviour Change-Id: Iaa8afb126382b6bfbef686883a026262f24d5ca1 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Ciro Santilli <ciro.santilli@arm.com> Reviewed-by: Javier Setoain <javier.setoain@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19149 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>
# 13892:0182a0601f66	22-Apr-2019	Gabe Black <gabeblack@google.com>	mem: Minimize the use of MemObject. MemObject doesn't provide anything beyond its base ClockedObject any more, so this change removes it from most inheritance hierarchies. Occasionally MemObject is replaced with SimObject when I was fairly confident that the extra functionality of ClockedObject wasn't needed. Change-Id: Ic014ab61e56402e62548e8c831eb16e26523fdce Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18289 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Gabe Black <gabeblack@google.com>
# 13795:e21c61d9efb8	19-Mar-2019	Andrea Mondelli <Andrea.Mondelli@ucf.edu>	dev-arm: ambiguous use of getPort() The recent introduction of getPort() creates a conflict with the existing method used in arm MMU. This patch rename the old getPort() in getDMAPort() according to the returned value (DmaPort class type) Change-Id: Ief3d83650fd6b08490522341631244be06e380ce Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17469 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
# 13784:1941dc118243	07-Mar-2019	Gabe Black <gabeblack@google.com>	arch, cpu, dev, gpu, mem, sim, python: start using getPort. Replace the getMasterPort, getSlavePort, and getEthPort functions with getPort, and remove extraneous mechanisms that are no longer necessary. Change-Id: Iab7e3c02d2f3a0cf33e7e824e18c28646b5bc318 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17040 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
# 13019:3fa5ab820fa8	04-Sep-2018	Anouk Van Laer <anouk.vanlaer@arm.com>	arch-arm: Correction for address size in EL1&0 translation When doing EL0/1 translation in stage2, the physical address size will be defined by the hypervisor (via VTCR_EL2.ps, not TCR.ips). See D10.2.121 of the ARM ARM. Change-Id: Ic7df97c0f5950a648f7408cde3955a640b562c1d Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/12552 Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
# 13018:9e9819585e55	03-Sep-2018	Anouk Van Laer <anouk.vanlaer@arm.com>	arch-arm: Correction to address size in EL2/EL3 This commit corrects how the address size is determined in EL2/EL3. Previously, TCR_ELx.ips was used but this should be TCR_ELx.ps. Change-Id: I7e5a2f376335532a1d1c8c74d12a416617474ae2 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/12551 Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
# 12749:223c83ed9979	04-Jun-2018	Giacomo Travaglini <giacomo.travaglini@arm.com>	misc: Using smart pointers for memory Requests This patch is changing the underlying type for RequestPtr from Request* to shared_ptr<Request>. Having memory requests being managed by smart pointers will simplify the code; it will also prevent memory leakage and dangling pointers. Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10996 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
# 12738:1ac09a70644f	05-Jun-2018	Andreas Sandberg <andreas.sandberg@arm.com>	arch-arm: Remove dead doingStage2 variable in PT walker Change-Id: Iab5ecec56120c725847b2e462fd4793cfac87d3c Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10815
# 12735:e3da526a0654	16-May-2018	Andreas Sandberg <andreas.sandberg@arm.com>	arch-arm: Respect EL from translation type There are cases where instructions request translations in the context of a lower EL. This is currently not respected in the TLB and the page table walker. Fix that. Change-Id: Icd59657a1ecfd8bd75a001bb1a4e41a6f4808a36 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10506 Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
# 12709:faf5b471d5ce	20-Apr-2018	Giacomo Travaglini <giacomo.travaglini@arm.com>	arch-arm: Implement ARMv8.1 TTBR1_EL2 register This patch implements the ARMv8.1 TTBR1_EL2 register, which is used for getting the translation table base address when a Host Operating System is running at EL2. (HCR_EL2.E2H = 1) Change-Id: Ic0ab351cae3fd64855eda7c18c8757da0d7b8663 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10382 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
# 12526:94adfd8b5dbd	02-Aug-2017	Chuan Zhu <chuan.zhu@arm.com>	arch-arm: Fix big endian support in do{Long,L1,L2}Descriptor do{Long,L1,L2}Descriptor was not able to load descriptors correctly for big-endian situations, causing recognised Descriptors. Added big-endian related data conversions to correct them. Change-Id: I0fdfbbdf56f94bbed19172acae1b6e4a0382b5a0 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/8144 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
# 12499:b81688796004	09-Jan-2018	Giacomo Travaglini <giacomo.travaglini@arm.com>	arch-arm: Change function name for banked miscregs This commit changes the function's name used for retrieving the index of a security banked register given the flatten index. This will avoid confusion with flattenRegId, which has a different purpose. Change-Id: I470ffb55916cb7fc9f78e071a7f2e609c1829f1a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/7982 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
# 12406:86bde4a026b5	22-Dec-2017	Gabe Black <gabeblack@google.com>	arch,cpu: "virtualize" the TLB interface. CPUs have historically instantiated the architecture specific version of the TLBs to avoid a virtual function call, making them a little bit more dependent on what the current ISA is. Some simple performance measurement, the x86 twolf regression on the atomic CPU, shows that there isn't actually any performance benefit, and if anything the simulator goes slightly faster (although still within margin of error) when the TLB functions are virtual. This change switches everything outside of the architectures themselves to use the generic BaseTLB type, and then inside the ISA for them to cast that to their architecture specific type to call into architecture specific interfaces. The ARM TLB needed the most adjustment since it was using non-standard translation function signatures. Specifically, they all took an extra "type" parameter which defaulted to normal, and translateTiming returned a Fault. translateTiming actually doesn't need to return a Fault because everywhere that consumed it just stored it into a structure which it then deleted(?), and the fault is stored in the Translation object when the translation is done. A little more work is needed to fully obviate the arch/tlb.hh header, so the TheISA::TLB type is still visible outside of the ISAs. Specifically, the TlbEntry type is used in the generic PageTable which lives in src/mem. Change-Id: I51b68ee74411f9af778317eff222f9349d2ed575 Reviewed-on: https://gem5-review.googlesource.com/6921 Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
# 12392:e0dbdf30a2a5	13-Dec-2017	Jason Lowe-Power <jason@lowepower.com>	misc: Updates for gcc7.2 for x86 GCC 7.2 is much stricter than previous GCC versions. The following changes are needed: * There is now a warning if there is an implicit fallthrough between two case statments. C++17 adds the [[fallthrough]]; declaration. However, to support non C++17 standards (i.e., C++11), we use M5_FALLTHROUGH. M5_FALLTHROUGH checks for [[fallthrough]] compliant C++17 compiler and if that doesn't exist, it defaults to nothing (no older compilers generate warnings). * The above resulted in a couple of bugs that were found. This is noted in the review request on gerrit. * throw() for dynamic exception specification is deprecated * There were a couple of new uninitialized variable warnings * Can no longer perform bitwise operations on a bool. * Must now include <functional> for std::function * Compiler bug for void* lambda. Changed to auto as work around. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82878 Change-Id: I5d4c782a4e133fa4cdb119e35d9aff68c6e2958e Signed-off-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-on: https://gem5-review.googlesource.com/5802 Reviewed-by: Gabe Black <gabeblack@google.com>
# 12086:069c529a76fd	07-Jun-2017	Sean Wilson <spwilson2@wisc.edu>	arm: Replace EventWrapper use with EventFunctionWrapper Change-Id: I08de5f72513645d1fe92bde99fa205dde897e951 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3747 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
# 11938:9efd6816e06a	27-Feb-2017	Nikos Nikoleris <nikos.nikoleris@arm.com>	arm: Treat Write-Through Normal memory as Non-Cacheable A completed write to a memory location that is Write-Through Cacheable has to be visible to an external observer without the need of explicit cache maintenance. This change adds support for Write-Through Cacheable Normal memory and treats it as Non-cacheable. This incurs a small penalty as accesses to the memory do not fill in the cache but does not violate the properties of the memory type. Change-Id: Iee17ef9d952a550be9ad660b1e60e9f6c4ef2c2d Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2280 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
# 11588:32cbf6ab7730	02-Aug-2016	Curtis Dunham <Curtis.Dunham@arm.com>	arm: refactor page table walking Introduce and use a lookup table. Using fetchDescriptor() rather than DMA cleanly handles nested paging. Change-Id: I69ec762f176bd752ba1040890e731826b58d15a6
# 11583:13c5ba4250b3	02-Aug-2016	Dylan Johnson <Dylan.Johnson@ARM.com>	arm: Fix stage 2 memory attribute checking in AArch64 Change-Id: I14c93a5460550051a12129e792a9a9bd522a145c
# 11580:afe051c345e9	02-Aug-2016	Dylan Johnson <Dylan.Johnson@ARM.com>	arm: Fix stage 2 determination in table walker We recompute if we are doing a stage 2 walk inside of the table walker but we have already figured it out in the tlb. Pass the information in to the walk instead of recomputing it. Change-Id: I39637ce99309b2ddbc30344d45ac9ebf6a203401
# 11579:6b8a05582dc7	02-Aug-2016	Dylan Johnson <Dylan.Johnson@ARM.com>	arm: Refactor aarch64 table walk logic to remove redundancy The functional case is already handled within the fetchDescriptor() function. We can thus use that function for both atomic and functional mode when we start the table walk. Change-Id: Iacaed28cd9024d259fd37a58150efd00ff94d86e
# 11575:0005b28685f0	02-Aug-2016	Dylan Johnson <Dylan.Johnson@ARM.com>	arm: add stage2 translation support Change-Id: I8f7c09c7ec3a97149ebebf4b21471b244e6cecc1
# 11574:868c31fcca24	02-Aug-2016	Curtis Dunham <Curtis.Dunham@arm.com>	arm: enable EL2 support Change-Id: I59fa4fae98c33d9e5c2185382e1411911d27d341
# 11522:348411ec525a	06-Jun-2016	Stephan Diestelhorst <stephan.diestelhorst@arm.com>	sim: Call regStats of base-class as well We want to extend the stats of objects hierarchically and thus it is necessary to register the statistics of the base-class(es), as well. For now, these are empty, but generic stats will be added there. Patch originally provided by Akash Bagdia at ARM Ltd.
# 11517:54230f1ebef2	02-Jun-2016	Curtis Dunham <Curtis.Dunham@arm.com>	arm: refactor page table format determination In particular, when EL0 is in AArch32 but EL1 is AArch64, AArch64 memory translation must be used. This is essential for typical AArch64/32 interworking use cases.
# 11395:032bc62120eb	21-Mar-2016	Andreas Sandberg <andreas.sandberg@arm.com>	arm: Refactor the TLB test interface Refactor the TLB and page table walker test interface to use a dynamic registration mechanism. Instead of patching a couple of empty methods to wire up a TLB tester, this change allows such testers to register themselves using the setTestInterface() method.
# 11181:4daf60db14d7	29-Oct-2015	Nathanael Premillieu <nathananel.premillieu@arm.com>	arm: Add secure flag to TableWalker request when needed
# 10913:38dbdeea7f1f	07-Jul-2015	Andreas Sandberg <andreas.sandberg@arm.com>	sim: Refactor and simplify the drain API The drain() call currently passes around a DrainManager pointer, which is now completely pointless since there is only ever one global DrainManager in the system. It also contains vestiges from the time when SimObjects had to keep track of their child objects that needed draining. This changeset moves all of the DrainState handling to the Drainable base class and changes the drain() and drainResume() calls to reflect this. Particularly, the drain() call has been updated to take no parameters (the DrainManager argument isn't needed) and return a DrainState instead of an unsigned integer (there is no point returning anything other than 0 or 1 any more). Drainable objects should return either DrainState::Draining (equivalent to returning 1 in the old system) if they need more time to drain or DrainState::Drained (equivalent to returning 0 in the old system) if they are already in a consistent state. Returning DrainState::Running is considered an error. Drain done signalling is now done through the signalDrainDone() method in the Drainable class instead of using the DrainManager directly. The new call checks if the state of the object is DrainState::Draining before notifying the drain manager. This means that it is safe to call signalDrainDone() without first checking if the simulator has requested draining. The intention here is to reduce the code needed to implement draining in simple objects.
# 10910:32f3d1c454ec	07-Jul-2015	Andreas Sandberg <andreas.sandberg@arm.com>	sim: Make the drain state a global typed enum The drain state enum is currently a part of the Drainable interface. The same state machine will be used by the DrainManager to identify the global state of the simulator. Make the drain state a global typed enum to better cater for this usage scenario.
# 10873:7c972b9aea16	21-Jun-2015	Andreas Sandberg <andreas.sandberg@arm.com>	arm: Cleanup arch headers to remove dma_device.hh dependency Break the dependency on dma_device.hh by forward-declaring DmaPort in the relevant header.
# 10836:9b424e7adac5	15-May-2015	Andreas Hansson <andreas.hansson@arm.com>	arm: Identify table-walker requests This patch ensures all page-table walks are flagged as such.
# 10717:4f8c1bd6fdb8	02-Mar-2015	Andreas Hansson <andreas.hansson@arm.com>	arm: Share a port for the two table walker objects This patch changes how the MMU and table walkers are created such that a single port is used to connect the MMU and the TLBs to the memory system. Previously two ports were needed as there are two table walker objects (stage one and stage two), and they both had a port. Now the port itself is moved to the Stage2MMU, and each TableWalker is simply using the port from the parent. By using the same port we also remove the need for having an additional crossbar joining the two ports before the walker cache or the L2. This simplifies the creation of the CPU cache topology in BaseCPU.py considerably. Moreover, for naming and symmetry reasons, the TLB walker port is connected through the stage-one table walker thus making the naming identical to x86. Along the same line, we use the stage-one table walker to generate the master id that is used by all TLB-related requests.
# 10621:b7bc5b1084a4	23-Dec-2014	Curtis Dunham <Curtis.Dunham@arm.com>	arm: Add stats to table walker This patch adds table walker stats for: - Walk events - Instruction vs Data - Page size histogram - Wait time and service time histograms - Pending requests histogram (per cycle) - measures dist. of L (p(1..) = how often busy, p(0) = how often idle) - Squashes, before starting and after completion
# 10579:e622a3e2ed14	02-Dec-2014	Andrew Bardsley <Andrew.Bardsley@arm.com>	arm: Fix TLB ignoring faults when table walking This patch fixes a case where the Minor CPU can deadlock due to the lack of a response to TLB request because of a bug in fault handling in the ARM table walker. TableWalker::processWalkWrapper is the scheduler-called wrapper which handles deferred walks which calls to TableWalker::wait cannot immediately process. The handling of faults generated by processWalk{AArch64,LPAE,} calls in those two functions is is different. processWalkWrapper ignores fault returns from processWalk... which can lead to ::finish not being called on a translation. This fix provides fault handling in processWalkWrapper similar to that found in the leaf functions which BaseTLB::Translation::finish.
# 10537:47fe87b0cf97	14-Nov-2014	Andreas Hansson <andreas.hansson@arm.com>	arm: Fixes based on UBSan and static analysis Another churn to clean up undefined behaviour, mostly ARM, but some parts also touching the generic part of the code base. Most of the fixes are simply ensuring that proper intialisation. One of the more subtle changes is the return type of the sign-extension, which is changed to uint64_t. This is to avoid shifting negative values (undefined behaviour) in the ISA code.
# 10509:d5554f97c451	30-Oct-2014	Ali Saidi <Ali.Saidi@ARM.com>	arm, mem: Fix drain bug and provide drain prints for more components.
# 10474:799c8ee4ecba	16-Oct-2014	Andreas Hansson <andreas.hansson@arm.com>	arch: Use shared_ptr for all Faults This patch takes quite a large step in transitioning from the ad-hoc RefCountingPtr to the c++11 shared_ptr by adopting its use for all Faults. There are no changes in behaviour, and the code modifications are mostly just replacing "new" with "make_shared".
# 10421:d469fdcd937e	01-Oct-2014	Andreas Hansson <andreas.hansson@arm.com>	arm: Use MiscRegIndex rather than int when flattening Some additional type checking to avoid future issues.
# 10367:bf52480abd01	12-Sep-2014	Andrew Bardsley <Andrew.Bardsley@arm.com>	style: Fix line continuation, especially in debug messages This patch closes a number of space gaps in debug messages caused by the incorrect use of line continuation within strings. (There's also one consistency change to a similar, but correct, use of line continuation)
# 10324:f40134eb3f85	27-May-2014	Curtis Dunham <Curtis.Dunham@arm.com>	arm: support 16kb vm granules
# 10109:b58c5c5854de	07-Mar-2014	Geoffrey Blake <Geoffrey.Blake@arm.com>	arm: Handle functional TLB walks properly The table walker code currently accounts for two types of walks, Atomic and Timing, and treats them differently. Atomic walks keep a single instance of WalkerState around for all walks to use in currState. Timing mode keeps a queue of in-flight WalkerStates and maintains currState as NULL between walks. If a functional walk is done during Timing mode, it is treated as an atomic walk and either creates a persistent WalkerState if in between Timing walks, or stomps an existing currState for an in-progress Timing walk. This patch distinguishes functional walks as being able to exist at any time and sets up a temporary WalkerState for its exclusive use and then cleans up when finished, leaving any in progress Atomic or Timing walks undisturbed.
# 10037:5cac77888310	24-Jan-2014	ARM gem5 Developers	arm: Add support for ARMv8 (AArch64 & AArch32) Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64 kernel you are restricted to AArch64 user-mode binaries. This will be addressed in a later patch. Note: Virtualization is only supported in AArch32 mode. This will also be fixed in a later patch. Contributors: Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation) Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation) Mbou Eyole (AArch64 NEON, validation) Ali Saidi (AArch64 Linux support, code integration, validation) Edmund Grimley-Evans (AArch64 FP) William Wang (AArch64 Linux support) Rene De Jong (AArch64 Linux support, performance opt.) Matt Horsnell (AArch64 MP, validation) Matt Evans (device models, code integration, validation) Chris Adeniyi-Jones (AArch64 syscall-emulation) Prakash Ramrakhyani (validation) Dam Sunwoo (validation) Chander Sudanthi (validation) Stephan Diestelhorst (validation) Andreas Hansson (code integration, performance opt.) Eric Van Hensbergen (performance opt.) Gabe Black
# 10024:fc10e1f9f124	24-Jan-2014	Dam Sunwoo <dam.sunwoo@arm.com>	mem: per-thread cache occupancy and per-block ages This patch enables tracking of cache occupancy per thread along with ages (in buckets) per cache blocks. Cache occupancy stats are recalculated on each stat dump.
# 9535:508aebb47ca6	15-Feb-2013	Mrinmoy Ghosh <mrinmoy.ghosh@arm.com>	arm: fix a page table walker issue where a page could be translated multiple times If multiple memory operations to the same page are miss the TLB they are all inserted into the page table queue and before this change could result in multiple uncessesary walks as well as duplicate enteries being inserted into the TLB.
# 9524:d6ffa982a68b	15-Feb-2013	Andreas Sandberg <Andreas.Sandberg@ARM.com>	sim: Add a system-global option to bypass caches Virtualized CPUs and the fastmem mode of the atomic CPU require direct access to physical memory. We currently require caches to be disabled when using them to prevent chaos. This is not ideal when switching between hardware virutalized CPUs and other CPU models as it would require a configuration change on each switch. This changeset introduces a new version of the atomic memory mode, 'atomic_noncaching', where memory accesses are inserted into the memory system as atomic accesses, but bypass caches. To make memory mode tests cleaner, the following methods are added to the System class: * isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'. * isTimingMode() -- True if the memory mode is 'timing'. * bypassCaches() -- True if caches should be bypassed. The old getMemoryMode() and setMemoryMode() methods should never be used from the C++ world anymore.
# 9438:ef92e4f00551	07-Jan-2013	Andreas Sandberg <Andreas.Sandberg@ARM.com>	arm: Fix draining of the pagetable walker when squashing Since the page table walker only checks if a drain has completed in doL1DescriptorWrapper() and doL2DescriptorWrapper(), it sometimes looses track of a drain request if there is a squash. This changeset adds a completeDrain() call after squashing requests in the pending queue, which fixes this issue.
# 9342:6fec8f26e56d	02-Nov-2012	Andreas Sandberg <Andreas.Sandberg@arm.com>	sim: Move the draining interface into a separate base class This patch moves the draining interface from SimObject to a separate class that can be used by any object needing draining. However, objects not visible to the Python code (i.e., objects not deriving from SimObject) still depend on their parents informing them when to drain. This patch also gets rid of the CountedDrainEvent (which isn't really an event) and replaces it with a DrainManager.
# 9309:10cf9d9fe5ed	25-Oct-2012	Andreas Hansson <andreas.hansson@arm.com>	arm: Use table walker clock that is inherited from CPU This patch simplifies the scheduling of the next walk for the ARM table walker. Previously it used the CPU clock, but as the table walker inherits the clock from the CPU, it is cleaner to simply use its own clock (which is the same).
# 9294:8fb03b13de02	15-Oct-2012	Andreas Hansson <andreas.hansson@arm.com>	Port: Add protocol-agnostic ports in the port hierarchy This patch adds an additional level of ports in the inheritance hierarchy, separating out the protocol-specific and protocl-agnostic parts. All the functionality related to the binding of ports is now confined to use BaseMaster/BaseSlavePorts, and all the protocol-specific parts stay in the Master/SlavePort. In the future it will be possible to add other protocol-specific implementations. The functions used in the binding of ports, i.e. getMaster/SlavePort now use the base classes, and the index parameter is updated to use the PortID typedef with the symbolic InvalidPortID as the default.
# 9258:baa17ba80e06	25-Sep-2012	Ali Saidi <Ali.Saidi@ARM.com>	ARM: Squash outstanding walks when instructions are squashed.
# 9180:ee8d7a51651d	28-Aug-2012	Andreas Hansson <andreas.hansson@arm.com>	Clock: Add a Cycles wrapper class and use where applicable This patch addresses the comments and feedback on the preceding patch that reworks the clocks and now more clearly shows where cycles (relative cycle counts) are used to express time. Instead of bumping the existing patch I chose to make this a separate patch, merely to try and focus the discussion around a smaller set of changes. The two patches will be pushed together though. This changes done as part of this patch are mostly following directly from the introduction of the wrapper class, and change enough code to make things compile and run again. There are definitely more places where int/uint/Tick is still used to represent cycles, and it will take some time to chase them all down. Similarly, a lot of parameters should be changed from Param.Tick and Param.Unsigned to Param.Cycles. In addition, the use of curTick is questionable as there should not be an absolute cycle. Potential solutions can be built on top of this patch. There is a similar situation in the o3 CPU where lastRunningCycle is currently counting in Cycles, and is still an absolute time. More discussion to be had in other words. An additional change that would be appropriate in the future is to perform a similar wrapping of Tick and probably also introduce a Ticks class along with suitable operators for all these classes.
# 9179:666bc9df1e49	28-Aug-2012	Andreas Hansson <andreas.hansson@arm.com>	Clock: Rework clocks to avoid tick-to-cycle transformations This patch introduces the notion of a clock update function that aims to avoid costly divisions when turning the current tick into a cycle. Each clocked object advances a private (hidden) cycle member and a tick member and uses these to implement functions for getting the tick of the next cycle, or the tick of a cycle some time in the future. In the different modules using the clocks, changes are made to avoid counting in ticks only to later translate to cycles. There are a few oddities in how the O3 and inorder CPU count idle cycles, as seen by a few locations where a cycle is subtracted in the calculation. This is done such that the regression does not change any stats, but should be revisited in a future patch. Another, much needed, change that is not done as part of this patch is to introduce a new typedef uint64_t Cycle to be able to at least hint at the unit of the variables counting Ticks vs Cycles. This will be done as a follow-up patch. As an additional follow up, the thread context still uses ticks for the book keeping of last activate and last suspend and this should probably also be changed into cycles as well.
# 9165:f9e3dac185ba	22-Aug-2012	Andreas Hansson <andreas.hansson@arm.com>	Packet: Remove NACKs from packet and its use in endpoints This patch removes the NACK frrom the packet as there is no longer any module in the system that issues them (the bridge was the only one and the previous patch removes that). The handling of NACKs was mostly avoided throughout the code base, by using e.g. panic or assert false, but in a few locations the NACKs were actually dealt with (although NACKs never occured in any of the regressions). Most notably, the DMA port will now never receive a NACK and the backoff time is thus never changed. As a consequence, the entire backoff mechanism (similar to a PCI bus) is now removed and the DMA port entirely relies on the bus performing the arbitration and issuing a retry when appropriate. This is more in line with e.g. PCIe. Surprisingly, this patch has no impact on any of the regressions. As mentioned in the patch that removes the NACK from the bridge, a follow-up patch should change the request and response buffer size for at least one regression to also verify that the system behaves as expected when the bridge fills up.
# 9152:86c0e6ca5e7c	15-Aug-2012	Anthony Gutierrez <atgutier@umich.edu>	O3,ARM: fix some problems with drain/switchout functionality and add Drain DPRINTFs This patch fixes some problems with the drain/switchout functionality for the O3 cpu and for the ARM ISA and adds some useful debug print statements. This is an incremental fix as there are still a few bugs/mem leaks with the switchout code. Particularly when switching from an O3CPU to a TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA I haven't encountered any more assertion failures; now the kernel will typically panic inside of simulation.
# 9015:7f4d25789dc4	23-May-2012	Andreas Hansson <andreas.hansson@arm.com>	MEM: Add a snooping DMA port subclass for table walker This patch makes the (device) DmaPort non-snooping and removes the recvSnoop constructor parameter and instead introduces a SnoopingDmaPort subclass for the ARM table walker. Functionality is unchanged, as are the stats, and the patch merely clarifies that the normal DMA ports are not snooping (although they may issue requests that are snooped by others, as done with PCI, PCIe, AMBA4 ACE etc). Currently this port is declared in the ARM table walker as it is not used anywhere else. If other ports were to have similar behaviour it could be moved in a future patch.
# 8949:3fa1ee293096	14-Apr-2012	Andreas Hansson <andreas.hansson@arm.com>	MEM: Remove the Broadcast destination from the packet This patch simplifies the packet by removing the broadcast flag and instead more firmly relying on (and enforcing) the semantics of transactions in the classic memory system, i.e. request packets are routed from a master to a slave based on the address, and when they are created they have neither a valid source, nor destination. On their way to the slave, the request packet is updated with a source field for all modules that multiplex packets from multiple master (e.g. a bus). When a request packet is turned into a response packet (at the final slave), it moves the potentially populated source field to the destination field, and the response packet is routed through any multiplexing components back to the master based on the destination field. Modules that connect multiplexing components, such as caches and bridges store any existing source and destination field in the sender state as a stack (just as before). The packet constructor is simplified in that there is no longer a need to pass the Packet::Broadcast as the destination (this was always the case for the classic memory system). In the case of Ruby, rather than using the parameter to the constructor we now rely on setDest, as there is already another three-argument constructor in the packet class. In many places where the packet information was printed as part of DPRINTFs, request packets would be printed with a numeric "dest" that would always be -1 (Broadcast) and that field is now removed from the printing.
# 8922:17f037ad8918	30-Mar-2012	William Wang <william.wang@arm.com>	MEM: Introduce the master/slave port sub-classes in C++ This patch introduces the notion of a master and slave port in the C++ code, thus bringing the previous classification from the Python classes into the corresponding simulation objects and memory objects. The patch enables us to classify behaviours into the two bins and add assumptions and enfore compliance, also simplifying the two interfaces. As a starting point, isSnooping is confined to a master port, and getAddrRanges to slave ports. More of these specilisations are to come in later patches. The getPort function is not getMasterPort and getSlavePort, and returns a port reference rather than a pointer as NULL would never be a valid return value. The default implementation of these two functions is placed in MemObject, and calls fatal. The one drawback with this specific patch is that it requires some code duplication, e.g. QueuedPort becomes QueuedMasterPort and QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort (avoiding multiple inheritance). With the later introduction of the port interfaces, moving the functionality outside the port itself, a lot of the duplicated code will disappear again.
# 8851:7e966326ef5b	24-Feb-2012	Andreas Hansson <andreas.hansson@arm.com>	MEM: Move port creation to the memory object(s) construction This patch moves all port creation from the getPort method to be consistently done in the MemObject's constructor. This is possible thanks to the Swig interface passing the length of the vector ports. Previously there was a mix of: 1) creating the ports as members (at object construction time) and using getPort for the name resolution, or 2) dynamically creating the ports in the getPort call. This is now uniform. Furthermore, objects that would not be complete without a port have these ports as members rather than having pointers to dynamically allocated ports. This patch also enables an elaboration-time enumeration of all the ports in the system which can be used to determine the masterId.
# 8832:247fee427324	12-Feb-2012	Ali Saidi <Ali.Saidi@ARM.com>	mem: Add a master ID to each request object. This change adds a master id to each request object which can be used identify every device in the system that is capable of issuing a request. This is part of the way to removing the numCpus+1 stats in the cache and replacing them with the master ids. This is one of a series of changes that make way for the stats output to be changed to python.
# 8733:64a7bf8fa56c	31-Jan-2012	Geoffrey Blake <geoffrey.blake@arm.com>	CheckerCPU: Re-factor CheckerCPU to be compatible with current gem5 Brings the CheckerCPU back to life to allow FS and SE checking of the O3CPU. These changes have only been tested with the ARM ISA. Other ISAs potentially require modification.
# 8630:05580a8506c7	01-Dec-2011	Mitchell Hayenga <Mitchell.Hayenga@ARM.com>	Device: Make changes necessary to support a coherent page walker cache. Adds the flag 'recvSnoops' which enables pagewalkers using DmaPorts, to properly configure snoops.
# 8510:846285f8c7be	19-Aug-2011	Ali Saidi <Ali.Saidi@ARM.com>	ARM: Fix a memory leak with the table walker.
# 8245:a9d06c894afe	20-Apr-2011	Nathan Binkert <nate@binkert.org>	fix some build problems from prior changesets
# 8229:78bf55f23338	15-Apr-2011	Nathan Binkert <nate@binkert.org>	includes: sort all includes
# 8202:1b63e9afeafc	04-Apr-2011	Ali Saidi <Ali.Saidi@ARM.com>	ARM: Fix table walk going on while ASID changes error
# 8067:21f14583aa6a	23-Feb-2011	Ali Saidi <Ali.Saidi@ARM.com>	ARM: Fix bug that let two table walks occur in parallel.
# 7946:7c58c106d28d	11-Feb-2011	Giacomo Gabrielli <Giacomo.Gabrielli@arm.com>	O3: Fix a few bugs in the TableWalker object. Uncacheable requests were set as such only in atomic mode. currState->delayed is checked in place of currState->timing for resetting currState in atomic mode.
# 7823:dac01f14f20f	08-Jan-2011	Steve Reinhardt <steve.reinhardt@amd.com>	Replace curTick global variable with accessor functions. This step makes it easy to replace the accessor functions (which still access a global variable) with ones that access per-thread curTick values.
# 7781:a9f9eed35b18	07-Dec-2010	Ali Saidi <Ali.Saidi@ARM.com>	ARM: Support switchover with hardware table walkers
# 7748:7bf78d12b359	15-Nov-2010	Ali Saidi <Ali.Saidi@ARM.com>	ARM: Add support for switching CPUs
# 7733:08d6a773d1b6	08-Nov-2010	Ali Saidi <Ali.Saidi@ARM.com>	ARM: Add checkpointing support
# 7728:cf9db1c47a77	08-Nov-2010	Ali Saidi <Ali.Saidi@ARM.com>	ARM: Don't return the result of a table walk the same cycle it's completed. The L1 cache may have been accessed to provide this data, which confuses it, if it ends up being accesses twice in one cycle. Instead wait 1 tick which will force the timing simple CPU to forward to its next clock cycle when the translation completes. Also prevent multiple outstanding table walks from occuring at once.
# 7720:65d338a8dba4	31-Oct-2010	Gabe Black <gblack@eecs.umich.edu>	ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
# 7694:de057cccee82	01-Oct-2010	Ali Saidi <Ali.Saidi@ARM.com>	ARM: Implement functional virtual to physical address translation for debugging and program introspection.
# 7653:968302e54850	25-Aug-2010	Gene WU <gene.wu@arm.com>	ARM: Seperate the queues of L1 and L2 walker states.
# 7611:c119da5a80c8	23-Aug-2010	Gene Wu <Gene.Wu@arm.com>	ARM: Make sure that software prefetch instructions can't change the state of the TLB
# 7608:17aabeaa1a8f	23-Aug-2010	Gene Wu <Gene.Wu@arm.com>	ARM: Fix Uncachable TLB requests and decoding of xn bit
# 7582:a24f26bf0fbe	23-Aug-2010	Ali Saidi <Ali.Saidi@arm.com>	ARM: Fix an un-initialized variable bug
# 7579:06fe5d901fe8	23-Aug-2010	Min Kyu Jeong <minkyu.jeong@arm.com>	ARM: Finish the timing translation when taking a fault.
# 7578:7ea651f34ae6	23-Aug-2010	Dam Sunwoo <dam.sunwoo@arm.com>	ARM: Use a stl queue for the table walker state
# 7576:4154f3e1edae	23-Aug-2010	Ali Saidi <Ali.Saidi@ARM.com>	Compiler: Fixes for GCC 4.5.
# 7439:b4c6b2532bbf	02-Jun-2010	Dam Sunwoo <dam.sunwoo@arm.com>	ARM: Allow multiple outstanding TLB walks to queue.
# 7438:8e4b37136330	02-Jun-2010	Ali Saidi <Ali.Saidi@ARM.com>	ARM TLB: Fix bug in memAttrs getting a bogus thread context
# 7437:5853fbdfba9b	02-Jun-2010	Dam Sunwoo <dam.sunwoo@arm.com>	ARM: Support table walks in timing mode.
# 7436:b578349f9371	02-Jun-2010	Dam Sunwoo <dam.sunwoo@arm.com>	ARM: Added support for Access Flag and some CP15 regs (V2PCWPR, V2PCWPW, V2PCWUR, V2PCWUW,...)
# 7406:ddc26bd4ea7d	02-Jun-2010	Ali Saidi <Ali.Saidi@ARM.com>	ARM: Some TLB bug fixes.
# 7404:bfc74724914e	02-Jun-2010	Ali Saidi <Ali.Saidi@ARM.com>	ARM: Implement the ARM TLB/Tablewalker. Needs performance improvements.