Searched hist:31 (Results 676 - 700 of 1011) sorted by relevance
/gem5/src/dev/x86/ | ||
H A D | i8259.cc | diff 5632:65132fd646c6 Sat Oct 11 04:31:00 EDT 2008 Gabe Black <gblack@eecs.umich.edu> X86: Hook the CMOS device to the I8259 PICs. |
/gem5/src/mem/ | ||
H A D | SConscript | diff 10614:da37aec3ed1a Tue Dec 23 09:31:00 EST 2014 Kanishk Sugand <kanishk.sugand@arm.com> mem: Add a stack distance calculator This patch adds a stand-alone stack distance calculator. The stack distance calculator is a passive SimObject that observes the addresses passed to it. It calculates stack distances (LRU Distances) of incoming addresses based on the partial sum hierarchy tree algorithm described by Alamasi et al. http://doi.acm.org/10.1145/773039.773043. For each transaction a hashtable look-up is performed. At every non-unique transaction the tree is traversed from the leaf at the returned index to the root, the old node is deleted from the tree, and the sums (to the right) are collected and decremented. The collected sum represets the stack distance of the found node. At every unique transaction the stack distance is returned as numeric_limits<uint64>::max(). In addition to the basic stack distance calculation, a feature to mark an old node in the tree is added. This is useful if it is required to see the reuse pattern. For example, Writebacks to the lower level (e.g. membus from L2), can be marked instead of being removed from the stack (isMarked flag of Node set to True). And then later if this same address is accessed (by L1), the value of the isMarked flag would be True. This gives some insight on how the Writeback policy of the lower level affect the read/write accesses in an application. Debugging is enabled by setting the verify flag to true. Debugging is implemented using a dummy stack that behaves in a naive way, using STL vectors. Note that this has a large impact on run time. diff 10612:6332c9d471a8 Tue Dec 23 09:31:00 EST 2014 Marco Elver <Marco.Elver@ARM.com> mem: Add MemChecker and MemCheckerMonitor This patch adds the MemChecker and MemCheckerMonitor classes. While MemChecker can be integrated anywhere in the system and is independent, the most convenient usage is through the MemCheckerMonitor -- this however, puts limitations on where the MemChecker is able to observe read/write transactions. diff 9036:6385cf85bf12 Thu May 31 13:30:00 EDT 2012 Andreas Hansson <andreas.hansson@arm.com> Bus: Split the bus into a non-coherent and coherent bus This patch introduces a class hierarchy of buses, a non-coherent one, and a coherent one, splitting the existing bus functionality. By doing so it also enables further specialisation of the two types of buses. A non-coherent bus connects a number of non-snooping masters and slaves, and routes the request and response packets based on the address. The request packets issued by the master connected to a non-coherent bus could still snoop in caches attached to a coherent bus, as is the case with the I/O bus and memory bus in most system configurations. No snoops will, however, reach any master on the non-coherent bus itself. The non-coherent bus can be used as a template for modelling PCI, PCIe, and non-coherent AMBA and OCP buses, and is typically used for the I/O buses. A coherent bus connects a number of (potentially) snooping masters and slaves, and routes the request and response packets based on the address, and also forwards all requests to the snoopers and deals with the snoop responses. The coherent bus can be used as a template for modelling QPI, HyperTransport, ACE and coherent OCP buses, and is typically used for the L1-to-L2 buses and as the main system interconnect. The configuration scripts are updated to use a NoncoherentBus for all peripheral and I/O buses. A bit of minor tidying up has also been done. diff 5192:582e583f8e7e Wed Oct 31 01:21:00 EDT 2007 Ali Saidi <saidi@eecs.umich.edu> Traceflags: Add SCons function to created a traceflag instead of having one file with them all. |
H A D | dram_ctrl.cc | diff 11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. diff 10620:74834c49fbbe Tue Dec 23 09:31:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> config: Expose the DRAM ranks as a command-line option This patch gives the user direct influence over the number of DRAM ranks to make it easier to tune the memory density without affecting the bandwidth (previously the only means of scaling the device count was through the number of channels). The patch also adds some basic sanity checks to ensure that the number of ranks is a power of two (since we rely on bit slices in the address decoding). diff 10619:6dd27a0e0d23 Tue Dec 23 09:31:00 EST 2014 Andreas Hansson <andreas.hansson@arm.com> mem: Ensure DRAM controller is idle when in atomic mode This patch addresses an issue seen with the KVM CPU where the refresh events scheduled by the DRAM controller forces the simulator to switch out of the KVM mode, thus killing performance. The current patch works around the fact that we currently have no proper API to inform a SimObject of the mode switches. Instead we rely on drainResume being called after any switch, and cache the previous mode locally to be able to decide on appropriate actions. The switcheroo regression require a minor stats bump as a result. diff 10618:bb665366cc00 Tue Dec 23 09:31:00 EST 2014 Omar Naji <Omar.Naji@arm.com> mem: Add rank-wise refresh to the DRAM controller This patch adds rank-wise refresh to the controller, as opposed to the channel-wide refresh currently in place. In essence each rank can be refreshed independently, and for this to be possible the controller is extended with a state machine per rank. Without this patch the data bus is always idle during a refresh, as all the ranks are refreshing at the same time. With the rank-wise refresh it is possible to use one rank while another one is refreshing, and thus the data bus can be kept busy. The patch introduces a Rank class to encapsulate the state per rank, and also shifts all the relevant banks, activation tracking etc to the rank. The arbitration is also updated to consider the state of the rank. diff 10617:471d390943f0 Tue Dec 23 09:31:00 EST 2014 Omar Naji <Omar.Naji@arm.com> mem: Fix a bug in the DRAM controller arbitration Fix a minor issue that affects multi-rank systems. |
H A D | packet.hh | diff 13856:c4a7f25aacb4 Fri Feb 08 09:31:00 EST 2019 Daniel R. Carvalho <odanrc@yahoo.com.br> mem: Allow packet to provide its own addr range Add a getter to Packet to allow it to provide its own addr range. Change-Id: I2128ea3b71906502d10d9376b050a62407defd23 Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17536 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> diff 11287:0d5bbeaeb8ca Thu Dec 31 09:34:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Do not rely on the NeedsWritable flag for responses This patch removes the NeedsWritable flag for all responses, as it is really only the request that needs a writable response. The response, on the other hand, should in these cases always provide the line in a writable state, as indicated by the hasSharers flag not being set. When we send requests that has NeedsWritable set, the response will always have the hasSharers flag not set. Additionally, there are cases where the request did not have NeedsWritable set, and we still get a writable response with the hasSharers flag not set. This never happens on snoops, but is used by downstream caches to pass ownership upstream. As part of this patch, the affected response types are updated, and the snoop filter is similarly modified to check only the hasSharers flag (as it should). A sanity check is also added to the packet class, asserting that we never look at the NeedsWritable flag for responses. No regressions are affected. diff 11286:2071db8f864b Thu Dec 31 09:33:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Do not allocate space for packet data if not needed This patch looks at the request and response command to determine if either actually has any data payload, and if not, we do not allocate any space for packet data. The only tricky case is where the command type is changed as part of the MSHR functionality. In these cases where the original packet had no data, but the new packet does, we need to explicitly call allocate(). diff 11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. diff 9951:2f5eec8c1010 Thu Oct 31 14:41:00 EDT 2013 Stephan Diestelhorst <stephan.diestelhorst@arm.com> mem: Add "const" attribute to Packet getters Add a "const" keywords to the getters in the Packet class so these can be invoked on const Packet objects. diff 8737:770ccf3af571 Tue Jan 31 00:05:00 EST 2012 Koan-Sin Tan <koansin.tan@gmail.com> clang: Enable compiling gem5 using clang 2.9 and 3.0 This patch adds the necessary flags to the SConstruct and SConscript files for compiling using clang 2.9 and later (on Ubuntu et al and OSX XCode 4.2), and also cleans up a bunch of compiler warnings found by clang. Most of the warnings are related to hidden virtual functions, comparisons with unsigneds >= 0, and if-statements with empty bodies. A number of mismatches between struct and class are also fixed. clang 2.8 is not working as it has problems with class names that occur in multiple namespaces (e.g. Statistics in kernel_stats.hh). clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which causes confusion between the container std::set and the function Packet::set, and this is currently addressed by not including the entire namespace std, but rather selecting e.g. "using std::vector" in the appropriate places. diff 7006:77c9c4d5007d Fri Mar 12 20:31:00 EST 2010 Nathan Binkert <nate@binkert.org> packet: add a method to set the size diff 4024:9eada81a030b Wed Feb 07 01:31:00 EST 2007 Nathan Binkert <binkertn@umich.edu> Include compiler.hh since we use some of the #defines diff 2665:a124942bacb8 Wed May 31 19:26:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Updated Authors from bk prs info diff 2663:c82193ae8467 Wed May 31 00:12:00 EDT 2006 Steve Reinhardt <stever@eecs.umich.edu> Streamline interface to Request object. src/SConscript: mem/request.cc no longer needed (all functions inline). src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: src/cpu/simple/timing.cc: src/dev/io_device.cc: src/mem/port.cc: Modified Request object interface. src/mem/packet.hh: Modified Request object interface. Address & size are always set together now, so track with single flag. src/mem/request.hh: Streamline interface to support a handful of calls that set multiple fields reflecting common usage patterns. Reduce number of validFoo booleans by combining flags for fields which must be set together. |
/gem5/src/arch/generic/ | ||
H A D | memhelpers.hh | diff 8737:770ccf3af571 Tue Jan 31 00:05:00 EST 2012 Koan-Sin Tan <koansin.tan@gmail.com> clang: Enable compiling gem5 using clang 2.9 and 3.0 This patch adds the necessary flags to the SConstruct and SConscript files for compiling using clang 2.9 and later (on Ubuntu et al and OSX XCode 4.2), and also cleans up a bunch of compiler warnings found by clang. Most of the warnings are related to hidden virtual functions, comparisons with unsigneds >= 0, and if-statements with empty bodies. A number of mismatches between struct and class are also fixed. clang 2.8 is not working as it has problems with class names that occur in multiple namespaces (e.g. Statistics in kernel_stats.hh). clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which causes confusion between the container std::set and the function Packet::set, and this is currently addressed by not including the entire namespace std, but rather selecting e.g. "using std::vector" in the appropriate places. |
/gem5/src/arch/x86/isa/ | ||
H A D | operands.isa | diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. diff 5659:f4b9c344d1ca Sun Oct 12 18:31:00 EDT 2008 Gabe Black <gblack@eecs.umich.edu> X86: Implement CPUID with a magical function instead of microcode. diff 5246:21f29e99e021 Tue Nov 13 04:31:00 EST 2007 Gabe Black <gblack@eecs.umich.edu> X86: Make microcode use presegmentation RIPs and the rest of m5 use post segmentation RIPS. |
/gem5/tests/configs/ | ||
H A D | o3-timing.py | diff 9489:172dbcb74a0e Thu Jan 31 07:49:00 EST 2013 Andreas Hansson <andreas.hansson@arm.com> mem: Add DDR3 and LPDDR2 DRAM controller configurations This patch moves the default DRAM parameters from the SimpleDRAM class to two different subclasses, one for DDR3 and one for LPDDR2. More can be added as we go forward. The regressions that previously used the SimpleDRAM are now using SimpleDDR3 as this is the most similar configuration. diff 9036:6385cf85bf12 Thu May 31 13:30:00 EDT 2012 Andreas Hansson <andreas.hansson@arm.com> Bus: Split the bus into a non-coherent and coherent bus This patch introduces a class hierarchy of buses, a non-coherent one, and a coherent one, splitting the existing bus functionality. By doing so it also enables further specialisation of the two types of buses. A non-coherent bus connects a number of non-snooping masters and slaves, and routes the request and response packets based on the address. The request packets issued by the master connected to a non-coherent bus could still snoop in caches attached to a coherent bus, as is the case with the I/O bus and memory bus in most system configurations. No snoops will, however, reach any master on the non-coherent bus itself. The non-coherent bus can be used as a template for modelling PCI, PCIe, and non-coherent AMBA and OCP buses, and is typically used for the I/O buses. A coherent bus connects a number of (potentially) snooping masters and slaves, and routes the request and response packets based on the address, and also forwards all requests to the snoopers and deals with the snoop responses. The coherent bus can be used as a template for modelling QPI, HyperTransport, ACE and coherent OCP buses, and is typically used for the L1-to-L2 buses and as the main system interconnect. The configuration scripts are updated to use a NoncoherentBus for all peripheral and I/O buses. A bit of minor tidying up has also been done. diff 3402:db60546818d0 Tue Oct 31 14:33:00 EST 2006 Kevin Lim <ktlim@umich.edu> Remove mem parameter. Now the translating port asks the CPU's dcache's peer for its MemObject instead of having to have a paramter for the MemObject. configs/example/fs.py: configs/example/se.py: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: src/cpu/thread_state.cc: src/cpu/thread_state.hh: tests/configs/o3-timing-mp.py: tests/configs/o3-timing.py: tests/configs/simple-atomic-mp.py: tests/configs/simple-atomic.py: tests/configs/simple-timing-mp.py: tests/configs/simple-timing.py: tests/configs/tsunami-simple-atomic-dual.py: tests/configs/tsunami-simple-atomic.py: tests/configs/tsunami-simple-timing-dual.py: tests/configs/tsunami-simple-timing.py: No need for mem parameter any more. src/cpu/checker/cpu.cc: Use new constructor for simple thread (no more MemObject parameter). src/cpu/checker/cpu.hh: Remove MemObject parameter. src/cpu/memtest/memtest.hh: Ports now take in their MemObject owner. src/cpu/o3/alpha/cpu_builder.cc: Remove mem parameter. src/cpu/o3/alpha/cpu_impl.hh: Remove memory parameter and clean up handling of TranslatingPort. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/mips/cpu_builder.cc: src/cpu/o3/mips/cpu_impl.hh: src/cpu/o3/params.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_builder.cc: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/simple_params.hh: src/cpu/ozone/thread_state.hh: src/cpu/simple/atomic.cc: Remove memory parameter. |
/gem5/src/arch/sparc/ | ||
H A D | SConscript | diff 8778:fbaf6af0be93 Mon Oct 31 05:58:00 EDT 2011 Gabe Black <gblack@eecs.umich.edu> SE/FS: Remove the last uses of FULL_SYSTEM from SPARC. diff 5192:582e583f8e7e Wed Oct 31 01:21:00 EDT 2007 Ali Saidi <saidi@eecs.umich.edu> Traceflags: Add SCons function to created a traceflag instead of having one file with them all. diff 2665:a124942bacb8 Wed May 31 19:26:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Updated Authors from bk prs info |
H A D | faults.cc | diff 8778:fbaf6af0be93 Mon Oct 31 05:58:00 EDT 2011 Gabe Black <gblack@eecs.umich.edu> SE/FS: Remove the last uses of FULL_SYSTEM from SPARC. diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. diff 3972:2c65c89843c5 Tue Jan 23 01:31:00 EST 2007 Gabe Black <gblack@eecs.umich.edu> Merge zizzer.eecs.umich.edu:/bk/newmem into ewok.(none):/home/gblack/m5/newmemo3 src/sim/byteswap.hh: Hand Merge diff 3455:fdc8b63937ca Tue Oct 31 03:44:00 EST 2006 Gabe Black <gblack@eecs.umich.edu> Get rid of old, commented out code. diff 2665:a124942bacb8 Wed May 31 19:26:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Updated Authors from bk prs info |
/gem5/util/m5/ | ||
H A D | m5.c | diff 9949:7890c22dad25 Thu Oct 31 14:41:00 EDT 2013 Ali Saidi <Ali.Saidi@ARM.com> arm: fix m5ops binary for ARM and add m5fail. Changes to make m5ops work under virtualization seemed to break them working with non-virtualized systems and the recently added m5 fail command makes the m5op binary not compile. For now remove the code for virtualization. diff 8734:79592b2b1d55 Tue Jan 31 10:46:00 EST 2012 Dam Sunwoo <dam.sunwoo@arm.com> util: implements "writefile" gem5 op to export file from guest to host filesystem Usage: m5 writefile <filename> File will be created in the gem5 output folder with the identical filename. Implementation is largely based on the existing "readfile" functionality. Currently does not support exporting of folders. diff 2665:a124942bacb8 Wed May 31 19:26:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Updated Authors from bk prs info |
/gem5/src/arch/x86/ | ||
H A D | types.hh | diff 12045:31d9a81ba286 Wed May 24 06:09:00 EDT 2017 Gabe Black <gabeblack@google.com> x86: Rework how VEX prefixes are decoded. Remove redundant information from the ExtMachInst, hash the vex information to ensure the decode cache works properly, print the vex info when printing an ExtMachInst, consider the vex info when comparing two ExtMachInsts, fold the info from the vex prefixes into existing settings, remove redundant decode code, handle vex prefixes one byte at a time and don't bother building up the entire prefix, and let instructions that care about vex use it in their implementation, instead of developing an entire parallel decode tree. This also eliminates the error prone vex immediate decode table which was incomplete and would result in an out of bounds access for incorrectly encoded instructions or when the CPU was mispeculating, as it was (as far as I can tell) redundant with the tables that already existed for two and three byte opcodes. There were differences, but I think those may have been mistakes based on the documentation I found. Also, in 32 bit mode, the VEX prefixes might actually be LDS or LES instructions which are still legal in that mode. A valid VEX prefix would look like an LDS/LES with an otherwise invalid modrm encoding, so use that as a signal to abort processing the VEX and turn the instruction into an LES/LDS as appropriate. Change-Id: Icb367eaaa35590692df1c98862f315da4c139f5c Reviewed-on: https://gem5-review.googlesource.com/3501 Reviewed-by: Joe Gross <joe.gross@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> diff 10924:d02e9c239892 Fri Jul 17 12:31:00 EDT 2015 Nilay Vaish <nilay@cs.wisc.edu> x86: decode instructions with vex prefix This patch updates the x86 decoder so that it can decode instructions with vex prefix. It also updates the isa with opcodes from vex opcode maps 1, 2 and 3. Note that none of the instructions have been implemented yet. The implementations would be provided in due course of time. diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. |
/gem5/src/sim/ | ||
H A D | main.cc | diff 5202:ff56fa8c2091 Wed Oct 31 21:04:00 EDT 2007 Steve Reinhardt <stever@gmail.com> String constant const-ness changes to placate g++ 4.2. Also some bug fixes in MIPS ISA uncovered by g++ warnings (Python string compares don't work in C++!). diff 4081:80f1e833d118 Sun Feb 18 12:31:00 EST 2007 Nathan Binkert <binkertn@umich.edu> Get rid of the stand alone ParamContext since all of the relevant stuff has now been moved to python. diff 2665:a124942bacb8 Wed May 31 19:26:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Updated Authors from bk prs info |
H A D | sim_object.hh | diff 8737:770ccf3af571 Tue Jan 31 00:05:00 EST 2012 Koan-Sin Tan <koansin.tan@gmail.com> clang: Enable compiling gem5 using clang 2.9 and 3.0 This patch adds the necessary flags to the SConstruct and SConscript files for compiling using clang 2.9 and later (on Ubuntu et al and OSX XCode 4.2), and also cleans up a bunch of compiler warnings found by clang. Most of the warnings are related to hidden virtual functions, comparisons with unsigneds >= 0, and if-statements with empty bodies. A number of mismatches between struct and class are also fixed. clang 2.8 is not working as it has problems with class names that occur in multiple namespaces (e.g. Statistics in kernel_stats.hh). clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which causes confusion between the container std::set and the function Packet::set, and this is currently addressed by not including the entire namespace std, but rather selecting e.g. "using std::vector" in the appropriate places. diff 3451:185232d74b31 Tue Oct 31 13:23:00 EST 2006 Ali Saidi <saidi@eecs.umich.edu> remove connectAll() and connect() code since it isn't used anymore. (The python does it all) diff 2665:a124942bacb8 Wed May 31 19:26:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Updated Authors from bk prs info |
H A D | sim_object.cc | diff 8737:770ccf3af571 Tue Jan 31 00:05:00 EST 2012 Koan-Sin Tan <koansin.tan@gmail.com> clang: Enable compiling gem5 using clang 2.9 and 3.0 This patch adds the necessary flags to the SConstruct and SConscript files for compiling using clang 2.9 and later (on Ubuntu et al and OSX XCode 4.2), and also cleans up a bunch of compiler warnings found by clang. Most of the warnings are related to hidden virtual functions, comparisons with unsigneds >= 0, and if-statements with empty bodies. A number of mismatches between struct and class are also fixed. clang 2.8 is not working as it has problems with class names that occur in multiple namespaces (e.g. Statistics in kernel_stats.hh). clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which causes confusion between the container std::set and the function Packet::set, and this is currently addressed by not including the entire namespace std, but rather selecting e.g. "using std::vector" in the appropriate places. diff 3451:185232d74b31 Tue Oct 31 13:23:00 EST 2006 Ali Saidi <saidi@eecs.umich.edu> remove connectAll() and connect() code since it isn't used anymore. (The python does it all) diff 2665:a124942bacb8 Wed May 31 19:26:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Updated Authors from bk prs info |
H A D | syscall_emul.cc | diff 12796:16dffc0e6c7f Thu Jun 21 19:31:00 EDT 2018 Matt Sinclair <mattdsinclair@gmail.com> syscall_emul: adding symlink system call Change-Id: Iebda05c130b4d2ee8434cad1e703933bfda486c8 Reviewed-on: https://gem5-review.googlesource.com/11490 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> diff 9455:31afddc29cd4 Tue Jan 08 08:54:00 EST 2013 Mitch Hayenga <mitch.hayenga+gem5@gmail.com> arm: add access syscall for ARM SE mode This patch adds the "access" syscall for ARM SE as required by some spec2006 benchmarks. diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. diff 6703:2707e7b63f53 Fri Oct 30 00:31:00 EDT 2009 Vince Weaver <vince@csl.cornell.edu> SysCalls: Implement truncate64 system call This uses the new stack-based argument infrastructure. Tested on x86 and x86_64. diff 6702:21f032e2ee9b Sat Oct 31 16:20:00 EDT 2009 Gabe Black <gblack@eecs.umich.edu> Syscalls: Fix a warning turned error about an unused variable in m5.fast. diff 2665:a124942bacb8 Wed May 31 19:26:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Updated Authors from bk prs info |
/gem5/src/dev/ | ||
H A D | dma_device.cc | diff 11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. diff 10621:b7bc5b1084a4 Tue Dec 23 09:31:00 EST 2014 Curtis Dunham <Curtis.Dunham@arm.com> arm: Add stats to table walker This patch adds table walker stats for: - Walk events - Instruction vs Data - Page size histogram - Wait time and service time histograms - Pending requests histogram (per cycle) - measures dist. of L (p(1..) = how often busy, p(0) = how often idle) - Squashes, before starting and after completion diff 9814:7ad2b0186a32 Thu Jul 18 08:31:00 EDT 2013 Andreas Hansson <andreas.hansson@arm.com> mem: Set the cache line size on a system level This patch removes the notion of a peer block size and instead sets the cache line size on the system level. Previously the size was set per cache, and communicated through the interconnect. There were plenty checks to ensure that everyone had the same size specified, and these checks are now removed. Another benefit that is not yet harnessed is that the cache line size is now known at construction time, rather than after the port binding. Hence, the block size can be locally stored and does not have to be queried every time it is used. A follow-on patch updates the configuration scripts accordingly. |
H A D | io_device.cc | diff 3401:1df0cb879413 Tue Oct 31 13:59:00 EST 2006 Kevin Lim <ktlim@umich.edu> Ports now have a pointer to the MemObject that owns it (can be NULL). src/cpu/simple/atomic.hh: Port now takes in the MemObject that owns it. src/cpu/simple/timing.hh: Port now takes in MemObject that owns it. src/dev/io_device.cc: src/mem/bus.hh: Ports now take in the MemObject that owns it. src/mem/cache/base_cache.cc: Ports now take in the MemObject that own it. src/mem/port.hh: src/mem/tport.hh: Ports now optionally take in the MemObject that owns it. diff 2665:a124942bacb8 Wed May 31 19:26:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Updated Authors from bk prs info diff 2664:d341e9b3506a Wed May 31 13:47:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Merge zizzer:/bk/newmem into zeep.pool:/z/saidi/work/m5.newmem diff 2663:c82193ae8467 Wed May 31 00:12:00 EDT 2006 Steve Reinhardt <stever@eecs.umich.edu> Streamline interface to Request object. src/SConscript: mem/request.cc no longer needed (all functions inline). src/cpu/simple/atomic.cc: src/cpu/simple/base.cc: src/cpu/simple/timing.cc: src/dev/io_device.cc: src/mem/port.cc: Modified Request object interface. src/mem/packet.hh: Modified Request object interface. Address & size are always set together now, so track with single flag. src/mem/request.hh: Streamline interface to support a handful of calls that set multiple fields reflecting common usage patterns. Reduce number of validFoo booleans by combining flags for fields which must be set together. diff 2659:e10ebef009c1 Wed May 31 13:45:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> make io device be a little nicer about scheduling retries on the bus |
/gem5/src/mem/cache/tags/ | ||
H A D | base.hh | diff 13418:08101e89101e Thu Oct 18 09:31:00 EDT 2018 Daniel R. Carvalho <odanrc@yahoo.com.br> mem-cache: Move access latency calculation to Cache Access latency was not being calculated properly, as it was always assuming that for hits reads take as long as writes, and that parallel accesses would produce the same latency for read and write misses. By moving the calculation to the Cache we can use the write/ read information, reduce latency variables duplication and remove Cache dependency from Tags. The tag lookup latency is still calculated by the Tags. Change-Id: I71bc68fb5c3515b372c3bf002d61b6f048a45540 Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/13697 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> diff 12566:d6d48df9bf0f Mon Oct 31 09:33:00 EDT 2016 Nikos Nikoleris <nikos.nikoleris@arm.com> mem-cache: Make invalidate a common function between tag classes invalidate was defined as a separate function in the base associative and fully-associative tags classes although both functions should implement identical functionality. This patch moves the invalidate function in the base tags class. Change-Id: I206ee969b00ab9e05873c6d87531474fcd712907 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/8286 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> diff 12553:514f2e4fb751 Mon Oct 31 11:52:00 EDT 2016 Nikos Nikoleris <nikos.nikoleris@arm.com> mem-cache: Remove mumBlock redundant initialiation from FALRU Change-Id: Id3afec0a62446d6d0f44ccb655032343037637e0 Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> Reviewed-on: https://gem5-review.googlesource.com/8281 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> |
/gem5/src/mem/ruby/system/ | ||
H A D | DMASequencer.cc | diff 11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. diff 10231:cb2e6950956d Sat May 31 21:00:00 EDT 2014 Steve Reinhardt <steve.reinhardt@amd.com> style: eliminate equality tests with true and false Using '== true' in a boolean expression is totally redundant, and using '== false' is pretty verbose (and arguably less readable in most cases) compared to '!'. It's somewhat of a pet peeve, perhaps, but I had some time waiting for some tests to run and decided to clean these up. Unfortunately, SLICC appears not to have the '!' operator, so I had to leave the '== false' tests in the SLICC code. diff 8645:89929730804b Sat Dec 31 19:44:00 EST 2011 Nilay Vaish<nilay@cs.wisc.edu> Ruby: Shuffle some of the included files This patch adds and removes included files from some of the files so as to organize remove some false dependencies and include some files directly instead of transitively. |
/gem5/src/arch/alpha/ | ||
H A D | ev5.cc | diff 8607:5fb918115c07 Mon Oct 31 04:09:00 EDT 2011 Gabe Black <gblack@eecs.umich.edu> GCC: Get everything working with gcc 4.6.1. And by "everything" I mean all the quick regressions. diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. diff 3459:dd091092c8bb Tue Oct 31 16:36:00 EST 2006 Gabe Black <gblack@eecs.umich.edu> Made the old name refer to the miscreg index to prevent having to change code all over the place. diff 3457:7479ebe49444 Tue Oct 31 16:02:00 EST 2006 Gabe Black <gblack@eecs.umich.edu> Make the IPRs use regular miscreg indexes, and make a table or two to find the miscreg index of a specific IPR. diff 2665:a124942bacb8 Wed May 31 19:26:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Updated Authors from bk prs info |
/gem5/src/cpu/simple/ | ||
H A D | base.cc | diff 8737:770ccf3af571 Tue Jan 31 00:05:00 EST 2012 Koan-Sin Tan <koansin.tan@gmail.com> clang: Enable compiling gem5 using clang 2.9 and 3.0 This patch adds the necessary flags to the SConstruct and SConscript files for compiling using clang 2.9 and later (on Ubuntu et al and OSX XCode 4.2), and also cleans up a bunch of compiler warnings found by clang. Most of the warnings are related to hidden virtual functions, comparisons with unsigneds >= 0, and if-statements with empty bodies. A number of mismatches between struct and class are also fixed. clang 2.8 is not working as it has problems with class names that occur in multiple namespaces (e.g. Statistics in kernel_stats.hh). clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which causes confusion between the container std::set and the function Packet::set, and this is currently addressed by not including the entire namespace std, but rather selecting e.g. "using std::vector" in the appropriate places. diff 8733:64a7bf8fa56c Tue Jan 31 10:46:00 EST 2012 Geoffrey Blake <geoffrey.blake@arm.com> CheckerCPU: Re-factor CheckerCPU to be compatible with current gem5 Brings the CheckerCPU back to life to allow FS and SE checking of the O3CPU. These changes have only been tested with the ARM ISA. Other ISAs potentially require modification. diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. diff 4501:b5f473594687 Thu May 31 16:45:00 EDT 2007 Gabe Black <gblack@eecs.umich.edu> Merge zizzer.eecs.umich.edu:/bk/newmem into ahchoo.blinky.homelinux.org:/home/gblack/m5/newmem-x86 src/cpu/simple/base.cc: Hand merge diff 4495:dbd2943590e6 Thu May 31 16:01:00 EDT 2007 Vincentius Robby <acolyte@umich.edu> Assign traceData to be NULL at BaseSimpleCPU constructor. Initialize a temporary variable for thread->readPC() at setupFetchRequest() to reduce function calls. exec tracing isn't needed for m5.fast binaries Moved MISCREG_GL, MISCREG_CWP, and MISCREG_TLB_DATA out of switch statement and use if blocks instead. src/arch/sparc/miscregfile.cc: Moved MISCREG_GL, MISCREG_CWP, and MISCREG_TLB_DATA out of switch statement and use if blocks instead. src/cpu/simple/base.cc: Assign traceData to be NULL at BaseSimpleCPU constructor. Initialize a temporary variable for thread->readPC() at setupFetchRequest() to reduce function calls. exec tracing isn't needed for m5.fast binaries diff 4400:619191b2f011 Mon Apr 16 11:31:00 EDT 2007 Ron Dreslinski <rdreslin@umich.edu> Fixes for splash, may conflict with Korey's SMT work and doesn't support 03cpu yet. src/cpu/simple/base.cc: Cpu's should start as unallocated, not suspended src/cpu/simple_thread.cc: Wait for a thread to be assigned to activate the cpu src/kern/tru64/tru64.hh: When looking for a open cpu to assign threads, look for an unallocated one, not a suspended one. diff 3476:0e26b5458236 Tue Oct 31 14:37:00 EST 2006 Kevin Lim <ktlim@umich.edu> Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-busfix configs/example/fs.py: configs/example/se.py: src/mem/tport.hh: Hand merge. diff 3402:db60546818d0 Tue Oct 31 14:33:00 EST 2006 Kevin Lim <ktlim@umich.edu> Remove mem parameter. Now the translating port asks the CPU's dcache's peer for its MemObject instead of having to have a paramter for the MemObject. configs/example/fs.py: configs/example/se.py: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: src/cpu/thread_state.cc: src/cpu/thread_state.hh: tests/configs/o3-timing-mp.py: tests/configs/o3-timing.py: tests/configs/simple-atomic-mp.py: tests/configs/simple-atomic.py: tests/configs/simple-timing-mp.py: tests/configs/simple-timing.py: tests/configs/tsunami-simple-atomic-dual.py: tests/configs/tsunami-simple-atomic.py: tests/configs/tsunami-simple-timing-dual.py: tests/configs/tsunami-simple-timing.py: No need for mem parameter any more. src/cpu/checker/cpu.cc: Use new constructor for simple thread (no more MemObject parameter). src/cpu/checker/cpu.hh: Remove MemObject parameter. src/cpu/memtest/memtest.hh: Ports now take in their MemObject owner. src/cpu/o3/alpha/cpu_builder.cc: Remove mem parameter. src/cpu/o3/alpha/cpu_impl.hh: Remove memory parameter and clean up handling of TranslatingPort. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/mips/cpu_builder.cc: src/cpu/o3/mips/cpu_impl.hh: src/cpu/o3/params.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_builder.cc: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/simple_params.hh: src/cpu/ozone/thread_state.hh: src/cpu/simple/atomic.cc: Remove memory parameter. diff 3093:b09c33e66bce Thu Aug 31 20:51:00 EDT 2006 Korey Sewell <ksewell@umich.edu> add ISA_HAS_DELAY_SLOT directive instead of "#if THE_ISA == ALPHA_ISA" throughout CPU models src/arch/alpha/isa_traits.hh: src/arch/mips/isa_traits.hh: src/arch/sparc/isa_traits.hh: define 'ISA_HAS_DELAY_SLOT' src/cpu/base_dyn_inst.hh: src/cpu/o3/bpred_unit_impl.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/simple/base.cc: use ISA_HAS_DELAY_SLOT instead of THE_ISA == ALPHA_ISA diff 2665:a124942bacb8 Wed May 31 19:26:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Updated Authors from bk prs info |
/gem5/src/ | ||
H A D | SConscript | diff 9227:c208c904ab13 Fri Sep 14 00:13:00 EDT 2012 Andreas Hansson <andreas.hansson@arm.com> gcc: Enable Link-Time Optimization for gcc >= 4.6 This patch adds Link-Time Optimization when building the fast target using gcc >= 4.6, and adds a scons flag to disable it (-no-lto). No check is performed to guarantee that the linker supports LTO and use of the linker plugin, so the user has to ensure that binutils GNU ld >= 2.21 or the gold linker is available. Typically, if gcc >= 4.6 is available, the latter should not be a problem. Currently the LTO option is only useful for gcc >= 4.6, due to the limited support on clang and earlier versions of gcc. The intention is to also add support for clang once the LTO integration matures. The same number of jobs is used for the parallel phase of LTO as the jobs specified on the scons command line, using the -flto=n flag that was introduced with gcc 4.6. The gold linker also supports concurrent and incremental linking, but this is not used at this point. The compilation and linking time is increased by almost 50% on average, although ARM seems to be particularly demanding with an increase of almost 100%. Also beware when using this as gcc uses a tremendous amount of memory and temp space in the process. You have been warned. After some careful consideration, and plenty discussions, the flag is only added to the fast target, and the warning that was issued in an earlier version of this patch is now removed. Similarly, the flag used to enable LTO, now the default is to use it, and the flag has been modified to disable LTO. The rationale behind this decision is that opt is used for development, whereas fast is only used for long runs, e.g. regressions or more elaborate experiments where the additional compile and link time is amortized by a much larger run time. When it comes to the return on investment, the regression seems to be roughly 15% faster with LTO. For a bit more detail, I ran twolf on ARM.fast, with three repeated runs, and they all finish within 42 minutes (+- 25 seconds) without LTO and 31 minutes (+- 25 seconds) with LTO, i.e. LTO gives an impressive >25% speed-up for this case. Without LTO (ARM.fast twolf) real 42m37.632s user 42m34.448s sys 0m0.390s real 41m51.793s user 41m50.384s sys 0m0.131s real 41m45.491s user 41m39.791s sys 0m0.139s With LTO (ARM.fast twolf) real 30m33.588s user 30m5.701s sys 0m0.141s real 31m27.791s user 31m24.674s sys 0m0.111s real 31m25.500s user 31m16.731s sys 0m0.106s diff 9227:c208c904ab13 Fri Sep 14 00:13:00 EDT 2012 Andreas Hansson <andreas.hansson@arm.com> gcc: Enable Link-Time Optimization for gcc >= 4.6 This patch adds Link-Time Optimization when building the fast target using gcc >= 4.6, and adds a scons flag to disable it (-no-lto). No check is performed to guarantee that the linker supports LTO and use of the linker plugin, so the user has to ensure that binutils GNU ld >= 2.21 or the gold linker is available. Typically, if gcc >= 4.6 is available, the latter should not be a problem. Currently the LTO option is only useful for gcc >= 4.6, due to the limited support on clang and earlier versions of gcc. The intention is to also add support for clang once the LTO integration matures. The same number of jobs is used for the parallel phase of LTO as the jobs specified on the scons command line, using the -flto=n flag that was introduced with gcc 4.6. The gold linker also supports concurrent and incremental linking, but this is not used at this point. The compilation and linking time is increased by almost 50% on average, although ARM seems to be particularly demanding with an increase of almost 100%. Also beware when using this as gcc uses a tremendous amount of memory and temp space in the process. You have been warned. After some careful consideration, and plenty discussions, the flag is only added to the fast target, and the warning that was issued in an earlier version of this patch is now removed. Similarly, the flag used to enable LTO, now the default is to use it, and the flag has been modified to disable LTO. The rationale behind this decision is that opt is used for development, whereas fast is only used for long runs, e.g. regressions or more elaborate experiments where the additional compile and link time is amortized by a much larger run time. When it comes to the return on investment, the regression seems to be roughly 15% faster with LTO. For a bit more detail, I ran twolf on ARM.fast, with three repeated runs, and they all finish within 42 minutes (+- 25 seconds) without LTO and 31 minutes (+- 25 seconds) with LTO, i.e. LTO gives an impressive >25% speed-up for this case. Without LTO (ARM.fast twolf) real 42m37.632s user 42m34.448s sys 0m0.390s real 41m51.793s user 41m50.384s sys 0m0.131s real 41m45.491s user 41m39.791s sys 0m0.139s With LTO (ARM.fast twolf) real 30m33.588s user 30m5.701s sys 0m0.141s real 31m27.791s user 31m24.674s sys 0m0.111s real 31m25.500s user 31m16.731s sys 0m0.106s diff 9227:c208c904ab13 Fri Sep 14 00:13:00 EDT 2012 Andreas Hansson <andreas.hansson@arm.com> gcc: Enable Link-Time Optimization for gcc >= 4.6 This patch adds Link-Time Optimization when building the fast target using gcc >= 4.6, and adds a scons flag to disable it (-no-lto). No check is performed to guarantee that the linker supports LTO and use of the linker plugin, so the user has to ensure that binutils GNU ld >= 2.21 or the gold linker is available. Typically, if gcc >= 4.6 is available, the latter should not be a problem. Currently the LTO option is only useful for gcc >= 4.6, due to the limited support on clang and earlier versions of gcc. The intention is to also add support for clang once the LTO integration matures. The same number of jobs is used for the parallel phase of LTO as the jobs specified on the scons command line, using the -flto=n flag that was introduced with gcc 4.6. The gold linker also supports concurrent and incremental linking, but this is not used at this point. The compilation and linking time is increased by almost 50% on average, although ARM seems to be particularly demanding with an increase of almost 100%. Also beware when using this as gcc uses a tremendous amount of memory and temp space in the process. You have been warned. After some careful consideration, and plenty discussions, the flag is only added to the fast target, and the warning that was issued in an earlier version of this patch is now removed. Similarly, the flag used to enable LTO, now the default is to use it, and the flag has been modified to disable LTO. The rationale behind this decision is that opt is used for development, whereas fast is only used for long runs, e.g. regressions or more elaborate experiments where the additional compile and link time is amortized by a much larger run time. When it comes to the return on investment, the regression seems to be roughly 15% faster with LTO. For a bit more detail, I ran twolf on ARM.fast, with three repeated runs, and they all finish within 42 minutes (+- 25 seconds) without LTO and 31 minutes (+- 25 seconds) with LTO, i.e. LTO gives an impressive >25% speed-up for this case. Without LTO (ARM.fast twolf) real 42m37.632s user 42m34.448s sys 0m0.390s real 41m51.793s user 41m50.384s sys 0m0.131s real 41m45.491s user 41m39.791s sys 0m0.139s With LTO (ARM.fast twolf) real 30m33.588s user 30m5.701s sys 0m0.141s real 31m27.791s user 31m24.674s sys 0m0.111s real 31m25.500s user 31m16.731s sys 0m0.106s diff 9227:c208c904ab13 Fri Sep 14 00:13:00 EDT 2012 Andreas Hansson <andreas.hansson@arm.com> gcc: Enable Link-Time Optimization for gcc >= 4.6 This patch adds Link-Time Optimization when building the fast target using gcc >= 4.6, and adds a scons flag to disable it (-no-lto). No check is performed to guarantee that the linker supports LTO and use of the linker plugin, so the user has to ensure that binutils GNU ld >= 2.21 or the gold linker is available. Typically, if gcc >= 4.6 is available, the latter should not be a problem. Currently the LTO option is only useful for gcc >= 4.6, due to the limited support on clang and earlier versions of gcc. The intention is to also add support for clang once the LTO integration matures. The same number of jobs is used for the parallel phase of LTO as the jobs specified on the scons command line, using the -flto=n flag that was introduced with gcc 4.6. The gold linker also supports concurrent and incremental linking, but this is not used at this point. The compilation and linking time is increased by almost 50% on average, although ARM seems to be particularly demanding with an increase of almost 100%. Also beware when using this as gcc uses a tremendous amount of memory and temp space in the process. You have been warned. After some careful consideration, and plenty discussions, the flag is only added to the fast target, and the warning that was issued in an earlier version of this patch is now removed. Similarly, the flag used to enable LTO, now the default is to use it, and the flag has been modified to disable LTO. The rationale behind this decision is that opt is used for development, whereas fast is only used for long runs, e.g. regressions or more elaborate experiments where the additional compile and link time is amortized by a much larger run time. When it comes to the return on investment, the regression seems to be roughly 15% faster with LTO. For a bit more detail, I ran twolf on ARM.fast, with three repeated runs, and they all finish within 42 minutes (+- 25 seconds) without LTO and 31 minutes (+- 25 seconds) with LTO, i.e. LTO gives an impressive >25% speed-up for this case. Without LTO (ARM.fast twolf) real 42m37.632s user 42m34.448s sys 0m0.390s real 41m51.793s user 41m50.384s sys 0m0.131s real 41m45.491s user 41m39.791s sys 0m0.139s With LTO (ARM.fast twolf) real 30m33.588s user 30m5.701s sys 0m0.141s real 31m27.791s user 31m24.674s sys 0m0.111s real 31m25.500s user 31m16.731s sys 0m0.106s diff 9227:c208c904ab13 Fri Sep 14 00:13:00 EDT 2012 Andreas Hansson <andreas.hansson@arm.com> gcc: Enable Link-Time Optimization for gcc >= 4.6 This patch adds Link-Time Optimization when building the fast target using gcc >= 4.6, and adds a scons flag to disable it (-no-lto). No check is performed to guarantee that the linker supports LTO and use of the linker plugin, so the user has to ensure that binutils GNU ld >= 2.21 or the gold linker is available. Typically, if gcc >= 4.6 is available, the latter should not be a problem. Currently the LTO option is only useful for gcc >= 4.6, due to the limited support on clang and earlier versions of gcc. The intention is to also add support for clang once the LTO integration matures. The same number of jobs is used for the parallel phase of LTO as the jobs specified on the scons command line, using the -flto=n flag that was introduced with gcc 4.6. The gold linker also supports concurrent and incremental linking, but this is not used at this point. The compilation and linking time is increased by almost 50% on average, although ARM seems to be particularly demanding with an increase of almost 100%. Also beware when using this as gcc uses a tremendous amount of memory and temp space in the process. You have been warned. After some careful consideration, and plenty discussions, the flag is only added to the fast target, and the warning that was issued in an earlier version of this patch is now removed. Similarly, the flag used to enable LTO, now the default is to use it, and the flag has been modified to disable LTO. The rationale behind this decision is that opt is used for development, whereas fast is only used for long runs, e.g. regressions or more elaborate experiments where the additional compile and link time is amortized by a much larger run time. When it comes to the return on investment, the regression seems to be roughly 15% faster with LTO. For a bit more detail, I ran twolf on ARM.fast, with three repeated runs, and they all finish within 42 minutes (+- 25 seconds) without LTO and 31 minutes (+- 25 seconds) with LTO, i.e. LTO gives an impressive >25% speed-up for this case. Without LTO (ARM.fast twolf) real 42m37.632s user 42m34.448s sys 0m0.390s real 41m51.793s user 41m50.384s sys 0m0.131s real 41m45.491s user 41m39.791s sys 0m0.139s With LTO (ARM.fast twolf) real 30m33.588s user 30m5.701s sys 0m0.141s real 31m27.791s user 31m24.674s sys 0m0.111s real 31m25.500s user 31m16.731s sys 0m0.106s diff 8737:770ccf3af571 Tue Jan 31 00:05:00 EST 2012 Koan-Sin Tan <koansin.tan@gmail.com> clang: Enable compiling gem5 using clang 2.9 and 3.0 This patch adds the necessary flags to the SConstruct and SConscript files for compiling using clang 2.9 and later (on Ubuntu et al and OSX XCode 4.2), and also cleans up a bunch of compiler warnings found by clang. Most of the warnings are related to hidden virtual functions, comparisons with unsigneds >= 0, and if-statements with empty bodies. A number of mismatches between struct and class are also fixed. clang 2.8 is not working as it has problems with class names that occur in multiple namespaces (e.g. Statistics in kernel_stats.hh). clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which causes confusion between the container std::set and the function Packet::set, and this is currently addressed by not including the entire namespace std, but rather selecting e.g. "using std::vector" in the appropriate places. diff 8607:5fb918115c07 Mon Oct 31 04:09:00 EDT 2011 Gabe Black <gblack@eecs.umich.edu> GCC: Get everything working with gcc 4.6.1. And by "everything" I mean all the quick regressions. diff 5517:3ad997252dd2 Thu Jul 31 11:01:00 EDT 2008 Nathan Binkert <nate@binkert.org> scons: Get rid of generate.py in the build system. I decided that separating some of the scons code into generate.py was just a bad idea because it caused the dependency system to get all messed up. If separation is the right way to go in the future, we should probably use the sconscript mechanism, not the mechanism that I just removed. diff 5192:582e583f8e7e Wed Oct 31 01:21:00 EDT 2007 Ali Saidi <saidi@eecs.umich.edu> Traceflags: Add SCons function to created a traceflag instead of having one file with them all. diff 4007:8c3bfad8bb92 Wed Jan 31 18:32:00 EST 2007 Ali Saidi <saidi@eecs.umich.edu> make sparc fs less chatty src/SConscript: strip doesn't take a src and dest in solaris |
/gem5/configs/example/ | ||
H A D | se.py | diff 9815:3b3b94536547 Thu Jul 18 08:31:00 EDT 2013 Andreas Hansson <andreas.hansson> config: Update script to set cache line size on system This patch changes the config scripts such that they do not set the cache line size per cache instance, but rather for the system as a whole. diff 9036:6385cf85bf12 Thu May 31 13:30:00 EDT 2012 Andreas Hansson <andreas.hansson@arm.com> Bus: Split the bus into a non-coherent and coherent bus This patch introduces a class hierarchy of buses, a non-coherent one, and a coherent one, splitting the existing bus functionality. By doing so it also enables further specialisation of the two types of buses. A non-coherent bus connects a number of non-snooping masters and slaves, and routes the request and response packets based on the address. The request packets issued by the master connected to a non-coherent bus could still snoop in caches attached to a coherent bus, as is the case with the I/O bus and memory bus in most system configurations. No snoops will, however, reach any master on the non-coherent bus itself. The non-coherent bus can be used as a template for modelling PCI, PCIe, and non-coherent AMBA and OCP buses, and is typically used for the I/O buses. A coherent bus connects a number of (potentially) snooping masters and slaves, and routes the request and response packets based on the address, and also forwards all requests to the snoopers and deals with the snoop responses. The coherent bus can be used as a template for modelling QPI, HyperTransport, ACE and coherent OCP buses, and is typically used for the L1-to-L2 buses and as the main system interconnect. The configuration scripts are updated to use a NoncoherentBus for all peripheral and I/O buses. A bit of minor tidying up has also been done. diff 8808:8af87554ad7e Tue Jan 31 00:07:00 EST 2012 Gabe Black <gblack@eecs.umich.edu> Merge with main repository. diff 3477:eaf445891a4e Tue Oct 31 14:58:00 EST 2006 Kevin Lim <ktlim@umich.edu> Fix up configs. configs/common/Simulation.py: Remove mem parameter. configs/example/se.py: Remove debug output that got included in my other push. diff 3476:0e26b5458236 Tue Oct 31 14:37:00 EST 2006 Kevin Lim <ktlim@umich.edu> Merge ktlim@zizzer:/bk/newmem into zamp.eecs.umich.edu:/z/ktlim2/clean/newmem-busfix configs/example/fs.py: configs/example/se.py: src/mem/tport.hh: Hand merge. diff 3402:db60546818d0 Tue Oct 31 14:33:00 EST 2006 Kevin Lim <ktlim@umich.edu> Remove mem parameter. Now the translating port asks the CPU's dcache's peer for its MemObject instead of having to have a paramter for the MemObject. configs/example/fs.py: configs/example/se.py: src/cpu/simple/base.cc: src/cpu/simple/base.hh: src/cpu/simple/timing.cc: src/cpu/simple_thread.cc: src/cpu/simple_thread.hh: src/cpu/thread_state.cc: src/cpu/thread_state.hh: tests/configs/o3-timing-mp.py: tests/configs/o3-timing.py: tests/configs/simple-atomic-mp.py: tests/configs/simple-atomic.py: tests/configs/simple-timing-mp.py: tests/configs/simple-timing.py: tests/configs/tsunami-simple-atomic-dual.py: tests/configs/tsunami-simple-atomic.py: tests/configs/tsunami-simple-timing-dual.py: tests/configs/tsunami-simple-timing.py: No need for mem parameter any more. src/cpu/checker/cpu.cc: Use new constructor for simple thread (no more MemObject parameter). src/cpu/checker/cpu.hh: Remove MemObject parameter. src/cpu/memtest/memtest.hh: Ports now take in their MemObject owner. src/cpu/o3/alpha/cpu_builder.cc: Remove mem parameter. src/cpu/o3/alpha/cpu_impl.hh: Remove memory parameter and clean up handling of TranslatingPort. src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/fetch.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/mips/cpu_builder.cc: src/cpu/o3/mips/cpu_impl.hh: src/cpu/o3/params.hh: src/cpu/o3/thread_state.hh: src/cpu/ozone/cpu.hh: src/cpu/ozone/cpu_builder.cc: src/cpu/ozone/cpu_impl.hh: src/cpu/ozone/front_end.hh: src/cpu/ozone/front_end_impl.hh: src/cpu/ozone/lw_lsq.hh: src/cpu/ozone/lw_lsq_impl.hh: src/cpu/ozone/simple_params.hh: src/cpu/ozone/thread_state.hh: src/cpu/simple/atomic.cc: Remove memory parameter. |
/gem5/src/cpu/o3/ | ||
H A D | rename_impl.hh | diff 12144:3f2976f87529 Tue Jul 18 11:31:00 EDT 2017 Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> cpu: Add missing rename of vector registers in the O3 CPU The introduction of a new vector register class broke rename in the O3 CPU due to an unhandled register class in DefaultRename<Impl>::renameSrcRegs(). This patch fixes adds the necessary handling to avoid a panic when the vector register file is used. Change-Id: Ie380ab35ec4a151db15402f25b25b58931ee0581 Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/4140 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> diff 8607:5fb918115c07 Mon Oct 31 04:09:00 EDT 2011 Gabe Black <gblack@eecs.umich.edu> GCC: Get everything working with gcc 4.6.1. And by "everything" I mean all the quick regressions. diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. diff 4352:52f11aaf7d19 Sun Apr 08 19:31:00 EDT 2007 Gabe Black <gblack@eecs.umich.edu> Take into account that the flattened integer register space is a different size than the architected one. Also fixed some asserts. diff 3093:b09c33e66bce Thu Aug 31 20:51:00 EDT 2006 Korey Sewell <ksewell@umich.edu> add ISA_HAS_DELAY_SLOT directive instead of "#if THE_ISA == ALPHA_ISA" throughout CPU models src/arch/alpha/isa_traits.hh: src/arch/mips/isa_traits.hh: src/arch/sparc/isa_traits.hh: define 'ISA_HAS_DELAY_SLOT' src/cpu/base_dyn_inst.hh: src/cpu/o3/bpred_unit_impl.hh: src/cpu/o3/commit_impl.hh: src/cpu/o3/cpu.cc: src/cpu/o3/cpu.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/simple/base.cc: use ISA_HAS_DELAY_SLOT instead of THE_ISA == ALPHA_ISA diff 2665:a124942bacb8 Wed May 31 19:26:00 EDT 2006 Ali Saidi <saidi@eecs.umich.edu> Updated Authors from bk prs info |
/gem5/src/mem/cache/ | ||
H A D | cache.cc | diff 14035:60068a2d56e0 Fri May 31 20:01:00 EDT 2019 Daniel Carvalho <odanrc@yahoo.com.br> Revert "mem-cache: Remove writebacks packet list" This reverts commit bf0a722acdd8247602e83720a5f81a0b69c76250. Reason for revert: This patch introduces a bug: The problem here is that the insertion of block A may cause the eviction of block B, which on the lower level may cause the eviction of block A. Since A is not marked as present yet, A is "safely" removed from the snoop filter However, by reverting it, using atomic and a Tags sub-class that can generate multiple evictions at once becomes broken when using Atomic mode and shall be fixed in a future patch. Change-Id: I5b27e54b54ae5b50255588835c1a2ebf3015f002 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19088 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> diff 12345:70c783a93195 Tue May 31 13:03:00 EDT 2016 Nikos Nikoleris <nikos.nikoleris@arm.com> mem: Add support for WriteClean packets in the memory system This change adds support for creating and handling WriteClean packets. The WriteClean operation is almost identical to a WritebackDirty with the exception that the cache generating a WriteClean retains a copy of the block. Change-Id: I63c8de62919fad0f9547d412f8266aa4292ebecd Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> Reviewed-by: Anouk Van Laer <anouk.vanlaer@arm.com> Reviewed-on: https://gem5-review.googlesource.com/5045 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com> diff 11288:57c340f947c7 Thu Dec 31 12:32:00 EST 2015 Steve Reinhardt <steve.reinhardt@amd.com> mem: add CacheVerbose debug flag, filter noisy DPRINTFs Some of the DPRINTFs added to the classic cache in cset 45df88079f04, while useful to those unfamiliar with the cache code, end up being noise when you're familiar with the code but are trying to debug tricky protocol issues. (Particularly getting two messages from each cache as it receives a snoop request then declares that there was no match.) This patch introduces a CacheVerbose debug flag, and moves a subset of the added DPRINTFs into that category, so that Cache by itself returns to being a more succinct summary of cache activity. Also added a CacheAll compound flag to turn on all the cache-related debug flags (other than CacheTags, which you *really* have to want badly to turn it on, IMO). diff 11286:2071db8f864b Thu Dec 31 09:33:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Do not allocate space for packet data if not needed This patch looks at the request and response command to determine if either actually has any data payload, and if not, we do not allocate any space for packet data. The only tricky case is where the command type is changed as part of the MSHR functionality. In these cases where the original packet had no data, but the new packet does, we need to explicitly call allocate(). diff 11285:25715951a4b8 Thu Dec 31 09:33:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Do not alter cache block state on uncacheable snoops This patch ensures we do not respond with a Modified (dirty and writable) line if the request is uncacheable, and that the cache responding retains the line without modifying the state (even if responding). diff 11284:b3926db25371 Thu Dec 31 09:32:00 EST 2015 Andreas Hansson <andreas.hansson@arm.com> mem: Make cache terminology easier to understand This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand. |
Completed in 443 milliseconds