Search

Home

Sort by

Full Search
Definition
Symbol
File Path
History
Type

In Project(s)

Searched hist:2010 (Results 401 - 425 of 929) sorted by relevance

<<11 12 13 14 15 161718 19 20 >>

/gem5/src/arch/x86/
H A D	faults.hh	diff 7714:32496de51017 Fri Oct 22 03:24:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Implement genMachineCheckFault. Even though this shouldn't ever be used, it might get called speculatively and shouldn't panic. diff 7681:61e31534522d Tue Sep 14 03:27:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Make unrecognized instructions behave better in x86. diff 7678:f19b6a3a8cec Mon Sep 13 22:26:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> Faults: Pass the StaticInst involved, if any, to a Fault's invoke method. Also move the "Fault" reference counted pointer type into a separate file, sim/fault.hh. It would be better to name this less similarly to sim/faults.hh to reduce confusion, but fault.hh matches the name of the type. We could change Fault to FaultPtr to match other pointer types, and then changing the name of the file would make more sense. diff 7625:b1e69203bae9 Mon Aug 23 12:44:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Make the TLB fault instead of panic when something is unmapped in SE mode. The fault object, if invoked, would then panic. This is a bit less direct, but it means speculative execution won't panic the simulator. diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly
H A D	system.cc	diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. diff 7704:b5e6461ea242 Sun Oct 10 23:39:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Detect attempts to load a 32 bit kernel and panic. diff 7629:0f0c231e3e97 Mon Aug 23 19:14:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Create a directory for files that define register indexes. This is to help tidy up arch/x86. These files should not be used external to the ISA. diff 7532:3f6413fc37a2 Tue Aug 17 08:17:00 EDT 2010 Steve Reinhardt <steve.reinhardt@amd.com> sim: revamp unserialization procedure Replace direct call to unserialize() on each SimObject with a pair of calls for better control over initialization in both ckpt and non-ckpt cases. If restoring from a checkpoint, loadState(ckpt) is called on each SimObject. The default implementation simply calls unserialize() if there is a corresponding checkpoint section, so we get backward compatibility for existing objects. However, objects can override loadState() to get other behaviors, e.g., doing other programmed initializations after unserialize(), or complaining if no checkpoint section is found. (Note that the default warning for a missing checkpoint section is now gone.) If not restoring from a checkpoint, we call the new initState() method on each SimObject instead. This provides a hook for state initializations that are only required when not restoring from a checkpoint. Given this new framework, do some cleanup of LiveProcess subclasses and X86System, which were (in some cases) emulating initState() behavior in startup via a local flag or (in other cases) erroneously doing initializations in startup() that clobbered state loaded earlier by unserialize(). diff 7447:3fc243687abb Thu Jun 03 22:41:00 EDT 2010 Steve Reinhardt <steve.reinhardt@amd.com> More minor gdb-related cleanup. Found several more stale includes and forward decls. diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly
/gem5/src/arch/arm/
H A D	faults.hh	diff 7678:f19b6a3a8cec Mon Sep 13 22:26:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> Faults: Pass the StaticInst involved, if any, to a Fault's invoke method. Also move the "Fault" reference counted pointer type into a separate file, sim/fault.hh. It would be better to name this less similarly to sim/faults.hh to reduce confusion, but fault.hh matches the name of the type. We could change Fault to FaultPtr to match other pointer types, and then changing the name of the file would make more sense. diff 7652:f2621206b062 Wed Aug 25 20:10:00 EDT 2010 Min Kyu Jeong <minkyu.jeong@arm.com> ARM: Adding a bogus fault that does nothing. This fault can used to flush the pipe, not including the faulting instruction. The particular case I needed this was for a self-modifying code. It needed to drain the store queue and force the following instruction to refetch from icache. DCCMVAC cp15 mcr instruction is modified to raise this fault. diff 7640:5286a8a469c5 Wed Aug 25 20:10:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Implement CPACR register and return Undefined Instruction when FP access is disabled. diff 7611:c119da5a80c8 Mon Aug 23 12:18:00 EDT 2010 Gene Wu <Gene.Wu@arm.com> ARM: Make sure that software prefetch instructions can't change the state of the TLB diff 7596:822c5e08c5bd Mon Aug 23 12:18:00 EDT 2010 Min Kyu Jeong <minkyu.jeong@arm.com> ARM: adding genMachineCheckFault() stub for ARM that doesn't panic diff 7595:65d88997f738 Mon Aug 23 12:18:00 EDT 2010 Gene Wu <Gene.Wu@arm.com> ARM: DFSR status value for sync external data abort is expected to be 0x8 in ARMv7 diff 7404:bfc74724914e Wed Jun 02 01:58:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Implement the ARM TLB/Tablewalker. Needs performance improvements. diff 7400:f6c9b27c4dbe Wed Jun 02 01:58:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Implement ARM CPU interrupts diff 7362:9ea92e0eb4a9 Wed Jun 02 01:58:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Implement and update the DFSR and IFSR registers on faults. diff 7197:21b9790c446d Wed Jun 02 01:58:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Trigger system calls from the SupervisorCall invoke method. This simplifies the decoder slightly, and makes the system call mechanism very slightly more realistic.
H A D	registers.hh	diff 7650:42684e4656e6 Wed Aug 25 20:10:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Limited implementation of dprintk. Does not work with vfp arguments or arguments passed on the stack. diff 7649:a6a6177a5ffa Wed Aug 25 20:10:00 EDT 2010 Min Kyu Jeong <minkyu.jeong@arm.com> ARM: Fixed register flattening logic (FP_Base_DepTag was set too low) When decoding a srs instruction, invalid mode encoding returns invalid instruction. This can happen when garbage instructions are fetched from mispredicted path diff 7642:05e445521db9 Wed Aug 25 20:10:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Eliminate some unused enums. diff 7592:24b18c320b66 Mon Aug 23 12:18:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Add some registers for big loads/stores to support neon. diff 7498:fbc62b421fa0 Wed Jul 14 01:41:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Adjust the FP_Base_DepTag to be larger than the largest int reg index. diff 7310:239ab4e0c7d4 Wed Jun 02 01:58:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Allow flattening into any mode. diff 7177:5f19e5b67864 Wed Jun 02 01:58:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Fix the constant describing the number of floating point registers.
H A D	table_walker.cc	diff 7781:a9f9eed35b18 Tue Dec 07 19:19:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Support switchover with hardware table walkers diff 7748:7bf78d12b359 Mon Nov 15 15:04:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Add support for switching CPUs diff 7733:08d6a773d1b6 Mon Nov 08 14:58:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Add checkpointing support diff 7728:cf9db1c47a77 Mon Nov 08 14:58:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Don't return the result of a table walk the same cycle it's completed. The L1 cache may have been accessed to provide this data, which confuses it, if it ends up being accesses twice in one cycle. Instead wait 1 tick which will force the timing simple CPU to forward to its next clock cycle when the translation completes. Also prevent multiple outstanding table walks from occuring at once. diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. diff 7694:de057cccee82 Fri Oct 01 17:03:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Implement functional virtual to physical address translation for debugging and program introspection. diff 7653:968302e54850 Wed Aug 25 20:10:00 EDT 2010 Gene WU <gene.wu@arm.com> ARM: Seperate the queues of L1 and L2 walker states. diff 7611:c119da5a80c8 Mon Aug 23 12:18:00 EDT 2010 Gene Wu <Gene.Wu@arm.com> ARM: Make sure that software prefetch instructions can't change the state of the TLB diff 7608:17aabeaa1a8f Mon Aug 23 12:18:00 EDT 2010 Gene Wu <Gene.Wu@arm.com> ARM: Fix Uncachable TLB requests and decoding of xn bit diff 7582:a24f26bf0fbe Mon Aug 23 12:18:00 EDT 2010 Ali Saidi <Ali.Saidi@arm.com> ARM: Fix an un-initialized variable bug
H A D	tlb.cc	diff 7781:a9f9eed35b18 Tue Dec 07 19:19:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Support switchover with hardware table walkers diff 7749:859e8bc1cdc2 Mon Nov 15 15:04:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Cache the misc regs at the TLB to limit readMiscReg() calls. diff 7734:85a8198aa2ff Mon Nov 08 14:58:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Add some TLB statistics for ARM diff 7733:08d6a773d1b6 Mon Nov 08 14:58:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Add checkpointing support diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. diff 7705:fd65f85fcc0c Wed Oct 13 04:57:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> Mem: Change the CLREX flag to CLEAR_LL. CLREX is the name of an ARM instruction, not a name for this generic flag. diff 7697:05b1a077977b Fri Oct 01 17:04:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Make the TLB a little bit faster by moving most recently used items to front of list diff 7694:de057cccee82 Fri Oct 01 17:03:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Implement functional virtual to physical address translation for debugging and program introspection. diff 7612:917946898102 Mon Aug 23 12:18:00 EDT 2010 Gene Wu <Gene.Wu@arm.com> MEM: Make CLREX a first class request operation and clear locks in caches when it in received diff 7611:c119da5a80c8 Mon Aug 23 12:18:00 EDT 2010 Gene Wu <Gene.Wu@arm.com> ARM: Make sure that software prefetch instructions can't change the state of the TLB
/gem5/src/mem/ruby/network/simple/
H A D	Throttle.cc	diff 7780:42da07116e12 Wed Dec 01 14:30:00 EST 2010 Nilay Vaish <nilay@cs.wisc.edu> ruby: Converted old ruby debug calls to M5 debug calls This patch developed by Nilay Vaish converts all the old GEMS-style ruby debug calls to the appropriate M5 debug calls. diff 7454:3a3e8e8cce1b Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of Vector and use STL add a couple of helper functions to base for deleteing all pointers in a container and outputting containers to a stream diff 7453:1a5db3dd0f62 Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of RefCnt and Allocator stuff use base/refcnt.hh This was somewhat tricky because the RefCnt API was somewhat odd. The biggest confusion was that the the RefCnt object's constructor that took a TYPE& cloned the object. I created an explicit virtual clone() function for things that took advantage of this version of the constructor. I was conservative and used clone() when I was in doubt of whether or not it was necessary. I still think that there are probably too many instances of clone(), but hopefully not too many. I converted several instances of const MsgPtr & to a simple MsgPtr. If the function wants to avoid the overhead of creating another reference, then it should just use a regular pointer instead of a ref counting ptr. There were a couple of instances where refcounted objects were created on the stack. This seems pretty dangerous since if you ever accidentally make a reference to that object with a ref counting pointer, bad things are bound to happen. diff 7055:4e24742201d7 Fri Apr 02 14:20:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get "using namespace" out of headers In addition to obvious changes, this required a slight change to the slicc grammar to allow types with :: in them. Otherwise slicc barfs on std::string which we need for the headers that slicc generates. diff 7054:7d6862b80049 Wed Mar 31 19:56:00 EDT 2010 Nathan Binkert <nate@binkert.org> style: another ruby style pass diff 7024:30883414ad10 Mon Mar 22 00:22:00 EDT 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Finally removed bash code cira. 2001ish! diff 6891:77451885bb00 Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Removed out_link_vec from Consumer Removed the out_line_vec data structure from the Consumer. I'm not sure what this did before, but currently it has no usefulness.
H A D	SimpleNetwork.cc	diff 7547:a5ddcb2abfa1 Fri Aug 20 14:46:00 EDT 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Added consolidated network msg stats diff 7455:586f99bf0dc4 Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of the Map class diff 7454:3a3e8e8cce1b Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of Vector and use STL add a couple of helper functions to base for deleteing all pointers in a container and outputting containers to a stream diff 7056:b66b558578bd Fri Apr 02 14:20:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of gems_common/util.hh and .cc and use stuff in src/base diff 7055:4e24742201d7 Fri Apr 02 14:20:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get "using namespace" out of headers In addition to obvious changes, this required a slight change to the slicc grammar to allow types with :: in them. Otherwise slicc barfs on std::string which we need for the headers that slicc generates. diff 7054:7d6862b80049 Wed Mar 31 19:56:00 EDT 2010 Nathan Binkert <nate@binkert.org> style: another ruby style pass diff 6895:5f3d2d3f977e Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: added ruby stats print Moved the previous rubymem stats print feature to ruby System so that ruby stats are printed on simulation exit. diff 6881:5a61a8a9009a Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: connects sm queues to the network diff 6879:c07cf29b5a33 Fri Jan 29 23:29:00 EST 2010 Steve Reinhardt <steve.reinhardt@amd.com> ruby: Add support for generating topologies in Python. diff 6876:a658c315512c Fri Jan 29 23:29:00 EST 2010 Steve Reinhardt <steve.reinhardt@amd.com> ruby: Convert most Ruby objects to M5 SimObjects. The necessary companion conversion of Ruby objects generated by SLICC are converted to M5 SimObjects in the following patch, so this patch alone does not compile. Conversion of Garnet network models is also handled in a separate patch; that code is temporarily disabled from compiling to allow testing of interim code.
H A D	PerfectSwitch.hh	diff 7454:3a3e8e8cce1b Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of Vector and use STL add a couple of helper functions to base for deleteing all pointers in a container and outputting containers to a stream diff 7054:7d6862b80049 Wed Mar 31 19:56:00 EDT 2010 Nathan Binkert <nate@binkert.org> style: another ruby style pass diff 7002:48a19d52d939 Wed Mar 10 21:33:00 EST 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of std-includes.hh Do not use "using namespace std;" in headers Include header files as needed
H A D	Switch.hh	diff 7454:3a3e8e8cce1b Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of Vector and use STL add a couple of helper functions to base for deleteing all pointers in a container and outputting containers to a stream diff 7054:7d6862b80049 Wed Mar 31 19:56:00 EDT 2010 Nathan Binkert <nate@binkert.org> style: another ruby style pass diff 7002:48a19d52d939 Wed Mar 10 21:33:00 EST 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of std-includes.hh Do not use "using namespace std;" in headers Include header files as needed
/gem5/src/mem/slicc/
H A D	parser.py	diff 7567:238f99c9f441 Fri Aug 20 14:46:00 EDT 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Stall and wait input messages instead of recycling This patch allows messages to be stalled in their input buffers and wait until a corresponding address changes state. In order to make this work, all in_ports must be ranked in order of dependence and those in_ports that may unblock an address, must wake up the stalled messages. Alot of this complexity is handled in slicc and the specification files simply annotate the in_ports. diff 7055:4e24742201d7 Fri Apr 02 14:20:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get "using namespace" out of headers In addition to obvious changes, this required a slight change to the slicc grammar to allow types with :: in them. Otherwise slicc barfs on std::string which we need for the headers that slicc generates. diff 6999:f226c098c393 Wed Mar 10 19:22:00 EST 2010 Nathan Binkert <nate@binkert.org> slicc: have a central mechanism for creating a code_formatter. This makes it easier to add global variables like protocol diff 6907:b05de761960e Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Allows boolean and string defaults for StateMachine parameters diff 6882:898047a3672c Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Ruby changes required to use the python config system This patch includes the necessary changes to connect ruby objects using the python configuration system. Mainly it consists of removing unnecessary ruby object pointers and connecting the necessary object pointers using the generated param objects. This patch includes the slicc changes necessary to connect generated ruby objects together using the python configuraiton system. diff 6877:2a1a3d916ca8 Fri Jan 29 23:29:00 EST 2010 Steve Reinhardt <steve.reinhardt@amd.com> ruby: Make SLICC-generated objects SimObjects. Also add SLICC support for state-machine parameter defaults (passed through to Python as SimObject Param defaults). diff 6863:21fbf0412e0d Tue Jan 19 18:11:00 EST 2010 Derek Hower <drh5@cs.wisc.edu> ruby: new atomics implementation This patch changes the way that Ruby handles atomic RMW instructions. This implementation, unlike the prior one, is protocol independent. It works by locking an address from the sequencer immediately after the read portion of an RMW completes. When that address is locked, the coherence controller will only satisfy requests coming from one port (e.g., the mandatory queue) and will ignore all others. After the write portion completed, the line is unlocked. This should also work with multi-line atomics, as long as the blocks are always acquired in the same order.
/gem5/src/arch/x86/insts/
H A D	microop.hh	diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. diff 7620:3d8a23caa1ef Mon Aug 23 12:44:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Consolidate extra microop flags into one parameter. This single parameter replaces the collection of bools that set up various flavors of microops. A flag parameter also allows other flags to be set like the serialize before/after flags, etc., without having to change the constructor. diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly
/gem5/src/arch/sparc/
H A D	nativetrace.cc	diff 7741:340b6f01d69b Thu Nov 11 05:03:00 EST 2010 Gabe Black <gblack@eecs.umich.edu> SPARC: Clean up some historical style issues. diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. diff 7678:f19b6a3a8cec Mon Sep 13 22:26:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> Faults: Pass the StaticInst involved, if any, to a Fault's invoke method. Also move the "Fault" reference counted pointer type into a separate file, sim/fault.hh. It would be better to name this less similarly to sim/faults.hh to reduce confusion, but fault.hh matches the name of the type. We could change Fault to FaultPtr to match other pointer types, and then changing the name of the file would make more sense.
/gem5/src/dev/
H A D	mc146818.cc	diff 7683:f81f5f27592b Thu Sep 16 23:24:00 EDT 2010 Steve Reinhardt <steve.reinhardt@amd.com> devices: undo cset 017baf09599f that added timer drain functions. It's not the right fix for the checkpoint deadlock problem Brad was having, and creates another bug where the system can deadlock on restore. Brad can't reproduce the original bug right now, so we'll wait until it arises again and then try to fix it the right way then. diff 7559:017baf09599f Fri Aug 20 14:46:00 EDT 2010 Brad Beckmann <Brad.Beckmann@amd.com> devices: Fixed periodic interrupts to work with draining Added drain functions to the RTC and 8254 timer so that periodic interrupts stop when the system is draining. This patch is needed to checkpoint in timing mode. Otherwise under certain situations, the event queue will never be completely empty. diff 7064:586b0e3a12b3 Thu Apr 15 19:24:00 EDT 2010 Nathan Binkert <nate@binkert.org> tick: rename Clock namespace to SimClock
/gem5/src/arch/x86/isa/microops/
H A D	base.isa	diff 7620:3d8a23caa1ef Mon Aug 23 12:44:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Consolidate extra microop flags into one parameter. This single parameter replaces the collection of bools that set up various flavors of microops. A flag parameter also allows other flags to be set like the serialize before/after flags, etc., without having to change the constructor. diff 7571:405f840c4ae1 Sun Aug 22 21:24:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Get rid of the unused getAllocator on the python base microop class. This function is always overridden, and doesn't actually have the right signature. diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly
H A D	debug.isa	diff 7626:bdd926760470 Mon Aug 23 12:44:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Get rid of the flagless microop constructor. This will reduce clutter in the source and hopefully speed up compilation. diff 7620:3d8a23caa1ef Mon Aug 23 12:44:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Consolidate extra microop flags into one parameter. This single parameter replaces the collection of bools that set up various flavors of microops. A flag parameter also allows other flags to be set like the serialize before/after flags, etc., without having to change the constructor. diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly
/gem5/src/arch/power/
H A D	tlb.hh	diff 7678:f19b6a3a8cec Mon Sep 13 22:26:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> Faults: Pass the StaticInst involved, if any, to a Fault's invoke method. Also move the "Fault" reference counted pointer type into a separate file, sim/fault.hh. It would be better to name this less similarly to sim/faults.hh to reduce confusion, but fault.hh matches the name of the type. We could change Fault to FaultPtr to match other pointer types, and then changing the name of the file would make more sense. diff 7461:5a07045d0af2 Tue Jun 15 04:18:00 EDT 2010 Nathan Binkert <nate@binkert.org> stats: only consider a formula initialized if there is a formula diff 6972:b6482c4c89e3 Fri Feb 12 14:53:00 EST 2010 Timothy M. Jones <tjones1@inf.ed.ac.uk> Power ISA: Add an alignment fault to Power ISA and check alignment in TLB.
/gem5/src/dev/arm/
H A D	timer_sp804.cc	`diff 7733:08d6a773d1b6 Mon Nov 08 14:58:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Add checkpointing support diff 7587:177151a54462 Mon Aug 23 12:18:00 EDT 2010 Ali Saidi <Ali.Saidi@arm.com> ARM: Change how the AMBA device ID checking is done to make it more generic 7584:28ddf6d9e982 Mon Aug 23 12:18:00 EDT 2010 Ali Saidi <Ali.Saidi@arm.com> ARM: Add I/O devices for booting linux`
/gem5/src/mem/ruby/slicc_interface/
H A D	Message.hh	diff 7453:1a5db3dd0f62 Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of RefCnt and Allocator stuff use base/refcnt.hh This was somewhat tricky because the RefCnt API was somewhat odd. The biggest confusion was that the the RefCnt object's constructor that took a TYPE& cloned the object. I created an explicit virtual clone() function for things that took advantage of this version of the constructor. I was conservative and used clone() when I was in doubt of whether or not it was necessary. I still think that there are probably too many instances of clone(), but hopefully not too many. I converted several instances of const MsgPtr & to a simple MsgPtr. If the function wants to avoid the overhead of creating another reference, then it should just use a regular pointer instead of a ref counting ptr. There were a couple of instances where refcounted objects were created on the stack. This seems pretty dangerous since if you ever accidentally make a reference to that object with a ref counting pointer, bad things are bound to happen. diff 7039:bc0b6ea676b5 Mon Mar 22 21:43:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: style pass diff 7002:48a19d52d939 Wed Mar 10 21:33:00 EST 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of std-includes.hh Do not use "using namespace std;" in headers Include header files as needed
H A D	AbstractCacheEntry.hh	diff 7055:4e24742201d7 Fri Apr 02 14:20:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get "using namespace" out of headers In addition to obvious changes, this required a slight change to the slicc grammar to allow types with :: in them. Otherwise slicc barfs on std::string which we need for the headers that slicc generates. diff 7039:bc0b6ea676b5 Mon Mar 22 21:43:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: style pass diff 6882:898047a3672c Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Ruby changes required to use the python config system This patch includes the necessary changes to connect ruby objects using the python configuration system. Mainly it consists of removing unnecessary ruby object pointers and connecting the necessary object pointers using the generated param objects. This patch includes the slicc changes necessary to connect generated ruby objects together using the python configuraiton system.
/gem5/src/mem/slicc/ast/
H A D	MemberExprAST.py	`diff 6999:f226c098c393 Wed Mar 10 19:22:00 EST 2010 Nathan Binkert <nate@binkert.org> slicc: have a central mechanism for creating a code_formatter. This makes it easier to add global variables like protocol`
H A D	StaticCastAST.py	6882:898047a3672c Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Ruby changes required to use the python config system This patch includes the necessary changes to connect ruby objects using the python configuration system. Mainly it consists of removing unnecessary ruby object pointers and connecting the necessary object pointers using the generated param objects. This patch includes the slicc changes necessary to connect generated ruby objects together using the python configuraiton system.
/gem5/src/arch/x86/bios/
H A D	e820.hh	`diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly`
H A D	intelmp.hh	`diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly`
H A D	smbios.hh	`diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly`

Completed in 200 milliseconds

<<11 12 13 14 15 161718 19 20 >>