Searched hist:2010 (Results 401 - 425 of 929) sorted by relevance

<<11121314151617181920>>

/gem5/src/arch/x86/
H A Dfaults.hhdiff 7714:32496de51017 Fri Oct 22 03:24:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Implement genMachineCheckFault.

Even though this shouldn't ever be used, it might get called speculatively and
shouldn't panic.
diff 7681:61e31534522d Tue Sep 14 03:27:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Make unrecognized instructions behave better in x86.
diff 7678:f19b6a3a8cec Mon Sep 13 22:26:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> Faults: Pass the StaticInst involved, if any, to a Fault's invoke method.

Also move the "Fault" reference counted pointer type into a separate file,
sim/fault.hh. It would be better to name this less similarly to sim/faults.hh
to reduce confusion, but fault.hh matches the name of the type. We could change
Fault to FaultPtr to match other pointer types, and then changing the name of
the file would make more sense.
diff 7625:b1e69203bae9 Mon Aug 23 12:44:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Make the TLB fault instead of panic when something is unmapped in SE mode.

The fault object, if invoked, would then panic. This is a bit less direct, but
it means speculative execution won't panic the simulator.
diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly
H A Dsystem.ccdiff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.



This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.
diff 7704:b5e6461ea242 Sun Oct 10 23:39:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Detect attempts to load a 32 bit kernel and panic.
diff 7629:0f0c231e3e97 Mon Aug 23 19:14:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Create a directory for files that define register indexes.

This is to help tidy up arch/x86. These files should not be used external to
the ISA.
diff 7532:3f6413fc37a2 Tue Aug 17 08:17:00 EDT 2010 Steve Reinhardt <steve.reinhardt@amd.com> sim: revamp unserialization procedure

Replace direct call to unserialize() on each SimObject with a pair of
calls for better control over initialization in both ckpt and non-ckpt
cases.

If restoring from a checkpoint, loadState(ckpt) is called on each
SimObject. The default implementation simply calls unserialize() if
there is a corresponding checkpoint section, so we get backward
compatibility for existing objects. However, objects can override
loadState() to get other behaviors, e.g., doing other programmed
initializations after unserialize(), or complaining if no checkpoint
section is found. (Note that the default warning for a missing
checkpoint section is now gone.)

If not restoring from a checkpoint, we call the new initState() method
on each SimObject instead. This provides a hook for state
initializations that are only required when *not* restoring from a
checkpoint.

Given this new framework, do some cleanup of LiveProcess subclasses
and X86System, which were (in some cases) emulating initState()
behavior in startup via a local flag or (in other cases) erroneously
doing initializations in startup() that clobbered state loaded earlier
by unserialize().
diff 7447:3fc243687abb Thu Jun 03 22:41:00 EDT 2010 Steve Reinhardt <steve.reinhardt@amd.com> More minor gdb-related cleanup.
Found several more stale includes and forward decls.
diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly
/gem5/src/arch/arm/
H A Dfaults.hhdiff 7678:f19b6a3a8cec Mon Sep 13 22:26:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> Faults: Pass the StaticInst involved, if any, to a Fault's invoke method.

Also move the "Fault" reference counted pointer type into a separate file,
sim/fault.hh. It would be better to name this less similarly to sim/faults.hh
to reduce confusion, but fault.hh matches the name of the type. We could change
Fault to FaultPtr to match other pointer types, and then changing the name of
the file would make more sense.
diff 7652:f2621206b062 Wed Aug 25 20:10:00 EDT 2010 Min Kyu Jeong <minkyu.jeong@arm.com> ARM: Adding a bogus fault that does nothing.
This fault can used to flush the pipe, not including the faulting instruction.

The particular case I needed this was for a self-modifying code. It needed to
drain the store queue and force the following instruction to refetch from
icache. DCCMVAC cp15 mcr instruction is modified to raise this fault.
diff 7640:5286a8a469c5 Wed Aug 25 20:10:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Implement CPACR register and return Undefined Instruction when FP access is disabled.
diff 7611:c119da5a80c8 Mon Aug 23 12:18:00 EDT 2010 Gene Wu <Gene.Wu@arm.com> ARM: Make sure that software prefetch instructions can't change the state of the TLB
diff 7596:822c5e08c5bd Mon Aug 23 12:18:00 EDT 2010 Min Kyu Jeong <minkyu.jeong@arm.com> ARM: adding genMachineCheckFault() stub for ARM that doesn't panic
diff 7595:65d88997f738 Mon Aug 23 12:18:00 EDT 2010 Gene Wu <Gene.Wu@arm.com> ARM: DFSR status value for sync external data abort is expected to be 0x8 in ARMv7
diff 7404:bfc74724914e Wed Jun 02 01:58:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Implement the ARM TLB/Tablewalker. Needs performance improvements.
diff 7400:f6c9b27c4dbe Wed Jun 02 01:58:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Implement ARM CPU interrupts
diff 7362:9ea92e0eb4a9 Wed Jun 02 01:58:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Implement and update the DFSR and IFSR registers on faults.
diff 7197:21b9790c446d Wed Jun 02 01:58:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Trigger system calls from the SupervisorCall invoke method.

This simplifies the decoder slightly, and makes the system call mechanism
very slightly more realistic.
H A Dregisters.hhdiff 7650:42684e4656e6 Wed Aug 25 20:10:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Limited implementation of dprintk.

Does not work with vfp arguments or arguments passed on the stack.
diff 7649:a6a6177a5ffa Wed Aug 25 20:10:00 EDT 2010 Min Kyu Jeong <minkyu.jeong@arm.com> ARM: Fixed register flattening logic (FP_Base_DepTag was set too low)

When decoding a srs instruction, invalid mode encoding returns invalid instruction.
This can happen when garbage instructions are fetched from mispredicted path
diff 7642:05e445521db9 Wed Aug 25 20:10:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Eliminate some unused enums.
diff 7592:24b18c320b66 Mon Aug 23 12:18:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Add some registers for big loads/stores to support neon.
diff 7498:fbc62b421fa0 Wed Jul 14 01:41:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Adjust the FP_Base_DepTag to be larger than the largest int reg index.
diff 7310:239ab4e0c7d4 Wed Jun 02 01:58:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Allow flattening into any mode.
diff 7177:5f19e5b67864 Wed Jun 02 01:58:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ARM: Fix the constant describing the number of floating point registers.
H A Dtable_walker.ccdiff 7781:a9f9eed35b18 Tue Dec 07 19:19:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Support switchover with hardware table walkers
diff 7748:7bf78d12b359 Mon Nov 15 15:04:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Add support for switching CPUs
diff 7733:08d6a773d1b6 Mon Nov 08 14:58:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Add checkpointing support
diff 7728:cf9db1c47a77 Mon Nov 08 14:58:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Don't return the result of a table walk the same cycle it's completed.

The L1 cache may have been accessed to provide this data, which confuses
it, if it ends up being accesses twice in one cycle. Instead wait 1 tick
which will force the timing simple CPU to forward to its next clock cycle
when the translation completes.

Also prevent multiple outstanding table walks from occuring at once.
diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.



This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.
diff 7694:de057cccee82 Fri Oct 01 17:03:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Implement functional virtual to physical address translation
for debugging and program introspection.
diff 7653:968302e54850 Wed Aug 25 20:10:00 EDT 2010 Gene WU <gene.wu@arm.com> ARM: Seperate the queues of L1 and L2 walker states.
diff 7611:c119da5a80c8 Mon Aug 23 12:18:00 EDT 2010 Gene Wu <Gene.Wu@arm.com> ARM: Make sure that software prefetch instructions can't change the state of the TLB
diff 7608:17aabeaa1a8f Mon Aug 23 12:18:00 EDT 2010 Gene Wu <Gene.Wu@arm.com> ARM: Fix Uncachable TLB requests and decoding of xn bit
diff 7582:a24f26bf0fbe Mon Aug 23 12:18:00 EDT 2010 Ali Saidi <Ali.Saidi@arm.com> ARM: Fix an un-initialized variable bug
H A Dtlb.ccdiff 7781:a9f9eed35b18 Tue Dec 07 19:19:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Support switchover with hardware table walkers
diff 7749:859e8bc1cdc2 Mon Nov 15 15:04:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Cache the misc regs at the TLB to limit readMiscReg() calls.
diff 7734:85a8198aa2ff Mon Nov 08 14:58:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Add some TLB statistics for ARM
diff 7733:08d6a773d1b6 Mon Nov 08 14:58:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Add checkpointing support
diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.



This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.
diff 7705:fd65f85fcc0c Wed Oct 13 04:57:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> Mem: Change the CLREX flag to CLEAR_LL.

CLREX is the name of an ARM instruction, not a name for this generic flag.
diff 7697:05b1a077977b Fri Oct 01 17:04:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Make the TLB a little bit faster by moving most recently used items to front of list
diff 7694:de057cccee82 Fri Oct 01 17:03:00 EDT 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Implement functional virtual to physical address translation
for debugging and program introspection.
diff 7612:917946898102 Mon Aug 23 12:18:00 EDT 2010 Gene Wu <Gene.Wu@arm.com> MEM: Make CLREX a first class request operation and clear locks in caches when it in received
diff 7611:c119da5a80c8 Mon Aug 23 12:18:00 EDT 2010 Gene Wu <Gene.Wu@arm.com> ARM: Make sure that software prefetch instructions can't change the state of the TLB
/gem5/src/mem/ruby/network/simple/
H A DThrottle.ccdiff 7780:42da07116e12 Wed Dec 01 14:30:00 EST 2010 Nilay Vaish <nilay@cs.wisc.edu> ruby: Converted old ruby debug calls to M5 debug calls

This patch developed by Nilay Vaish converts all the old GEMS-style ruby
debug calls to the appropriate M5 debug calls.
diff 7454:3a3e8e8cce1b Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of Vector and use STL
add a couple of helper functions to base for deleteing all pointers in
a container and outputting containers to a stream
diff 7453:1a5db3dd0f62 Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of RefCnt and Allocator stuff use base/refcnt.hh

This was somewhat tricky because the RefCnt API was somewhat odd. The
biggest confusion was that the the RefCnt object's constructor that
took a TYPE& cloned the object. I created an explicit virtual clone()
function for things that took advantage of this version of the
constructor. I was conservative and used clone() when I was in doubt
of whether or not it was necessary. I still think that there are
probably too many instances of clone(), but hopefully not too many.

I converted several instances of const MsgPtr & to a simple MsgPtr.
If the function wants to avoid the overhead of creating another
reference, then it should just use a regular pointer instead of a ref
counting ptr.

There were a couple of instances where refcounted objects were created
on the stack. This seems pretty dangerous since if you ever
accidentally make a reference to that object with a ref counting
pointer, bad things are bound to happen.
diff 7055:4e24742201d7 Fri Apr 02 14:20:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get "using namespace" out of headers
In addition to obvious changes, this required a slight change to the slicc
grammar to allow types with :: in them. Otherwise slicc barfs on std::string
which we need for the headers that slicc generates.
diff 7054:7d6862b80049 Wed Mar 31 19:56:00 EDT 2010 Nathan Binkert <nate@binkert.org> style: another ruby style pass
diff 7024:30883414ad10 Mon Mar 22 00:22:00 EDT 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Finally removed bash code cira. 2001ish!
diff 6891:77451885bb00 Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Removed out_link_vec from Consumer
Removed the out_line_vec data structure from the Consumer. I'm not sure
what this did before, but currently it has no usefulness.
H A DSimpleNetwork.ccdiff 7547:a5ddcb2abfa1 Fri Aug 20 14:46:00 EDT 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Added consolidated network msg stats
diff 7455:586f99bf0dc4 Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of the Map class
diff 7454:3a3e8e8cce1b Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of Vector and use STL
add a couple of helper functions to base for deleteing all pointers in
a container and outputting containers to a stream
diff 7056:b66b558578bd Fri Apr 02 14:20:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of gems_common/util.hh and .cc and use stuff in src/base
diff 7055:4e24742201d7 Fri Apr 02 14:20:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get "using namespace" out of headers
In addition to obvious changes, this required a slight change to the slicc
grammar to allow types with :: in them. Otherwise slicc barfs on std::string
which we need for the headers that slicc generates.
diff 7054:7d6862b80049 Wed Mar 31 19:56:00 EDT 2010 Nathan Binkert <nate@binkert.org> style: another ruby style pass
diff 6895:5f3d2d3f977e Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: added ruby stats print
Moved the previous rubymem stats print feature to ruby System so that ruby
stats are printed on simulation exit.
diff 6881:5a61a8a9009a Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: connects sm queues to the network
diff 6879:c07cf29b5a33 Fri Jan 29 23:29:00 EST 2010 Steve Reinhardt <steve.reinhardt@amd.com> ruby: Add support for generating topologies in Python.
diff 6876:a658c315512c Fri Jan 29 23:29:00 EST 2010 Steve Reinhardt <steve.reinhardt@amd.com> ruby: Convert most Ruby objects to M5 SimObjects.
The necessary companion conversion of Ruby objects generated by SLICC
are converted to M5 SimObjects in the following patch, so this patch
alone does not compile.
Conversion of Garnet network models is also handled in a separate
patch; that code is temporarily disabled from compiling to allow
testing of interim code.
H A DPerfectSwitch.hhdiff 7454:3a3e8e8cce1b Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of Vector and use STL
add a couple of helper functions to base for deleteing all pointers in
a container and outputting containers to a stream
diff 7054:7d6862b80049 Wed Mar 31 19:56:00 EDT 2010 Nathan Binkert <nate@binkert.org> style: another ruby style pass
diff 7002:48a19d52d939 Wed Mar 10 21:33:00 EST 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of std-includes.hh
Do not use "using namespace std;" in headers
Include header files as needed
H A DSwitch.hhdiff 7454:3a3e8e8cce1b Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of Vector and use STL
add a couple of helper functions to base for deleteing all pointers in
a container and outputting containers to a stream
diff 7054:7d6862b80049 Wed Mar 31 19:56:00 EDT 2010 Nathan Binkert <nate@binkert.org> style: another ruby style pass
diff 7002:48a19d52d939 Wed Mar 10 21:33:00 EST 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of std-includes.hh
Do not use "using namespace std;" in headers
Include header files as needed
/gem5/src/mem/slicc/
H A Dparser.pydiff 7567:238f99c9f441 Fri Aug 20 14:46:00 EDT 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Stall and wait input messages instead of recycling

This patch allows messages to be stalled in their input buffers and wait
until a corresponding address changes state. In order to make this work,
all in_ports must be ranked in order of dependence and those in_ports that
may unblock an address, must wake up the stalled messages. Alot of this
complexity is handled in slicc and the specification files simply
annotate the in_ports.
diff 7055:4e24742201d7 Fri Apr 02 14:20:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get "using namespace" out of headers
In addition to obvious changes, this required a slight change to the slicc
grammar to allow types with :: in them. Otherwise slicc barfs on std::string
which we need for the headers that slicc generates.
diff 6999:f226c098c393 Wed Mar 10 19:22:00 EST 2010 Nathan Binkert <nate@binkert.org> slicc: have a central mechanism for creating a code_formatter.
This makes it easier to add global variables like protocol
diff 6907:b05de761960e Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Allows boolean and string defaults for StateMachine parameters
diff 6882:898047a3672c Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Ruby changes required to use the python config system
This patch includes the necessary changes to connect ruby objects using
the python configuration system. Mainly it consists of removing
unnecessary ruby object pointers and connecting the necessary object
pointers using the generated param objects. This patch includes the
slicc changes necessary to connect generated ruby objects together using
the python configuraiton system.
diff 6877:2a1a3d916ca8 Fri Jan 29 23:29:00 EST 2010 Steve Reinhardt <steve.reinhardt@amd.com> ruby: Make SLICC-generated objects SimObjects.
Also add SLICC support for state-machine parameter defaults
(passed through to Python as SimObject Param defaults).
diff 6863:21fbf0412e0d Tue Jan 19 18:11:00 EST 2010 Derek Hower <drh5@cs.wisc.edu> ruby: new atomics implementation

This patch changes the way that Ruby handles atomic RMW instructions. This implementation, unlike the prior one, is protocol independent. It works by locking an address from the sequencer immediately after the read portion of an RMW completes. When that address is locked, the coherence controller will only satisfy requests coming from one port (e.g., the mandatory queue) and will ignore all others. After the write portion completed, the line is unlocked. This should also work with multi-line atomics, as long as the blocks are always acquired in the same order.
/gem5/src/arch/x86/insts/
H A Dmicroop.hhdiff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.



This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.
diff 7620:3d8a23caa1ef Mon Aug 23 12:44:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Consolidate extra microop flags into one parameter.

This single parameter replaces the collection of bools that set up various
flavors of microops. A flag parameter also allows other flags to be set like
the serialize before/after flags, etc., without having to change the
constructor.
diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly
/gem5/src/arch/sparc/
H A Dnativetrace.ccdiff 7741:340b6f01d69b Thu Nov 11 05:03:00 EST 2010 Gabe Black <gblack@eecs.umich.edu> SPARC: Clean up some historical style issues.
diff 7720:65d338a8dba4 Sun Oct 31 03:07:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.



This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.
diff 7678:f19b6a3a8cec Mon Sep 13 22:26:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> Faults: Pass the StaticInst involved, if any, to a Fault's invoke method.

Also move the "Fault" reference counted pointer type into a separate file,
sim/fault.hh. It would be better to name this less similarly to sim/faults.hh
to reduce confusion, but fault.hh matches the name of the type. We could change
Fault to FaultPtr to match other pointer types, and then changing the name of
the file would make more sense.
/gem5/src/dev/
H A Dmc146818.ccdiff 7683:f81f5f27592b Thu Sep 16 23:24:00 EDT 2010 Steve Reinhardt <steve.reinhardt@amd.com> devices: undo cset 017baf09599f that added timer drain functions.
It's not the right fix for the checkpoint deadlock problem
Brad was having, and creates another bug where the system can
deadlock on restore. Brad can't reproduce the original bug
right now, so we'll wait until it arises again and then try
to fix it the right way then.
diff 7559:017baf09599f Fri Aug 20 14:46:00 EDT 2010 Brad Beckmann <Brad.Beckmann@amd.com> devices: Fixed periodic interrupts to work with draining

Added drain functions to the RTC and 8254 timer so that periodic interrupts
stop when the system is draining. This patch is needed to checkpoint in
timing mode. Otherwise under certain situations, the event queue will never
be completely empty.
diff 7064:586b0e3a12b3 Thu Apr 15 19:24:00 EDT 2010 Nathan Binkert <nate@binkert.org> tick: rename Clock namespace to SimClock
/gem5/src/arch/x86/isa/microops/
H A Dbase.isadiff 7620:3d8a23caa1ef Mon Aug 23 12:44:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Consolidate extra microop flags into one parameter.

This single parameter replaces the collection of bools that set up various
flavors of microops. A flag parameter also allows other flags to be set like
the serialize before/after flags, etc., without having to change the
constructor.
diff 7571:405f840c4ae1 Sun Aug 22 21:24:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Get rid of the unused getAllocator on the python base microop class.

This function is always overridden, and doesn't actually have the right
signature.
diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly
H A Ddebug.isadiff 7626:bdd926760470 Mon Aug 23 12:44:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Get rid of the flagless microop constructor.

This will reduce clutter in the source and hopefully speed up compilation.
diff 7620:3d8a23caa1ef Mon Aug 23 12:44:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> X86: Consolidate extra microop flags into one parameter.

This single parameter replaces the collection of bools that set up various
flavors of microops. A flag parameter also allows other flags to be set like
the serialize before/after flags, etc., without having to change the
constructor.
diff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly
/gem5/src/arch/power/
H A Dtlb.hhdiff 7678:f19b6a3a8cec Mon Sep 13 22:26:00 EDT 2010 Gabe Black <gblack@eecs.umich.edu> Faults: Pass the StaticInst involved, if any, to a Fault's invoke method.

Also move the "Fault" reference counted pointer type into a separate file,
sim/fault.hh. It would be better to name this less similarly to sim/faults.hh
to reduce confusion, but fault.hh matches the name of the type. We could change
Fault to FaultPtr to match other pointer types, and then changing the name of
the file would make more sense.
diff 7461:5a07045d0af2 Tue Jun 15 04:18:00 EDT 2010 Nathan Binkert <nate@binkert.org> stats: only consider a formula initialized if there is a formula
diff 6972:b6482c4c89e3 Fri Feb 12 14:53:00 EST 2010 Timothy M. Jones <tjones1@inf.ed.ac.uk> Power ISA: Add an alignment fault to Power ISA and check alignment in TLB.
/gem5/src/dev/arm/
H A Dtimer_sp804.ccdiff 7733:08d6a773d1b6 Mon Nov 08 14:58:00 EST 2010 Ali Saidi <Ali.Saidi@ARM.com> ARM: Add checkpointing support
diff 7587:177151a54462 Mon Aug 23 12:18:00 EDT 2010 Ali Saidi <Ali.Saidi@arm.com> ARM: Change how the AMBA device ID checking is done to make it more generic
7584:28ddf6d9e982 Mon Aug 23 12:18:00 EDT 2010 Ali Saidi <Ali.Saidi@arm.com> ARM: Add I/O devices for booting linux
/gem5/src/mem/ruby/slicc_interface/
H A DMessage.hhdiff 7453:1a5db3dd0f62 Fri Jun 11 02:17:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of RefCnt and Allocator stuff use base/refcnt.hh

This was somewhat tricky because the RefCnt API was somewhat odd. The
biggest confusion was that the the RefCnt object's constructor that
took a TYPE& cloned the object. I created an explicit virtual clone()
function for things that took advantage of this version of the
constructor. I was conservative and used clone() when I was in doubt
of whether or not it was necessary. I still think that there are
probably too many instances of clone(), but hopefully not too many.

I converted several instances of const MsgPtr & to a simple MsgPtr.
If the function wants to avoid the overhead of creating another
reference, then it should just use a regular pointer instead of a ref
counting ptr.

There were a couple of instances where refcounted objects were created
on the stack. This seems pretty dangerous since if you ever
accidentally make a reference to that object with a ref counting
pointer, bad things are bound to happen.
diff 7039:bc0b6ea676b5 Mon Mar 22 21:43:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: style pass
diff 7002:48a19d52d939 Wed Mar 10 21:33:00 EST 2010 Nathan Binkert <nate@binkert.org> ruby: get rid of std-includes.hh
Do not use "using namespace std;" in headers
Include header files as needed
H A DAbstractCacheEntry.hhdiff 7055:4e24742201d7 Fri Apr 02 14:20:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: get "using namespace" out of headers
In addition to obvious changes, this required a slight change to the slicc
grammar to allow types with :: in them. Otherwise slicc barfs on std::string
which we need for the headers that slicc generates.
diff 7039:bc0b6ea676b5 Mon Mar 22 21:43:00 EDT 2010 Nathan Binkert <nate@binkert.org> ruby: style pass
diff 6882:898047a3672c Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Ruby changes required to use the python config system
This patch includes the necessary changes to connect ruby objects using
the python configuration system. Mainly it consists of removing
unnecessary ruby object pointers and connecting the necessary object
pointers using the generated param objects. This patch includes the
slicc changes necessary to connect generated ruby objects together using
the python configuraiton system.
/gem5/src/mem/slicc/ast/
H A DMemberExprAST.pydiff 6999:f226c098c393 Wed Mar 10 19:22:00 EST 2010 Nathan Binkert <nate@binkert.org> slicc: have a central mechanism for creating a code_formatter.
This makes it easier to add global variables like protocol
H A DStaticCastAST.py6882:898047a3672c Fri Jan 29 23:29:00 EST 2010 Brad Beckmann <Brad.Beckmann@amd.com> ruby: Ruby changes required to use the python config system
This patch includes the necessary changes to connect ruby objects using
the python configuration system. Mainly it consists of removing
unnecessary ruby object pointers and connecting the necessary object
pointers using the generated param objects. This patch includes the
slicc changes necessary to connect generated ruby objects together using
the python configuraiton system.
/gem5/src/arch/x86/bios/
H A De820.hhdiff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly
H A Dintelmp.hhdiff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly
H A Dsmbios.hhdiff 7087:fb8d5786ff30 Mon May 24 01:44:00 EDT 2010 Nathan Binkert <nate@binkert.org> copyright: Change HP copyright on x86 code to be more friendly

Completed in 200 milliseconds

<<11121314151617181920>>