Searched hist:13 (Results 1151 - 1175 of 1864) sorted by relevance

<<41424344454647484950>>

/gem5/src/cpu/o3/
H A Dcpu.hh13954:2f400a5f2627 Fri Jul 07 09:13:00 EDT 2017 Giacomo Gabrielli <giacomo.gabrielli@arm.com> cpu,mem: Add support for partial loads/stores and wide mem. accesses

This changeset adds support for partial (or masked) loads/stores, i.e.
loads/stores that can disable accesses to individual bytes within the
target address range. In addition, this changeset extends the code to
crack memory accesses across most CPU models (TimingSimpleCPU still
TBD), so that arbitrarily wide memory accesses are supported. These
changes are required for supporting ISAs with wide vectors.

Additional authors:
- Gabor Dozsa <gabor.dozsa@arm.com>
- Tiago Muck <tiago.muck@arm.com>

Change-Id: Ibad33541c258ad72925c0b1d5abc3e5e8bf92d92
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13518
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
13652:45d94ac03a27 Mon Jan 22 13:12:00 EST 2018 Tuan Ta <qtt2@cornell.edu> cpu: support atomic memory request type with AtomicOpFunctor

This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU,
MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory
system.

Atomic memory instruction is treated as a special store instruction in
all CPU models.

In simple CPUs, an AMO request with an associated AtomicOpFunctor is
simply sent to L1 dcache.

In MinorCPU, an AMO request bypasses store buffer and waits for any
conflicting store request(s) currently in the store buffer to retire
before the AMO request is sent to the cache. AMO requests are not buffered
in the store buffer, so their effects appear immediately in the cache.

In DerivO3CPU, an AMO request is inserted in the store buffer so that it
is delivered to the cache only after all previous stores are issued to
the cache. Data forwarding between between an outstanding AMO in the
store buffer and a subsequent load is not allowed since the AMO request
does not hold valid data until it's executed in the cache.

This implementation assumes that a target ISA implementation must insert
enough memory fences as micro-ops around an atomic instruction to
enforce a correct order of memory instructions with respect to its
memory consistency model. Without extra memory fences, this implementation
can allow AMOs and other memory instructions that do not conflict
(i.e., not target the same address) to reorder.

This implementation also assumes that atomic instructions execute within
a cache line boundary since the cache for now is not able to execute an
operation on two different cache lines in one single step. Therefore,
ISAs like x86 that require multi-cache-line atomic instructions need to
either use a pair of locking load and unlocking store or change the
cache implementation to guarantee the atomicity of an atomic
instruction.

Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a
Reviewed-on: https://gem5-review.googlesource.com/c/8188
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
13590:d7e018859709 Mon Feb 13 04:41:00 EST 2017 Rekai Gonzalez-Alberquilla <rekai.gonzalezalberquilla@arm.com> cpu-o3: O3 LSQ Generalisation

This patch does a large modification of the LSQ in the O3 model. The
main goal of the patch is to remove the 'an operation can be served with
one or two memory requests' assumption that is present in the LSQ
and the instruction with the req, reqLow, reqHigh triplet, and
generalising it to operations that can be addressed with one request,
and operations that require many requests, embodied in the
SingleDataRequest and the SplitDataRequest.

This modification has been done mimicking the minor model to an extent,
shifting the responsibilities of dealing with VtoP translation and
tracking the status and resources from the DynInst to the LSQ via the
LSQRequest. The LSQRequest models the information concerning the
operation, handles the creation of fragments for translation and request
as well as assembling/splitting the data accordingly.

With this modifications, the implementation of vector ISAs, particularly
on the memory side, become more rich, as the new model permits a
dissociation of the ISA characteristics as vector length, from the
microarchitectural characteristics that govern how contiguous loads are
executing, allowing exploration of different LSQ to DL1 bus widths to
understand the tradeoffs in complexity and performance.

Part of the complexities introduced stem from the fact that gem5 keeps a
large amount of metadata regarding, in particular, memory operations,
thus, when an instruction is squashed while some operation as TLB lookup
or cache access is ongoing, when the relevant structure communicates to
the LSQ that the operation is over, it tries to access some pieces of
data that should have died when the instruction is squashed, leading to
asserts, panics, or memory corruption. To ensure the correct behaviour,
the LSQRequest rely on assesing who is their owner, and self-destroying
if they detect their owner is done with the request, and there will be
no subsequent action. For example, in the case of an instruction
squashed whal the TLB is doing a walk to serve the translation, when the
translation is served by the TLB, the LSQRequest detects that the
instruction was squashed, and as the translation is done, no one else
expect to access its information, and therefore, it self-destructs.
Having destroyed the LSQRequest earlier, would lead to wrong behaviour
as the TLB walk may access some fields of it.

Additional authors:
- Gabor Dozsa <gabor.dozsa@arm.com>

Change-Id: I9578a1a3f6b899c390cdd886856a24db68ff7d0c
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/13516
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
13557:fc33e6048b25 Sat Oct 13 03:54:00 EDT 2018 Gabe Black <gabeblack@google.com> cpu: dev: sim: gpu-compute: Banish some ISA specific register types.

These types are IntReg, FloatReg, FloatRegBits, and MiscReg. There are
some remaining types, specifically the vector registers and the CCReg.
I'm less familiar with these new types of registers, and so will look
at getting rid of them at some later time.

Change-Id: Ide8f76b15c531286f61427330053b44074b8ac9b
Reviewed-on: https://gem5-review.googlesource.com/c/13624
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
11877:5ea85692a53e Mon Jul 20 10:15:00 EDT 2015 Brandon Potter <brandon.potter@amd.com> syscall_emul: [patch 13/22] add system call retry capability

This changeset adds functionality that allows system calls to retry without
affecting thread context state such as the program counter or register values
for the associated thread context (when system calls return with a retry
fault).

This functionality is needed to solve problems with blocking system calls
in multi-process or multi-threaded simulations where information is passed
between processes/threads. Blocking system calls can cause deadlock because
the simulator itself is single threaded. There is only a single thread
servicing the event queue which can cause deadlock if the thread hits a
blocking system call instruction.

To illustrate the problem, consider two processes using the producer/consumer
sharing model. The processes can use file descriptors and the read and write
calls to pass information to one another. If the consumer calls the blocking
read system call before the producer has produced anything, the call will
block the event queue (while executing the system call instruction) and
deadlock the simulation.

The solution implemented in this changeset is to recognize that the system
calls will block and then generate a special retry fault. The fault will
be sent back up through the function call chain until it is exposed to the
cpu model's pipeline where the fault becomes visible. The fault will trigger
the cpu model to replay the instruction at a future tick where the call has
a chance to succeed without actually going into a blocking state.

In subsequent patches, we recognize that a syscall will block by calling a
non-blocking poll (from inside the system call implementation) and checking
for events. When events show up during the poll, it signifies that the call
would not have blocked and the syscall is allowed to proceed (calling an
underlying host system call if necessary). If no events are returned from the
poll, we generate the fault and try the instruction for the thread context
at a distant tick. Note that retrying every tick is not efficient.

As an aside, the simulator has some multi-threading support for the event
queue, but it is not used by default and needs work. Even if the event queue
was completely multi-threaded, meaning that there is a hardware thread on
the host servicing a single simulator thread contexts with a 1:1 mapping
between them, it's still possible to run into deadlock due to the event queue
barriers on quantum boundaries. The solution of replaying at a later tick
is the simplest solution and solves the problem generally.
9448:569d1e8f74e4 Mon Jan 07 13:05:00 EST 2013 Andreas Sandberg <Andreas.Sandberg@ARM.com> cpu: Unify the serialization code for all of the CPU models

Cleanup the serialization code for the simple CPUs and the O3 CPU. The
CPU-specific code has been replaced with a (un)serializeThread that
serializes the thread state / context of a specific thread. Assuming
that the thread state class uses the CPU-specific thread state uses
the base thread state serialization code, this allows us to restore a
checkpoint with any of the CPU models.
9444:ab47fe7f03f0 Mon Jan 07 13:05:00 EST 2013 Andreas Sandberg <Andreas.Sandberg@ARM.com> cpu: Rewrite O3 draining to avoid stopping in microcode

Previously, the O3 CPU could stop in the middle of a microcode
sequence. This patch makes sure that the pipeline stops when it has
committed a normal instruction or exited from a microcode
sequence. Additionally, it makes sure that the pipeline has no
instructions in flight when it is drained, which should make draining
more robust.

Draining is controlled in the commit stage, which checks if the next
PC after a committed instruction is in microcode. If this isn't the
case, it requests a squash of all instructions after that the
instruction that just committed and immediately signals a drain stall
to the fetch stage. The CPU then continues to execute until the
pipeline and all associated buffers are empty.
9436:4a0223da4924 Mon Jan 07 13:05:00 EST 2013 Andreas Sandberg <Andreas.Sandberg@ARM.com> o3 cpu: Remove unused variables
9433:34971d2e0019 Mon Jan 07 13:05:00 EST 2013 Andreas Sandberg <Andreas.Sandberg@ARM.com> cpu: Rename defer_registration->switched_out

The defer_registration parameter is used to prevent a CPU from
initializing at startup, leaving it in the "switched out" mode. The
name of this parameter (and the help string) is confusing. This patch
renames it to switched_out, which should be more descriptive.
9427:ddf45c1d54d4 Mon Jan 07 13:05:00 EST 2013 Andreas Sandberg <Andreas.Sandberg@ARM.com> cpu: Initialize the O3 pipeline from startup()

The entire O3 pipeline used to be initialized from init(), which is
called before initState() or unserialize(). This causes the pipeline
to be initialized from an incorrect thread context. This doesn't
currently lead to correctness problems as instructions fetched from
the incorrect start PC will be squashed a few cycles after
initialization.

This patch will affect the regressions since the O3 CPU now issues its
first instruction fetch to the correct PC instead of 0x0.
/gem5/src/sim/
H A DClockDomain.py10249:6bbb7ae309ac Mon Jun 30 13:56:00 EDT 2014 Stephan Diestelhorst <stephan.diestelhorst@arm.com> power: Add basic DVFS support for gem5

Adds DVFS capabilities to gem5, by allowing users to specify lists for
frequencies and voltages in SrcClockDomains and VoltageDomains respectively.
A separate component, DVFSHandler, provides a small interface to change
operating points of the associated domains.

Clock domains will be linked to voltage domains and thus allow separate clock,
but shared voltage lines.

Currently all the valid performance-level updates are performed with a fixed
transition latency as specified for the domain.

Config file example:
...
vd = VoltageDomain(voltage = ['1V','0.95V','0.90V','0.85V'])
tsys.cluster1.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster2.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster1.clk_domain.domain_id = 0
tsys.cluster2.clk_domain.domain_id = 1
tsys.cluster1.clk_domain.voltage_domain = vd
tsys.cluster2.clk_domain.voltage_domain = vd
tsys.dvfs_handler.domains = [tsys.cluster1.clk_domain,
tsys.cluster2.clk_domain]
tsys.dvfs_handler.enable = True
/gem5/configs/boot/
H A Dnat-spec-surge-client.rcS1645:4efe65fb0bce Thu Apr 28 16:13:00 EDT 2005 Ron Dreslinski <rdreslin@umich.edu> Make ip_conntrack table size larger
H A Dnatbox-spec-surge.rcS1645:4efe65fb0bce Thu Apr 28 16:13:00 EDT 2005 Ron Dreslinski <rdreslin@umich.edu> Make ip_conntrack table size larger
H A Dsurge-client.rcS1645:4efe65fb0bce Thu Apr 28 16:13:00 EDT 2005 Ron Dreslinski <rdreslin@umich.edu> Make ip_conntrack table size larger
/gem5/src/arch/sparc/isa/formats/mem/
H A Dmem.isa4040:eb894f3fc168 Mon Feb 12 13:06:00 EST 2007 Ali Saidi <saidi@eecs.umich.edu> rename store conditional stuff as extra data so it can be used for conditional swaps as well
Add support for a twin 64 bit int load
Add Memory barrier and write barrier flags as appropriate
Make atomic memory ops atomic

src/arch/alpha/isa/mem.isa:
src/arch/alpha/locked_mem.hh:
src/cpu/base_dyn_inst.hh:
src/mem/cache/cache_blk.hh:
src/mem/cache/cache_impl.hh:
rename store conditional stuff as extra data so it can be used for conditional swaps as well
src/arch/alpha/types.hh:
src/arch/mips/types.hh:
src/arch/sparc/types.hh:
add a largest read data type for statically allocating read buffers in atomic simple cpu
src/arch/isa_parser.py:
Add support for a twin 64 bit int load
src/arch/sparc/isa/decoder.isa:
Make atomic memory ops atomic
Add Memory barrier and write barrier flags as appropriate
src/arch/sparc/isa/formats/mem/basicmem.isa:
add post access code block and define a twinload format for twin loads
src/arch/sparc/isa/formats/mem/blockmem.isa:
remove old microcoded twin load coad
src/arch/sparc/isa/formats/mem/mem.isa:
swap.isa replaces the code in loadstore.isa
src/arch/sparc/isa/formats/mem/util.isa:
add a post access code block
src/arch/sparc/isa/includes.isa:
need bigint.hh for Twin64_t
src/arch/sparc/isa/operands.isa:
add a twin 64 int type
src/cpu/simple/atomic.cc:
src/cpu/simple/atomic.hh:
src/cpu/simple/base.hh:
src/cpu/simple/timing.cc:
add support for twinloads
add support for swap and conditional swap instructions
rename store conditional stuff as extra data so it can be used for conditional swaps as well
src/mem/packet.cc:
src/mem/packet.hh:
Add support for atomic swap memory commands
src/mem/packet_access.hh:
Add endian conversion function for Twin64_t type
src/mem/physical.cc:
src/mem/physical.hh:
src/mem/request.hh:
Add support for atomic swap memory commands
Rename sc code to extradata
/gem5/util/stats/
H A Ddisplay.py1547:1f0c266940d4 Tue Mar 15 13:22:00 EST 2005 Nathan Binkert <binkertn@umich.edu> get rid of issequence and just use the isinstance builtin
/gem5/util/
H A Dtracediff4018:77a084a042bb Tue Feb 06 13:06:00 EST 2007 Steve Reinhardt <stever@eecs.umich.edu> Use perl FindBin package to set path to rundiff to the
directory where tracediff is.
/gem5/src/dev/alpha/
H A Dbackdoor.hh8789:a8b63a0ee14c Sun Nov 13 05:05:00 EST 2011 Gabe Black <gblack@eecs.umich.edu> SE/FS: Get rid of FULL_SYSTEM in dev.
/gem5/src/dev/x86/
H A Di8237.hh9808:13ffc0066b76 Thu Jul 11 22:57:00 EDT 2013 Steve Reinhardt <stever@gmail.com> dev: make BasicPioDevice take size in constructor

Instead of relying on derived classes explicitly assigning
to the BasicPioDevice pioSize field, require them to pass
a size value in to the constructor.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
H A Dspeaker.hh9808:13ffc0066b76 Thu Jul 11 22:57:00 EDT 2013 Steve Reinhardt <stever@gmail.com> dev: make BasicPioDevice take size in constructor

Instead of relying on derived classes explicitly assigning
to the BasicPioDevice pioSize field, require them to pass
a size value in to the constructor.

Committed by: Nilay Vaish <nilay@cs.wisc.edu>
/gem5/tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/
H A Dsystem.pc.com_1.terminal9901:13c5fea24be1 Wed Oct 02 05:03:00 EDT 2013 Andreas Sandberg <andreas@sandberg.pp.se> stats: Update x86 stats after x87 fixes

The updates to the x87 caused the stats for several regressions to
change. This was mainly caused by the addition of a working 32-bit and
80-bit FP load instruction and xsave support.
/gem5/src/base/
H A Dcp_annotate.hh8229:78bf55f23338 Fri Apr 15 13:44:00 EDT 2011 Nathan Binkert <nate@binkert.org> includes: sort all includes
/gem5/src/dev/mips/
H A Dmalta.hh5222:bb733a878f85 Tue Nov 13 16:58:00 EST 2007 Korey Sewell <ksewell@umich.edu> Add in files from merge-bare-iron, get them compiling in FS and SE mode
/gem5/src/dev/sparc/
H A Ddtod.hh3943:68e673d2db04 Sun Jan 28 13:26:00 EST 2007 Nathan Binkert <binkertn@umich.edu> Stick the conversion of python to unix time with all of
the other param code so that other functions can use it
as well.
H A Diob.hh8229:78bf55f23338 Fri Apr 15 13:44:00 EDT 2011 Nathan Binkert <nate@binkert.org> includes: sort all includes
H A Dmm_disk.hh8229:78bf55f23338 Fri Apr 15 13:44:00 EDT 2011 Nathan Binkert <nate@binkert.org> includes: sort all includes
/gem5/src/base/loader/
H A Decoff_object.hh11392:5967db4cff04 Thu Mar 17 13:34:00 EDT 2016 Brandon Potter <brandon.potter@amd.com> base: add symbol support for dynamic libraries

Libraries are loaded into the process address space using the
mmap system call. Conveniently, this happens to be a good
time to update the process symbol table with the library's
incoming symbols so we handle the table update from within the
system call.

This works just like an application's normal symbols. The only
difference between a dynamic library and a main executable is
when the symbol table update occurs. The symbol table update for
an executable happens at program load time and is finished before
the process ever begins executing. Since dynamic linking happens
at runtime, the symbol loading happens after the library is
first loaded into the process address space. The library binary
is examined at this time for a symbol section and that section
is parsed for symbol types with specific bindings (global,
local, weak). Subsequently, these symbols are added to the table
and are available for use by gem5 for things like trace
generation.

Checkpointing should work just as it did previously. The address
space (and therefore the library) will be recorded and the symbol
table will be entirely recorded. (It's not possible to do anything
clever like checkpoint a program and then load the program back
with different libraries with LD_LIBRARY_PATH, because the
library becomes part of the address space after being loaded.)
H A Daout_object.hh11392:5967db4cff04 Thu Mar 17 13:34:00 EDT 2016 Brandon Potter <brandon.potter@amd.com> base: add symbol support for dynamic libraries

Libraries are loaded into the process address space using the
mmap system call. Conveniently, this happens to be a good
time to update the process symbol table with the library's
incoming symbols so we handle the table update from within the
system call.

This works just like an application's normal symbols. The only
difference between a dynamic library and a main executable is
when the symbol table update occurs. The symbol table update for
an executable happens at program load time and is finished before
the process ever begins executing. Since dynamic linking happens
at runtime, the symbol loading happens after the library is
first loaded into the process address space. The library binary
is examined at this time for a symbol section and that section
is parsed for symbol types with specific bindings (global,
local, weak). Subsequently, these symbols are added to the table
and are available for use by gem5 for things like trace
generation.

Checkpointing should work just as it did previously. The address
space (and therefore the library) will be recorded and the symbol
table will be entirely recorded. (It's not possible to do anything
clever like checkpoint a program and then load the program back
with different libraries with LD_LIBRARY_PATH, because the
library becomes part of the address space after being loaded.)
H A Draw_object.hh11392:5967db4cff04 Thu Mar 17 13:34:00 EDT 2016 Brandon Potter <brandon.potter@amd.com> base: add symbol support for dynamic libraries

Libraries are loaded into the process address space using the
mmap system call. Conveniently, this happens to be a good
time to update the process symbol table with the library's
incoming symbols so we handle the table update from within the
system call.

This works just like an application's normal symbols. The only
difference between a dynamic library and a main executable is
when the symbol table update occurs. The symbol table update for
an executable happens at program load time and is finished before
the process ever begins executing. Since dynamic linking happens
at runtime, the symbol loading happens after the library is
first loaded into the process address space. The library binary
is examined at this time for a symbol section and that section
is parsed for symbol types with specific bindings (global,
local, weak). Subsequently, these symbols are added to the table
and are available for use by gem5 for things like trace
generation.

Checkpointing should work just as it did previously. The address
space (and therefore the library) will be recorded and the symbol
table will be entirely recorded. (It's not possible to do anything
clever like checkpoint a program and then load the program back
with different libraries with LD_LIBRARY_PATH, because the
library becomes part of the address space after being loaded.)
/gem5/tests/quick/fs/80.netperf-stream/ref/alpha/linux/twosys-tsunami-simple-atomic/
H A Dconfig.ini9449:56610ab73040 Mon Jan 07 13:05:00 EST 2013 Ali Saidi <Ali.Saidi@ARM.com> stats: update stats for previous changes.
/gem5/tests/quick/se/00.hello/ref/alpha/linux/simple-atomic/
H A Dconfig.ini11390:f40859930028 Thu Mar 17 13:32:00 EDT 2016 Steve Reinhardt <steve.reinhardt@amd.com> stats: update stats for ld.so support

Additional auxv entries leads to more instructions in start-up
while walking the list, along with different cache conflicts
wrt stack entries.
/gem5/tests/quick/se/00.hello/ref/alpha/linux/simple-timing/
H A Dconfig.ini11390:f40859930028 Thu Mar 17 13:32:00 EDT 2016 Steve Reinhardt <steve.reinhardt@amd.com> stats: update stats for ld.so support

Additional auxv entries leads to more instructions in start-up
while walking the list, along with different cache conflicts
wrt stack entries.
/gem5/tests/quick/se/00.hello/ref/mips/linux/simple-timing/
H A Dconfig.ini11390:f40859930028 Thu Mar 17 13:32:00 EDT 2016 Steve Reinhardt <steve.reinhardt@amd.com> stats: update stats for ld.so support

Additional auxv entries leads to more instructions in start-up
while walking the list, along with different cache conflicts
wrt stack entries.
/gem5/tests/quick/se/00.hello/ref/mips/linux/simple-atomic/
H A Dconfig.ini11390:f40859930028 Thu Mar 17 13:32:00 EDT 2016 Steve Reinhardt <steve.reinhardt@amd.com> stats: update stats for ld.so support

Additional auxv entries leads to more instructions in start-up
while walking the list, along with different cache conflicts
wrt stack entries.

Completed in 115 milliseconds

<<41424344454647484950>>