14081:f99ed78e5263 |
12-Jun-2019 |
Javier Bueno Hedo <javier.bueno@metempsy.com> |
cpu: Added the Multiperspective Perceptron Predictor with TAGE (8KB and 64KB)
Described by the following article: Jiménez, D. "Multiperspective perceptron predictor with TAGE." Championship Branch Prediction (CBP-5) (2016).
Change-Id: Ica3c121a4c94657d9015573085040e8a1984b069 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19188 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> |
14034:937e704c6807 |
13-Feb-2019 |
Javier Bueno <javier.bueno@metempsy.com> |
cpu: Added the Multiperspective Perceptron Predictor (8KB and 64KB)
Described by the following article: Jiménez, D. "Multiperspective perceptron predictor." Championship Branch Prediction (CBP-5) (2016).
Change-Id: Iaa68ead7696e0b6ba05b4417d0322e8053e10d30 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/15495 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13981:577196ddd040 |
02-May-2019 |
Gabe Black <gabeblack@google.com> |
arch, base, cpu, dev, mem, sim: Remove #if 0-ed out code.
This code will be preserved through version control, but otherwise creates clutter and will rot in place since it's never compiled.
Change-Id: Id265f6deac445116843956ea5cf1210d8127274e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18608 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13960:e1ab93677110 |
10-Apr-2019 |
Daniel <odanrc@yahoo.com.br> |
base: Move SatCounter to base directory
Saturating counters are used by many objects, not only the cpu predictors. Therefore, move the class to the base folder so that it can be more easily used.
Change-Id: I26f799324bdd8720ab8834c72a2002149cee777c Signed-off-by: Daniel <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17993 Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> |
13959:ea907b02c800 |
05-Apr-2019 |
Daniel <odanrc@yahoo.com.br> |
cpu: Revamp saturating counters
Revamp the SatCounter class, improving comments, implementing increment, decrement and read operators to solve an old todo, and adding missing error checking.
Change-Id: Ia057c423c90652ebd966b6b91a3471b17800f933 Signed-off-by: Daniel <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17992 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13957:25e9c77a8a99 |
06-Jan-2019 |
Jairo Balart <jairo.balart@metempsy.com> |
cpu: Make the indirect predictor into a SimObject
Change-Id: Ice6549773def7d3e944fae450d4a079bc351e2ba Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/15319 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> |
13831:4fba790d88be |
06-Mar-2019 |
Andrea Mondelli <Andrea.Mondelli@ucf.edu> |
misc: Removed inconsistency in O3* debug msgs
Added consistency in the DEBUG message form, to allow a better parsing. Fixed sn/tid type parameter. Removed some annoying newlines
Change-Id: I4761c49fc12b874a7d8b46779475b606865cad4b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17248 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13810:f50e3b82df73 |
01-Mar-2019 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fixed the indirect branch predictor GHR handling
The internal indirect predictor global history was not being updated properly, resulting in higher than expected miss rates
Also added a parameter to set the size of the indirect predictor GHR
Change-Id: Ibc797816974cba6719da65122801e8919559a003 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reported-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/16928 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Andrea Mondelli <Andrea.Mondelli@ucf.edu> Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13733:e0829ba19545 |
26-Feb-2019 |
Srikant Bharadwaj <srikant.bharadwaj@amd.com> |
cpu: Fix indirect branch history updates
Recent changes to indirect branch predictor interface accesses non-existent buffers even when indirect predictor is not in use.
Change-Id: I0df9ac4d5f6f3cb63e4d1bd36949c27f7611eef6 Reviewed-on: https://gem5-review.googlesource.com/c/16668 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> |
13685:bb3377c81303 |
29-Jan-2019 |
Javier Bueno <javier.bueno@metempsy.com> |
cpu: Added 8KB and 64KB TAGE-SC-L branch predictor
The original paper of the branch predictor can be found here: http://www.jilp.org/cbp2016/paper/AndreSeznecLimited.pdf
Change-Id: I684863752407685adaacedebb699205c3559c528 Reviewed-on: https://gem5-review.googlesource.com/c/14855 Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13654:dc3878f03a0c |
06-Jan-2019 |
Jairo Balart <jairo.balart@metempsy.com> |
cpu: Proposal for changing the indirect branch predictor interface
Now the indirect branch predictor handles its own GHR instead of getting the one from the direction predictor.
Also, now the commit method of the indirect predictor is called for every pending branch on an update, as the indirect predictors may want to update their interal structures/histories with the information of each branch.
Change-Id: I7053fbea42a53960a3bc1ba32912cc99c160511e Reviewed-on: https://gem5-review.googlesource.com/c/15318 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13627:ec1395943cd2 |
28-Jan-2019 |
Javier Bueno <javier.bueno@metempsy.com> |
cpu: Made the Loop Predictor a SimObject
The Loop Predictor implementation is now a SimObject so that other branch predictors can easily use it (including LTAGE, which is now using it). It has also been updated with the latest available loop predictor implementation from Andre Seznec:
http://www.irisa.fr/alf/downloads/seznec/TAGE-GSC-IMLI.tar
Change-Id: I60ad079a2c49b00a1f84d5cfd3611631883a4b57 Reviewed-on: https://gem5-review.googlesource.com/c/15775 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13626:d6a6358aa6db |
05-Jan-2019 |
Jairo Balart <jairo.balart@metempsy.com> |
cpu: Made TAGE a SimObject that can be used by other predictors
The TAGE implementation is now a SimObject so that other branch predictors can easily use it. It has also been updated with the latest available TAGE implementation from Andre Seznec:
http://www.irisa.fr/alf/downloads/seznec/TAGE-GSC-IMLI.tar
Change-Id: I2251b8b2d7f94124f9955f52b917dc3b064f090e Reviewed-on: https://gem5-review.googlesource.com/c/15317 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13494:ed4ed5351b16 |
01-Dec-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fixed typos in parameter/stats descriptions
Change-Id: I7b3274a3e37128da35f497da150af08343e97ee6 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14795 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13493:91ae6168ef27 |
23-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Added parameters to enable/disable features in LTAGE
They are for the following features in the LTAGE loop predictor: - Hashing for calculating the loop table entry - Add direction information - Add speculative iteration number information
Change-Id: I395f4526163ee0d0229d1e87cde2bb046f1dd43a Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14597 Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Louis Delhez <ldelhez@ucla.edu> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13455:56e25a5f9603 |
22-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Added new stats to TAGE and LTAGE branch predictors
They are basically used to tell wich component of the predictor is providing the prediction and whether it is correct or wrong
Change-Id: I7b3db66535f159091f1b37d70c2d942d50b20fb2 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14535 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13454:19a5b4fb1f1f |
19-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: split LTAGE implementation into a base TAGE and a derived LTAGE
The new derived LTAGE is equivalent to the original LTAGE implementation The default values of the TAGE branch predictor match the settings of the 8C-TAGE configuration described in https://www.jilp.org/vol8/v8paper1.pdf
Change-Id: I8323adbfd5c9a45db23cfff234218280e639f9ed Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14435 Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13444:26f81be73cb7 |
17-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Made LTAGE parameters configurable
This includes TAGE tag sizes, TAGE table sizes, U counters reset period, loop predictor associativity, path history size, the USE_ALT_ON_NA size and the WITHLOOP size
Change-Id: I935823f0a5794f5d55b744263798897a813dc1bd Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14417 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13443:a111cb197897 |
17-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fixed useful counter handling in LTAGE
Increased to 2 bits of useful counter per TAGE entry as described in the LTAGE paper (and made the size configurable)
Changed how the useful counters are incremented/decremented as described in the LTAGE paper
Change-Id: I8c692cc7c180d29897cb77781681ff498a1d16c8 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14215 Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13442:5314c50529a5 |
11-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fixes on the loop predictor part of LTAGE
Fixed the following fields of the loop predictor entries as described on the LTAGE paper: - Age counter (it was 3 bits and it should be 8 bits) - Tag (it was 16 bits and it should be 14 bits). Also some times it used int variables and some times uint16_t, leading to wrong behaviour - Confidence counter (it was 2 bits ins some parts of the code and 3 bits in some other parts. It should be 2 bits) - Iteration counters (they were 16 bits and they should be 14 bits) All the new sizes are now configurable
Change-Id: I8884c7454c1e510b65160eb4d5749d3259d34096 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14216 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
13433:fd8c49bea81f |
09-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fix LTAGE max number of allocations on update
The LTAGE paper states that only one TAGE entry can be allocated when updating
Change-Id: I6cfb4d80ce835e93d4bf5099ef88a7d425abaddd Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14195 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13432:6ce67b7e6e44 |
07-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
configs: Added an option for choosing branch predictor type
Added the parameter "--bp-type" to set the branch predictor type Added the parameter "--list-bp-types" to list all the available branch predictor types
Change-Id: Ia6aae90c784aef359b6d8233c8383cd7a871aca1 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14015 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13420:5cb2b90e1cb5 |
08-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fixed ratio of pred to hyst bits for LTAGE Bimodal
The LTAGE paper states 1 hyst bit shared for 4 pred bits. Made this ratio configurable use 4 by default. Also changed the Bimodal structure to use two std::vector<bool> (one for pred and one for hyst bits)
Change-Id: I6793e8e358be01b75b8fd181ddad50f259862d79 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14120 Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
13413:b84a7c832ead |
07-Nov-2018 |
Pau Cabre <pau.cabre@metempsy.com> |
cpu: Fixed PC shifting on LTAGE branch predictor
The PC needs to be shifted according to the instShiftAmt parameter
Change-Id: I272619c093695b56cf7f8ff7163e3b5d23205d16 Signed-off-by: Pau Cabre <pau.cabre@metempsy.com> Reviewed-on: https://gem5-review.googlesource.com/c/14035 Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com> Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> |
12334:e0ab29a34764 |
30-Nov-2017 |
Gabe Black <gabeblack@google.com> |
misc: Rename misc.(hh|cc) to logging.(hh|cc)
These files aren't a collection of miscellaneous stuff, they're the definition of the Logger interface, and a few utility macros for calling into that interface (panic, warn, etc.).
Change-Id: I84267ac3f45896a83c0ef027f8f19c5e9a5667d1 Reviewed-on: https://gem5-review.googlesource.com/6226 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Maintainer: Gabe Black <gabeblack@google.com> |
12180:72159e1f6701 |
24-Aug-2017 |
Rico Amslinger <rico.amslinger@informatik.uni-augsburg.de> |
cpu: Fix bi-mode branch predictor thresholds
When different sizes were set for the choice and global saturation counter (e.g. ex5_big), the threshold calculation used the wrong size. Thus the branch predictor always predicted "not taken" for choice > global.
Change-Id: I076549ff1482e2280cef24a0d16b7bb2122d4110 Reviewed-on: https://gem5-review.googlesource.com/4560 Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> |
11793:ef606668d247 |
09-Nov-2016 |
Brandon Potter <brandon.potter@amd.com> |
style: [patch 1/22] use /r/3648/ to reorganize includes |
11784:00fd5dce5e7e |
21-Dec-2016 |
Arthur Perais <arthur.perais@inria.fr> |
cpu: implement an L-TAGE branch predictor
This patch implements an L-TAGE predictor, based on André Seznec's code available from CBP-2 (http://hpca23.cse.tamu.edu/taco/camino/cbp2/cbp-src/realistic-seznec.h).
Signed-off-by Jason Lowe-Power <jason@lowepower.com> |
11783:f94c14fd6561 |
21-Dec-2016 |
Arthur Perais <arthur.perais@inria.fr> |
cpu: disallow speculative update of branch predictor tables (o3)
The Minor and o3 cpu models share the branch prediction code. Minor relies on the BPredUnit::squash() function to update the branch predictor tables on a branch mispre- diction. This is fine because Minor executes in-order, so the update is on the correct path. However, this causes the branch predictor to be updated on out-of-order branch mispredictions when using the o3 model, which should not be the case.
This patch guards against speculative update of the branch prediction tables. On a branch misprediction, BPredUnit::squash() calls BpredUnit::update(..., squashed = true). The underlying branch predictor tests against the value of squashed. If it is true, it restores any speculatively updated internal state it might have (e.g., global/local branch history), then returns. If false, it updates its prediction tables. Previously, exist- ing predictors did not test against the "squashed" parameter.
To accomodate for this change, the Minor model must now call BPredUnit::squash() then BPredUnit::update(..., squashed = false) on branch mispredictions. Before, calling BpredUnit::squash() performed the prediction tables update.
The effect is a slight MPKI improvement when using the o3 model. A further patch should perform the same modifications for the indirect target predictor and BTB (less critical).
Signed-off-by: Jason Lowe-Power <jason@lowepower.com> |
11782:c2e1ead33662 |
21-Dec-2016 |
Arthur Perais <arthur.perais@inria.fr> |
cpu: correct comments in tournament branch predictor
The tournament predictor is presented as doing speculative update of the global history and non-speculative update of the local history used to generate the branch prediction. However, the code does speculative update of both histories.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com> |
11721:b0853929e223 |
30-Nov-2016 |
Jason Lowe-Power <jason@lowepower.com> |
cpu: Remove branch predictor function predictInOrder
This function was used by the now-defunct InOrderCPU model. Since this model is no longer in gem5, this function was not called from anywhere in the code. |
11523:81332eb10367 |
06-Jun-2016 |
David Guillen Fandos <david.guillen@arm.com> |
stats: Fixing regStats function for some SimObjects
Fixing an issue with regStats not calling the parent class method for most SimObjects in Gem5. This causes issues if one adds new stats in the base class (since they are never initialized properly!).
Change-Id: Iebc5aa66f58816ef4295dc8e48a357558d76a77c Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11434:b5aed9d2d54e |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Implement per-thread GHRs
Branch predictors that use GHRs should index them on a per-thread basis. This makes that so.
This is a re-spin of fb51231 after the revert (bd1c6789). |
11433:72b075cdc336 |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add an indirect branch target predictor
This patch adds a configurable indirect branch predictor that can be indexed by a combination of GHR and path history hashes. Implements the functionality described in:
"Target prediction for indirect jumps" by Chang, Hao, and Patt http://dl.acm.org/citation.cfm?id=264209
This is a re-spin of fb9d142 after the revert (bd1c6789). |
11432:4209ec56e923 |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix BTB threading oversight
The extant BTB code doesn't hash on the thread id but does check the thread id for 'btb hits'. This results in 1-thread of a multi-threaded workload taking a BTB entry, and all other threads missing for the same branch missing. |
11429:cf5af0cc3be4 |
06-Apr-2016 |
Andreas Sandberg <andreas.sandberg@arm.com> |
Revert power patch sets with unexpected interactions
The following patches had unexpected interactions with the current upstream code and have been reverted for now:
e07fd01651f3: power: Add support for power models 831c7f2f9e39: power: Low-power idle power state for idle CPUs 4f749e00b667: power: Add power states to ClockedObject
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> |
11427:fb512311295e |
05-Apr-2016 |
Curtis Dunham <Curtis.Dunham@arm.com> |
cpu: Implement per-thread GHRs
Branch predictors that use GHRs should index them on a per-thread basis. This makes that so. |
11426:fb9d14204674 |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Add an indirect branch target predictor
This patch adds a configurable indirect branch predictor that can be indexed by a combination of GHR and path history hashes. Implements the functionality described in:
"Target prediction for indirect jumps" by Chang, Hao, and Patt http://dl.acm.org/citation.cfm?id=264209 |
11425:e24d92c62860 |
05-Apr-2016 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix BTB threading oversight
The extant BTB code doesn't hash on the thread id but does check the thread id for 'btb hits'. This results in 1-thread of a multi-threaded workload taking a BTB entry, and all other threads missing for the same branch missing. |
11321:02e930db812d |
06-Feb-2016 |
Steve Reinhardt <steve.reinhardt@amd.com> |
style: fix missing spaces in control statements
Result of running 'hg m5style --skip-all --fix-control -a'. |
11294:a368064a2ab5 |
11-Jan-2016 |
Andreas Hansson <andreas.hansson@arm.com> |
scons: Enable -Wextra by default
Make best use of the compiler, and enable -Wextra as well as -Wall. There are a few issues that had to be resolved, but they are all trivial. |
11169:44b5c183c3cd |
12-Oct-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Add explicit overrides and fix other clang >= 3.5 issues
This patch adds explicit overrides as this is now required when using "-Wall" with clang >= 3.5, the latter now part of the most recent XCode. The patch consequently removes "virtual" for those methods where "override" is added. The latter should be enough of an indication.
As part of this patch, a few minor issues that clang >= 3.5 complains about are also resolved (unused methods and variables). |
11168:f98eb2da15a4 |
12-Oct-2015 |
Andreas Hansson <andreas.hansson@arm.com> |
misc: Remove redundant compiler-specific defines
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions. |
11098:8e96720a382c |
15-Sep-2015 |
Andrew Lukefahr <lukefahr@umich.edu> |
cpu: pred: Local Predictor Reset in Tournament Predictor
When a branch gets squashed, it's speculative branch predictor state should get rolled back in squash(). However, only the globalHistory state was being rolled back. This patch adds (at least some) support for rolling back the local predictor state also.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10785:f56c10663a01 |
13-Apr-2015 |
Dibakar Gope <gope@wisc.edu> |
cpu: re-organizes the branch predictor structure.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
10462:e975e8afba8b |
16-Oct-2014 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
cpu: Add branch predictor PMU probe points
This changeset adds probe points that can be used to implement PMU counters for branch predictor stats. The following probes are supported:
* BPRedUnit::ppBranches / Branches * BPRedUnit::ppMisses / Misses |
10417:710ee116eb68 |
27-Sep-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
arch: Use const StaticInstPtr references where possible
This patch optimises the passing of StaticInstPtr by avoiding copying the reference-counting pointer. This avoids first incrementing and then decrementing the reference-counting pointer. |
10335:1b627a6ddac0 |
03-Sep-2014 |
Dam Sunwoo <dam.sunwoo@arm.com> |
cpu: fix bimodal predictor to use correct global history reg
A small bug in the bimodal predictor caused significant degradation in performance on some benchmarks. This was caused by using the wrong globalHistoryReg during the update phase. This patches fixes the bug and brings the performance to normal level. |
10330:f54586c894e3 |
03-Sep-2014 |
Mitch Hayenga <mitch.hayenga@arm.com> |
cpu: Fix incorrect speculative branch predictor behavior
When a branch mispredicted gem5 would squash all history after and including the mispredicted branch. However, the mispredicted branch is still speculative and its history is required to rollback state if another, older, branch mispredicts. This leads to things like RAS corruption. |
10281:c7187ee80868 |
13-Aug-2014 |
Andreas Sandberg <Andreas.Sandberg@ARM.com> |
scons: Build the branch predictor for all CPUs
The branch predictor is normally only built when a CPU that uses a branch predictor is built. The list of CPUs is currently incomplete as the simple CPUs support branch predictors (for warming, branch stats, etc). In practice, all CPU models now use branch predictors, so this changeset removes the CPU model check and replaces it with a check for the NULL ISA. |
10273:6e6557085eb7 |
13-Aug-2014 |
Andreas Hansson <andreas.hansson@arm.com> |
cpu: Modernise the branch predictor (STL and C++11)
This patch does some minor house keeping of the branch predictor by adopting STL containers, and shifting some iterator to use range-based for loops.
The predictor history is also changed from a list to a deque as we never to insertion/deletion other than at the front and back. |
10259:ebb376f73dd2 |
23-Jul-2014 |
Andrew Bardsley <Andrew.Bardsley@arm.com> |
cpu: `Minor' in-order CPU model
This patch contains a new CPU model named `Minor'. Minor models a four stage in-order execution pipeline (fetch lines, decompose into macroops, decompose macroops into microops, execute).
The model was developed to support the ARM ISA but should be fixable to support all the remaining gem5 ISAs. It currently also works for Alpha, and regressions are included for ARM and Alpha (including Linux boot).
Documentation for the model can be found in src/doc/inside-minor.doxygen and its internal operations can be visualised using the Minorview tool utils/minorview.py.
Minor was designed to be fairly simple and not to engage in a lot of instruction annotation. As such, it currently has very few gathered stats and may lack other gem5 features.
Minor is faster than the o3 model. Sample results:
Benchmark | Stat host_seconds (s) ---------------+--------v--------v-------- (on ARM, opt) | simple | o3 | minor | timing | timing | timing ---------------+--------+--------+-------- 10.linux-boot | 169 | 1883 | 1075 10.mcf | 117 | 967 | 491 20.parser | 668 | 6315 | 3146 30.eon | 542 | 3413 | 2414 40.perlbmk | 2339 | 20905 | 11532 50.vortex | 122 | 1094 | 588 60.bzip2 | 2045 | 18061 | 9662 70.twolf | 207 | 2736 | 1036 |
10244:d2deb51a4abf |
30-Jun-2014 |
Anthony Gutierrez <atgutier@umich.edu> |
cpu: implement a bi-mode branch predictor |
9944:4ff1c5c6dcbc |
17-Oct-2013 |
Matt Horsnell <matt.horsnell@ARM.com> |
cpu: add consistent guarding to *_impl.hh files. |
9691:b1be1df904c9 |
14-May-2013 |
Anthony Gutierrez <atgutier@umich.edu> |
cpu: remove local/globalHistoryBits params from branch pred
having separate params for the local/globalHistoryBits and the local/globalPredictorSize can lead to inconsistencies if they are not carefully set. this patch dervies the number of bits necessary to index into the local/global predictors based on their size.
the value of the localHistoryTableSize for the ARM O3 CPU has been increased to 1024 from 64, which is more accurate for an A15 based on some correlation against A15 hardware. |
9480:d059f8a95a42 |
24-Jan-2013 |
Nilay Vaish <nilay@cs.wisc.edu>, Timothy Jones <timothy.jones@cl.cam.ac.uk> |
branch predictor: move out of o3 and inorder cpus This patch moves the branch predictor files in the o3 and inorder directories to src/cpu/pred. This allows sharing the branch predictor across different cpu models.
This patch was originally posted by Timothy Jones in July 2010 but never made it to the repository. |
9360:515891d9057a |
06-Dec-2012 |
Erik Tomusk <E.Tomusk@sms.ed.ac.uk> |
TournamentBP: Fix some bugs with table sizes and counters globalHistoryBits, globalPredictorSize, and choicePredictorSize are decoupled. globalHistoryBits controls how much history is kept, global and choice predictor sizes control how much of that history is used when accessing predictor tables. This way, global and choice predictors can actually be different sizes, and it is no longer possible to walk off the predictor arrays and cause a seg fault.
There are now individual thresholds for choice, global, and local saturating counters, so that taken/not taken decisions are correct even when the predictors' counters' sizes are different.
The interface for localPredictorSize has been removed from TournamentBP because the value can be calculated from localHistoryBits.
Committed by: Nilay Vaish <nilay@cs.wisc.edu> |
9327:07a22ace275d |
02-Nov-2012 |
Mrinmoy Ghosh <mrinmoy.ghosh@arm.com> |
o3: Fix a couple of issues with the local predictor.
Fix some issues with the local predictor and the way it's indexed. |
8843:7d3ac6813147 |
13-Feb-2012 |
Mrinmoy Ghosh <mrinmoy.ghosh@arm.com> |
BPred: Fix RAS to handle predicated call/return instructions.
Change RAS to fix issues with predicated call/return instructions. Handled all cases in the life of a predicated call and return instruction. |
8842:a02932e2e73d |
13-Feb-2012 |
Mrinmoy Ghosh <mrinmoy.ghosh@arm.com> |
BP: Fix several Branch Predictor issues. 1. Updates the Branch Predictor correctly to the state just after a mispredicted branch, if a squash occurs. 2. If a BTB does not find an entry, the branch is predicted not taken. The global history is modified to correctly reflect this prediction. 3. Local history is now updated at the fetch stage instead of execute stage. 4. In the Update stage of the branch predictor the local predictors are now correctly updated according to the state of local history during fetch stage.
This patch also improves performance by as much as 17% on some benchmarks |
8487:c7982323e834 |
07-Aug-2011 |
Ali Saidi <Ali.Saidi@ARM.com> |
O3: Fix uninitialized variable in the tournament branch predictor. |
8463:7a48916a32a8 |
10-Jul-2011 |
Mrinmoy Ghosh <Mrinmoy.Ghosh@arm.com> |
Branch predictor: Fixes the tournament branch predictor.
Branch predictor could not predict a branch in a nested loop because: 1. The global history was not updated after a mispredict squash. 2. The global history was updated in the fetch stage. The choice predictors that were updated used the changed global history. This is incorrect, as it incorporates the state of global history after the branch in encountered. Fixed update to choice predictor using the global history state before the branch happened. 3. The global predictor table was also updated using the global history state before the branch happened as above.
Additionally, parameters to initialize ctr and history size were reversed. |
8335:9228e00459d4 |
02-Jun-2011 |
Nathan Binkert <nate@binkert.org> |
scons: rename TraceFlags to DebugFlags |
8232:b28d06a175be |
15-Apr-2011 |
Nathan Binkert <nate@binkert.org> |
trace: reimplement the DTRACE function so it doesn't use a vector At the same time, rename the trace flags to debug flags since they have broader usage than simply tracing. This means that --trace-flags is now --debug-flags and --trace-help is now --debug-help |
7720:65d338a8dba4 |
31-Oct-2010 |
Gabe Black <gblack@eecs.umich.edu> |
ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.
This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way.
PC type:
Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a *Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM.
These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later.
Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses.
Advancing the PC:
The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements.
One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now.
Variable length instructions:
To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back.
ISA parser:
To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out.
Return address stack:
The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works.
Change in stats:
There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS.
TODO:
Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC. |
7082:070529b41c1e |
13-May-2010 |
Maximilien Breughe <Maximilien.Breughe@elis.ugent.be> |
BPRED: Fixed the treshold-bug in the tournament predictor.
Suppose the saturating counters of a branch predictor contain n bits. When the counter is between 0 and (2^(n-1) - 1), boundaries included, the branch is predicted as not taken. When the counter is between 2^(n-1) and (2^n - 1), boundaries included, the branch is predicted as taken. |
6227:a17798f2a52c |
05-Jun-2009 |
Nathan Binkert <nate@binkert.org> |
types: clean up types, especially signed vs unsigned |
6226:f1076450ab2b |
05-Jun-2009 |
Nathan Binkert <nate@binkert.org> |
move: put predictor includes and cc files into the same place |