memory_system.doxygen revision 13892
110259SAndrew.Bardsley@arm.com# Copyright (c) 2012 ARM Limited
210259SAndrew.Bardsley@arm.com# All rights reserved
310259SAndrew.Bardsley@arm.com#
410259SAndrew.Bardsley@arm.com# The license below extends only to copyright in the software and shall
510259SAndrew.Bardsley@arm.com# not be construed as granting a license to any other intellectual
610259SAndrew.Bardsley@arm.com# property including but not limited to intellectual property relating
710259SAndrew.Bardsley@arm.com# to a hardware implementation of the functionality of the software
810259SAndrew.Bardsley@arm.com# licensed hereunder.  You may use the software subject to the license
910259SAndrew.Bardsley@arm.com# terms below provided that you ensure that this notice is replicated
1010259SAndrew.Bardsley@arm.com# unmodified and in its entirety in all distributions of the software,
1110259SAndrew.Bardsley@arm.com# modified or unmodified, in source code or in binary form.
1210259SAndrew.Bardsley@arm.com#
1310259SAndrew.Bardsley@arm.com# Redistribution and use in source and binary forms, with or without
1410259SAndrew.Bardsley@arm.com# modification, are permitted provided that the following conditions are
1510259SAndrew.Bardsley@arm.com# met: redistributions of source code must retain the above copyright
1610259SAndrew.Bardsley@arm.com# notice, this list of conditions and the following disclaimer;
1710259SAndrew.Bardsley@arm.com# redistributions in binary form must reproduce the above copyright
1810259SAndrew.Bardsley@arm.com# notice, this list of conditions and the following disclaimer in the
1910259SAndrew.Bardsley@arm.com# documentation and/or other materials provided with the distribution;
2010259SAndrew.Bardsley@arm.com# neither the name of the copyright holders nor the names of its
2110259SAndrew.Bardsley@arm.com# contributors may be used to endorse or promote products derived from
2210259SAndrew.Bardsley@arm.com# this software without specific prior written permission.
2310259SAndrew.Bardsley@arm.com#
2410259SAndrew.Bardsley@arm.com# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
2510259SAndrew.Bardsley@arm.com# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
2610259SAndrew.Bardsley@arm.com# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
2710259SAndrew.Bardsley@arm.com# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
2810259SAndrew.Bardsley@arm.com# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
2910259SAndrew.Bardsley@arm.com# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
3010259SAndrew.Bardsley@arm.com# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
3110259SAndrew.Bardsley@arm.com# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
3210259SAndrew.Bardsley@arm.com# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
3310259SAndrew.Bardsley@arm.com# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
3410259SAndrew.Bardsley@arm.com# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
3510259SAndrew.Bardsley@arm.com#
3610259SAndrew.Bardsley@arm.com# Author: Djordje Kovacevic
3710259SAndrew.Bardsley@arm.com
3810259SAndrew.Bardsley@arm.com/*! \page gem5MemorySystem Memory System in gem5
3910259SAndrew.Bardsley@arm.com
4010259SAndrew.Bardsley@arm.com  \tableofcontents
4110259SAndrew.Bardsley@arm.com
4210259SAndrew.Bardsley@arm.com  The document describes memory subsystem in gem5 with focus on program flow
4310259SAndrew.Bardsley@arm.com  during CPU’s simple memory transactions (read or write).
4410259SAndrew.Bardsley@arm.com
4510259SAndrew.Bardsley@arm.com
4610259SAndrew.Bardsley@arm.com  \section gem5_MS_MH MODEL HIERARCHY
4710259SAndrew.Bardsley@arm.com
4810259SAndrew.Bardsley@arm.com  Model that is used in this document consists of two out-of-order (O3)
4910259SAndrew.Bardsley@arm.com  ARM v7 CPUs with corresponding L1 data caches and Simple Memory. It is
5010259SAndrew.Bardsley@arm.com  created by running gem5 with the following parameters:
5110259SAndrew.Bardsley@arm.com
5210259SAndrew.Bardsley@arm.com  configs/example/fs.py --caches --cpu-type=arm_detailed --num-cpus=2
5310259SAndrew.Bardsley@arm.com
5410259SAndrew.Bardsley@arm.com  Gem5 uses Simulation Objects (SimObject) derived objects as basic blocks for
5510259SAndrew.Bardsley@arm.com  building memory system. They are connected via ports with established
5610259SAndrew.Bardsley@arm.com  master/slave hierarchy. Data flow is initiated on master port while the
5710259SAndrew.Bardsley@arm.com  response messages and snoop queries appear on the slave port. The following
5810259SAndrew.Bardsley@arm.com  figure shows the hierarchy of Simulation Objects used in this document:
5910259SAndrew.Bardsley@arm.com
6010259SAndrew.Bardsley@arm.com  \image html "gem5_MS_Fig1.PNG" "Simulation Object hierarchy of the model" width=3cm
6110259SAndrew.Bardsley@arm.com
6210259SAndrew.Bardsley@arm.com  \section gem5_CPU CPU
6310259SAndrew.Bardsley@arm.com
6410259SAndrew.Bardsley@arm.com  It is not in the scope of this document to describe O3 CPU model in details, so
6510259SAndrew.Bardsley@arm.com  here are only a few relevant notes about the model:
6610259SAndrew.Bardsley@arm.com
6710259SAndrew.Bardsley@arm.com  <b>Read access </b>is initiated by sending message to the port towards DCache
6810259SAndrew.Bardsley@arm.com  object. If DCache rejects the message (for being blocked or busy) CPU will
6910259SAndrew.Bardsley@arm.com  flush the pipeline and the access will be re-attempted later on. The access
7010259SAndrew.Bardsley@arm.com  is completed upon receiving reply message (ReadRep) from DCache.
7110259SAndrew.Bardsley@arm.com
7210259SAndrew.Bardsley@arm.com  <b>Write access</b> is initiated by storing the request into store buffer whose
7310259SAndrew.Bardsley@arm.com  context is emptied and sent to DCache on every tick. DCache may also reject
7410259SAndrew.Bardsley@arm.com  the request. Write access is completed when write reply (WriteRep) message is
7510259SAndrew.Bardsley@arm.com  received from DCache.
7610259SAndrew.Bardsley@arm.com
7710259SAndrew.Bardsley@arm.com  Load & store buffers  (for read and write access) don’t impose any
7810259SAndrew.Bardsley@arm.com  restriction on the number of active memory accesses. Therefore, the maximum
7910259SAndrew.Bardsley@arm.com  number of outstanding CPU’s memory access requests is not limited by CPU
8010259SAndrew.Bardsley@arm.com  Simulation Object but by underlying memory system model.
8110259SAndrew.Bardsley@arm.com
8210259SAndrew.Bardsley@arm.com  <b>Split memory access</b> is implemented.
8310259SAndrew.Bardsley@arm.com
8410259SAndrew.Bardsley@arm.com  The message that is sent by CPU contains memory type (Normal, Device, Strongly
8510259SAndrew.Bardsley@arm.com  Ordered and cachebility) of the accessed region. However, this is not being used
8610259SAndrew.Bardsley@arm.com  by the rest of the model that takes more simplified approach towards memory types.
8710259SAndrew.Bardsley@arm.com
8810259SAndrew.Bardsley@arm.com  \section gem5_DCache DATA CACHE OBJECT
8910259SAndrew.Bardsley@arm.com
9010259SAndrew.Bardsley@arm.com  Data Cache object implements a standard cache structure:
9110259SAndrew.Bardsley@arm.com
9210259SAndrew.Bardsley@arm.com  \image html "gem5_MS_Fig2.PNG" "DCache Simulation Object" width=3cm
9310259SAndrew.Bardsley@arm.com
9410259SAndrew.Bardsley@arm.com  <b>Cached memory reads</b> that match particular cache tag (with Valid & Read
9510259SAndrew.Bardsley@arm.com  flags) will be completed (by sending ReadResp to CPU) after a configurable time.
9610259SAndrew.Bardsley@arm.com  Otherwise, the request is forwarded to Miss Status and Handling Register
9710259SAndrew.Bardsley@arm.com  (MSHR) block.
9810259SAndrew.Bardsley@arm.com
9910259SAndrew.Bardsley@arm.com  <b>Cached memory writes</b> that match particular cache tag (with Valid, Read
10010259SAndrew.Bardsley@arm.com  & Write flags) will be completed (by sending WriteResp CPU) after the same
10110259SAndrew.Bardsley@arm.com  configurable time. Otherwise, the request is forwarded to Miss Status and
10210259SAndrew.Bardsley@arm.com  Handling Register(MSHR) block.
10310259SAndrew.Bardsley@arm.com
10410259SAndrew.Bardsley@arm.com  <b>Uncached memory reads</b> are forwarded to MSHR block.
10510259SAndrew.Bardsley@arm.com
10610259SAndrew.Bardsley@arm.com  <b>Uncached memory writes</b> are forwarded to WriteBuffer block.
10710259SAndrew.Bardsley@arm.com
10810259SAndrew.Bardsley@arm.com  <b>Evicted (& dirty) cache lines</b> are forwarded to WriteBuffer block.
10910259SAndrew.Bardsley@arm.com
11010259SAndrew.Bardsley@arm.com  CPU’s access to Data Cache is blocked if any of the following is true:
11110259SAndrew.Bardsley@arm.com
11210259SAndrew.Bardsley@arm.com    - MSHR block is full. (The size of MSHR’s buffer is configurable.)
11310259SAndrew.Bardsley@arm.com
11410259SAndrew.Bardsley@arm.com    - Writeback block is full. (The size of the block’s buffer is
11510259SAndrew.Bardsley@arm.com    configurable.)
11610259SAndrew.Bardsley@arm.com
11710259SAndrew.Bardsley@arm.com    - The number of outstanding memory accesses against the same memory cache line
11810259SAndrew.Bardsley@arm.com    has reached configurable threshold value – see MSHR and Write Buffer for details.
11910259SAndrew.Bardsley@arm.com
12010259SAndrew.Bardsley@arm.com  Data Cache in block state will reject any request from slave port (from CPU)
12110259SAndrew.Bardsley@arm.com  regardless of whether it would result in cache hit or miss. Note that
12210259SAndrew.Bardsley@arm.com  incoming messages on master port (response messages and snoop requests)
12310259SAndrew.Bardsley@arm.com  are never rejected.
12410259SAndrew.Bardsley@arm.com
12510259SAndrew.Bardsley@arm.com  Cache hit on uncachable memory region (unpredicted behaviour according to
12610259SAndrew.Bardsley@arm.com  ARM ARM) will invalidate cache line and fetch data from memory.
12710259SAndrew.Bardsley@arm.com
12810259SAndrew.Bardsley@arm.com  \subsection gem5_MS_TAndDBlock Tags & Data Block
12910259SAndrew.Bardsley@arm.com
13010259SAndrew.Bardsley@arm.com  Cache lines (referred as blocks in source code) are organised into sets with
13110259SAndrew.Bardsley@arm.com  configurable associativity and size. They have the following status flags:
13210259SAndrew.Bardsley@arm.com    - <b>Valid.</b> It holds data. Address tag is valid
13310259SAndrew.Bardsley@arm.com    - <b>Read.</b> No read request will be accepted without this flag being set.
13410259SAndrew.Bardsley@arm.com      For example, cache line is valid and unreadable when it waits for write flag
13510259SAndrew.Bardsley@arm.com      to complete write access.
13610259SAndrew.Bardsley@arm.com    - <b>Write.</b> It may accept writes. Cache line with Write flags
13710259SAndrew.Bardsley@arm.com      identifies Unique state – no other cache memory holds the copy.
13810259SAndrew.Bardsley@arm.com    - <b>Dirty.</b> It needs Writeback when evicted.
13910259SAndrew.Bardsley@arm.com
14010259SAndrew.Bardsley@arm.com  Read access will hit cache line if address tags match and Valid and Read
14110259SAndrew.Bardsley@arm.com  flags are set. Write access will hit cache line if address tags match and
14210259SAndrew.Bardsley@arm.com  Valid, Read and Write flags are set.
14310259SAndrew.Bardsley@arm.com
14410259SAndrew.Bardsley@arm.com  \subsection gem5_MS_Queues MSHR and Write Buffer Queues
14510259SAndrew.Bardsley@arm.com
14610259SAndrew.Bardsley@arm.com  Miss Status and Handling Register (MSHR) queue holds the list of CPU’s
14710259SAndrew.Bardsley@arm.com  outstanding memory requests that require read access to lower memory
14810259SAndrew.Bardsley@arm.com  level. They are:
14910259SAndrew.Bardsley@arm.com    - Cached Read misses.
15010259SAndrew.Bardsley@arm.com    - Cached Write misses.
15110259SAndrew.Bardsley@arm.com    - Uncached reads.
15210259SAndrew.Bardsley@arm.com
15310259SAndrew.Bardsley@arm.com  WriteBuffer queue holds the following memory requests:
15410259SAndrew.Bardsley@arm.com    - Uncached writes.
15510259SAndrew.Bardsley@arm.com    - Writeback from evicted (& dirty) cache lines.
15610259SAndrew.Bardsley@arm.com
15710259SAndrew.Bardsley@arm.com  \image html "gem5_MS_Fig3.PNG" "MSHR and Write Buffer Blocks" width=6cm
15810259SAndrew.Bardsley@arm.com
15910259SAndrew.Bardsley@arm.com  Each memory request is assigned to corresponding MSHR object (READ or WRITE
16010259SAndrew.Bardsley@arm.com  on diagram above) that represents particular block (cache line) of memory
16110259SAndrew.Bardsley@arm.com  that has to be read or written in order to complete the command(s). As shown
16210259SAndrew.Bardsley@arm.com  on gigure above, cached read/writes against the same cache line have a common
16310259SAndrew.Bardsley@arm.com  MSHR object and will be completed with a single memory access.
16410259SAndrew.Bardsley@arm.com
16510259SAndrew.Bardsley@arm.com  The size of the block (and therefore the size of read/write access to lower
16610259SAndrew.Bardsley@arm.com  memory) is:
16710259SAndrew.Bardsley@arm.com    - The size of cache line for cached access & writeback;
16810259SAndrew.Bardsley@arm.com    - As specified in CPU instruction for uncached access.
16910259SAndrew.Bardsley@arm.com
17010259SAndrew.Bardsley@arm.com  In general, Data Cache model distinguishes between just two memory types:
17110259SAndrew.Bardsley@arm.com    - Normal Cached memory. It is always treated as write back, read and write
17210259SAndrew.Bardsley@arm.com      allocate.
17310259SAndrew.Bardsley@arm.com    - Normal uncached, Device and Strongly Ordered types are treated equally
17410259SAndrew.Bardsley@arm.com      (as uncached memory)
17510259SAndrew.Bardsley@arm.com
17610259SAndrew.Bardsley@arm.com  \subsection gem5_MS_Ordering Memory Access Ordering
17710259SAndrew.Bardsley@arm.com
17810259SAndrew.Bardsley@arm.com  An unique order number is assigned to each CPU read/write request(as they appear on
17910259SAndrew.Bardsley@arm.com  slave port). Order numbers of MSHR objects are copied from the first
18010259SAndrew.Bardsley@arm.com  assigned read/write.
18110259SAndrew.Bardsley@arm.com
18210259SAndrew.Bardsley@arm.com  Memory read/writes from each of these two queues are executed in order (according
18310259SAndrew.Bardsley@arm.com  to the assigned order number). When both queues are not empty the model will
18410259SAndrew.Bardsley@arm.com  execute memory read from MSHR block unless WriteBuffer is full. It will,
18510259SAndrew.Bardsley@arm.com  however, always preserve the order of read/writes on the same
18610259SAndrew.Bardsley@arm.com  (or overlapping) memory cache line (block).
18710259SAndrew.Bardsley@arm.com
18810259SAndrew.Bardsley@arm.com  In summary:
18910259SAndrew.Bardsley@arm.com    - Order of accesses to cached memory is not preserved unless they target
19010259SAndrew.Bardsley@arm.com      the same cache line. For example, the accesses #1, #5 & #10 will
19110259SAndrew.Bardsley@arm.com      complete simultaneously in the same tick (still in order). The access
19210259SAndrew.Bardsley@arm.com      #5 will complete before #3.
19310259SAndrew.Bardsley@arm.com    - Order of all uncached memory writes is preserved. Write#6 always
19410259SAndrew.Bardsley@arm.com      completes before Write#13.
19510259SAndrew.Bardsley@arm.com    - Order to all uncached memory reads is preserved. Read#2 always completes
19610259SAndrew.Bardsley@arm.com      before Read#8.
19710259SAndrew.Bardsley@arm.com    - The order of a read and a write uncached access is not necessarily
19810259SAndrew.Bardsley@arm.com      preserved  - unless their access regions overlap. Therefore, Write#6
19910259SAndrew.Bardsley@arm.com      always completes before Read#8 (they target the same memory block).
20010259SAndrew.Bardsley@arm.com      However, Write#13 may complete before Read#8.
20110259SAndrew.Bardsley@arm.com
20210259SAndrew.Bardsley@arm.com
20310259SAndrew.Bardsley@arm.com  \section gem5_MS_Bus COHERENT BUS OBJECT
20410259SAndrew.Bardsley@arm.com
20510259SAndrew.Bardsley@arm.com  \image html "gem5_MS_Fig4.PNG" "Coherent Bus Object" width=3cm
20610259SAndrew.Bardsley@arm.com
20710259SAndrew.Bardsley@arm.com  Coherent Bus object provides basic support for snoop protocol:
20810259SAndrew.Bardsley@arm.com
20910259SAndrew.Bardsley@arm.com  <b>All requests on the slave port</b> are forwarded to the appropriate master port. Requests
21010259SAndrew.Bardsley@arm.com  for cached memory regions are also forwarded to other slave ports (as snoop
21110259SAndrew.Bardsley@arm.com  requests).
21210259SAndrew.Bardsley@arm.com
21310259SAndrew.Bardsley@arm.com  <b>Master port replies</b> are forwarded to the appropriate slave port.
21410259SAndrew.Bardsley@arm.com
21510259SAndrew.Bardsley@arm.com  <b>Master port snoop requests</b> are forwarded to all slave ports.
21610259SAndrew.Bardsley@arm.com
21710259SAndrew.Bardsley@arm.com  <b>Slave port snoop replies</b> are forwarded to the port that was the source of the
21810259SAndrew.Bardsley@arm.com  request. (Note that the source of snoop request can be either slave or
21910259SAndrew.Bardsley@arm.com  master port.)
22010259SAndrew.Bardsley@arm.com
22110259SAndrew.Bardsley@arm.com  The bus declares itself blocked for a configurable period of time after
22210259SAndrew.Bardsley@arm.com  any of the following events:
22310259SAndrew.Bardsley@arm.com    - A packet is sent (or failed to be sent) to a slave port.
22410259SAndrew.Bardsley@arm.com    - A reply message is sent to a master port.
22510259SAndrew.Bardsley@arm.com    - Snoop response from one slave port is sent to another slave port.
22610259SAndrew.Bardsley@arm.com
22710259SAndrew.Bardsley@arm.com  The bus in blocked state rejects the following incoming messages:
22810259SAndrew.Bardsley@arm.com    - Slave port requests.
22910259SAndrew.Bardsley@arm.com    - Master port replies.
23010259SAndrew.Bardsley@arm.com    - Master port snoop requests.
23110259SAndrew.Bardsley@arm.com
23210259SAndrew.Bardsley@arm.com  \section gem5_MS_SimpleMemory SIMPLE MEMORY OBJECT
23310259SAndrew.Bardsley@arm.com
23410259SAndrew.Bardsley@arm.com  It never blocks the access on slave port.
23510259SAndrew.Bardsley@arm.com
23610259SAndrew.Bardsley@arm.com  Memory read/write takes immediate effect. (Read or write is performed when
23710259SAndrew.Bardsley@arm.com  the request is received).
23810259SAndrew.Bardsley@arm.com
23910259SAndrew.Bardsley@arm.com  Reply message is sent after a configurable period of time .
24010259SAndrew.Bardsley@arm.com
24110259SAndrew.Bardsley@arm.com  \section gem5_MS_MessageFlow MESSAGE FLOW
24210259SAndrew.Bardsley@arm.com
24310259SAndrew.Bardsley@arm.com  \subsection gem5_MS_Ordering Read Access
24410259SAndrew.Bardsley@arm.com
24510259SAndrew.Bardsley@arm.com  The following diagram shows read access that hits Data Cache line with Valid
24610259SAndrew.Bardsley@arm.com  and Read flags:
24710259SAndrew.Bardsley@arm.com
24810259SAndrew.Bardsley@arm.com  \image html "gem5_MS_Fig5.PNG" "Read Hit (Read flag must be set in cache line)" width=3cm
24910259SAndrew.Bardsley@arm.com
25010259SAndrew.Bardsley@arm.com  Cache miss read access will generate the following sequence of messages:
25110259SAndrew.Bardsley@arm.com
25210259SAndrew.Bardsley@arm.com  \image html "gem5_MS_Fig6.PNG" "Read Miss with snoop reply" width=3cm
25310259SAndrew.Bardsley@arm.com
25410259SAndrew.Bardsley@arm.com  Note that bus object never gets response from both DCache2 and Memory object.
25510259SAndrew.Bardsley@arm.com  It sends the very same ReadReq package (message) object to memory and data
25610259SAndrew.Bardsley@arm.com  cache. When Data Cache wants to reply on snoop request it marks the message
25710259SAndrew.Bardsley@arm.com  with MEM_INHIBIT flag that tells Memory object not to process the message.
25810259SAndrew.Bardsley@arm.com
25910259SAndrew.Bardsley@arm.com  \subsection gem5_MS_Ordering Write Access
26010259SAndrew.Bardsley@arm.com
26110259SAndrew.Bardsley@arm.com  The following diagram shows write access that hits DCache1 cache line with
26210259SAndrew.Bardsley@arm.com  Valid & Write flags:
26310259SAndrew.Bardsley@arm.com
26410259SAndrew.Bardsley@arm.com  \image html "gem5_MS_Fig7.PNG" "Write Hit (with Write flag set in cache line)" width=3cm
26510259SAndrew.Bardsley@arm.com
26610259SAndrew.Bardsley@arm.com  Next figure shows write access that hits DCache1 cache line with Valid but no
26710259SAndrew.Bardsley@arm.com  Write flags – which qualifies as write miss. DCache1 issues UpgradeReq to
26810259SAndrew.Bardsley@arm.com  obtain write permission. DCache2::snoopTiming will invalidate cache line that
26910259SAndrew.Bardsley@arm.com  has been hit. Note that UpgradeResp message doesn’t carry data.
27010259SAndrew.Bardsley@arm.com
27110259SAndrew.Bardsley@arm.com  \image html "gem5_MS_Fig8.PNG" "Write Miss – matching tag with no Write flag" width=3cm
27210259SAndrew.Bardsley@arm.com
27310259SAndrew.Bardsley@arm.com  The next diagram shows write miss in DCache. ReadExReq invalidates cache line
27410259SAndrew.Bardsley@arm.com  in DCache2. ReadExResp carries the content of memory cache line.
27510259SAndrew.Bardsley@arm.com
27610259SAndrew.Bardsley@arm.com  \image html "gem5_MS_Fig9.PNG" "Miss - no matching tag" width=3cm
27710259SAndrew.Bardsley@arm.com
27810259SAndrew.Bardsley@arm.com*/
27910259SAndrew.Bardsley@arm.com