1# Copyright (c) 2012 ARM Limited 2# All rights reserved 3# 4# The license below extends only to copyright in the software and shall 5# not be construed as granting a license to any other intellectual 6# property including but not limited to intellectual property relating 7# to a hardware implementation of the functionality of the software 8# licensed hereunder. You may use the software subject to the license 9# terms below provided that you ensure that this notice is replicated 10# unmodified and in its entirety in all distributions of the software, 11# modified or unmodified, in source code or in binary form. 12# 13# Redistribution and use in source and binary forms, with or without 14# modification, are permitted provided that the following conditions are 15# met: redistributions of source code must retain the above copyright 16# notice, this list of conditions and the following disclaimer; 17# redistributions in binary form must reproduce the above copyright 18# notice, this list of conditions and the following disclaimer in the 19# documentation and/or other materials provided with the distribution; 20# neither the name of the copyright holders nor the names of its 21# contributors may be used to endorse or promote products derived from 22# this software without specific prior written permission. 23# 24# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 25# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 26# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 27# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 28# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 29# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 30# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 31# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 32# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 33# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 34# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 35# 36# Author: Djordje Kovacevic 37 38/*! \page gem5MemorySystem Memory System in gem5 39 40 \tableofcontents 41 42 The document describes memory subsystem in gem5 with focus on program flow 43 during CPU’s simple memory transactions (read or write). 44 45 46 \section gem5_MS_MH MODEL HIERARCHY 47 48 Model that is used in this document consists of two out-of-order (O3) 49 ARM v7 CPUs with corresponding L1 data caches and Simple Memory. It is 50 created by running gem5 with the following parameters: 51 52 configs/example/fs.py --caches --cpu-type=arm_detailed --num-cpus=2 53 54 Gem5 uses Simulation Objects (SimObject) derived objects as basic blocks for 55 building memory system. They are connected via ports with established 56 master/slave hierarchy. Data flow is initiated on master port while the 57 response messages and snoop queries appear on the slave port. The following 58 figure shows the hierarchy of Simulation Objects used in this document: 59 60 \image html "gem5_MS_Fig1.PNG" "Simulation Object hierarchy of the model" width=3cm 61 62 \section gem5_CPU CPU 63 64 It is not in the scope of this document to describe O3 CPU model in details, so 65 here are only a few relevant notes about the model: 66 67 <b>Read access </b>is initiated by sending message to the port towards DCache 68 object. If DCache rejects the message (for being blocked or busy) CPU will 69 flush the pipeline and the access will be re-attempted later on. The access 70 is completed upon receiving reply message (ReadRep) from DCache. 71 72 <b>Write access</b> is initiated by storing the request into store buffer whose 73 context is emptied and sent to DCache on every tick. DCache may also reject 74 the request. Write access is completed when write reply (WriteRep) message is 75 received from DCache. 76 77 Load & store buffers (for read and write access) don’t impose any 78 restriction on the number of active memory accesses. Therefore, the maximum 79 number of outstanding CPU’s memory access requests is not limited by CPU 80 Simulation Object but by underlying memory system model. 81 82 <b>Split memory access</b> is implemented. 83 84 The message that is sent by CPU contains memory type (Normal, Device, Strongly 85 Ordered and cachebility) of the accessed region. However, this is not being used 86 by the rest of the model that takes more simplified approach towards memory types. 87 88 \section gem5_DCache DATA CACHE OBJECT 89 90 Data Cache object implements a standard cache structure: 91 92 \image html "gem5_MS_Fig2.PNG" "DCache Simulation Object" width=3cm 93 94 <b>Cached memory reads</b> that match particular cache tag (with Valid & Read 95 flags) will be completed (by sending ReadResp to CPU) after a configurable time. 96 Otherwise, the request is forwarded to Miss Status and Handling Register 97 (MSHR) block. 98 99 <b>Cached memory writes</b> that match particular cache tag (with Valid, Read 100 & Write flags) will be completed (by sending WriteResp CPU) after the same 101 configurable time. Otherwise, the request is forwarded to Miss Status and 102 Handling Register(MSHR) block. 103 104 <b>Uncached memory reads</b> are forwarded to MSHR block. 105 106 <b>Uncached memory writes</b> are forwarded to WriteBuffer block. 107 108 <b>Evicted (& dirty) cache lines</b> are forwarded to WriteBuffer block. 109 110 CPU’s access to Data Cache is blocked if any of the following is true: 111 112 - MSHR block is full. (The size of MSHR’s buffer is configurable.) 113 114 - Writeback block is full. (The size of the block’s buffer is 115 configurable.) 116 117 - The number of outstanding memory accesses against the same memory cache line 118 has reached configurable threshold value – see MSHR and Write Buffer for details. 119 120 Data Cache in block state will reject any request from slave port (from CPU) 121 regardless of whether it would result in cache hit or miss. Note that 122 incoming messages on master port (response messages and snoop requests) 123 are never rejected. 124 125 Cache hit on uncachable memory region (unpredicted behaviour according to 126 ARM ARM) will invalidate cache line and fetch data from memory. 127 128 \subsection gem5_MS_TAndDBlock Tags & Data Block 129 130 Cache lines (referred as blocks in source code) are organised into sets with 131 configurable associativity and size. They have the following status flags: 132 - <b>Valid.</b> It holds data. Address tag is valid 133 - <b>Read.</b> No read request will be accepted without this flag being set. 134 For example, cache line is valid and unreadable when it waits for write flag 135 to complete write access. 136 - <b>Write.</b> It may accept writes. Cache line with Write flags 137 identifies Unique state – no other cache memory holds the copy. 138 - <b>Dirty.</b> It needs Writeback when evicted. 139 140 Read access will hit cache line if address tags match and Valid and Read 141 flags are set. Write access will hit cache line if address tags match and 142 Valid, Read and Write flags are set. 143 144 \subsection gem5_MS_Queues MSHR and Write Buffer Queues 145 146 Miss Status and Handling Register (MSHR) queue holds the list of CPU’s 147 outstanding memory requests that require read access to lower memory 148 level. They are: 149 - Cached Read misses. 150 - Cached Write misses. 151 - Uncached reads. 152 153 WriteBuffer queue holds the following memory requests: 154 - Uncached writes. 155 - Writeback from evicted (& dirty) cache lines. 156 157 \image html "gem5_MS_Fig3.PNG" "MSHR and Write Buffer Blocks" width=6cm 158 159 Each memory request is assigned to corresponding MSHR object (READ or WRITE 160 on diagram above) that represents particular block (cache line) of memory 161 that has to be read or written in order to complete the command(s). As shown 162 on gigure above, cached read/writes against the same cache line have a common 163 MSHR object and will be completed with a single memory access. 164 165 The size of the block (and therefore the size of read/write access to lower 166 memory) is: 167 - The size of cache line for cached access & writeback; 168 - As specified in CPU instruction for uncached access. 169 170 In general, Data Cache model distinguishes between just two memory types: 171 - Normal Cached memory. It is always treated as write back, read and write 172 allocate. 173 - Normal uncached, Device and Strongly Ordered types are treated equally 174 (as uncached memory) 175 176 \subsection gem5_MS_Ordering Memory Access Ordering 177 178 An unique order number is assigned to each CPU read/write request(as they appear on 179 slave port). Order numbers of MSHR objects are copied from the first 180 assigned read/write. 181 182 Memory read/writes from each of these two queues are executed in order (according 183 to the assigned order number). When both queues are not empty the model will 184 execute memory read from MSHR block unless WriteBuffer is full. It will, 185 however, always preserve the order of read/writes on the same 186 (or overlapping) memory cache line (block). 187 188 In summary: 189 - Order of accesses to cached memory is not preserved unless they target 190 the same cache line. For example, the accesses #1, #5 & #10 will 191 complete simultaneously in the same tick (still in order). The access 192 #5 will complete before #3. 193 - Order of all uncached memory writes is preserved. Write#6 always 194 completes before Write#13. 195 - Order to all uncached memory reads is preserved. Read#2 always completes 196 before Read#8. 197 - The order of a read and a write uncached access is not necessarily 198 preserved - unless their access regions overlap. Therefore, Write#6 199 always completes before Read#8 (they target the same memory block). 200 However, Write#13 may complete before Read#8. 201 202 203 \section gem5_MS_Bus COHERENT BUS OBJECT 204 205 \image html "gem5_MS_Fig4.PNG" "Coherent Bus Object" width=3cm 206 207 Coherent Bus object provides basic support for snoop protocol: 208 209 <b>All requests on the slave port</b> are forwarded to the appropriate master port. Requests 210 for cached memory regions are also forwarded to other slave ports (as snoop 211 requests). 212 213 <b>Master port replies</b> are forwarded to the appropriate slave port. 214 215 <b>Master port snoop requests</b> are forwarded to all slave ports. 216 217 <b>Slave port snoop replies</b> are forwarded to the port that was the source of the 218 request. (Note that the source of snoop request can be either slave or 219 master port.) 220 221 The bus declares itself blocked for a configurable period of time after 222 any of the following events: 223 - A packet is sent (or failed to be sent) to a slave port. 224 - A reply message is sent to a master port. 225 - Snoop response from one slave port is sent to another slave port. 226 227 The bus in blocked state rejects the following incoming messages: 228 - Slave port requests. 229 - Master port replies. 230 - Master port snoop requests. 231 232 \section gem5_MS_SimpleMemory SIMPLE MEMORY OBJECT 233 234 It never blocks the access on slave port. 235 236 Memory read/write takes immediate effect. (Read or write is performed when 237 the request is received). 238 239 Reply message is sent after a configurable period of time . 240 241 \section gem5_MS_MessageFlow MESSAGE FLOW 242 243 \subsection gem5_MS_Ordering Read Access 244 245 The following diagram shows read access that hits Data Cache line with Valid 246 and Read flags: 247 248 \image html "gem5_MS_Fig5.PNG" "Read Hit (Read flag must be set in cache line)" width=3cm 249 250 Cache miss read access will generate the following sequence of messages: 251 252 \image html "gem5_MS_Fig6.PNG" "Read Miss with snoop reply" width=3cm 253 254 Note that bus object never gets response from both DCache2 and Memory object. 255 It sends the very same ReadReq package (message) object to memory and data 256 cache. When Data Cache wants to reply on snoop request it marks the message 257 with MEM_INHIBIT flag that tells Memory object not to process the message. 258 259 \subsection gem5_MS_Ordering Write Access 260 261 The following diagram shows write access that hits DCache1 cache line with 262 Valid & Write flags: 263 264 \image html "gem5_MS_Fig7.PNG" "Write Hit (with Write flag set in cache line)" width=3cm 265 266 Next figure shows write access that hits DCache1 cache line with Valid but no 267 Write flags – which qualifies as write miss. DCache1 issues UpgradeReq to 268 obtain write permission. DCache2::snoopTiming will invalidate cache line that 269 has been hit. Note that UpgradeResp message doesn’t carry data. 270 271 \image html "gem5_MS_Fig8.PNG" "Write Miss – matching tag with no Write flag" width=3cm 272 273 The next diagram shows write miss in DCache. ReadExReq invalidates cache line 274 in DCache2. ReadExResp carries the content of memory cache line. 275 276 \image html "gem5_MS_Fig9.PNG" "Miss - no matching tag" width=3cm 277 278*/ 279