1# Copyright (c) 2012-2013 ARM Limited 2# All rights reserved. 3# 4# The license below extends only to copyright in the software and shall 5# not be construed as granting a license to any other intellectual 6# property including but not limited to intellectual property relating 7# to a hardware implementation of the functionality of the software 8# licensed hereunder. You may use the software subject to the license 9# terms below provided that you ensure that this notice is replicated 10# unmodified and in its entirety in all distributions of the software, 11# modified or unmodified, in source code or in binary form. 12# 13# Copyright (c) 2015 The University of Bologna 14# All rights reserved. 15# 16# Redistribution and use in source and binary forms, with or without 17# modification, are permitted provided that the following conditions are 18# met: redistributions of source code must retain the above copyright 19# notice, this list of conditions and the following disclaimer; 20# redistributions in binary form must reproduce the above copyright 21# notice, this list of conditions and the following disclaimer in the 22# documentation and/or other materials provided with the distribution; 23# neither the name of the copyright holders nor the names of its 24# contributors may be used to endorse or promote products derived from 25# this software without specific prior written permission. 26# 27# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 28# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 29# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 30# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 31# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 32# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 33# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 34# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 35# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 36# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38# 39# Authors: Erfan Azarkhish 40# Abdul Mutaal Ahmad 41 42# A Simplified model of a complete HMC device. Based on: 43# [1] http://www.hybridmemorycube.org/specification-download/ 44# [2] High performance AXI-4.0 based interconnect for extensible smart memory 45# cubes(E. Azarkhish et. al) 46# [3] Low-Power Hybrid Memory Cubes With Link Power Management and Two-Level 47# Prefetching (J. Ahn et. al) 48# [4] Memory-centric system interconnect design with Hybrid Memory Cubes 49# (G. Kim et. al) 50# [5] Near Data Processing, Are we there yet? (M. Gokhale) 51# http://www.cs.utah.edu/wondp/gokhale.pdf 52# [6] openHMC - A Configurable Open-Source Hybrid Memory Cube Controller 53# (J. Schmidt) 54# [7] Hybrid Memory Cube performance characterization on data-centric 55# workloads (M. Gokhale) 56# 57# This script builds a complete HMC device composed of vault controllers, 58# serial links, the main internal crossbar, and an external hmc controller. 59# 60# - VAULT CONTROLLERS: 61# Instances of the HMC_2500_1x32 class with their functionality specified in 62# dram_ctrl.cc 63# 64# - THE MAIN XBAR: 65# This component is simply an instance of the NoncoherentXBar class, and its 66# parameters are tuned to [2]. 67# 68# - SERIAL LINKS CONTROLLER: 69# SerialLink is a simple variation of the Bridge class, with the ability to 70# account for the latency of packet serialization and controller latency. We 71# assume that the serializer component at the transmitter side does not need 72# to receive the whole packet to start the serialization. But the 73# deserializer waits for the complete packet to check its integrity first. 74# 75# * Bandwidth of the serial links is not modeled in the SerialLink component 76# itself. 77# 78# * Latency of serial link controller is composed of SerDes latency + link 79# controller 80# 81# * It is inferred from the standard [1] and the literature [3] that serial 82# links share the same address range and packets can travel over any of 83# them so a load distribution mechanism is required among them. 84# 85# ----------------------------------------- 86# | Host/HMC Controller | 87# | ---------------------- | 88# | | Link Aggregator | opt | 89# | ---------------------- | 90# | ---------------------- | 91# | | Serial Link + Ser | * 4 | 92# | ---------------------- | 93# |--------------------------------------- 94# ----------------------------------------- 95# | Device 96# | ---------------------- | 97# | | Xbar | * 4 | 98# | ---------------------- | 99# | ---------------------- | 100# | | Vault Controller | * 16 | 101# | ---------------------- | 102# | ---------------------- | 103# | | Memory | | 104# | ---------------------- | 105# |---------------------------------------| 106# 107# In this version we have present 3 different HMC archiecture along with 108# alongwith their corresponding test script. 109# 110# same: It has 4 crossbars in HMC memory. All the crossbars are connected 111# to each other, providing complete memory range. This archicture also covers 112# the added latency for sending a request to non-local vault(bridge in b/t 113# crossbars). All the 4 serial links can access complete memory. So each 114# link can be connected to separate processor. 115# 116# distributed: It has 4 crossbars inside the HMC. Crossbars are not 117# connected.Through each crossbar only local vaults can be accessed. But to 118# support this architecture we need a crossbar between serial links and 119# processor. 120# 121# mixed: This is a hybrid architecture. It has 4 crossbars inside the HMC. 122# 2 Crossbars are connected to only local vaults. From other 2 crossbar, a 123# request can be forwarded to any other vault. 124 125from __future__ import print_function 126from __future__ import absolute_import 127 128import argparse 129 130import m5 131from m5.objects import * 132from m5.util import * 133 134 135def add_options(parser): 136 # *****************************CROSSBAR PARAMETERS************************* 137 # Flit size of the main interconnect [1] 138 parser.add_argument("--xbar-width", default=32, action="store", type=int, 139 help="Data width of the main XBar (Bytes)") 140 141 # Clock frequency of the main interconnect [1] 142 # This crossbar, is placed on the logic-based of the HMC and it has its 143 # own voltage and clock domains, different from the DRAM dies or from the 144 # host. 145 parser.add_argument("--xbar-frequency", default='1GHz', type=str, 146 help="Clock Frequency of the main XBar") 147 148 # Arbitration latency of the HMC XBar [1] 149 parser.add_argument("--xbar-frontend-latency", default=1, action="store", 150 type=int, help="Arbitration latency of the XBar") 151 152 # Latency to forward a packet via the interconnect [1](two levels of FIFOs 153 # at the input and output of the inteconnect) 154 parser.add_argument("--xbar-forward-latency", default=2, action="store", 155 type=int, help="Forward latency of the XBar") 156 157 # Latency to forward a response via the interconnect [1](two levels of 158 # FIFOs at the input and output of the inteconnect) 159 parser.add_argument("--xbar-response-latency", default=2, action="store", 160 type=int, help="Response latency of the XBar") 161 162 # number of cross which connects 16 Vaults to serial link[7] 163 parser.add_argument("--number-mem-crossbar", default=4, action="store", 164 type=int, help="Number of crossbar in HMC") 165 166 # *****************************SERIAL LINK PARAMETERS********************** 167 # Number of serial links controllers [1] 168 parser.add_argument("--num-links-controllers", default=4, action="store", 169 type=int, help="Number of serial links") 170 171 # Number of packets (not flits) to store at the request side of the serial 172 # link. This number should be adjusted to achive required bandwidth 173 parser.add_argument("--link-buffer-size-req", default=10, action="store", 174 type=int, help="Number of packets to buffer at the\ 175 request side of the serial link") 176 177 # Number of packets (not flits) to store at the response side of the serial 178 # link. This number should be adjusted to achive required bandwidth 179 parser.add_argument("--link-buffer-size-rsp", default=10, action="store", 180 type=int, help="Number of packets to buffer at the\ 181 response side of the serial link") 182 183 # Latency of the serial link composed by SER/DES latency (1.6ns [4]) plus 184 # the PCB trace latency (3ns Estimated based on [5]) 185 parser.add_argument("--link-latency", default='4.6ns', type=str, 186 help="Latency of the serial links") 187 188 # Clock frequency of the each serial link(SerDes) [1] 189 parser.add_argument("--link-frequency", default='10GHz', type=str, 190 help="Clock Frequency of the serial links") 191 192 # Clock frequency of serial link Controller[6] 193 # clk_hmc[Mhz]= num_lanes_per_link * lane_speed [Gbits/s] / 194 # data_path_width * 10^6 195 # clk_hmc[Mhz]= 16 * 10 Gbps / 256 * 10^6 = 625 Mhz 196 parser.add_argument("--link-controller-frequency", default='625MHz', 197 type=str, help="Clock Frequency of the link\ 198 controller") 199 200 # Latency of the serial link controller to process the packets[1][6] 201 # (ClockDomain = 625 Mhz ) 202 # used here for calculations only 203 parser.add_argument("--link-ctrl-latency", default=4, action="store", 204 type=int, help="The number of cycles required for the\ 205 controller to process the packet") 206 207 # total_ctrl_latency = link_ctrl_latency + link_latency 208 # total_ctrl_latency = 4(Cycles) * 1.6 ns + 4.6 ns 209 parser.add_argument("--total-ctrl-latency", default='11ns', type=str, 210 help="The latency experienced by every packet\ 211 regardless of size of packet") 212 213 # Number of parallel lanes in each serial link [1] 214 parser.add_argument("--num-lanes-per-link", default=16, action="store", 215 type=int, help="Number of lanes per each link") 216 217 # Number of serial links [1] 218 parser.add_argument("--num-serial-links", default=4, action="store", 219 type=int, help="Number of serial links") 220 221 # speed of each lane of serial link - SerDes serial interface 10 Gb/s 222 parser.add_argument("--serial-link-speed", default=10, action="store", 223 type=int, help="Gbs/s speed of each lane of serial\ 224 link") 225 226 # address range for each of the serial links 227 parser.add_argument("--serial-link-addr-range", default='1GB', type=str, 228 help="memory range for each of the serial links.\ 229 Default: 1GB") 230 231 # *****************************PERFORMANCE MONITORING********************* 232 # The main monitor behind the HMC Controller 233 parser.add_argument("--enable-global-monitor", action="store_true", 234 help="The main monitor behind the HMC Controller") 235 236 # The link performance monitors 237 parser.add_argument("--enable-link-monitor", action="store_true", 238 help="The link monitors") 239 240 # link aggregator enable - put a cross between buffers & links 241 parser.add_argument("--enable-link-aggr", action="store_true", help="The\ 242 crossbar between port and Link Controller") 243 244 parser.add_argument("--enable-buff-div", action="store_true", 245 help="Memory Range of Buffer is ivided between total\ 246 range") 247 248 # *****************************HMC ARCHITECTURE ************************** 249 # Memory chunk for 16 vault - numbers of vault / number of crossbars 250 parser.add_argument("--mem-chunk", default=4, action="store", type=int, 251 help="Chunk of memory range for each cross bar in\ 252 arch 0") 253 254 # size of req buffer within crossbar, used for modelling extra latency 255 # when the reuqest go to non-local vault 256 parser.add_argument("--xbar-buffer-size-req", default=10, action="store", 257 type=int, help="Number of packets to buffer at the\ 258 request side of the crossbar") 259 260 # size of response buffer within crossbar, used for modelling extra latency 261 # when the response received from non-local vault 262 parser.add_argument("--xbar-buffer-size-resp", default=10, action="store", 263 type=int, help="Number of packets to buffer at the\ 264 response side of the crossbar") 265 # HMC device architecture. It affects the HMC host controller as well 266 parser.add_argument("--arch", type=str, choices=["same", "distributed", 267 "mixed"], default="distributed", help="same: HMC with\ 268 4 links, all with same range.\ndistributed: HMC with\ 269 4 links with distributed range.\nmixed: mixed with\ 270 same and distributed range.\nDefault: distributed") 271 # HMC device - number of vaults 272 parser.add_argument("--hmc-dev-num-vaults", default=16, action="store", 273 type=int, help="number of independent vaults within\ 274 the HMC device. Note: each vault has a memory\ 275 controller (valut controller)\nDefault: 16") 276 # HMC device - vault capacity or size 277 parser.add_argument("--hmc-dev-vault-size", default='256MB', type=str, 278 help="vault storage capacity in bytes. Default:\ 279 256MB") 280 parser.add_argument("--mem-type", type=str, choices=["HMC_2500_1x32"], 281 default="HMC_2500_1x32", help="type of HMC memory to\ 282 use. Default: HMC_2500_1x32") 283 parser.add_argument("--mem-channels", default=1, action="store", type=int, 284 help="Number of memory channels") 285 parser.add_argument("--mem-ranks", default=1, action="store", type=int, 286 help="Number of ranks to iterate across") 287 parser.add_argument("--burst-length", default=256, action="store", 288 type=int, help="burst length in bytes. Note: the\ 289 cache line size will be set to this value.\nDefault:\ 290 256") 291 292 293# configure HMC host controller 294def config_hmc_host_ctrl(opt, system): 295 296 # create HMC host controller 297 system.hmc_host = SubSystem() 298 299 # Create additional crossbar for arch1 300 if opt.arch == "distributed" or opt.arch == "mixed": 301 clk = '100GHz' 302 vd = VoltageDomain(voltage='1V') 303 # Create additional crossbar for arch1 304 system.membus = NoncoherentXBar(width=8) 305 system.membus.badaddr_responder = BadAddr() 306 system.membus.default = Self.badaddr_responder.pio 307 system.membus.width = 8 308 system.membus.frontend_latency = 3 309 system.membus.forward_latency = 4 310 system.membus.response_latency = 2 311 cd = SrcClockDomain(clock=clk, voltage_domain=vd) 312 system.membus.clk_domain = cd 313 314 # create memory ranges for the serial links 315 slar = convert.toMemorySize(opt.serial_link_addr_range) 316 # Memmory ranges of serial link for arch-0. Same as the ranges of vault 317 # controllers (4 vaults to 1 serial link) 318 if opt.arch == "same": 319 ser_ranges = [AddrRange(0, (4*slar)-1) for i in 320 range(opt.num_serial_links)] 321 # Memmory ranges of serial link for arch-1. Distributed range accross 322 # links 323 if opt.arch == "distributed": 324 ser_ranges = [AddrRange(i*slar, ((i+1)*slar)-1) for i in 325 range(opt.num_serial_links)] 326 # Memmory ranges of serial link for arch-2 'Mixed' address distribution 327 # over links 328 if opt.arch == "mixed": 329 ser_range0 = AddrRange(0, (1*slar)-1) 330 ser_range1 = AddrRange(1*slar, 2*slar-1) 331 ser_range2 = AddrRange(0, (4*slar)-1) 332 ser_range3 = AddrRange(0, (4*slar)-1) 333 ser_ranges = [ser_range0, ser_range1, ser_range2, ser_range3] 334 335 # Serial link Controller with 16 SerDes links at 10 Gbps with serial link 336 # ranges w.r.t to architecture 337 sl = [SerialLink(ranges=ser_ranges[i], 338 req_size=opt.link_buffer_size_req, 339 resp_size=opt.link_buffer_size_rsp, 340 num_lanes=opt.num_lanes_per_link, 341 link_speed=opt.serial_link_speed, 342 delay=opt.total_ctrl_latency) for i in 343 range(opt.num_serial_links)] 344 system.hmc_host.seriallink = sl 345 346 # enable global monitor 347 if opt.enable_global_monitor: 348 system.hmc_host.lmonitor = [CommMonitor() for i in 349 range(opt.num_serial_links)] 350 351 # set the clock frequency for serial link 352 for i in range(opt.num_serial_links): 353 clk = opt.link_controller_frequency 354 vd = VoltageDomain(voltage='1V') 355 scd = SrcClockDomain(clock=clk, voltage_domain=vd) 356 system.hmc_host.seriallink[i].clk_domain = scd 357 358 # Connect membus/traffic gen to Serial Link Controller for differrent HMC 359 # architectures 360 hh = system.hmc_host 361 if opt.arch == "distributed": 362 mb = system.membus 363 for i in range(opt.num_links_controllers): 364 if opt.enable_global_monitor: 365 mb.master = hh.lmonitor[i].slave 366 hh.lmonitor[i].master = hh.seriallink[i].slave 367 else: 368 mb.master = hh.seriallink[i].slave 369 if opt.arch == "mixed": 370 mb = system.membus 371 if opt.enable_global_monitor: 372 mb.master = hh.lmonitor[0].slave 373 hh.lmonitor[0].master = hh.seriallink[0].slave 374 mb.master = hh.lmonitor[1].slave 375 hh.lmonitor[1].master = hh.seriallink[1].slave 376 else: 377 mb.master = hh.seriallink[0].slave 378 mb.master = hh.seriallink[1].slave 379 380 if opt.arch == "same": 381 for i in range(opt.num_links_controllers): 382 if opt.enable_global_monitor: 383 hh.lmonitor[i].master = hh.seriallink[i].slave 384 385 return system 386 387 388# Create an HMC device 389def config_hmc_dev(opt, system, hmc_host): 390 391 # create HMC device 392 system.hmc_dev = SubSystem() 393 394 # create memory ranges for the vault controllers 395 arv = convert.toMemorySize(opt.hmc_dev_vault_size) 396 addr_ranges_vaults = [AddrRange(i*arv, ((i+1)*arv-1)) for i in 397 range(opt.hmc_dev_num_vaults)] 398 system.mem_ranges = addr_ranges_vaults 399 400 if opt.enable_link_monitor: 401 lm = [CommMonitor() for i in range(opt.num_links_controllers)] 402 system.hmc_dev.lmonitor = lm 403 404 # 4 HMC Crossbars located in its logic-base (LoB) 405 xb = [NoncoherentXBar(width=opt.xbar_width, 406 frontend_latency=opt.xbar_frontend_latency, 407 forward_latency=opt.xbar_forward_latency, 408 response_latency=opt.xbar_response_latency) for i in 409 range(opt.number_mem_crossbar)] 410 system.hmc_dev.xbar = xb 411 412 for i in range(opt.number_mem_crossbar): 413 clk = opt.xbar_frequency 414 vd = VoltageDomain(voltage='1V') 415 scd = SrcClockDomain(clock=clk, voltage_domain=vd) 416 system.hmc_dev.xbar[i].clk_domain = scd 417 418 # Attach 4 serial link to 4 crossbar/s 419 for i in range(opt.num_serial_links): 420 if opt.enable_link_monitor: 421 system.hmc_host.seriallink[i].master = \ 422 system.hmc_dev.lmonitor[i].slave 423 system.hmc_dev.lmonitor[i].master = system.hmc_dev.xbar[i].slave 424 else: 425 system.hmc_host.seriallink[i].master = system.hmc_dev.xbar[i].slave 426 427 # Connecting xbar with each other for request arriving at the wrong xbar, 428 # then it will be forward to correct xbar. Bridge is used to connect xbars 429 if opt.arch == "same": 430 numx = len(system.hmc_dev.xbar) 431 432 # create a list of buffers 433 system.hmc_dev.buffers = [Bridge(req_size=opt.xbar_buffer_size_req, 434 resp_size=opt.xbar_buffer_size_resp) 435 for i in range(numx*(opt.mem_chunk-1))] 436 437 # Buffer iterator 438 it = iter(range(len(system.hmc_dev.buffers))) 439 440 # necesarry to add system_port to one of the xbar 441 system.system_port = system.hmc_dev.xbar[3].slave 442 443 # iterate over all the crossbars and connect them as required 444 for i in range(numx): 445 for j in range(numx): 446 # connect xbar to all other xbars except itself 447 if i != j: 448 # get the next index of buffer 449 index = it.next() 450 451 # Change the default values for ranges of bridge 452 system.hmc_dev.buffers[index].ranges = system.mem_ranges[ 453 j * int(opt.mem_chunk): 454 (j + 1) * int(opt.mem_chunk)] 455 456 # Connect the bridge between corssbars 457 system.hmc_dev.xbar[i].master = system.hmc_dev.buffers[ 458 index].slave 459 system.hmc_dev.buffers[ 460 index].master = system.hmc_dev.xbar[j].slave 461 else: 462 # Don't connect the xbar to itself 463 pass 464 465 # Two crossbars are connected to all other crossbars-Other 2 vault 466 # can only direct traffic to it local vaults 467 if opt.arch == "mixed": 468 system.hmc_dev.buffer30 = Bridge(ranges=system.mem_ranges[0:4]) 469 system.hmc_dev.xbar[3].master = system.hmc_dev.buffer30.slave 470 system.hmc_dev.buffer30.master = system.hmc_dev.xbar[0].slave 471 472 system.hmc_dev.buffer31 = Bridge(ranges=system.mem_ranges[4:8]) 473 system.hmc_dev.xbar[3].master = system.hmc_dev.buffer31.slave 474 system.hmc_dev.buffer31.master = system.hmc_dev.xbar[1].slave 475 476 system.hmc_dev.buffer32 = Bridge(ranges=system.mem_ranges[8:12]) 477 system.hmc_dev.xbar[3].master = system.hmc_dev.buffer32.slave 478 system.hmc_dev.buffer32.master = system.hmc_dev.xbar[2].slave 479 480 system.hmc_dev.buffer20 = Bridge(ranges=system.mem_ranges[0:4]) 481 system.hmc_dev.xbar[2].master = system.hmc_dev.buffer20.slave 482 system.hmc_dev.buffer20.master = system.hmc_dev.xbar[0].slave 483 484 system.hmc_dev.buffer21 = Bridge(ranges=system.mem_ranges[4:8]) 485 system.hmc_dev.xbar[2].master = system.hmc_dev.buffer21.slave 486 system.hmc_dev.buffer21.master = system.hmc_dev.xbar[1].slave 487 488 system.hmc_dev.buffer23 = Bridge(ranges=system.mem_ranges[12:16]) 489 system.hmc_dev.xbar[2].master = system.hmc_dev.buffer23.slave 490 system.hmc_dev.buffer23.master = system.hmc_dev.xbar[3].slave 491