trace_cpu.hh revision 11252:18bb597fc40c
1/* 2 * Copyright (c) 2013 - 2015 ARM Limited 3 * All rights reserved 4 * 5 * The license below extends only to copyright in the software and shall 6 * not be construed as granting a license to any other intellectual 7 * property including but not limited to intellectual property relating 8 * to a hardware implementation of the functionality of the software 9 * licensed hereunder. You may use the software subject to the license 10 * terms below provided that you ensure that this notice is replicated 11 * unmodified and in its entirety in all distributions of the software, 12 * modified or unmodified, in source code or in binary form. 13 * 14 * Redistribution and use in source and binary forms, with or without 15 * modification, are permitted provided that the following conditions are 16 * met: redistributions of source code must retain the above copyright 17 * notice, this list of conditions and the following disclaimer; 18 * redistributions in binary form must reproduce the above copyright 19 * notice, this list of conditions and the following disclaimer in the 20 * documentation and/or other materials provided with the distribution; 21 * neither the name of the copyright holders nor the names of its 22 * contributors may be used to endorse or promote products derived from 23 * this software without specific prior written permission. 24 * 25 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 26 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 27 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 28 * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 29 * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 30 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 31 * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 32 * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 33 * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 34 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 35 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 36 * 37 * Authors: Radhika Jagtap 38 * Andreas Hansson 39 * Thomas Grass 40 */ 41 42#ifndef __CPU_TRACE_TRACE_CPU_HH__ 43#define __CPU_TRACE_TRACE_CPU_HH__ 44 45#include <array> 46#include <cstdint> 47#include <queue> 48#include <set> 49#include <unordered_map> 50 51#include "arch/registers.hh" 52#include "base/statistics.hh" 53#include "cpu/base.hh" 54#include "debug/TraceCPUData.hh" 55#include "debug/TraceCPUInst.hh" 56#include "params/TraceCPU.hh" 57#include "proto/inst_dep_record.pb.h" 58#include "proto/packet.pb.h" 59#include "proto/protoio.hh" 60#include "sim/sim_events.hh" 61 62/** 63 * The trace cpu replays traces generated using the elastic trace probe 64 * attached to the O3 CPU model. The elastic trace is an execution trace with 65 * register data dependencies and ordering dependencies annotated to it. The 66 * trace cpu also replays a fixed timestamp fetch trace that is also generated 67 * by the elastic trace probe. This trace cpu model aims at achieving faster 68 * simulation compared to the detailed cpu model and good correlation when the 69 * same trace is used for playback on different memory sub-systems. 70 * 71 * The TraceCPU inherits from BaseCPU so some virtual methods need to be 72 * defined. It has two port subclasses inherited from MasterPort for 73 * instruction and data ports. It issues the memory requests deducing the 74 * timing from the trace and without performing real execution of micro-ops. As 75 * soon as the last dependency for an instruction is complete, its 76 * computational delay, also provided in the input trace is added. The 77 * dependency-free nodes are maintained in a list, called 'ReadyList', ordered 78 * by ready time. Instructions which depend on load stall until the responses 79 * for read requests are received thus achieving elastic replay. If the 80 * dependency is not found when adding a new node, it is assumed complete. 81 * Thus, if this node is found to be completely dependency-free its issue time 82 * is calculated and it is added to the ready list immediately. This is 83 * encapsulated in the subclass ElasticDataGen. 84 * 85 * If ready nodes are issued in an unconstrained way there can be more nodes 86 * outstanding which results in divergence in timing compared to the O3CPU. 87 * Therefore, the Trace CPU also models hardware resources. A sub-class to 88 * model hardware resources contains the maximum sizes of load buffer, store 89 * buffer and ROB. If resources are not available, the node is not issued. Such 90 * nodes that are pending issue are held in the 'depFreeQueue' structure. 91 * 92 * Modeling the ROB size in the Trace CPU as a resource limitation is arguably 93 * the most important parameter of all resources. The ROB occupancy is 94 * estimated using the newly added field 'robNum'. We need to use ROB number as 95 * sequence number is at times much higher due to squashing and trace replay is 96 * focused on correct path modeling. 97 * 98 * A map called 'inFlightNodes' is added to track nodes that are not only in 99 * the readyList but also load nodes that are executed (and thus removed from 100 * readyList) but are not complete. ReadyList handles what and when to execute 101 * next node while the inFlightNodes is used for resource modelling. The oldest 102 * ROB number is updated when any node occupies the ROB or when an entry in the 103 * ROB is released. The ROB occupancy is equal to the difference in the ROB 104 * number of the newly dependency-free node and the oldest ROB number in 105 * flight. 106 * 107 * If no node depends on a non load/store node then there is no reason to 108 * track it in the dependency graph. We filter out such nodes but count them 109 * and add a weight field to the subsequent node that we do include in the 110 * trace. The weight field is used to model ROB occupancy during replay. 111 * 112 * The depFreeQueue is chosen to be FIFO so that child nodes which are in 113 * program order get pushed into it in that order and thus issued in program 114 * order, like in the O3CPU. This is also why the dependents is made a 115 * sequential container, std::set to std::vector. We only check head of the 116 * depFreeQueue as nodes are issued in order and blocking on head models that 117 * better than looping the entire queue. An alternative choice would be to 118 * inspect top N pending nodes where N is the issue-width. This is left for 119 * future as the timing correlation looks good as it is. 120 * 121 * At the start of an execution event, first we attempt to issue such pending 122 * nodes by checking if appropriate resources have become available. If yes, we 123 * compute the execute tick with respect to the time then. Then we proceed to 124 * complete nodes from the readyList. 125 * 126 * When a read response is received, sometimes a dependency on it that was 127 * supposed to be released when it was issued is still not released. This 128 * occurs because the dependent gets added to the graph after the read was 129 * sent. So the check is made less strict and the dependency is marked complete 130 * on read response instead of insisting that it should have been removed on 131 * read sent. 132 * 133 * There is a check for requests spanning two cache lines as this condition 134 * triggers an assert fail in the L1 cache. If it does then truncate the size 135 * to access only until the end of that line and ignore the remainder. 136 * Strictly-ordered requests are skipped and the dependencies on such requests 137 * are handled by simply marking them complete immediately. 138 * 139 * The simulated seconds can be calculated as the difference between the 140 * final_tick stat and the tickOffset stat. A CountedExitEvent that contains a 141 * static int belonging to the Trace CPU class as a down counter is used to 142 * implement multi Trace CPU simulation exit. 143 */ 144 145class TraceCPU : public BaseCPU 146{ 147 148 public: 149 TraceCPU(TraceCPUParams *params); 150 ~TraceCPU(); 151 152 void init(); 153 154 /** 155 * This is a pure virtual function in BaseCPU. As we don't know how many 156 * insts are in the trace but only know how how many micro-ops are we 157 * cannot count this stat. 158 * 159 * @return 0 160 */ 161 Counter totalInsts() const 162 { 163 return 0; 164 } 165 166 /** 167 * Return totalOps as the number of committed micro-ops plus the 168 * speculatively issued loads that are modelled in the TraceCPU replay. 169 * 170 * @return number of micro-ops i.e. nodes in the elastic data generator 171 */ 172 Counter totalOps() const 173 { 174 return dcacheGen.getMicroOpCount(); 175 } 176 177 /* Pure virtual function in BaseCPU. Do nothing. */ 178 void wakeup(ThreadID tid = 0) 179 { 180 return; 181 } 182 183 /* 184 * When resuming from checkpoint in FS mode, the TraceCPU takes over from 185 * the old cpu. This function overrides the takeOverFrom() function in the 186 * BaseCPU. It unbinds the ports of the old CPU and binds the ports of the 187 * TraceCPU. 188 */ 189 void takeOverFrom(BaseCPU *oldCPU); 190 191 /** 192 * When instruction cache port receives a retry, schedule event 193 * icacheNextEvent. 194 */ 195 void icacheRetryRecvd(); 196 197 /** 198 * When data cache port receives a retry, schedule event 199 * dcacheNextEvent. 200 */ 201 void dcacheRetryRecvd(); 202 203 /** 204 * When data cache port receives a response, this calls the dcache 205 * generator method handle to complete the load writeback. 206 * 207 * @param pkt Pointer to packet received 208 */ 209 void dcacheRecvTimingResp(PacketPtr pkt); 210 211 /** 212 * Schedule event dcacheNextEvent at the given tick 213 * 214 * @param when Tick at which to schedule event 215 */ 216 void schedDcacheNextEvent(Tick when); 217 218 protected: 219 220 /** 221 * IcachePort class that interfaces with L1 Instruction Cache. 222 */ 223 class IcachePort : public MasterPort 224 { 225 public: 226 /** Default constructor. */ 227 IcachePort(TraceCPU* _cpu) 228 : MasterPort(_cpu->name() + ".icache_port", _cpu), 229 owner(_cpu) 230 { } 231 232 public: 233 /** 234 * Receive the timing reponse and simply delete the packet since 235 * instruction fetch requests are issued as per the timing in the trace 236 * and responses are ignored. 237 * 238 * @param pkt Pointer to packet received 239 * @return true 240 */ 241 bool recvTimingResp(PacketPtr pkt); 242 243 /** 244 * Required functionally but do nothing. 245 * 246 * @param pkt Pointer to packet received 247 */ 248 void recvTimingSnoopReq(PacketPtr pkt) { } 249 250 /** 251 * Handle a retry signalled by the cache if instruction read failed in 252 * the first attempt. 253 */ 254 void recvReqRetry(); 255 256 private: 257 TraceCPU* owner; 258 }; 259 260 /** 261 * DcachePort class that interfaces with L1 Data Cache. 262 */ 263 class DcachePort : public MasterPort 264 { 265 266 public: 267 /** Default constructor. */ 268 DcachePort(TraceCPU* _cpu) 269 : MasterPort(_cpu->name() + ".dcache_port", _cpu), 270 owner(_cpu) 271 { } 272 273 public: 274 275 /** 276 * Receive the timing reponse and call dcacheRecvTimingResp() method 277 * of the dcacheGen to handle completing the load 278 * 279 * @param pkt Pointer to packet received 280 * @return true 281 */ 282 bool recvTimingResp(PacketPtr pkt); 283 284 /** 285 * Required functionally but do nothing. 286 * 287 * @param pkt Pointer to packet received 288 */ 289 void recvTimingSnoopReq(PacketPtr pkt) 290 { } 291 292 /** 293 * Required functionally but do nothing. 294 * 295 * @param pkt Pointer to packet received 296 */ 297 void recvFunctionalSnoop(PacketPtr pkt) 298 { } 299 300 /** 301 * Handle a retry signalled by the cache if data access failed in the 302 * first attempt. 303 */ 304 void recvReqRetry(); 305 306 /** 307 * Required functionally. 308 * 309 * @return true since we have to snoop 310 */ 311 bool isSnooping() const { return true; } 312 313 private: 314 TraceCPU* owner; 315 }; 316 317 /** Port to connect to L1 instruction cache. */ 318 IcachePort icachePort; 319 320 /** Port to connect to L1 data cache. */ 321 DcachePort dcachePort; 322 323 /** Master id for instruction read requests. */ 324 const MasterID instMasterID; 325 326 /** Master id for data read and write requests. */ 327 const MasterID dataMasterID; 328 329 /** File names for input instruction and data traces. */ 330 std::string instTraceFile, dataTraceFile; 331 332 /** 333 * Generator to read protobuf trace containing memory requests at fixed 334 * timestamps, perform flow control and issue memory requests. If L1 cache 335 * port sends packet succesfully, determine the tick to send the next 336 * packet else wait for retry from cache. 337 */ 338 class FixedRetryGen 339 { 340 341 private: 342 343 /** 344 * This struct stores a line in the trace file. 345 */ 346 struct TraceElement { 347 348 /** Specifies if the request is to be a read or a write */ 349 MemCmd cmd; 350 351 /** The address for the request */ 352 Addr addr; 353 354 /** The size of the access for the request */ 355 Addr blocksize; 356 357 /** The time at which the request should be sent */ 358 Tick tick; 359 360 /** Potential request flags to use */ 361 Request::FlagsType flags; 362 363 /** Instruction PC */ 364 Addr pc; 365 366 /** 367 * Check validity of this element. 368 * 369 * @return if this element is valid 370 */ 371 bool isValid() const { 372 return cmd != MemCmd::InvalidCmd; 373 } 374 375 /** 376 * Make this element invalid. 377 */ 378 void clear() { 379 cmd = MemCmd::InvalidCmd; 380 } 381 }; 382 383 /** 384 * The InputStream encapsulates a trace file and the 385 * internal buffers and populates TraceElements based on 386 * the input. 387 */ 388 class InputStream 389 { 390 391 private: 392 393 // Input file stream for the protobuf trace 394 ProtoInputStream trace; 395 396 public: 397 398 /** 399 * Create a trace input stream for a given file name. 400 * 401 * @param filename Path to the file to read from 402 */ 403 InputStream(const std::string& filename); 404 405 /** 406 * Reset the stream such that it can be played once 407 * again. 408 */ 409 void reset(); 410 411 /** 412 * Attempt to read a trace element from the stream, 413 * and also notify the caller if the end of the file 414 * was reached. 415 * 416 * @param element Trace element to populate 417 * @return True if an element could be read successfully 418 */ 419 bool read(TraceElement* element); 420 }; 421 422 public: 423 /* Constructor */ 424 FixedRetryGen(TraceCPU& _owner, const std::string& _name, 425 MasterPort& _port, MasterID master_id, 426 const std::string& trace_file) 427 : owner(_owner), 428 port(_port), 429 masterID(master_id), 430 trace(trace_file), 431 genName(owner.name() + ".fixedretry" + _name), 432 retryPkt(nullptr), 433 delta(0), 434 traceComplete(false) 435 { 436 } 437 438 /** 439 * Called from TraceCPU init(). Reads the first message from the 440 * input trace file and returns the send tick. 441 * 442 * @return Tick when first packet must be sent 443 */ 444 Tick init(); 445 446 /** 447 * This tries to send current or retry packet and returns true if 448 * successfull. It calls nextExecute() to read next message. 449 * 450 * @return bool true if packet is sent successfully 451 */ 452 bool tryNext(); 453 454 /** Returns name of the FixedRetryGen instance. */ 455 const std::string& name() const { return genName; } 456 457 /** 458 * Creates a new request assigning the request parameters passed by the 459 * arguments. Calls the port's sendTimingReq() and returns true if 460 * the packet was sent succesfully. It is called by tryNext() 461 * 462 * @param addr address of request 463 * @param size size of request 464 * @param cmd if it is a read or write request 465 * @param flags associated request flags 466 * @param pc instruction PC that generated the request 467 * 468 * @return true if packet was sent successfully 469 */ 470 bool send(Addr addr, unsigned size, const MemCmd& cmd, 471 Request::FlagsType flags, Addr pc); 472 473 /** Exit the FixedRetryGen. */ 474 void exit(); 475 476 /** 477 * Reads a line of the trace file. Returns the tick 478 * when the next request should be generated. If the end 479 * of the file has been reached, it returns false. 480 * 481 * @return bool false id end of file has been reached 482 */ 483 bool nextExecute(); 484 485 /** 486 * Returns the traceComplete variable which is set when end of the 487 * input trace file is reached. 488 * 489 * @return bool true if traceComplete is set, false otherwise. 490 */ 491 bool isTraceComplete() { return traceComplete; } 492 493 int64_t tickDelta() { return delta; } 494 495 void regStats(); 496 497 private: 498 499 /** Reference of the TraceCPU. */ 500 TraceCPU& owner; 501 502 /** Reference of the port to be used to issue memory requests. */ 503 MasterPort& port; 504 505 /** MasterID used for the requests being sent. */ 506 const MasterID masterID; 507 508 /** Input stream used for reading the input trace file. */ 509 InputStream trace; 510 511 /** String to store the name of the FixedRetryGen. */ 512 std::string genName; 513 514 /** PacketPtr used to store the packet to retry. */ 515 PacketPtr retryPkt; 516 517 /** 518 * Stores the difference in the send ticks of the current and last 519 * packets. Keeping this signed to check overflow to a negative value 520 * which will be caught by assert(delta > 0) 521 */ 522 int64_t delta; 523 524 /** 525 * Set to true when end of trace is reached. 526 */ 527 bool traceComplete; 528 529 /** Store an element read from the trace to send as the next packet. */ 530 TraceElement currElement; 531 532 /** Stats for instruction accesses replayed. */ 533 Stats::Scalar numSendAttempted; 534 Stats::Scalar numSendSucceeded; 535 Stats::Scalar numSendFailed; 536 Stats::Scalar numRetrySucceeded; 537 /** Last simulated tick by the FixedRetryGen */ 538 Stats::Scalar instLastTick; 539 540 }; 541 542 /** 543 * The elastic data memory request generator to read protobuf trace 544 * containing execution trace annotated with data and ordering 545 * dependencies. It deduces the time at which to send a load/store request 546 * by tracking the dependencies. It attempts to send a memory request for a 547 * load/store without performing real execution of micro-ops. If L1 cache 548 * port sends packet succesfully, the generator checks which instructions 549 * became dependency free as a result of this and schedules an event 550 * accordingly. If it fails to send the packet, it waits for a retry from 551 * the cache. 552 */ 553 class ElasticDataGen 554 { 555 556 private: 557 558 /** Node sequence number type. */ 559 typedef uint64_t NodeSeqNum; 560 561 /** Node ROB number type. */ 562 typedef uint64_t NodeRobNum; 563 564 typedef ProtoMessage::InstDepRecord::RecordType RecordType; 565 typedef ProtoMessage::InstDepRecord Record; 566 567 /** 568 * The struct GraphNode stores an instruction in the trace file. The 569 * format of the trace file favours constructing a dependency graph of 570 * the execution and this struct is used to encapsulate the request 571 * data as well as pointers to its dependent GraphNodes. 572 */ 573 class GraphNode { 574 575 public: 576 /** 577 * The maximum no. of ROB dependencies. There can be at most 2 578 * order dependencies which could exist for a store. For a load 579 * and comp node there can be at most one order dependency. 580 */ 581 static const uint8_t maxRobDep = 2; 582 583 /** Typedef for the array containing the ROB dependencies */ 584 typedef std::array<NodeSeqNum, maxRobDep> RobDepArray; 585 586 /** Typedef for the array containing the register dependencies */ 587 typedef std::array<NodeSeqNum, TheISA::MaxInstSrcRegs> RegDepArray; 588 589 /** Instruction sequence number */ 590 NodeSeqNum seqNum; 591 592 /** ROB occupancy number */ 593 NodeRobNum robNum; 594 595 /** Type of the node corresponding to the instruction modelled by it */ 596 RecordType type; 597 598 /** The address for the request if any */ 599 Addr addr; 600 601 /** Size of request if any */ 602 uint32_t size; 603 604 /** Request flags if any */ 605 Request::Flags flags; 606 607 /** Instruction PC */ 608 Addr pc; 609 610 /** Array of order dependencies. */ 611 RobDepArray robDep; 612 613 /** Number of order dependencies */ 614 uint8_t numRobDep; 615 616 /** Computational delay */ 617 uint64_t compDelay; 618 619 /** 620 * Array of register dependencies (incoming) if any. Maximum number 621 * of source registers used to set maximum size of the array 622 */ 623 RegDepArray regDep; 624 625 /** Number of register dependencies */ 626 uint8_t numRegDep; 627 628 /** 629 * A vector of nodes dependent (outgoing) on this node. A 630 * sequential container is chosen because when dependents become 631 * free, they attempt to issue in program order. 632 */ 633 std::vector<GraphNode *> dependents; 634 635 /** Is the node a load */ 636 bool isLoad() const { return (type == Record::LOAD); } 637 638 /** Is the node a store */ 639 bool isStore() const { return (type == Record::STORE); } 640 641 /** Is the node a compute (non load/store) node */ 642 bool isComp() const { return (type == Record::COMP); } 643 644 /** Initialize register dependency array to all zeroes */ 645 void clearRegDep(); 646 647 /** Initialize register dependency array to all zeroes */ 648 void clearRobDep(); 649 650 /** Remove completed instruction from register dependency array */ 651 bool removeRegDep(NodeSeqNum reg_dep); 652 653 /** Remove completed instruction from order dependency array */ 654 bool removeRobDep(NodeSeqNum rob_dep); 655 656 /** Check for all dependencies on completed inst */ 657 bool removeDepOnInst(NodeSeqNum done_seq_num); 658 659 /** Return true if node has a request which is strictly ordered */ 660 bool isStrictlyOrdered() const { 661 return (flags.isSet(Request::STRICT_ORDER)); 662 } 663 /** 664 * Write out element in trace-compatible format using debug flag 665 * TraceCPUData. 666 */ 667 void writeElementAsTrace() const; 668 669 /** Return string specifying the type of the node */ 670 std::string typeToStr() const; 671 }; 672 673 /** Struct to store a ready-to-execute node and its execution tick. */ 674 struct ReadyNode 675 { 676 /** The sequence number of the ready node */ 677 NodeSeqNum seqNum; 678 679 /** The tick at which the ready node must be executed */ 680 Tick execTick; 681 }; 682 683 /** 684 * The HardwareResource class models structures that hold the in-flight 685 * nodes. When a node becomes dependency free, first check if resources 686 * are available to issue it. 687 */ 688 class HardwareResource 689 { 690 public: 691 /** 692 * Constructor that initializes the sizes of the structures. 693 * 694 * @param max_rob size of the Reorder Buffer 695 * @param max_stores size of Store Buffer 696 * @param max_loads size of Load Buffer 697 */ 698 HardwareResource(uint16_t max_rob, uint16_t max_stores, 699 uint16_t max_loads); 700 701 /** 702 * Occupy appropriate structures for an issued node. 703 * 704 * @param node_ptr pointer to the issued node 705 */ 706 void occupy(const GraphNode* new_node); 707 708 /** 709 * Release appropriate structures for a completed node. 710 * 711 * @param node_ptr pointer to the completed node 712 */ 713 void release(const GraphNode* done_node); 714 715 /** Release store buffer entry for a completed store */ 716 void releaseStoreBuffer(); 717 718 /** 719 * Check if structures required to issue a node are free. 720 * 721 * @param node_ptr pointer to the node ready to issue 722 * @return true if resources are available 723 */ 724 bool isAvailable(const GraphNode* new_node) const; 725 726 /** 727 * Check if there are any outstanding requests, i.e. requests for 728 * which we are yet to receive a response. 729 * 730 * @return true if there is at least one read or write request 731 * outstanding 732 */ 733 bool awaitingResponse() const; 734 735 /** Print resource occupancy for debugging */ 736 void printOccupancy(); 737 738 private: 739 /** 740 * The size of the ROB used to throttle the max. number of in-flight 741 * nodes. 742 */ 743 const uint16_t sizeROB; 744 745 /** 746 * The size of store buffer. This is used to throttle the max. number 747 * of in-flight stores. 748 */ 749 const uint16_t sizeStoreBuffer; 750 751 /** 752 * The size of load buffer. This is used to throttle the max. number 753 * of in-flight loads. 754 */ 755 const uint16_t sizeLoadBuffer; 756 757 /** 758 * A map from the sequence number to the ROB number of the in- 759 * flight nodes. This includes all nodes that are in the readyList 760 * plus the loads for which a request has been sent which are not 761 * present in the readyList. But such loads are not yet complete 762 * and thus occupy resources. We need to query the oldest in-flight 763 * node and since a map container keeps all its keys sorted using 764 * the less than criterion, the first element is the in-flight node 765 * with the least sequence number, i.e. the oldest in-flight node. 766 */ 767 std::map<NodeSeqNum, NodeRobNum> inFlightNodes; 768 769 /** The ROB number of the oldest in-flight node */ 770 NodeRobNum oldestInFlightRobNum; 771 772 /** Number of ready loads for which request may or may not be sent */ 773 uint16_t numInFlightLoads; 774 775 /** Number of ready stores for which request may or may not be sent */ 776 uint16_t numInFlightStores; 777 }; 778 779 /** 780 * The InputStream encapsulates a trace file and the 781 * internal buffers and populates GraphNodes based on 782 * the input. 783 */ 784 class InputStream 785 { 786 787 private: 788 789 /** Input file stream for the protobuf trace */ 790 ProtoInputStream trace; 791 792 /** Count of committed ops read from trace plus the filtered ops */ 793 uint64_t microOpCount; 794 795 /** 796 * The window size that is read from the header of the protobuf 797 * trace and used to process the dependency trace 798 */ 799 uint32_t windowSize; 800 public: 801 802 /** 803 * Create a trace input stream for a given file name. 804 * 805 * @param filename Path to the file to read from 806 */ 807 InputStream(const std::string& filename); 808 809 /** 810 * Reset the stream such that it can be played once 811 * again. 812 */ 813 void reset(); 814 815 /** 816 * Attempt to read a trace element from the stream, 817 * and also notify the caller if the end of the file 818 * was reached. 819 * 820 * @param element Trace element to populate 821 * @param size of register dependency array stored in the element 822 * @return True if an element could be read successfully 823 */ 824 bool read(GraphNode* element); 825 826 /** Get window size from trace */ 827 uint32_t getWindowSize() const { return windowSize; } 828 829 /** Get number of micro-ops modelled in the TraceCPU replay */ 830 uint64_t getMicroOpCount() const { return microOpCount; } 831 }; 832 833 public: 834 /* Constructor */ 835 ElasticDataGen(TraceCPU& _owner, const std::string& _name, 836 MasterPort& _port, MasterID master_id, 837 const std::string& trace_file, uint16_t max_rob, 838 uint16_t max_stores, uint16_t max_loads) 839 : owner(_owner), 840 port(_port), 841 masterID(master_id), 842 trace(trace_file), 843 genName(owner.name() + ".elastic" + _name), 844 retryPkt(nullptr), 845 traceComplete(false), 846 nextRead(false), 847 execComplete(false), 848 windowSize(trace.getWindowSize()), 849 hwResource(max_rob, max_stores, max_loads) 850 { 851 DPRINTF(TraceCPUData, "Window size in the trace is %d.\n", 852 windowSize); 853 } 854 855 /** 856 * Called from TraceCPU init(). Reads the first message from the 857 * input trace file and returns the send tick. 858 * 859 * @return Tick when first packet must be sent 860 */ 861 Tick init(); 862 863 /** Returns name of the ElasticDataGen instance. */ 864 const std::string& name() const { return genName; } 865 866 /** Exit the ElasticDataGen. */ 867 void exit(); 868 869 /** 870 * Reads a line of the trace file. Returns the tick when the next 871 * request should be generated. If the end of the file has been 872 * reached, it returns false. 873 * 874 * @return bool false if end of file has been reached else true 875 */ 876 bool readNextWindow(); 877 878 /** 879 * Iterate over the dependencies of a new node and add the new node 880 * to the list of dependents of the parent node. 881 * 882 * @param new_node new node to add to the graph 883 * @tparam dep_array the dependency array of type rob or register, 884 * that is to be iterated, and may get modified 885 * @param num_dep the number of dependencies set in the array 886 * which may get modified during iteration 887 */ 888 template<typename T> void addDepsOnParent(GraphNode *new_node, 889 T& dep_array, 890 uint8_t& num_dep); 891 892 /** 893 * This is the main execute function which consumes nodes from the 894 * sorted readyList. First attempt to issue the pending dependency-free 895 * nodes held in the depFreeQueue. Insert the ready-to-issue nodes into 896 * the readyList. Then iterate through the readyList and when a node 897 * has its execute tick equal to curTick(), execute it. If the node is 898 * a load or a store call executeMemReq() and if it is neither, simply 899 * mark it complete. 900 */ 901 void execute(); 902 903 /** 904 * Creates a new request for a load or store assigning the request 905 * parameters. Calls the port's sendTimingReq() and returns a packet 906 * if the send failed so that it can be saved for a retry. 907 * 908 * @param node_ptr pointer to the load or store node to be executed 909 * 910 * @return packet pointer if the request failed and nullptr if it was 911 * sent successfully 912 */ 913 PacketPtr executeMemReq(GraphNode* node_ptr); 914 915 /** 916 * Add a ready node to the readyList. When inserting, ensure the nodes 917 * are sorted in ascending order of their execute ticks. 918 * 919 * @param seq_num seq. num of ready node 920 * @param exec_tick the execute tick of the ready node 921 */ 922 void addToSortedReadyList(NodeSeqNum seq_num, Tick exec_tick); 923 924 /** Print readyList for debugging using debug flag TraceCPUData. */ 925 void printReadyList(); 926 927 /** 928 * When a load writeback is received, that is when the load completes, 929 * release the dependents on it. This is called from the dcache port 930 * recvTimingResp(). 931 */ 932 void completeMemAccess(PacketPtr pkt); 933 934 /** 935 * Returns the execComplete variable which is set when the last 936 * node is executed. 937 * 938 * @return bool true if execComplete is set, false otherwise. 939 */ 940 bool isExecComplete() const { return execComplete; } 941 942 /** 943 * Attempts to issue a node once the node's source dependencies are 944 * complete. If resources are available then add it to the readyList, 945 * otherwise the node is not issued and is stored in depFreeQueue 946 * until resources become available. 947 * 948 * @param node_ptr pointer to node to be issued 949 * @param first true if this is the first attempt to issue this node 950 * @return true if node was added to readyList 951 */ 952 bool checkAndIssue(const GraphNode* node_ptr, bool first = true); 953 954 /** Get number of micro-ops modelled in the TraceCPU replay */ 955 uint64_t getMicroOpCount() const { return trace.getMicroOpCount(); } 956 957 void regStats(); 958 959 private: 960 961 /** Reference of the TraceCPU. */ 962 TraceCPU& owner; 963 964 /** Reference of the port to be used to issue memory requests. */ 965 MasterPort& port; 966 967 /** MasterID used for the requests being sent. */ 968 const MasterID masterID; 969 970 /** Input stream used for reading the input trace file. */ 971 InputStream trace; 972 973 /** String to store the name of the FixedRetryGen. */ 974 std::string genName; 975 976 /** PacketPtr used to store the packet to retry. */ 977 PacketPtr retryPkt; 978 979 /** Set to true when end of trace is reached. */ 980 bool traceComplete; 981 982 /** Set to true when the next window of instructions need to be read */ 983 bool nextRead; 984 985 /** Set true when execution of trace is complete */ 986 bool execComplete; 987 988 /** 989 * Window size within which to check for dependencies. Its value is 990 * made equal to the window size used to generate the trace which is 991 * recorded in the trace header. The dependency graph must be 992 * populated enough such that when a node completes, its potential 993 * child node must be found and the dependency removed before the 994 * completed node itself is removed. Thus as soon as the graph shrinks 995 * to become smaller than this window, we read in the next window. 996 */ 997 const uint32_t windowSize; 998 999 /** 1000 * Hardware resources required to contain in-flight nodes and to 1001 * throttle issuing of new nodes when resources are not available. 1002 */ 1003 HardwareResource hwResource; 1004 1005 /** Store the depGraph of GraphNodes */ 1006 std::unordered_map<NodeSeqNum, GraphNode*> depGraph; 1007 1008 /** 1009 * Queue of dependency-free nodes that are pending issue because 1010 * resources are not available. This is chosen to be FIFO so that 1011 * dependent nodes which become free in program order get pushed 1012 * into the queue in that order. Thus nodes are more likely to 1013 * issue in program order. 1014 */ 1015 std::queue<const GraphNode*> depFreeQueue; 1016 1017 /** List of nodes that are ready to execute */ 1018 std::list<ReadyNode> readyList; 1019 1020 /** Stats for data memory accesses replayed. */ 1021 Stats::Scalar maxDependents; 1022 Stats::Scalar maxReadyListSize; 1023 Stats::Scalar numSendAttempted; 1024 Stats::Scalar numSendSucceeded; 1025 Stats::Scalar numSendFailed; 1026 Stats::Scalar numRetrySucceeded; 1027 Stats::Scalar numSplitReqs; 1028 Stats::Scalar numSOLoads; 1029 Stats::Scalar numSOStores; 1030 /** Tick when ElasticDataGen completes execution */ 1031 Stats::Scalar dataLastTick; 1032 }; 1033 1034 /** Instance of FixedRetryGen to replay instruction read requests. */ 1035 FixedRetryGen icacheGen; 1036 1037 /** Instance of ElasticDataGen to replay data read and write requests. */ 1038 ElasticDataGen dcacheGen; 1039 1040 /** 1041 * This is the control flow that uses the functionality of the icacheGen to 1042 * replay the trace. It calls tryNext(). If it returns true then next event 1043 * is scheduled at curTick() plus delta. If it returns false then delta is 1044 * ignored and control is brought back via recvRetry(). 1045 */ 1046 void schedIcacheNext(); 1047 1048 /** 1049 * This is the control flow that uses the functionality of the dcacheGen to 1050 * replay the trace. It calls execute(). It checks if execution is complete 1051 * and schedules an event to exit simulation accordingly. 1052 */ 1053 void schedDcacheNext(); 1054 1055 /** Event for the control flow method schedIcacheNext() */ 1056 EventWrapper<TraceCPU, &TraceCPU::schedIcacheNext> icacheNextEvent; 1057 1058 /** Event for the control flow method schedDcacheNext() */ 1059 EventWrapper<TraceCPU, &TraceCPU::schedDcacheNext> dcacheNextEvent; 1060 1061 /** This is called when either generator finishes executing from the trace */ 1062 void checkAndSchedExitEvent(); 1063 1064 /** Set to true when one of the generators finishes replaying its trace. */ 1065 bool oneTraceComplete; 1066 1067 /** 1068 * This is stores the tick of the first instruction fetch request 1069 * which is later used for dumping the tickOffset stat. 1070 */ 1071 Tick firstFetchTick; 1072 1073 /** 1074 * Number of Trace CPUs in the system used as a shared variable and passed 1075 * to the CountedExitEvent event used for counting down exit events. It is 1076 * incremented in the constructor call so that the total is arrived at 1077 * automatically. 1078 */ 1079 static int numTraceCPUs; 1080 1081 /** 1082 * A CountedExitEvent which when serviced decrements the counter. A sim 1083 * exit event is scheduled when the counter equals zero, that is all 1084 * instances of Trace CPU have had their execCompleteEvent serviced. 1085 */ 1086 CountedExitEvent *execCompleteEvent; 1087 1088 Stats::Scalar numSchedDcacheEvent; 1089 Stats::Scalar numSchedIcacheEvent; 1090 1091 /** Stat for number of simulated micro-ops. */ 1092 Stats::Scalar numOps; 1093 /** Stat for the CPI. This is really cycles per micro-op and not inst. */ 1094 Stats::Formula cpi; 1095 1096 /** 1097 * The first execution tick is dumped as a stat so that the simulated 1098 * seconds for a trace replay can be calculated as a difference between the 1099 * final_tick stat and the tickOffset stat 1100 */ 1101 Stats::Scalar tickOffset; 1102 1103 public: 1104 1105 /** Used to get a reference to the icache port. */ 1106 MasterPort &getInstPort() { return icachePort; } 1107 1108 /** Used to get a reference to the dcache port. */ 1109 MasterPort &getDataPort() { return dcachePort; } 1110 1111 void regStats(); 1112}; 1113#endif // __CPU_TRACE_TRACE_CPU_HH__ 1114