README revision 10152
110152Satgutier@umich.edu __ __ ____ _ _____ ____ _ 210152Satgutier@umich.edu| \/ | ___| _ \ / \|_ _| | __ ) ___| |_ __ _ 310152Satgutier@umich.edu| |\/| |/ __| |_) / _ \ | | | _ \ / _ \ __|/ _` | 410152Satgutier@umich.edu| | | | (__| __/ ___ \| | | |_) | __/ |_| (_| | 510152Satgutier@umich.edu|_| |_|\___|_| /_/ \_\_| |____/ \___|\__|\__,_| 610152Satgutier@umich.edu 710152Satgutier@umich.eduMcPAT: Multicore Power, Area, and Timing 810152Satgutier@umich.eduCurrent version 0.8Beta 910152Satgutier@umich.edu=============================== 1010152Satgutier@umich.edu 1110152Satgutier@umich.eduMcPAT is an architectural modeling tool for chip multiprocessors (CMP) 1210152Satgutier@umich.eduThe main focus of McPAT is accurate power and area 1310152Satgutier@umich.edumodeling, and a target clock rate is used as a design constraint. 1410152Satgutier@umich.eduMcPAT performs automatic extensive search to find optimal designs 1510152Satgutier@umich.eduthat satisfy the target clock frequency. 1610152Satgutier@umich.edu 1710152Satgutier@umich.eduFor complete documentation of the McPAT, please refer McPAT 1.0 1810152Satgutier@umich.edutechnical report and the following paper, 1910152Satgutier@umich.edu"McPAT: An Integrated Power, Area, and Timing Modeling 2010152Satgutier@umich.edu Framework for Multicore and Manycore Architectures", 2110152Satgutier@umich.eduthat appears in MICRO 2009. Please cite the paper, if you use 2210152Satgutier@umich.eduMcPAT in your work. The bibtex entry is provided below for your convenience. 2310152Satgutier@umich.edu 2410152Satgutier@umich.edu @inproceedings{mcpat:micro, 2510152Satgutier@umich.edu author = {Sheng Li and Jung Ho Ahn and Richard D. Strong and Jay B. Brockman and Dean M. Tullsen and Norman P. Jouppi}, 2610152Satgutier@umich.edu title = "{McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures}", 2710152Satgutier@umich.edu booktitle = {MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture}, 2810152Satgutier@umich.edu year = {2009}, 2910152Satgutier@umich.edu pages = {469--480}, 3010152Satgutier@umich.edu } 3110152Satgutier@umich.edu 3210152Satgutier@umich.eduCurrent McPAT is in its beta release. 3310152Satgutier@umich.eduList of features of beta release 3410152Satgutier@umich.edu=============================== 3510152Satgutier@umich.eduThe following are the list of features supported by the tool. 3610152Satgutier@umich.edu 3710152Satgutier@umich.edu* Power, area, and timing models for CMPs with: 3810152Satgutier@umich.edu Inorder cores both single and multithreaded 3910152Satgutier@umich.edu OOO cores both single and multithreaded 4010152Satgutier@umich.edu Shared/coherent caches with directory hardware: 4110152Satgutier@umich.edu including directory cache, shadowed tag directory 4210152Satgutier@umich.edu and static bank mapped tag directory 4310152Satgutier@umich.edu Network-on-Chip 4410152Satgutier@umich.edu On-chip memory controllers 4510152Satgutier@umich.edu 4610152Satgutier@umich.edu* Internal models are based on real modern processors: 4710152Satgutier@umich.edu Inorder models are based on Sun Niagara family 4810152Satgutier@umich.edu OOO models are based on Intel P6 for reservation 4910152Satgutier@umich.edu station based OOO cores, and on Intel Netburst and 5010152Satgutier@umich.edu Alpha 21264 for physical register file based OOO cores. 5110152Satgutier@umich.edu 5210152Satgutier@umich.edu* Leakage power modeling considers both sub-threshold leakage 5310152Satgutier@umich.edu and gate leakage power. The impact of operating temperature 5410152Satgutier@umich.edu on both leakage power are considered. Longer channel devices 5510152Satgutier@umich.edu that can reduce leakage significantly with modest performance 5610152Satgutier@umich.edu penalty are also modeled. 5710152Satgutier@umich.edu 5810152Satgutier@umich.edu* McPAT supports automatic extensive search to find optimal designs 5910152Satgutier@umich.edu that satisfy the target clock frequency. The timing constraint 6010152Satgutier@umich.edu include both throughput and latency. 6110152Satgutier@umich.edu 6210152Satgutier@umich.edu* Interconnect model with different delay, power, and area 6310152Satgutier@umich.edu properties, as well as both the aggressive and conservative 6410152Satgutier@umich.edu interconnect projections on wire technologies. 6510152Satgutier@umich.edu 6610152Satgutier@umich.edu* All process specific values used by the McPAT are obtained 6710152Satgutier@umich.edu from ITRS and currently, the McPAT supports 90nm, 65nm, 45nm, 6810152Satgutier@umich.edu 32nm, and 22nm technology nodes. At 32nm and 22nm nodes, SOI 6910152Satgutier@umich.edu and DG devices are used. After 45nm, Hi-K metal gates are used. 7010152Satgutier@umich.edu 7110152Satgutier@umich.eduHow to use the tool? 7210152Satgutier@umich.edu==================== 7310152Satgutier@umich.edu 7410152Satgutier@umich.eduMcPAT takes input parameters from an XML-based interface, 7510152Satgutier@umich.eduthen it computes area and peak power of the 7610152Satgutier@umich.eduPlease note that the peak power is the absolute worst case power, 7710152Satgutier@umich.eduwhich could be even higher than TDP. 7810152Satgutier@umich.edu 7910152Satgutier@umich.edu1. Steps to run McPAT: 8010152Satgutier@umich.edu -> define the target processor using inorder.xml or OOO.xml 8110152Satgutier@umich.edu -> run the "mcpat" binary: 8210152Satgutier@umich.edu ./mcpat -infile <*.xml> -print_level < level of detailed output> 8310152Satgutier@umich.edu ./mcpat -h (or mcpat --help) will show the quick help message. 8410152Satgutier@umich.edu 8510152Satgutier@umich.edu Rather than being hardwired to certain simulators, McPAT 8610152Satgutier@umich.edu uses an XML-based interface to enable easy integration 8710152Satgutier@umich.edu with various performance simulators. Our collaborator, 8810152Satgutier@umich.edu Richard Strong, at University of California, San Diego, 8910152Satgutier@umich.edu designed an experimental parser for the M5 simulator, aiming for 9010152Satgutier@umich.edu streamlining the integration of McPAT and M5. Please check the M5 9110152Satgutier@umich.edu repository/ for the latest version of the parser. 9210152Satgutier@umich.edu 9310152Satgutier@umich.edu2. Optimize: 9410152Satgutier@umich.edu McPAT will try its best to satisfy the target clock rate. 9510152Satgutier@umich.edu When it cannot find a valid solution, it gives out warnings, 9610152Satgutier@umich.edu while still giving a solution that is closest to the timing 9710152Satgutier@umich.edu constraints and calculate power based on it. The optimization 9810152Satgutier@umich.edu will lead to larger power/area numbers for target higher clock 9910152Satgutier@umich.edu rate. McPAT also provides the option "-opt_for_clk" to turn on 10010152Satgutier@umich.edu ("-opt_for_clk 1") and off this strict optimization for the 10110152Satgutier@umich.edu timing constraint. When it is off, McPAT always optimize 10210152Satgutier@umich.edu component for ED^2P without worrying about meeting the 10310152Satgutier@umich.edu target clock frequency. By turning it off, the computation time 10410152Satgutier@umich.edu can be reduced, which suites for situations where target clock rate 10510152Satgutier@umich.edu is conservative. 10610152Satgutier@umich.edu 10710152Satgutier@umich.edu3. The output: 10810152Satgutier@umich.edu McPAT outputs results in a hierarchical manner. Increasing 10910152Satgutier@umich.edu the "-print_level" will show detailed results inside each 11010152Satgutier@umich.edu component. For each component, major parts are shown, and associated 11110152Satgutier@umich.edu pipeline registers/control logic are added up in total area/power of each 11210152Satgutier@umich.edu components. In general, McPAT does not model the area/overhead of the pad 11310152Satgutier@umich.edu frame used in a processor die. 11410152Satgutier@umich.edu 11510152Satgutier@umich.edu4. How to use the XML interface for McPAT 11610152Satgutier@umich.edu 4.1 Set up the parameters 11710152Satgutier@umich.edu Parameters of target designs need to be set in the *.xml file for 11810152Satgutier@umich.edu entries taged as "param". McPAT have very detailed parameter settings. 11910152Satgutier@umich.edu please remove the structure parameter from the file if you want 12010152Satgutier@umich.edu to use the default values. Otherwise, the parameters in the xml file 12110152Satgutier@umich.edu will override the default values. 12210152Satgutier@umich.edu 12310152Satgutier@umich.edu 4.2 Pass the statistics 12410152Satgutier@umich.edu There are two options to get the correct stats: a) the performance 12510152Satgutier@umich.edu simulator can capture all the stats in detail and pass them to McPAT; 12610152Satgutier@umich.edu b). Performance simulator can only capture partial stats and pass 12710152Satgutier@umich.edu them to McPAT, while McPAT can reason about the complete stats using 12810152Satgutier@umich.edu the partial information and the configuration. Therefore, there are 12910152Satgutier@umich.edu some overlap for the stats. 13010152Satgutier@umich.edu 13110152Satgutier@umich.edu 4.3 Interface XML file structures (PLEASE READ!) 13210152Satgutier@umich.edu The XML is hierarchical from processor level to micro-architecture 13310152Satgutier@umich.edu level. McPAT support both heterogeneous and homogeneous manycore processors. 13410152Satgutier@umich.edu 13510152Satgutier@umich.edu 1). For heterogeneous processor setup, each component (core, NoC, cache, 13610152Satgutier@umich.edu and etc) must have its own instantiations (core0, core1, ..., coreN). 13710152Satgutier@umich.edu Each instantiation will have different parameters as well as its stats. 13810152Satgutier@umich.edu Thus, the XML file must have multiple "instantiation" of each type of 13910152Satgutier@umich.edu heterogeneous components and the corresponding hetero flags must be set 14010152Satgutier@umich.edu in the XML file. Then state in the XML should be the stats of "a" instantiation 14110152Satgutier@umich.edu (e.g. "a" cores). The reported runtime dynamic is of a single instantiation 14210152Satgutier@umich.edu (e.g. "a" cores). Since the stats for each (e.g. "a" cores) may be different, 14310152Satgutier@umich.edu we will see a whole list of (e.g. "a" cores) with different dynamic power, 14410152Satgutier@umich.edu and total power is just a sum of them. 14510152Satgutier@umich.edu 14610152Satgutier@umich.edu 2). For homogeneous processors, the same method for heterogeneous can 14710152Satgutier@umich.edu also be used by treating all homogeneous instantiations as heterogeneous. 14810152Satgutier@umich.edu However, a preferred approach is to use a single representative for all 14910152Satgutier@umich.edu the same components (e.g. core0 to represent all cores) and set the 15010152Satgutier@umich.edu processor to have homogeneous components (e.g. <param name="homogeneous_cores 15110152Satgutier@umich.edu " value="1"/> ). Thus, the XML file only has one instantiation to represent 15210152Satgutier@umich.edu all others with the same architectural parameters. The corresponding homo 15310152Satgutier@umich.edu flags must be set in the XML file. Then, the stats in the XML should be 15410152Satgutier@umich.edu the aggregated stats of the sum of all instantiations (e.g. aggregated stats 15510152Satgutier@umich.edu of all cores). In the final results, McPAT will only report a single 15610152Satgutier@umich.edu instantiation of each type of component, and the reported runtime dynamic power 15710152Satgutier@umich.edu is the sum of all instantiations of the same type. This approach can run fast 15810152Satgutier@umich.edu and use much less memory. 15910152Satgutier@umich.edu 16010152Satgutier@umich.edu5. Guide for integrating McPAT into performance simulators and bypassing the XML interface 16110152Satgutier@umich.edu The detailed work flow of McPAT has two phases: the initialization phase and 16210152Satgutier@umich.edu the computation phase. Specifically, in order to start the initialization phase a 16310152Satgutier@umich.edu user specifies static configurations, including parameters at all three levels, 16410152Satgutier@umich.edu namely, architectural, circuit, and technology levels. During the initialization 16510152Satgutier@umich.edu phase, McPAT will generate the internal chip representation using the configurations 16610152Satgutier@umich.edu set by the user. 16710152Satgutier@umich.edu The computation phase of McPAT is called by McPAT or the performance simulator 16810152Satgutier@umich.edu during simulation to generate runtime power numbers. Before calling McPAT to 16910152Satgutier@umich.edu compute runtime power numbers, the performance simulator needs to pass the 17010152Satgutier@umich.edu statistics, namely, the activity factors of each individual components to McPAT 17110152Satgutier@umich.edu via the XML interface. 17210152Satgutier@umich.edu The initialization phase is very time-consuming, since it will repeat many 17310152Satgutier@umich.edu times until valid configurations are found or the possible configurations are 17410152Satgutier@umich.edu exhausted. To reduce the overhead, a user can let the simulator to call McPAT 17510152Satgutier@umich.edu directly for computation phase and only call initialization phase once at the 17610152Satgutier@umich.edu beginning of simulation. In this case, the XML interface file is bypassed, 17710152Satgutier@umich.edu please refer to processor.cc to see how the two phases are called. 17810152Satgutier@umich.edu 17910152Satgutier@umich.edu6. Sample input files: 18010152Satgutier@umich.edu This package provide sample XML files for validating target processors. Please find the 18110152Satgutier@umich.edu enclosed Niagara1.xml (for the Sun Niagara1 processor), Niagara2.xml (for the Sun Niagara2 18210152Satgutier@umich.edu processor), Alpha21364.xml (for the Alpha21364 processor), and Xeon.xml (for the Intel 18310152Satgutier@umich.edu Xeon Tulsa processor). 18410152Satgutier@umich.edu 18510152Satgutier@umich.edu Special instructions for using Xeon.xml: 18610152Satgutier@umich.edu McPAT uses ITRS device types including HP, LSTP, and LOP. Although most 18710152Satgutier@umich.edu designs follow ITRS projections, there are designs with special technologies. 18810152Satgutier@umich.edu For example, the 65nm Xeon Tulsa processor uses 1.25 V rather than 1.1V 18910152Satgutier@umich.edu for the core voltage domain, which results in the changes in threshold voltage, 19010152Satgutier@umich.edu leakage current density, saturation current, and etc, besides the different 19110152Satgutier@umich.edu supply voltage. We use MASTAR to match the special technology as used in Xeon 19210152Satgutier@umich.edu core domain. Therefore, in order to generate accurate results of Xeon 19310152Satgutier@umich.edu Tulsa cores, users need to do make TAR=mcpatXeonCore and use the generated 19410152Satgutier@umich.edu special executable. The L3 cache and buses must be computed using standard 19510152Satgutier@umich.edu ITRS technology. 19610152Satgutier@umich.edu 19710152Satgutier@umich.edu 19810152Satgutier@umich.edu==================== 19910152Satgutier@umich.eduMcPAT is in its beginning stage. We are still improving 20010152Satgutier@umich.eduthe tool and refining the code. Please come back to its website 20110152Satgutier@umich.edufor newer versions. If you have any comments, 20210152Satgutier@umich.eduquestions, or suggestions, please write to us. 20310152Satgutier@umich.edu 20410152Satgutier@umich.eduVersion history and roadmap 20510152Satgutier@umich.edu 20610152Satgutier@umich.eduMcPAT Alpha: released Sep. 2009 Experimental release 20710152Satgutier@umich.eduMcPAT Beta (0.6): released Nov. 2009 New code base and technology base 20810152Satgutier@umich.eduMcPAT Beta (0.7): released May. 2010 Added various new models, 20910152Satgutier@umich.edu including long channel devices, buses model; together 21010152Satgutier@umich.edu with bug fixes and extensive code optimization to reduce 21110152Satgutier@umich.edu memory usage. 21210152Satgutier@umich.eduMcPAT Beta (0.8): released Aug. 2010 Added various new models, 21310152Satgutier@umich.edu including on-chip 10Gb ethernet units, PCIe, and flash controllers. 21410152Satgutier@umich.eduNext major release: 21510152Satgutier@umich.eduMcPAT 1.0: including advance power-saving states 21610152Satgutier@umich.edu 21710152Satgutier@umich.eduFuture releases may include the modeling of embedded low-power 21810152Satgutier@umich.eduprocessors as well as vector processors and GPGPUs. 21910152Satgutier@umich.edu 22010152Satgutier@umich.edu 22110152Satgutier@umich.eduSheng Li 22210152Satgutier@umich.edusheng.li@hp.com 22310152Satgutier@umich.edu 22410152Satgutier@umich.edu 22510152Satgutier@umich.edu 22610152Satgutier@umich.edu 227