README revision 10152
12SN/A __ __ ____ _ _____ ____ _ 24039Sbinkertn@umich.edu| \/ | ___| _ \ / \|_ _| | __ ) ___| |_ __ _ 32SN/A| |\/| |/ __| |_) / _ \ | | | _ \ / _ \ __|/ _` | 42SN/A| | | | (__| __/ ___ \| | | |_) | __/ |_| (_| | 52SN/A|_| |_|\___|_| /_/ \_\_| |____/ \___|\__|\__,_| 62SN/A 72SN/AMcPAT: Multicore Power, Area, and Timing 82SN/ACurrent version 0.8Beta 92SN/A=============================== 102SN/A 112SN/AMcPAT is an architectural modeling tool for chip multiprocessors (CMP) 122SN/AThe main focus of McPAT is accurate power and area 132SN/Amodeling, and a target clock rate is used as a design constraint. 142SN/AMcPAT performs automatic extensive search to find optimal designs 152SN/Athat satisfy the target clock frequency. 162SN/A 172SN/AFor complete documentation of the McPAT, please refer McPAT 1.0 182SN/Atechnical report and the following paper, 192SN/A"McPAT: An Integrated Power, Area, and Timing Modeling 202SN/A Framework for Multicore and Manycore Architectures", 212SN/Athat appears in MICRO 2009. Please cite the paper, if you use 222SN/AMcPAT in your work. The bibtex entry is provided below for your convenience. 232SN/A 242SN/A @inproceedings{mcpat:micro, 252SN/A author = {Sheng Li and Jung Ho Ahn and Richard D. Strong and Jay B. Brockman and Dean M. Tullsen and Norman P. Jouppi}, 262SN/A title = "{McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures}", 272665Ssaidi@eecs.umich.edu booktitle = {MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture}, 282665Ssaidi@eecs.umich.edu year = {2009}, 292665Ssaidi@eecs.umich.edu pages = {469--480}, 302SN/A } 312SN/A 321354SN/ACurrent McPAT is in its beta release. 331354SN/AList of features of beta release 342SN/A=============================== 352SN/AThe following are the list of features supported by the tool. 362SN/A 3756SN/A* Power, area, and timing models for CMPs with: 381031SN/A Inorder cores both single and multithreaded 3956SN/A OOO cores both single and multithreaded 401696SN/A Shared/coherent caches with directory hardware: 412SN/A including directory cache, shadowed tag directory 42699SN/A and static bank mapped tag directory 432SN/A Network-on-Chip 442SN/A On-chip memory controllers 452SN/A 462SN/A* Internal models are based on real modern processors: 472SN/A Inorder models are based on Sun Niagara family 482SN/A OOO models are based on Intel P6 for reservation 492SN/A station based OOO cores, and on Intel Netburst and 504042Sbinkertn@umich.edu Alpha 21264 for physical register file based OOO cores. 512SN/A 522SN/A* Leakage power modeling considers both sub-threshold leakage 532SN/A and gate leakage power. The impact of operating temperature 542SN/A on both leakage power are considered. Longer channel devices 552SN/A that can reduce leakage significantly with modest performance 564042Sbinkertn@umich.edu penalty are also modeled. 572SN/A 584042Sbinkertn@umich.edu* McPAT supports automatic extensive search to find optimal designs 592SN/A that satisfy the target clock frequency. The timing constraint 602SN/A include both throughput and latency. 612SN/A 622SN/A* Interconnect model with different delay, power, and area 632SN/A properties, as well as both the aggressive and conservative 642SN/A interconnect projections on wire technologies. 651030SN/A 662SN/A* All process specific values used by the McPAT are obtained 672SN/A from ITRS and currently, the McPAT supports 90nm, 65nm, 45nm, 682SN/A 32nm, and 22nm technology nodes. At 32nm and 22nm nodes, SOI 692SN/A and DG devices are used. After 45nm, Hi-K metal gates are used. 702SN/A 712SN/AHow to use the tool? 722SN/A==================== 732SN/A 742SN/AMcPAT takes input parameters from an XML-based interface, 752SN/Athen it computes area and peak power of the 762SN/APlease note that the peak power is the absolute worst case power, 772SN/Awhich could be even higher than TDP. 782SN/A 792SN/A1. Steps to run McPAT: 802SN/A -> define the target processor using inorder.xml or OOO.xml 814039Sbinkertn@umich.edu -> run the "mcpat" binary: 822SN/A ./mcpat -infile <*.xml> -print_level < level of detailed output> 834039Sbinkertn@umich.edu ./mcpat -h (or mcpat --help) will show the quick help message. 842SN/A 852SN/A Rather than being hardwired to certain simulators, McPAT 864039Sbinkertn@umich.edu uses an XML-based interface to enable easy integration 874039Sbinkertn@umich.edu with various performance simulators. Our collaborator, 884039Sbinkertn@umich.edu Richard Strong, at University of California, San Diego, 894039Sbinkertn@umich.edu designed an experimental parser for the M5 simulator, aiming for 902SN/A streamlining the integration of McPAT and M5. Please check the M5 912SN/A repository/ for the latest version of the parser. 922SN/A 932SN/A2. Optimize: 942SN/A McPAT will try its best to satisfy the target clock rate. 952SN/A When it cannot find a valid solution, it gives out warnings, 962SN/A while still giving a solution that is closest to the timing 972SN/A constraints and calculate power based on it. The optimization 981030SN/A will lead to larger power/area numbers for target higher clock 992SN/A rate. McPAT also provides the option "-opt_for_clk" to turn on 1002SN/A ("-opt_for_clk 1") and off this strict optimization for the 1011030SN/A timing constraint. When it is off, McPAT always optimize 1022SN/A component for ED^2P without worrying about meeting the 1032SN/A target clock frequency. By turning it off, the computation time 1042SN/A can be reduced, which suites for situations where target clock rate 1052SN/A is conservative. 1061030SN/A 1071030SN/A3. The output: 1081030SN/A McPAT outputs results in a hierarchical manner. Increasing 1092SN/A the "-print_level" will show detailed results inside each 1102SN/A component. For each component, major parts are shown, and associated 1112SN/A pipeline registers/control logic are added up in total area/power of each 1122SN/A components. In general, McPAT does not model the area/overhead of the pad 1132SN/A frame used in a processor die. 1142SN/A 1152SN/A4. How to use the XML interface for McPAT 1162SN/A 4.1 Set up the parameters 1172SN/A Parameters of target designs need to be set in the *.xml file for 1182SN/A entries taged as "param". McPAT have very detailed parameter settings. 1192SN/A please remove the structure parameter from the file if you want 1202SN/A to use the default values. Otherwise, the parameters in the xml file 1212SN/A will override the default values. 1222SN/A 1232SN/A 4.2 Pass the statistics 1242SN/A There are two options to get the correct stats: a) the performance 1252SN/A simulator can capture all the stats in detail and pass them to McPAT; 1262SN/A b). Performance simulator can only capture partial stats and pass 1272SN/A them to McPAT, while McPAT can reason about the complete stats using 1282SN/A the partial information and the configuration. Therefore, there are 1292SN/A some overlap for the stats. 1302SN/A 1312SN/A 4.3 Interface XML file structures (PLEASE READ!) 1322SN/A The XML is hierarchical from processor level to micro-architecture 1331031SN/A level. McPAT support both heterogeneous and homogeneous manycore processors. 1342SN/A 1352SN/A 1). For heterogeneous processor setup, each component (core, NoC, cache, 1364039Sbinkertn@umich.edu and etc) must have its own instantiations (core0, core1, ..., coreN). 1374039Sbinkertn@umich.edu Each instantiation will have different parameters as well as its stats. 1382SN/A Thus, the XML file must have multiple "instantiation" of each type of 1394039Sbinkertn@umich.edu heterogeneous components and the corresponding hetero flags must be set 1404039Sbinkertn@umich.edu in the XML file. Then state in the XML should be the stats of "a" instantiation 1414039Sbinkertn@umich.edu (e.g. "a" cores). The reported runtime dynamic is of a single instantiation 1424039Sbinkertn@umich.edu (e.g. "a" cores). Since the stats for each (e.g. "a" cores) may be different, 1434039Sbinkertn@umich.edu we will see a whole list of (e.g. "a" cores) with different dynamic power, 1442SN/A and total power is just a sum of them. 1452SN/A 1462SN/A 2). For homogeneous processors, the same method for heterogeneous can 1474039Sbinkertn@umich.edu also be used by treating all homogeneous instantiations as heterogeneous. 1482SN/A However, a preferred approach is to use a single representative for all 1494039Sbinkertn@umich.edu the same components (e.g. core0 to represent all cores) and set the 1502SN/A processor to have homogeneous components (e.g. <param name="homogeneous_cores 1512SN/A " value="1"/> ). Thus, the XML file only has one instantiation to represent 1522SN/A all others with the same architectural parameters. The corresponding homo 1534039Sbinkertn@umich.edu flags must be set in the XML file. Then, the stats in the XML should be 1542SN/A the aggregated stats of the sum of all instantiations (e.g. aggregated stats 1552SN/A of all cores). In the final results, McPAT will only report a single 1564042Sbinkertn@umich.edu instantiation of each type of component, and the reported runtime dynamic power 1574042Sbinkertn@umich.edu is the sum of all instantiations of the same type. This approach can run fast 1584042Sbinkertn@umich.edu and use much less memory. 1594042Sbinkertn@umich.edu 1604042Sbinkertn@umich.edu5. Guide for integrating McPAT into performance simulators and bypassing the XML interface 1614039Sbinkertn@umich.edu The detailed work flow of McPAT has two phases: the initialization phase and 1621070SN/A the computation phase. Specifically, in order to start the initialization phase a 1631070SN/A user specifies static configurations, including parameters at all three levels, 1641070SN/A namely, architectural, circuit, and technology levels. During the initialization 1651070SN/A phase, McPAT will generate the internal chip representation using the configurations 1661070SN/A set by the user. 1671070SN/A The computation phase of McPAT is called by McPAT or the performance simulator 1681070SN/A during simulation to generate runtime power numbers. Before calling McPAT to 1691070SN/A compute runtime power numbers, the performance simulator needs to pass the 1701070SN/A statistics, namely, the activity factors of each individual components to McPAT 1712SN/A via the XML interface. 1722SN/A The initialization phase is very time-consuming, since it will repeat many 1732SN/A times until valid configurations are found or the possible configurations are 1742SN/A exhausted. To reduce the overhead, a user can let the simulator to call McPAT 1752SN/A directly for computation phase and only call initialization phase once at the 1762SN/A beginning of simulation. In this case, the XML interface file is bypassed, 1772SN/A please refer to processor.cc to see how the two phases are called. 1782SN/A 1792SN/A6. Sample input files: 1802SN/A This package provide sample XML files for validating target processors. Please find the 1812SN/A enclosed Niagara1.xml (for the Sun Niagara1 processor), Niagara2.xml (for the Sun Niagara2 1822SN/A processor), Alpha21364.xml (for the Alpha21364 processor), and Xeon.xml (for the Intel 1832SN/A Xeon Tulsa processor). 1842SN/A 1854042Sbinkertn@umich.edu Special instructions for using Xeon.xml: 1862SN/A McPAT uses ITRS device types including HP, LSTP, and LOP. Although most 1874041Sbinkertn@umich.edu designs follow ITRS projections, there are designs with special technologies. 1884041Sbinkertn@umich.edu For example, the 65nm Xeon Tulsa processor uses 1.25 V rather than 1.1V 1894041Sbinkertn@umich.edu for the core voltage domain, which results in the changes in threshold voltage, 1902SN/A leakage current density, saturation current, and etc, besides the different 1912SN/A supply voltage. We use MASTAR to match the special technology as used in Xeon 1924041Sbinkertn@umich.edu core domain. Therefore, in order to generate accurate results of Xeon 1934041Sbinkertn@umich.edu Tulsa cores, users need to do make TAR=mcpatXeonCore and use the generated 1944041Sbinkertn@umich.edu special executable. The L3 cache and buses must be computed using standard 1952SN/A ITRS technology. 1962SN/A 1974041Sbinkertn@umich.edu 1984041Sbinkertn@umich.edu==================== 1994041Sbinkertn@umich.eduMcPAT is in its beginning stage. We are still improving 2002SN/Athe tool and refining the code. Please come back to its website 2012SN/Afor newer versions. If you have any comments, 2024041Sbinkertn@umich.eduquestions, or suggestions, please write to us. 2034041Sbinkertn@umich.edu 2042SN/AVersion history and roadmap 2052SN/A 2064041Sbinkertn@umich.eduMcPAT Alpha: released Sep. 2009 Experimental release 2074041Sbinkertn@umich.eduMcPAT Beta (0.6): released Nov. 2009 New code base and technology base 208507SN/AMcPAT Beta (0.7): released May. 2010 Added various new models, 209507SN/A including long channel devices, buses model; together 2102SN/A with bug fixes and extensive code optimization to reduce 2112SN/A memory usage. 2122SN/AMcPAT Beta (0.8): released Aug. 2010 Added various new models, 2134041Sbinkertn@umich.edu including on-chip 10Gb ethernet units, PCIe, and flash controllers. 2144041Sbinkertn@umich.eduNext major release: 2154041Sbinkertn@umich.eduMcPAT 1.0: including advance power-saving states 2164041Sbinkertn@umich.edu 217105SN/AFuture releases may include the modeling of embedded low-power 2182SN/Aprocessors as well as vector processors and GPGPUs. 2192SN/A 2202SN/A 2211354SN/ASheng Li 222sheng.li@hp.com 223 224 225 226 227