110152Satgutier@umich.edu __  __      ____   _  _____   ____       _         
210152Satgutier@umich.edu|  \/  | ___|  _ \ / \|_   _| | __ )  ___| |_  __ _ 
310152Satgutier@umich.edu| |\/| |/ __| |_) / _ \ | |   |  _ \ / _ \ __|/ _` |
410152Satgutier@umich.edu| |  | | (__|  __/ ___ \| |   | |_) |  __/ |_| (_| |
510152Satgutier@umich.edu|_|  |_|\___|_| /_/   \_\_|   |____/ \___|\__|\__,_|
610152Satgutier@umich.edu
710152Satgutier@umich.eduMcPAT: Multicore Power, Area, and Timing
810152Satgutier@umich.eduCurrent version 0.8Beta 
910152Satgutier@umich.edu===============================
1010152Satgutier@umich.edu
1110152Satgutier@umich.eduMcPAT is an architectural modeling tool for chip multiprocessors (CMP)
1210152Satgutier@umich.eduThe main focus of McPAT is accurate power and area
1310152Satgutier@umich.edumodeling, and a target clock rate is used as a design constraint. 
1410152Satgutier@umich.eduMcPAT performs automatic extensive search to find optimal designs 
1510152Satgutier@umich.eduthat satisfy the target clock frequency.  
1610152Satgutier@umich.edu
1710152Satgutier@umich.eduFor complete documentation of the McPAT, please refer McPAT 1.0
1810152Satgutier@umich.edutechnical report and the following paper,
1910152Satgutier@umich.edu"McPAT: An Integrated Power, Area, and Timing Modeling
2010152Satgutier@umich.edu Framework for Multicore and Manycore Architectures", 
2110152Satgutier@umich.eduthat appears in MICRO 2009. Please cite the paper, if you use
2210152Satgutier@umich.eduMcPAT in your work. The bibtex entry is provided below for your convenience.
2310152Satgutier@umich.edu
2410152Satgutier@umich.edu @inproceedings{mcpat:micro,
2510152Satgutier@umich.edu author = {Sheng Li and Jung Ho Ahn and Richard D. Strong and Jay B. Brockman and Dean M. Tullsen and Norman P. Jouppi},
2610152Satgutier@umich.edu title =  "{McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures}",
2710152Satgutier@umich.edu booktitle = {MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture},
2810152Satgutier@umich.edu year = {2009},
2910152Satgutier@umich.edu pages = {469--480},
3010152Satgutier@umich.edu }
3110152Satgutier@umich.edu
3210152Satgutier@umich.eduCurrent McPAT is in its beta release. 
3310152Satgutier@umich.eduList of features of beta release
3410152Satgutier@umich.edu===============================
3510152Satgutier@umich.eduThe following are the list of features supported by the tool. 
3610152Satgutier@umich.edu
3710152Satgutier@umich.edu* Power, area, and timing models for CMPs with:
3810152Satgutier@umich.edu      Inorder cores both single and multithreaded
3910152Satgutier@umich.edu      OOO cores both single and multithreaded
4010152Satgutier@umich.edu      Shared/coherent caches with directory hardware:
4110152Satgutier@umich.edu      	including directory cache, shadowed tag directory
4210152Satgutier@umich.edu      	and static bank mapped tag directory
4310152Satgutier@umich.edu      Network-on-Chip
4410152Satgutier@umich.edu      On-chip memory controllers
4510152Satgutier@umich.edu    
4610152Satgutier@umich.edu* Internal models are based on real modern processors:
4710152Satgutier@umich.edu  Inorder models are based on Sun Niagara family
4810152Satgutier@umich.edu  OOO models are based on Intel P6 for reservation 
4910152Satgutier@umich.edu  station based OOO cores, and on Intel Netburst and 
5010152Satgutier@umich.edu  Alpha 21264 for physical register file based OOO cores.     
5110152Satgutier@umich.edu
5210152Satgutier@umich.edu* Leakage power modeling considers both sub-threshold leakage 
5310152Satgutier@umich.edu  and gate leakage power. The impact of operating temperature 
5410152Satgutier@umich.edu  on both leakage power are considered. Longer channel devices 
5510152Satgutier@umich.edu  that can reduce leakage significantly with modest performance 
5610152Satgutier@umich.edu  penalty are also modeled.
5710152Satgutier@umich.edu  
5810152Satgutier@umich.edu* McPAT supports automatic extensive search to find optimal designs 
5910152Satgutier@umich.edu  that satisfy the target clock frequency. The timing constraint 
6010152Satgutier@umich.edu  include both throughput and latency.
6110152Satgutier@umich.edu
6210152Satgutier@umich.edu* Interconnect model with different delay, power, and area 
6310152Satgutier@umich.edu  properties, as well as both the aggressive and conservative 
6410152Satgutier@umich.edu  interconnect projections on wire technologies. 
6510152Satgutier@umich.edu
6610152Satgutier@umich.edu* All process specific values used by the McPAT are obtained
6710152Satgutier@umich.edu  from ITRS and currently, the McPAT supports 90nm, 65nm, 45nm, 
6810152Satgutier@umich.edu  32nm, and 22nm technology nodes. At 32nm and 22nm nodes, SOI 
6910152Satgutier@umich.edu  and DG devices are used. After 45nm, Hi-K metal gates are used.
7010152Satgutier@umich.edu
7110152Satgutier@umich.eduHow to use the tool?
7210152Satgutier@umich.edu====================
7310152Satgutier@umich.edu
7410152Satgutier@umich.eduMcPAT takes input parameters from an XML-based interface,
7510152Satgutier@umich.eduthen it computes area and peak power of the 
7610152Satgutier@umich.eduPlease note that the peak power is the absolute worst case power, 
7710152Satgutier@umich.eduwhich could be even higher than TDP. 
7810152Satgutier@umich.edu
7910152Satgutier@umich.edu1. Steps to run McPAT:
8010152Satgutier@umich.edu   -> define the target processor using inorder.xml or OOO.xml 
8110152Satgutier@umich.edu   -> run the "mcpat" binary:
8210152Satgutier@umich.edu      ./mcpat -infile <*.xml>  -print_level < level of detailed output>
8310152Satgutier@umich.edu      ./mcpat -h (or mcpat --help) will show the quick help message.
8410152Satgutier@umich.edu
8510152Satgutier@umich.edu   Rather than being hardwired to certain simulators, McPAT 
8610152Satgutier@umich.edu   uses an XML-based interface to enable easy integration
8710152Satgutier@umich.edu   with various performance simulators. Our collaborator, 
8810152Satgutier@umich.edu   Richard Strong, at University of California, San Diego, 
8910152Satgutier@umich.edu   designed an experimental parser for the M5 simulator, aiming for 
9010152Satgutier@umich.edu   streamlining the integration of McPAT and M5. Please check the M5 
9110152Satgutier@umich.edu   repository/ for the latest version of the parser.
9210152Satgutier@umich.edu   
9310152Satgutier@umich.edu2. Optimize:
9410152Satgutier@umich.edu   McPAT will try its best to satisfy the target clock rate. 
9510152Satgutier@umich.edu   When it cannot find a valid solution, it gives out warnings, 
9610152Satgutier@umich.edu   while still giving a solution that is closest to the timing 
9710152Satgutier@umich.edu   constraints and calculate power based on it. The optimization 
9810152Satgutier@umich.edu   will lead to larger power/area numbers for target higher clock
9910152Satgutier@umich.edu   rate. McPAT also provides the option "-opt_for_clk" to turn on 
10010152Satgutier@umich.edu   ("-opt_for_clk 1") and off this strict optimization for the 
10110152Satgutier@umich.edu   timing constraint. When it is off, McPAT always optimize 
10210152Satgutier@umich.edu   component for ED^2P without worrying about meeting the 
10310152Satgutier@umich.edu   target clock frequency. By turning it off, the computation time 
10410152Satgutier@umich.edu   can be reduced, which suites for situations where target clock rate
10510152Satgutier@umich.edu   is conservative.
10610152Satgutier@umich.edu  
10710152Satgutier@umich.edu3. The output:
10810152Satgutier@umich.edu   McPAT outputs results in a hierarchical manner. Increasing 
10910152Satgutier@umich.edu   the "-print_level" will show detailed results inside each 
11010152Satgutier@umich.edu   component. For each component, major parts are shown, and associated 
11110152Satgutier@umich.edu   pipeline registers/control logic are added up in total area/power of each 
11210152Satgutier@umich.edu   components. In general, McPAT does not model the area/overhead of the pad 
11310152Satgutier@umich.edu   frame used in a processor die.
11410152Satgutier@umich.edu   
11510152Satgutier@umich.edu4. How to use the XML interface for McPAT 
11610152Satgutier@umich.edu   4.1 Set up the parameters
11710152Satgutier@umich.edu   		Parameters of target designs need to be set in the *.xml file for 
11810152Satgutier@umich.edu   		entries taged as "param". McPAT have very detailed parameter settings. 
11910152Satgutier@umich.edu   		please remove the structure parameter from the file if you want 
12010152Satgutier@umich.edu   		to use the default values. Otherwise, the parameters in the xml file 
12110152Satgutier@umich.edu   		will override the default values. 
12210152Satgutier@umich.edu   
12310152Satgutier@umich.edu   4.2 Pass the statistics
12410152Satgutier@umich.edu   		There are two options to get the correct stats: a) the performance 
12510152Satgutier@umich.edu   		simulator can capture all the stats in detail and pass them to McPAT;
12610152Satgutier@umich.edu   		b). Performance simulator can only capture partial stats and pass 
12710152Satgutier@umich.edu   		them to McPAT, while McPAT can reason about the complete stats using 
12810152Satgutier@umich.edu        the partial information and the configuration. Therefore, there are 
12910152Satgutier@umich.edu        some overlap for the stats. 
13010152Satgutier@umich.edu   
13110152Satgutier@umich.edu   4.3 Interface XML file structures (PLEASE READ!)
13210152Satgutier@umich.edu   			The XML is hierarchical from processor level to micro-architecture 
13310152Satgutier@umich.edu   		level. McPAT support both heterogeneous and homogeneous manycore processors. 
13410152Satgutier@umich.edu   		
13510152Satgutier@umich.edu   			1). For heterogeneous processor setup, each component (core, NoC, cache, 
13610152Satgutier@umich.edu   		and etc) must have its own instantiations (core0, core1, ..., coreN). 
13710152Satgutier@umich.edu   		Each instantiation will have different parameters as well as its stats.
13810152Satgutier@umich.edu   		Thus, the XML file must have multiple "instantiation" of each type of 
13910152Satgutier@umich.edu   		heterogeneous components and the corresponding hetero flags must be set 
14010152Satgutier@umich.edu   		in the XML file. Then state in the XML should be the stats of "a" instantiation 
14110152Satgutier@umich.edu   		(e.g. "a" cores). The reported runtime dynamic is of a single instantiation 
14210152Satgutier@umich.edu   		(e.g. "a" cores). Since the stats for each (e.g. "a" cores) may be different,
14310152Satgutier@umich.edu   		we will see a whole list of (e.g. "a" cores) with different dynamic power,
14410152Satgutier@umich.edu   		and total power is just a sum of them.  
14510152Satgutier@umich.edu   		
14610152Satgutier@umich.edu   			2). For homogeneous processors, the same method for heterogeneous can 
14710152Satgutier@umich.edu   		also be used by treating all homogeneous instantiations as heterogeneous. 
14810152Satgutier@umich.edu   		However, a preferred approach is to use a single representative for all 
14910152Satgutier@umich.edu   		the same components (e.g. core0 to represent all cores) and set the 
15010152Satgutier@umich.edu   		processor to have homogeneous components (e.g. <param name="homogeneous_cores
15110152Satgutier@umich.edu   		" value="1"/> ). Thus, the XML file only has one instantiation to represent 
15210152Satgutier@umich.edu   		all others with the same architectural parameters. The corresponding homo 
15310152Satgutier@umich.edu   		flags must be set in the XML file.  Then, the stats in the XML should be 
15410152Satgutier@umich.edu   		the aggregated stats of the sum of all instantiations (e.g. aggregated stats 
15510152Satgutier@umich.edu   		of all cores). In the final results, McPAT will only report a single 
15610152Satgutier@umich.edu   		instantiation of each type of component, and the reported runtime dynamic power
15710152Satgutier@umich.edu   		is the sum of all instantiations of the same type. This approach can run fast 
15810152Satgutier@umich.edu   		and use much less memory.        
15910152Satgutier@umich.edu
16010152Satgutier@umich.edu5. Guide for integrating McPAT into performance simulators and bypassing the XML interface
16110152Satgutier@umich.edu   		The detailed work flow of McPAT has two phases: the initialization phase and
16210152Satgutier@umich.edu   the computation phase. Specifically, in order to start the initialization phase a 
16310152Satgutier@umich.edu   user specifies static configurations, including parameters at all three levels, 
16410152Satgutier@umich.edu   namely, architectural, circuit, and technology levels. During the initialization 
16510152Satgutier@umich.edu   phase, McPAT will generate the internal chip representation using the configurations 
16610152Satgutier@umich.edu   set by the user. 
16710152Satgutier@umich.edu   		The computation phase of McPAT is called by McPAT or the performance simulator 
16810152Satgutier@umich.edu   during simulation to generate runtime power numbers. Before calling McPAT to 
16910152Satgutier@umich.edu   compute runtime power numbers, the performance simulator needs to pass the 
17010152Satgutier@umich.edu   statistics, namely, the activity factors of each individual components to McPAT 
17110152Satgutier@umich.edu   via the XML interface. 
17210152Satgutier@umich.edu   		The initialization phase is very time-consuming, since it will repeat many 
17310152Satgutier@umich.edu   times until valid configurations are found or the possible configurations are 
17410152Satgutier@umich.edu   exhausted. To reduce the overhead, a user can let the simulator to call McPAT 
17510152Satgutier@umich.edu   directly for computation phase and only call initialization phase once at the 
17610152Satgutier@umich.edu   beginning of simulation. In this case, the XML interface file is bypassed, 
17710152Satgutier@umich.edu   please refer to processor.cc to see how the two phases are called.
17810152Satgutier@umich.edu   
17910152Satgutier@umich.edu6. Sample input files:
18010152Satgutier@umich.edu   This package provide sample XML files for validating target processors. Please find the 
18110152Satgutier@umich.edu   enclosed Niagara1.xml (for the Sun Niagara1 processor), Niagara2.xml (for the Sun Niagara2 
18210152Satgutier@umich.edu   processor), Alpha21364.xml (for the Alpha21364 processor), and Xeon.xml (for the Intel 
18310152Satgutier@umich.edu   Xeon Tulsa processor). 
18410152Satgutier@umich.edu   
18510152Satgutier@umich.edu   Special instructions for using Xeon.xml:
18610152Satgutier@umich.edu   McPAT uses ITRS device types including HP, LSTP, and LOP. Although most 
18710152Satgutier@umich.edu   designs follow ITRS projections, there are designs with special technologies. 
18810152Satgutier@umich.edu   For example, the 65nm Xeon Tulsa processor uses 1.25 V rather than 1.1V 
18910152Satgutier@umich.edu   for the core voltage domain, which results in the changes in threshold voltage,
19010152Satgutier@umich.edu   leakage current density, saturation current, and etc, besides the different 
19110152Satgutier@umich.edu   supply voltage. We use MASTAR to match the special technology as used in Xeon 
19210152Satgutier@umich.edu   core domain. Therefore, in order to generate accurate results of Xeon 
19310152Satgutier@umich.edu   Tulsa cores, users need to do make TAR=mcpatXeonCore and use the generated 
19410152Satgutier@umich.edu   special executable. The L3 cache and buses must be computed using standard 
19510152Satgutier@umich.edu   ITRS technology.    
19610152Satgutier@umich.edu    
19710152Satgutier@umich.edu
19810152Satgutier@umich.edu====================
19910152Satgutier@umich.eduMcPAT is in its beginning stage. We are still improving 
20010152Satgutier@umich.eduthe tool and refining the code. Please come back to its website 
20110152Satgutier@umich.edufor newer versions. If you have any comments, 
20210152Satgutier@umich.eduquestions, or suggestions, please write to us.
20310152Satgutier@umich.edu
20410152Satgutier@umich.eduVersion history and roadmap
20510152Satgutier@umich.edu
20610152Satgutier@umich.eduMcPAT Alpha:      released Sep. 2009 Experimental release
20710152Satgutier@umich.eduMcPAT Beta (0.6): released Nov. 2009 New code base and technology base
20810152Satgutier@umich.eduMcPAT Beta (0.7): released May. 2010 Added various new models, 
20910152Satgutier@umich.edu                  including long channel devices, buses model; together
21010152Satgutier@umich.edu                  with bug fixes and extensive code optimization to reduce 
21110152Satgutier@umich.edu                  memory usage.  
21210152Satgutier@umich.eduMcPAT Beta (0.8): released Aug. 2010 Added various new models, 
21310152Satgutier@umich.edu                  including on-chip 10Gb ethernet units, PCIe, and flash controllers.
21410152Satgutier@umich.eduNext major release:     
21510152Satgutier@umich.eduMcPAT 1.0:        including advance power-saving states
21610152Satgutier@umich.edu
21710152Satgutier@umich.eduFuture releases may include the modeling of embedded low-power 
21810152Satgutier@umich.eduprocessors as well as vector processors and GPGPUs.             
21910152Satgutier@umich.edu                  
22010152Satgutier@umich.edu
22110152Satgutier@umich.eduSheng Li             
22210152Satgutier@umich.edusheng.li@hp.com 
22310152Satgutier@umich.edu
22410152Satgutier@umich.edu
22510152Satgutier@umich.edu
22610152Satgutier@umich.edu
227