README revision 10152
12SN/A __  __      ____   _  _____   ____       _         
24039Sbinkertn@umich.edu|  \/  | ___|  _ \ / \|_   _| | __ )  ___| |_  __ _ 
32SN/A| |\/| |/ __| |_) / _ \ | |   |  _ \ / _ \ __|/ _` |
42SN/A| |  | | (__|  __/ ___ \| |   | |_) |  __/ |_| (_| |
52SN/A|_|  |_|\___|_| /_/   \_\_|   |____/ \___|\__|\__,_|
62SN/A
72SN/AMcPAT: Multicore Power, Area, and Timing
82SN/ACurrent version 0.8Beta 
92SN/A===============================
102SN/A
112SN/AMcPAT is an architectural modeling tool for chip multiprocessors (CMP)
122SN/AThe main focus of McPAT is accurate power and area
132SN/Amodeling, and a target clock rate is used as a design constraint. 
142SN/AMcPAT performs automatic extensive search to find optimal designs 
152SN/Athat satisfy the target clock frequency.  
162SN/A
172SN/AFor complete documentation of the McPAT, please refer McPAT 1.0
182SN/Atechnical report and the following paper,
192SN/A"McPAT: An Integrated Power, Area, and Timing Modeling
202SN/A Framework for Multicore and Manycore Architectures", 
212SN/Athat appears in MICRO 2009. Please cite the paper, if you use
222SN/AMcPAT in your work. The bibtex entry is provided below for your convenience.
232SN/A
242SN/A @inproceedings{mcpat:micro,
252SN/A author = {Sheng Li and Jung Ho Ahn and Richard D. Strong and Jay B. Brockman and Dean M. Tullsen and Norman P. Jouppi},
262SN/A title =  "{McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures}",
272665Ssaidi@eecs.umich.edu booktitle = {MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture},
282665Ssaidi@eecs.umich.edu year = {2009},
292665Ssaidi@eecs.umich.edu pages = {469--480},
302SN/A }
312SN/A
321354SN/ACurrent McPAT is in its beta release. 
331354SN/AList of features of beta release
342SN/A===============================
352SN/AThe following are the list of features supported by the tool. 
362SN/A
3756SN/A* Power, area, and timing models for CMPs with:
381031SN/A      Inorder cores both single and multithreaded
3956SN/A      OOO cores both single and multithreaded
401696SN/A      Shared/coherent caches with directory hardware:
412SN/A      	including directory cache, shadowed tag directory
42699SN/A      	and static bank mapped tag directory
432SN/A      Network-on-Chip
442SN/A      On-chip memory controllers
452SN/A    
462SN/A* Internal models are based on real modern processors:
472SN/A  Inorder models are based on Sun Niagara family
482SN/A  OOO models are based on Intel P6 for reservation 
492SN/A  station based OOO cores, and on Intel Netburst and 
504042Sbinkertn@umich.edu  Alpha 21264 for physical register file based OOO cores.     
512SN/A
522SN/A* Leakage power modeling considers both sub-threshold leakage 
532SN/A  and gate leakage power. The impact of operating temperature 
542SN/A  on both leakage power are considered. Longer channel devices 
552SN/A  that can reduce leakage significantly with modest performance 
564042Sbinkertn@umich.edu  penalty are also modeled.
572SN/A  
584042Sbinkertn@umich.edu* McPAT supports automatic extensive search to find optimal designs 
592SN/A  that satisfy the target clock frequency. The timing constraint 
602SN/A  include both throughput and latency.
612SN/A
622SN/A* Interconnect model with different delay, power, and area 
632SN/A  properties, as well as both the aggressive and conservative 
642SN/A  interconnect projections on wire technologies. 
651030SN/A
662SN/A* All process specific values used by the McPAT are obtained
672SN/A  from ITRS and currently, the McPAT supports 90nm, 65nm, 45nm, 
682SN/A  32nm, and 22nm technology nodes. At 32nm and 22nm nodes, SOI 
692SN/A  and DG devices are used. After 45nm, Hi-K metal gates are used.
702SN/A
712SN/AHow to use the tool?
722SN/A====================
732SN/A
742SN/AMcPAT takes input parameters from an XML-based interface,
752SN/Athen it computes area and peak power of the 
762SN/APlease note that the peak power is the absolute worst case power, 
772SN/Awhich could be even higher than TDP. 
782SN/A
792SN/A1. Steps to run McPAT:
802SN/A   -> define the target processor using inorder.xml or OOO.xml 
814039Sbinkertn@umich.edu   -> run the "mcpat" binary:
822SN/A      ./mcpat -infile <*.xml>  -print_level < level of detailed output>
834039Sbinkertn@umich.edu      ./mcpat -h (or mcpat --help) will show the quick help message.
842SN/A
852SN/A   Rather than being hardwired to certain simulators, McPAT 
864039Sbinkertn@umich.edu   uses an XML-based interface to enable easy integration
874039Sbinkertn@umich.edu   with various performance simulators. Our collaborator, 
884039Sbinkertn@umich.edu   Richard Strong, at University of California, San Diego, 
894039Sbinkertn@umich.edu   designed an experimental parser for the M5 simulator, aiming for 
902SN/A   streamlining the integration of McPAT and M5. Please check the M5 
912SN/A   repository/ for the latest version of the parser.
922SN/A   
932SN/A2. Optimize:
942SN/A   McPAT will try its best to satisfy the target clock rate. 
952SN/A   When it cannot find a valid solution, it gives out warnings, 
962SN/A   while still giving a solution that is closest to the timing 
972SN/A   constraints and calculate power based on it. The optimization 
981030SN/A   will lead to larger power/area numbers for target higher clock
992SN/A   rate. McPAT also provides the option "-opt_for_clk" to turn on 
1002SN/A   ("-opt_for_clk 1") and off this strict optimization for the 
1011030SN/A   timing constraint. When it is off, McPAT always optimize 
1022SN/A   component for ED^2P without worrying about meeting the 
1032SN/A   target clock frequency. By turning it off, the computation time 
1042SN/A   can be reduced, which suites for situations where target clock rate
1052SN/A   is conservative.
1061030SN/A  
1071030SN/A3. The output:
1081030SN/A   McPAT outputs results in a hierarchical manner. Increasing 
1092SN/A   the "-print_level" will show detailed results inside each 
1102SN/A   component. For each component, major parts are shown, and associated 
1112SN/A   pipeline registers/control logic are added up in total area/power of each 
1122SN/A   components. In general, McPAT does not model the area/overhead of the pad 
1132SN/A   frame used in a processor die.
1142SN/A   
1152SN/A4. How to use the XML interface for McPAT 
1162SN/A   4.1 Set up the parameters
1172SN/A   		Parameters of target designs need to be set in the *.xml file for 
1182SN/A   		entries taged as "param". McPAT have very detailed parameter settings. 
1192SN/A   		please remove the structure parameter from the file if you want 
1202SN/A   		to use the default values. Otherwise, the parameters in the xml file 
1212SN/A   		will override the default values. 
1222SN/A   
1232SN/A   4.2 Pass the statistics
1242SN/A   		There are two options to get the correct stats: a) the performance 
1252SN/A   		simulator can capture all the stats in detail and pass them to McPAT;
1262SN/A   		b). Performance simulator can only capture partial stats and pass 
1272SN/A   		them to McPAT, while McPAT can reason about the complete stats using 
1282SN/A        the partial information and the configuration. Therefore, there are 
1292SN/A        some overlap for the stats. 
1302SN/A   
1312SN/A   4.3 Interface XML file structures (PLEASE READ!)
1322SN/A   			The XML is hierarchical from processor level to micro-architecture 
1331031SN/A   		level. McPAT support both heterogeneous and homogeneous manycore processors. 
1342SN/A   		
1352SN/A   			1). For heterogeneous processor setup, each component (core, NoC, cache, 
1364039Sbinkertn@umich.edu   		and etc) must have its own instantiations (core0, core1, ..., coreN). 
1374039Sbinkertn@umich.edu   		Each instantiation will have different parameters as well as its stats.
1382SN/A   		Thus, the XML file must have multiple "instantiation" of each type of 
1394039Sbinkertn@umich.edu   		heterogeneous components and the corresponding hetero flags must be set 
1404039Sbinkertn@umich.edu   		in the XML file. Then state in the XML should be the stats of "a" instantiation 
1414039Sbinkertn@umich.edu   		(e.g. "a" cores). The reported runtime dynamic is of a single instantiation 
1424039Sbinkertn@umich.edu   		(e.g. "a" cores). Since the stats for each (e.g. "a" cores) may be different,
1434039Sbinkertn@umich.edu   		we will see a whole list of (e.g. "a" cores) with different dynamic power,
1442SN/A   		and total power is just a sum of them.  
1452SN/A   		
1462SN/A   			2). For homogeneous processors, the same method for heterogeneous can 
1474039Sbinkertn@umich.edu   		also be used by treating all homogeneous instantiations as heterogeneous. 
1482SN/A   		However, a preferred approach is to use a single representative for all 
1494039Sbinkertn@umich.edu   		the same components (e.g. core0 to represent all cores) and set the 
1502SN/A   		processor to have homogeneous components (e.g. <param name="homogeneous_cores
1512SN/A   		" value="1"/> ). Thus, the XML file only has one instantiation to represent 
1522SN/A   		all others with the same architectural parameters. The corresponding homo 
1534039Sbinkertn@umich.edu   		flags must be set in the XML file.  Then, the stats in the XML should be 
1542SN/A   		the aggregated stats of the sum of all instantiations (e.g. aggregated stats 
1552SN/A   		of all cores). In the final results, McPAT will only report a single 
1564042Sbinkertn@umich.edu   		instantiation of each type of component, and the reported runtime dynamic power
1574042Sbinkertn@umich.edu   		is the sum of all instantiations of the same type. This approach can run fast 
1584042Sbinkertn@umich.edu   		and use much less memory.        
1594042Sbinkertn@umich.edu
1604042Sbinkertn@umich.edu5. Guide for integrating McPAT into performance simulators and bypassing the XML interface
1614039Sbinkertn@umich.edu   		The detailed work flow of McPAT has two phases: the initialization phase and
1621070SN/A   the computation phase. Specifically, in order to start the initialization phase a 
1631070SN/A   user specifies static configurations, including parameters at all three levels, 
1641070SN/A   namely, architectural, circuit, and technology levels. During the initialization 
1651070SN/A   phase, McPAT will generate the internal chip representation using the configurations 
1661070SN/A   set by the user. 
1671070SN/A   		The computation phase of McPAT is called by McPAT or the performance simulator 
1681070SN/A   during simulation to generate runtime power numbers. Before calling McPAT to 
1691070SN/A   compute runtime power numbers, the performance simulator needs to pass the 
1701070SN/A   statistics, namely, the activity factors of each individual components to McPAT 
1712SN/A   via the XML interface. 
1722SN/A   		The initialization phase is very time-consuming, since it will repeat many 
1732SN/A   times until valid configurations are found or the possible configurations are 
1742SN/A   exhausted. To reduce the overhead, a user can let the simulator to call McPAT 
1752SN/A   directly for computation phase and only call initialization phase once at the 
1762SN/A   beginning of simulation. In this case, the XML interface file is bypassed, 
1772SN/A   please refer to processor.cc to see how the two phases are called.
1782SN/A   
1792SN/A6. Sample input files:
1802SN/A   This package provide sample XML files for validating target processors. Please find the 
1812SN/A   enclosed Niagara1.xml (for the Sun Niagara1 processor), Niagara2.xml (for the Sun Niagara2 
1822SN/A   processor), Alpha21364.xml (for the Alpha21364 processor), and Xeon.xml (for the Intel 
1832SN/A   Xeon Tulsa processor). 
1842SN/A   
1854042Sbinkertn@umich.edu   Special instructions for using Xeon.xml:
1862SN/A   McPAT uses ITRS device types including HP, LSTP, and LOP. Although most 
1874041Sbinkertn@umich.edu   designs follow ITRS projections, there are designs with special technologies. 
1884041Sbinkertn@umich.edu   For example, the 65nm Xeon Tulsa processor uses 1.25 V rather than 1.1V 
1894041Sbinkertn@umich.edu   for the core voltage domain, which results in the changes in threshold voltage,
1902SN/A   leakage current density, saturation current, and etc, besides the different 
1912SN/A   supply voltage. We use MASTAR to match the special technology as used in Xeon 
1924041Sbinkertn@umich.edu   core domain. Therefore, in order to generate accurate results of Xeon 
1934041Sbinkertn@umich.edu   Tulsa cores, users need to do make TAR=mcpatXeonCore and use the generated 
1944041Sbinkertn@umich.edu   special executable. The L3 cache and buses must be computed using standard 
1952SN/A   ITRS technology.    
1962SN/A    
1974041Sbinkertn@umich.edu
1984041Sbinkertn@umich.edu====================
1994041Sbinkertn@umich.eduMcPAT is in its beginning stage. We are still improving 
2002SN/Athe tool and refining the code. Please come back to its website 
2012SN/Afor newer versions. If you have any comments, 
2024041Sbinkertn@umich.eduquestions, or suggestions, please write to us.
2034041Sbinkertn@umich.edu
2042SN/AVersion history and roadmap
2052SN/A
2064041Sbinkertn@umich.eduMcPAT Alpha:      released Sep. 2009 Experimental release
2074041Sbinkertn@umich.eduMcPAT Beta (0.6): released Nov. 2009 New code base and technology base
208507SN/AMcPAT Beta (0.7): released May. 2010 Added various new models, 
209507SN/A                  including long channel devices, buses model; together
2102SN/A                  with bug fixes and extensive code optimization to reduce 
2112SN/A                  memory usage.  
2122SN/AMcPAT Beta (0.8): released Aug. 2010 Added various new models, 
2134041Sbinkertn@umich.edu                  including on-chip 10Gb ethernet units, PCIe, and flash controllers.
2144041Sbinkertn@umich.eduNext major release:     
2154041Sbinkertn@umich.eduMcPAT 1.0:        including advance power-saving states
2164041Sbinkertn@umich.edu
217105SN/AFuture releases may include the modeling of embedded low-power 
2182SN/Aprocessors as well as vector processors and GPGPUs.             
2192SN/A                  
2202SN/A
2211354SN/ASheng Li             
222sheng.li@hp.com 
223
224
225
226
227