perf-monitor accumulate-mode processor syscall-number filename program argument*
The performance monitor is a small hack that uses the on-chip counters on UltraSPARC-I/II processors to gather statistics. Both user and system mode can be counted. Since everything on that processor is counted, some runs may be inaccurate due to activity by other processes.
The performance monitor uses a loadable kernel module to access privileged registers. The module installs a system call that is used by the perf-monitor program. If the module has not been loaded, the behavior of perf-monitor is undefined. The module is loaded as root with the command:
hal> modload -e modload-script inst_sync
92 176
The output from modload is the module identifier and system call number.
If accumulate-mode is 1 data will be accumulated which is good for graph generation. If 0 you get the raw count.
Processor is the processor number that the program it to be run on. Note that the processor number are not always sequential (schuetz has processors numbered 0, 1, 4, and 5).
Syscall-number is the number of the system call loaded from inst_sync.
Filename is a file where perf-monitor writes info about every run it makes. There is one line for each run with the format:
<ticks> <clocks> <event> <count> <event> <count> <mode>
Program is the executable that is to be monitored. Note that if the program forks, the created processes may be scheduled on different processors and therefor not counted. The program program is run with the arguments specified last on the perf-monitor command line.
Perf-monitor reads the file perf-monitor.conf to determine what to count and how. The file format is:
<conf-file> ::= <conf-line>*
<cond-line> ::= <counter number> <event id> <mode>
Counter number is the column in the output that the event should be associated with. All events associated with one counter are accumulated in that counter. Counter numbers over 100 indicate that accumulation should be turned off for that counter (useful for Cycle_cnt).
Event id is a countable event as listed in the UltraSPARC user's manual (and in the table below).
| Cycle_cnt | Accumulated cycles |
| Instr_cnt | The number of instructions completed |
| Dispatch0_IC_miss | Cycles I-buffer empty from I-Cache miss |
| Dispatch0_mispred | Cycles I-buffer empty from branch misprediction |
| Dispatch0_storeBuf | Cycles store buffer full |
| Dispatch0_FP_use | Cycles stalled waiting for fp dependency |
| Load_use | Cycles stalled waiting for load |
| Load_use_RAW | Cycles stalled on some weird internal condition |
| IC_ref | I-Cache references |
| IC_hit | I-Cache hits |
| DC_rd | D-Cache read references |
| DC_rd_hit | D-Cache read hits |
| DC_wr | D-Cache write references |
| DC_wr_hit | D-Cache write hits |
| EC_ref | E-Cache references |
| EC_hit | E-Cache hits |
| EC_write_hit_RDO | See User's guide |
| EC_wb | E-Cache misses that do writebacks |
| EC_snoop_inv | E-Cache invalidates |
| EC_snoop_cb | E-Cache snoop copy-backs |
| EC_rd_hit | E-Cache read hits from D-Cache misses |
| EC_ic_hit | E-Cache read hits from I-Cache misses |
Mode selects user/system mode as seen in the table below.
| 0 | Nothing recorded |
| 1 | System events only |
| 2 | User events only |
| 3 | System and user events |
The output is send to perf-stats.dat which has a format suitable for building bar charts with gle.
1 Cycle_cnt 2
2 Instr_cnt 2
3 Cycle_cnt 3
4 Instr_cnt 3
Run
hal> perf-monitor 0 0 176 vortex-cpi ./vortex.sim1 vortex.in
Run 0 will measure: 0(1) and 1(2)
Run 1 will measure: 0(3) and 1(4)
Performing run 1.
Performing run 2.
hal> cat vortex-cpi
3356285792 13400000 Cycle_cnt 3016264966 Instr_cnt 2418782563 2
3347298939 13410000 Cycle_cnt 3347301488 Instr_cnt 2533223764 3
From this data we calculate CPI(user) = 1.25, and CPI(user+system) = 1.32. Not bad for a quad-issue machine!
Does not work on UltraSparc-I.
Magnus Christensson, mch@sics.se