Figure 1 below shows a listing of the cmppt() function in the EQNTOTT program, a program from the SPECint92 suite. Running a traditional profiler on EQNTOTT will tell you that 80-90% of execution time is spent in this function.
(gdb) list 34,59
34 int cmppt (a, b)
35 PTERM *a[], *b[];
36 /*
37 * compare product terms indirectly pointed to by a and b.
38 */
39 {
40 register int i, aa, bb;
41
42 for (i = 0; i < ninputs; i++) {
43 aa = a[0]->ptand[i];
44 bb = b[0]->ptand[i];
45 if (aa == 2)
46 aa = 0;
47 if (bb == 2)
48 bb = 0;
49 if (aa != bb) {
50 if (aa < bb) {
51 return (-1);
52 }
53 else {
54 return (1);
55 }
56 }
57 }
58 return (0);
59 }
|
In Figure 2, we've run SimICS as a back-end for GDB, and we've run EQNTOTT to completion. Giving the "list" command withing gdb-simics will show profile totals for each line of C.
The numbers in the columns correspond to the following profilers:
To facilitate for the reader, we've added column headings "a" through "h" to the listing. The simulated TLB is 64-entry, fully associative. The data cache is 16 Kbyte, 4-way associative with 32-byte cache lines. The instruction cache is 20 Kbyte, 5-way associative with 64-byte cache lines. These values correspond to the SUPERsparc processor.
Not surprisingly, this function triggers a myriad of expensive events - lots of instructions and branches, TLB misses, and read cache misses.
(gdb-simics) list 34,59
34 int cmppt (a, b)
35 PTERM *a[], *b[];
36 /*
37 * compare product terms indirectly pointed to by a and b.
38 */
39 {
40 register int i, aa, bb;
41 a b c d e f g h
42 1 0 2384615 150991 5683242 2841621 28416210 10 for (i = 0; i < ninputs; i++) {
43 0 0 1567937 229925 0 0 2841621 1 aa = a[0]->ptand[i];
44 bb = b[0]->ptand[i];
45 0 0 1436933 110481 73692796 19709830 229603251 3 if (aa == 2)
46 0 0 0 0 0 0 56824587 1 aa = 0;
47 0 0 0 2443 19709830 20979970 208623281 3 if (bb == 2)
48 bb = 0;
49 0 0 0 5631 20979970 76534417 228321868 3 if (aa != bb) {
50 0 0 0 0 1281383 7231 2562766 2 if (aa < bb) {
51 return (-1);
52 }
53 else {
54 0 0 3210112 14722 75253034 76527186 226747168 5 return (1);
55 }
56 }
57 }
58 0 0 0 0 1560238 0 1560238 1 return (0);
59 0 0 0 0 1281383 2841621 5683242 2 }
|
The profile information in SimICS is kept on an assembler-line granularity. We can thus disassemble code to see more detail, or we can use a new GDB command, "list-detail", shown in Figure 3. Using this information, we can find common return values, etc. Indeed, we used this and similar information to rewrite EQNTOTT to run over 10 times faster than the version included in SPECint92.
SimICS runs 30-100 times slower than native execution when collecting this type of information on a program.
(gdb-simics) list-detail 34,59
34 int cmppt (a, b)
35 PTERM *a[], *b[];
36 /*
37 * compare product terms indirectly pointed to by a and b.
38 */
39 {
40 register int i, aa, bb;
41 a b c d e f g h
42 1 0 2384615 150991 5683242 2841621 28416210 10 for (i = 0; i < ninputs; i++) {
0x11bf8 0 0 0 121 2841621 0 2841621 1 sethi %hi(0x3b400), %g2
0x11bfc 0 0 3996 8207 0 0 2841621 1 ld [ %g2 + 0x224 ], %g3 ! 0x3b624
|