Using gdb-simics to analyze programs

Running SimICS as a back-end allows GDB to provide some interesting information to a programmer.

Figure 1 below shows a listing of the cmppt() function in the EQNTOTT program, a program from the SPECint92 suite. Running a traditional profiler on EQNTOTT will tell you that 80-90% of execution time is spent in this function.

  (gdb) list 34,59
  34       int cmppt (a, b)
  35       PTERM *a[], *b[];
  36       /*
  37        * compare product terms indirectly pointed to by a and b.   
  38        */
  39       {
  40               register int i, aa, bb;
  41 
  42               for (i = 0; i < ninputs; i++) {
  43                       aa = a[0]->ptand[i];
  44                       bb = b[0]->ptand[i];
  45                       if (aa == 2)
  46                               aa = 0;
  47                       if (bb == 2)
  48                               bb = 0;
  49                       if (aa != bb) {
  50                               if (aa < bb) {
  51                                       return (-1);
  52                               }
  53                               else    {
  54                                       return (1);
  55                               }
  56                       }
  57               }
  58               return (0);
  59       }
Figure 1 - Traditional program listing

In Figure 2, we've run SimICS as a back-end for GDB, and we've run EQNTOTT to completion. Giving the "list" command withing gdb-simics will show profile totals for each line of C.

The numbers in the columns correspond to the following profilers:

  1. instruction cache misses
  2. write cache misses (data)
  3. read cache misses (data)
  4. translation lookaside buffer misses
  5. branches to the instruction
  6. branches from the instruction
  7. count of instruction execution
  8. number of different instructions executed

To facilitate for the reader, we've added column headings "a" through "h" to the listing. The simulated TLB is 64-entry, fully associative. The data cache is 16 Kbyte, 4-way associative with 32-byte cache lines. The instruction cache is 20 Kbyte, 5-way associative with 64-byte cache lines. These values correspond to the SUPERsparc processor.

Not surprisingly, this function triggers a myriad of expensive events - lots of instructions and branches, TLB misses, and read cache misses.

  (gdb-simics) list 34,59
  34                                                              int cmppt (a, b)
  35                                                              PTERM *a[], *b[];
  36                                                              /*
  37                                                               * compare product terms indirectly pointed to by a and b.   
  38                                                               */
  39                                                              {
  40                                                                      register int i, aa, bb;
  41       a b       c      d        e        f         g  h
  42       1 0 2384615 150991  5683242  2841621  28416210 10              for (i = 0; i < ninputs; i++) {
  43       0 0 1567937 229925        0        0   2841621  1                      aa = a[0]->ptand[i];
  44                                                                              bb = b[0]->ptand[i];
  45       0 0 1436933 110481 73692796 19709830 229603251  3                      if (aa == 2)
  46       0 0       0      0        0        0  56824587  1                              aa = 0;
  47       0 0       0   2443 19709830 20979970 208623281  3                      if (bb == 2)
  48                                                                                      bb = 0;
  49       0 0       0   5631 20979970 76534417 228321868  3                      if (aa != bb) {
  50       0 0       0      0  1281383     7231   2562766  2                              if (aa < bb) {
  51                                                                                              return (-1);
  52                                                                                      }
  53                                                                                      else    {
  54       0 0 3210112  14722 75253034 76527186 226747168  5                                      return (1);
  55                                                                                      }
  56                                                                              }
  57                                                                      }
  58       0 0       0      0  1560238        0   1560238  1              return (0);
  59       0 0       0      0  1281383  2841621   5683242  2      }
Figure 2 - Instrumented program listing

The profile information in SimICS is kept on an assembler-line granularity. We can thus disassemble code to see more detail, or we can use a new GDB command, "list-detail", shown in Figure 3. Using this information, we can find common return values, etc. Indeed, we used this and similar information to rewrite EQNTOTT to run over 10 times faster than the version included in SPECint92.

SimICS runs 30-100 times slower than native execution when collecting this type of information on a program.

  (gdb-simics) list-detail 34,59
  34                                                              int cmppt (a, b)
  35                                                              PTERM *a[], *b[];
  36                                                              /*
  37                                                               * compare product terms indirectly pointed to by a and b.   
  38                                                               */
  39                                                              {
  40                                                                      register int i, aa, bb;
  41       a b       c      d        e        f         g  h
  42       1 0 2384615 150991  5683242  2841621  28416210 10              for (i = 0; i < ninputs; i++) {

  0x11bf8  0 0       0    121  2841621        0   2841621  1 sethi  %hi(0x3b400), %g2
  0x11bfc  0 0    3996   8207        0        0   2841621  1 ld  [ %g2 + 0x224 ], %g3     ! 0x3b624 
  0x11c00  1 0       0      0        0        0   2841621  1 cmp  %g3, 0
  0x11c04  0 0       0      0        0  2841621   2841621  1 ble,a   0x11c70 
  0x11c08  0 0       0      0        0        0         0  0 clr  %o0
  0x11c0c  0 0  208116   8109  2841621        0   2841621  1 ld  [ %o0 ], %g2
  0x11c10  0 0 1543507  85754        0        0   2841621  1 ld  [ %g2 ], %o3
  0x11c14  0 0   75485   5398        0        0   2841621  1 ld  [ %o1 ], %g2
  0x11c18  0 0       0      0        0        0   2841621  1 clr  %o2
  0x11c1c  0 0       0      0        0        0   2841621  1 sll  %g3, 1, %o1
  0x11c20  0 0  553511  43402        0        0   2841621  1 ld  [ %g2 ], %g2

  43       0 0 1567937 229925        0        0   2841621  1                      aa = a[0]->ptand[i];

  0x11c24  0 0 1567937 229925        0        0   2841621  1 ldsh  [ %o2 + %o3 ], %o0

  44                                                                              bb = b[0]->ptand[i];
  45       0 0 1436933 110481 73692796 19709830 229603251  3                      if (aa == 2)

  0x11c28  0 0       0      0 73692796        0  76534417  1 cmp  %o0, 2
  0x11c2c  0 0       0      0        0        0  76534417  1 bne  0x11c38 
  0x11c30  0 0 1436933 110481        0 19709830  76534417  1 ldsh  [ %o2 + %g2 ], %g3

  46       0 0       0      0        0        0  56824587  1                              aa = 0;

  0x11c34  0 0       0      0        0        0  56824587  1 clr  %o0

  47       0 0       0   2443 19709830 20979970 208623281  3                      if (bb == 2)

  0x11c38  0 0       0   2443 19709830        0  76534417  1 cmp  %g3, 2
  0x11c3c  0 0       0      0        0 20979970  76534417  1 be,a   0x11c44 
  0x11c40  0 0       0      0        0        0  55554447  1 clr  %g3

  48                                                                                      bb = 0;
  49       0 0       0   5631 20979970 76534417 228321868  3                      if (aa != bb) {

  0x11c44  0 0       0   5631 20979970        0  76534417  1 cmp  %o0, %g3
  0x11c48  0 0       0      0        0  1281383  76534417  1 be,a   0x11c60 
  0x11c4c  0 0       0      0        0 75253034  75253034  1 add  %o2, 2, %o2

  50       0 0       0      0  1281383     7231   2562766  2                              if (aa < bb) {

  0x11c50  0 0       0      0  1281383        0   1281383  1 bge  0x11c70 
  0x11c54  0 0       0      0        0     7231   1281383  1 mov  1, %o0

  51                                                                                              return (-1);
  52                                                                                      }
  53                                                                                      else    {
  54       0 0 3210112  14722 75253034 76527186 226747168  5                                      return (1);

  0x11c58  0 0       0      0        0        0   1274152  1 b  0x11c70 
  0x11c5c  0 0       0      0        0  1274152   1274152  1 mov  -1, %o0
  0x11c60  0 0       0      0 75253034        0  75253034  1 cmp  %o2, %o1
  0x11c64  0 0       0      0        0  1560238  75253034  1 bl,a   0x11c28 
  0x11c68  0 0 3210112  14722        0 73692796  73692796  1 ldsh  [ %o2 + %o3 ], %o0

  55                                                                                      }
  56                                                                              }
  57                                                                      }
  58       0 0       0      0  1560238        0   1560238  1              return (0);

  0x11c6c  0 0       0      0  1560238        0   1560238  1 clr  %o0

  59       0 0       0      0  1281383  2841621   5683242  2      }

  0x11c70  0 0       0      0  1281383        0   2841621  1 retl 
  0x11c74  0 0       0      0        0  2841621   2841621  1 nop 
Figure 3 - Instrumented, detailed program listing


Document prepared by Peter Magnusson, psm@sics.se, April 1997