Using TBTK to time C++ code

Kristofer Björnson

(16th of May 2018)

Most recent TBTK release at the time of writing: v0.9.6

When developing computationally intensive algorithms, it is important to be able to time the execution to find the main bottlenecks. TBTK therefore provides a flexible Timer class that allows for simple profiling of C++ code.

1 Include file

The Timer is made available through

⬇

1#include ”TBTK/Timer.h”

2 Stacked tick-tock calls

In its basic use case the Timer acts as a time stamp stack that allows a single time stamp to be pushed onto the stack through the call

⬇

1Timer::tick(”Optional tag.”);

The latest time stamp can similarly be popped from the stack using

⬇

1Timer::tock();

When a time stamp is popped, the time difference between the current time and the time that the time stamp was pushed onto the stack is printed to the standard output. In addition, the current stack depth and the optional tag is also printed. This allows for easy identification of which part of the code a specific time difference corresponds to.

In the following example nested tick-tock calls are used to simultaneously measure the total calculation time of a nested loop as well as the time taken by a single iteration of the outer loop.

⬇

1Timer::tick(”Full calculation.”)

2double sum = 0;

3for(unsigned int i = 0; i < 10; i++){

4 Timer::tick(”Single iteration of the outer loop.”);

5 for(unsigned int j = 0; j < 10000000; j++)

6 sum += sqrt(i*j);

7 Timer::tock();

9Timer::tock();

A typical output is

⬇

1(1) 23ms 356us 578ns Single iteration of the outer loop.

2(1) 22ms 121us 595ns Single iteration of the outer loop.

3(1) 20ms 894us 633ns Single iteration of the outer loop.

4(1) 20ms 693us 151ns Single iteration of the outer loop.

5(1) 20ms 753us 656ns Single iteration of the outer loop.

6(1) 20ms 632us 406ns Single iteration of the outer loop.

7(1) 20ms 693us 43ns Single iteration of the outer loop.

8(1) 20ms 751us 261ns Single iteration of the outer loop.

9(1) 20ms 420us 362ns Single iteration of the outer loop.

10(1) 20ms 753us 927ns Single iteration of the outer loop.

11(0) 211ms 272us 43ns Full calculation.

We can see that each iteration takes 20-23ms to execute while the full calculation takes 211ms. On the left we also see the stack depth, while the tag is displayed to the right.

As a second example we consider the following recursive function call

⬇

1double recursive(int depth){

2 Timer::tick(”Recursive step ” + to_string(depth));

4 double answer;

5 if(depth == 10){

6 answer = 0;

7 }

8 else{

9 double sum = 0;

10 for(unsigned int i = 0; i < 10000000; i++)

11 sum += sqrt(depth*i);

12 answer = sum + recursive(depth+1);

13 }

15 Timer::tock();

17 return answer;

18}

Each time the recursive function is entered, a new time stamp is pushed onto the stack. After ten consecutive function calls the function begins to return and the time stamp stack is unwinded. The resulting output is

⬇

1(10) 501ns Recursive step 10

2(9) 62ms 570us 861ns Recursive step 9

3(8) 125ms 299us 748ns Recursive step 8

4(7) 187ms 827us 863ns Recursive step 7

5(6) 250ms 704us 696ns Recursive step 6

6(5) 313ms 284us 290ns Recursive step 5

7(4) 376ms 373us 532ns Recursive step 4

8(3) 439ms 518us 286ns Recursive step 3

9(2) 502ms 852us 691ns Recursive step 2

10(1) 566ms 298us 567ns Recursive step 1

11(0) 617ms 420us 578ns Recursive step 0

3 Accumulators

While the stacked mode allows for flexible measurements of multiple pieces of code at the same time, it is not suitable for a case like the following

⬇

1for(unsigned int i = 0; i < 1000000; i++){

2 doA();

3 doB();

Here both doA() and doB() are executed multiple times and we may want to figure out which one that is the bottleneck. Moreover, measuring doA() and doB() in single shot measurements may not be useful for figuring out the overall performance. For example, doA() and doB() may take widely different times to execute each time and the execution time for doB() may even depend on what the previous call of doA() did, and vice versa. In such cases it is only meaningful to measure the accumulated time taken by each function over the full loop.

For this reason the Timer also has an accumulator mode. To use this mode we first create two new accumulators.

⬇

1unsigned int idA = Timer::createAccumulator(”Accumulator A”);

2unsigned int idB = Timer::createAccumulator(”Accumulator B”);

The accumulators can then be used independently by supplying the corresponding ID to the tick-tock calls.

⬇

1for(unsigned int i = 0; i < 1000000; i++){

2 Timer::tick(idA);

3 doA();

4 Timer::tock(idA);

6 Timer::tick(idB);

7 doB();

8 Timer::tock(idB);

Since tick-tock pairs can be executed millions of times for any given accumulator, no information is printed by the tock calls in this mode. Instead the time accumulated so far can be printed at any time through the call

⬇

1Timer::printAccumulators();

A typical output looks like

⬇

1============================== Accumulator table ==============================

2ID Time Tag

3[0] 650ms 618us 220ns Accumulator A

4[1] 2s 870ms 110us 19ns Accumulator B

5===============================================================================

Note that time is added to the accumulators when the tock calls are made. Therefore, when printing the accumulator table, only completed time intervals contribute to the listed time. We also note that it is possible to use stacked tick-tock calls and accumulator calls simultaneously.

4 Learn more

For more information about the Timer class, see the corresponding manual page and API entry.