Using TBTK to time C++ code

Most recent TBTK release at the time of writing: v0.9.6

When developing computationally intensive algorithms, it is important to be able to time the execution to find the main bottlenecks. TBTK therefore provides a flexible Timer class that allows for simple profiling of C++ code.

Include file

The Timer is made available through

#include "TBTK/Timer.h"

Stacked tick-tock calls

In its basic use case the Timer acts as a time stamp stack that allows a single time stamp to be pushed onto the stack through the call

Timer::tick("Optional tag.");

The latest time stamp can similarly be popped from the stack using

Timer::tock();

When a time stamp is popped, the time difference between the current time and the time that the time stamp was pushed onto the stack is printed to the standard output. In addition, the current stack depth and the optional tag is also printed. This allows for easy identification of which part of the code a specific time difference corresponds to.

In the following example nested tick-tock calls are used to simultaneously measure the total calculation time of a nested loop as well as the time taken by a single iteration of the outer loop.

Timer::tick("Full calculation.")
double sum = 0;
for(unsigned int i = 0; i < 10; i++){
    Timer::tick("Single iteration of the outer loop.");
    for(unsigned int j = 0; j < 10000000; j++)
        sum += sqrt(i*j);
    Timer::tock();
}
Timer::tock();

A typical output is

(1) 23ms 356us 578ns 	Single iteration of the outer loop.
(1) 22ms 121us 595ns 	Single iteration of the outer loop.
(1) 20ms 894us 633ns 	Single iteration of the outer loop.
(1) 20ms 693us 151ns 	Single iteration of the outer loop.
(1) 20ms 753us 656ns 	Single iteration of the outer loop.
(1) 20ms 632us 406ns 	Single iteration of the outer loop.
(1) 20ms 693us 43ns 	Single iteration of the outer loop.
(1) 20ms 751us 261ns 	Single iteration of the outer loop.
(1) 20ms 420us 362ns 	Single iteration of the outer loop.
(1) 20ms 753us 927ns 	Single iteration of the outer loop.
(0) 211ms 272us 43ns 	Full calculation.

We can see that each iteration takes 20-23ms to execute while the full calculation takes 211ms. On the left we also see the stack depth, while the tag is displayed to the right.

As a second example we consider the following recursive function call

double recursive(int depth){
        Timer::tick("Recursive step " + to_string(depth));

        double answer;
        if(depth == 10){
                answer = 0;
        }
        else{
                double sum = 0;
                for(unsigned int i = 0; i < 10000000; i++)
                        sum += sqrt(depth*i);
                answer = sum + recursive(depth+1);
        }

        Timer::tock();

        return answer;
}

Each time the recursive function is entered, a new time stamp is pushed onto the stack. After ten consecutive function calls the function begins to return and the time stamp stack is unwinded. The resulting output is

(10) 501ns           	Recursive step 10
(9) 62ms 570us 861ns 	Recursive step 9
(8) 125ms 299us 748ns 	Recursive step 8
(7) 187ms 827us 863ns 	Recursive step 7
(6) 250ms 704us 696ns 	Recursive step 6
(5) 313ms 284us 290ns 	Recursive step 5
(4) 376ms 373us 532ns 	Recursive step 4
(3) 439ms 518us 286ns 	Recursive step 3
(2) 502ms 852us 691ns 	Recursive step 2
(1) 566ms 298us 567ns 	Recursive step 1
(0) 617ms 420us 578ns 	Recursive step 0

Accumulators

While the stacked mode allows for flexible measurements of multiple pieces of code at the same time, it is not suitable for a case like the following

for(unsigned int i = 0; i < 1000000; i++){
    doA();
    doB();
}

Here both doA() and doB() are executed multiple times and we may want to figure out which one that is the bottleneck. Moreover, measuring doA() and doB() in single shot measurements may not be useful for figuring out the overall performance. For example, doA() and doB() may take widely different times to execute each time and the execution time for doB() may even depend on what the previous call of doA() did, and vice versa. In such cases it is only meaningful to measure the accumulated time taken by each function over the full loop.

For this reason the Timer  also has an accumulator mode. To use this mode we first create two new accumulators.

unsigned int idA = Timer::createAccumulator("Accumulator A");
unsigned int idB = Timer::createAccumulator("Accumulator B");

The accumulators can then be used independently by supplying the corresponding ID to the tick-tock calls.

for(unsigned int i = 0; i < 1000000; i++){
    Timer::tick(idA);
    doA();
    Timer::tock(idA);

    Timer::tick(idB);
    doB();
    Timer::tock(idB);
}

Since tick-tock pairs can be executed millions of times for any given accumulator, no information is printed by the tock calls in this mode. Instead the time accumulated so far can be printed at any time through the call

Timer::printAccumulators();

A typical output looks like

============================== Accumulator table ==============================
ID        Time                                  Tag                                                                                            
[0]                       650ms 618us 220ns     Accumulator A                                                                                       
[1]                    2s 870ms 110us  19ns     Accumulator B                                                                                       
===============================================================================

Note that time is added to the accumulators when the tock calls are made. Therefore, when printing the accumulator table, only completed time intervals contribute to the listed time. We also note that it is possible to use stacked tick-tock calls and accumulator calls simultaneously.

Learn more

For more information about the Timer class, see the corresponding manual page and API entry.

Leave a Reply

Your email address will not be published. Required fields are marked *