A class designed for extending to be used for testing the actuall number of instructions executed for a given sample of code. A zip file that contains all the files in this directory is here .
This class makes use of self-modifying code to access x86 instructions in a compiler independant manner (i.e., does not have any embedded assembler). This technique is based on parts of the book "Computer Systems: A Programmer's Perspective" by Randal E. Bryant and David R. O'Hallaron, information on which can be found at csapp.cs.cmu.edu. Sean Larsson originally adapted the examples in the book and Keith Oxenrider took Sean's code, rewrote it to use self-modifying code and adapted it into a class. This program should run fine on any x86 instruction based computer (Intel, AMD, Cyrix), but has only been tested on Intel with VC++ 6/7 and g++ on Win2K and Linux respectively. If you get results for other machines (good or bad) we would very much appreciate knowing about them, particularlly if you had to modify the program to get it to run properly.
This code is placed in the public domain for use by anyone for anything. We ask that if you make use of this code that you acknowledge us as authors.
Sean wrote his version 12/25/2003, Keith initially adapted it on 06/22/2004 and made substantial revisions to that adaptation and published to my web site on 11/29/2004.
The most recent version of this program can be found via: sol-biotech.com/code/
Keith can be contacted at koxenrider[at]sol[dash]biotech[dot]com
Sean can be contacted at infamous41md[at]sol[dash]biotech[dot]com
It may also be interestingly informative to view the results when compiled with and without optimization (debug turned on is typically optimization turned off).
I created three examples for you. The first is a simple extension of the one Sean originally wrote to show the results of manual loop unrolling when adding up an array of numbers. The second compares a recursive algorithm against its looped counterpart in the calculation of a factorial. The third calculates the number of prime numbers within a range and compares the results using a BitMap class I wrote some time ago against a plain vector (I am a little dissapointed the vector beat the pants off my BitMap class, but that is what you get for not testing things). I hope the number of examples give enough of an idea of how to make use of the CPE class obtain actual cycles executed for a given section of code. This class should be very handy for testing hand optimizations, as no matter what your other results indicate, if the number of cycles it takes to execute the code increases you have made things worse.
This program uses static class methods, though in principle it should be just as easy to use non-static class members (some fancy casting is required and I was too lazy to look it up).
Other caveats: