Profiling with Shark
Saturday, September 19th, 2009I have absolutely no experience with profiling on a Mac. I’ve used gprof and valgrind a lot on Linux, but now that I’ve started developing on Mac I need to learn how to profile here as well.
A bit of googling tells me that there are two nice tools for this, Shark and Instruments. I have both installed and decided to try out Shark first, since that looked a bit easier to use. I am also going to try out Instruments later, but my experience with Shark was pretty good.
It is a sampler based profiler, so to use it you just start your application and then start sampling. It will sample everything running on your computer, but if your program is doing a significant amount of work it will be easy to find it in the resulting performance profile, and you can then get rid of everything else with some filters.
I actually have something I need to profile having to do with file IO, but the data I need for that is on another machine that is now busy with actual computations, so for my experiments with Shark I just tried out our CoalHMM tool on the example data distributed with the code.
I started the tool, then started the sampling, and 30 seconds later I got this profile:
It is pretty clear from it that there is a hotspot worth looking at (in the Bio++ NumCalc library), and looking at the code Shark nicely shows where it is:
It even gives hints as to what the problem could be and how to fix it. Neat!
The hotspot doesn’t surprise me much. The application is a hidden Markov model, and I fully expected that most of the time was spent in the Forward algorithm. The solution doesn’t surprise me either – and we are already working on an SSE improvement. Still, with profiling you can never be sure, so it is nice to be confirmed.
I also tried the simple fix of enabling auto-vectorization (-ftree-vectorize) and compared that solution to the one before (something Shark also makes easy).
It gives a very modest improvement, but I guess it isn’t that easy for the compiler to automatically insert SIMD instructions in code like this… I expect more from our hand-coded version where we right now get two to four-fold improvements, depending on whether we are using float or double floating point precision.
–
262-292=-30