Archive for September 20th, 2009

Profiling with Instruments

Sunday, September 20th, 2009

Today I tried the same profiling exercise as yesterday, but using Instruments.

You can run it directly from Xcode by picking the “Run with performance tool” entry in the Run menu.  Turns out you can do the same with Shark, but I didn’t notice that yesterday.

There are a lot of things you can profile with Instruments, but the only thing I have figured out how works so far is CPU performance.

The way Instruments profile is similar to Shark.  It samples during the execution of the program and thereby gets a picture of where the program spends its time.  The display of this is much nicer than Shark, though, with cool performance bars in various tracks.  You can add any number of tracks to profile memory usage, IO, etc. together with CPU performance, but as I said I haven’t yet figured out how to use this.

For CPU performance, it gives you an overview of which functions are taking up the run time, displayed very similar to Shark:

CPU profileA nice feature which I didn’t see in Shark – but perhaps it is there – is that you can also pick time slices of the execution and see which functions took up the time in that slice.  Say, if I zoom in on the first few seconds of the run, I can see which functions are used when reading in the data.  Something that will come in handy when I start my real work on figuring out how to improve that part of the program.

CPU profile in time sliceAs with Shark, you can also get a profile for where in the source code the time is spent.  For the full run, that is (of course) the same hotspot that Shark identified:

HotspotUnlike Shark, it doesn’t seem to give any hints on how to improve on the hotspot, but in this case it turns out to be a better choice, ’cause I learned something unpleasant about the code that is probably more valuable than the suggestion to use ESS instructions!

Browsing through the code — you can click on the function calls that work as hyperlinks — I found out that CoalHMM accesses the entries in the transition matrix through a virtual function call:

    /**
     * @brief Get the transition probability between two states.
     *
     * @param i initial state.
     * @param j final state.
     * @return the transition probability between the two states.
     */

    virtual double Pij(unsigned int i, unsigned int j) const = 0;

That is probably much more of a problem than not using SIMD instructions!

Getting the entry in a matrix shouldn’t be more than a few pointer calculations and fetching a value, but here not only is a function call needed (forget about inlining virtual function calls, that never works) it is a function call to a calculated function address, that is very likely to break the processor’s pipelining.

I haven’t tried changing it in CoalHMM so I don’t know how much it is costing us here, but in experiments we did when we implemented the SNPFile library we found that it was about an order of magnitude slower to use a virtual function to access a matrix entry.

It is perhaps not so surprising that this line is taking up much of the time when running the tool…

263-293=-30