Posts Tagged ‘programming’

Writing lecture notes for “Applied Programming”

Monday, November 2nd, 2009

Once again I find myself writing lecture notes for my programming class Applied Programming.

It is a lot of work, and I really would prefer not doing it and using a text book instead, but all the text books I have seen for Python programming falls in one of two categories: Either they assume that the reader is a moron who has to have everything explained in every single little details over close to one thousand pages, or they assume that the reader already knows how to program and just needs a quick introduction to Python.

Neither really fits my purposes.

I have seven weeks to teach basic script programming, so I only want to cover the bits really essential for that, and since I teach mainly biology and medical students I can safely assume that this is the first time my students are exposed to any kind of computer programming.

It is the second time I teach the class, and last year I tried using a text book supplemented with lecture slides, but that didn’t really work so well, so this year I am going to write enough lecture notes to cover the material I need.

I had to write my own lecture notes in previous programming classes as well, but those classes was for people familiar with the basics of programming and Python who just needed to know a little extra stuff useful in bioinformatics, so those were completely different notes and I cannot reuse any of it this time.

I plan to spend one day a week on lecture notes this time around.  I don’t have time for more.  We will see how that goes, and how much I need to add on top of it next year, where I can take the next iteration over the notes.

306-321=-15

Don’t do this!

Monday, September 21st, 2009

Don’t ever do this:

try {
   ...
} catch (std::exception e) {
   throw e;
}

It doesn’t do what you think it does!

Unless you think it construct and then deconstructs an exception and throws away the type information about the original exception, in which case it does exactly what you think it does.

Of course, that might be the whole point of this piece of code, but why anyone would want to throw away information about which exception was thrown is beyond me.

Catching by value just means that you are creating a new object, but an exception is already copied when thrown, so there is no point.  You save the copying by catching by reference.

Throwing the explicit copy means you slice away the dynamic type, so the (re-)thrown exception has a different type than the original exception, so you have no way of knowing which kind of exception was originally thrown.

Catching by reference doesn’t help you here, since “throw e” when “e” has a type means you throw with the static type of “e”, not the dynamic type, so you will still slice away the type.

Consider this small example:

#include <iostream>

struct Exception {};
struct SpecialException : public Exception {};

void foo() { throw SpecialException(); }

void bar() {
  try { foo(); }
  catch (Exception e) { throw e; }
}

void baz() {
  try { foo(); }
  catch (Exception &e) { throw; }
}

int main() {
  try { foo(); }
  catch (SpecialException &e) {
    std::cout << "foo threw special" << std::endl;
  } catch (Exception &e) {
    std::cout << "foo threw plain" << std::endl;
  }

  try { bar(); }
  catch (SpecialException &e) {
    std::cout << "bar threw special" << std::endl;
  } catch (Exception &e) {
    std::cout << "bar threw plain" << std::endl;
  }

  try { baz(); }
  catch (SpecialException &e) {
    std::cout << "baz threw special" << std::endl;
  } catch (Exception &e) {
    std::cout << "baz threw plain" << std::endl;
  }

  return 0;
}

Here “bar” will throw away the type information, while “baz” will not.

The result is this:

$ ./foo
foo threw special
bar threw plain
baz threw special

In the original example, I’m also a bit puzzled as to the reasoning behind it.  Not catching the exception in the first place seems to me a better approach than catching it and just rethrowing it, without any further processing.  The code really only contributes by adding a bit of overhead and removing useful type information.

PS. Yes, I stumbled over code exactly like this in a library I’m working with, which prompted this post…

264-297=-33

Installing with CMake and Xcode, again, again

Monday, September 21st, 2009

Ok, I have finally figured out why I had problems with xcodebuild most of today.

I use the Xcode project files built from CMake, and I was expecting

$ xcodebuild -configuration Release install

to install my libraries — since it looks like that is what I’m asking it to — or at least complain if I did something wrong.  Apparently there is nothing wrong with that command line, it just doesn’t install my libraries.  It builds them ok, but it doesn’t install them.

What I wanted to run was

$ xcodebuild -configuration Release -target install build

that builds the install target that then installs the libraries.

264-296=-32

This is driving me crazy!!!

Monday, September 21st, 2009

I’m trying to profile some changes to CoalHMM today, but for some reason both Instruments and Shark flatly refuses to show me debug information (like the source code) for functions in the Bio++ libraries.

In the weekend — on my macbook — it worked just fine, but today — on a different machine — there is just no way to get it!  I have rebuilt all the libraries with the Debug target, but to no avail.

This is extremely frustrating, ’cause without the source code I cannot identify the hotspots inside a function.

I’m getting the impression that regardless that I compile the library with Debug information, what gets installed is the Release target, but for the death of me I cannot figure out why?

And why on earth does it work differently on this machine?

264-295=-31

Profiling with Instruments

Sunday, September 20th, 2009

Today I tried the same profiling exercise as yesterday, but using Instruments.

You can run it directly from Xcode by picking the “Run with performance tool” entry in the Run menu.  Turns out you can do the same with Shark, but I didn’t notice that yesterday.

There are a lot of things you can profile with Instruments, but the only thing I have figured out how works so far is CPU performance.

The way Instruments profile is similar to Shark.  It samples during the execution of the program and thereby gets a picture of where the program spends its time.  The display of this is much nicer than Shark, though, with cool performance bars in various tracks.  You can add any number of tracks to profile memory usage, IO, etc. together with CPU performance, but as I said I haven’t yet figured out how to use this.

For CPU performance, it gives you an overview of which functions are taking up the run time, displayed very similar to Shark:

CPU profileA nice feature which I didn’t see in Shark – but perhaps it is there – is that you can also pick time slices of the execution and see which functions took up the time in that slice.  Say, if I zoom in on the first few seconds of the run, I can see which functions are used when reading in the data.  Something that will come in handy when I start my real work on figuring out how to improve that part of the program.

CPU profile in time sliceAs with Shark, you can also get a profile for where in the source code the time is spent.  For the full run, that is (of course) the same hotspot that Shark identified:

HotspotUnlike Shark, it doesn’t seem to give any hints on how to improve on the hotspot, but in this case it turns out to be a better choice, ’cause I learned something unpleasant about the code that is probably more valuable than the suggestion to use ESS instructions!

Browsing through the code — you can click on the function calls that work as hyperlinks — I found out that CoalHMM accesses the entries in the transition matrix through a virtual function call:

    /**
     * @brief Get the transition probability between two states.
     *
     * @param i initial state.
     * @param j final state.
     * @return the transition probability between the two states.
     */

    virtual double Pij(unsigned int i, unsigned int j) const = 0;

That is probably much more of a problem than not using SIMD instructions!

Getting the entry in a matrix shouldn’t be more than a few pointer calculations and fetching a value, but here not only is a function call needed (forget about inlining virtual function calls, that never works) it is a function call to a calculated function address, that is very likely to break the processor’s pipelining.

I haven’t tried changing it in CoalHMM so I don’t know how much it is costing us here, but in experiments we did when we implemented the SNPFile library we found that it was about an order of magnitude slower to use a virtual function to access a matrix entry.

It is perhaps not so surprising that this line is taking up much of the time when running the tool…

263-293=-30