Bio++ Utils
Ok, I haven’t bothered much with the framework underlying our CoalHMMs framework so far, since I have managed to just worry about prototyping this and that in Python, but I think it is time I get familiar with the C++ implementation. To get there, I should get much better acquainted with Bio++, so I have decided to write a tutorial series on it. I won’t promise that it will be of any help to anyone, but writing down my experiences will definitely help me understand the framework.
I am going to write about the current CVS version of the framework, which differs a lot from the current released version (but will become the next release pretty soon). That is because it is the CVS version we use in CoalHMM, so that is the version I am interested in.
In this first post, I will describe the Utils library. This is a tiny little library with very little functionality, so it wont take long to describe. Essentially it just contains a few generic classes and interfaces used in the rest of the framework – but of little use in itself – and some IO code.
To use the library, you need to setup your include path (so set up the -I/path/to/lib compiler flag) and link with the library (set the library path -L/path/to/lib and add the linker flag -lbpp-utils).
Depending on your platform there are different ways of doing this. You can just add those compiler and linker flags to your Makefile on UNIX, or in Xcode you can add the header include path in “Header Search Paths” under Project > Edit Project Settings and include the library as Add > Existing Frameworks.
Ok, let’s get started!
Hello World
Tradition dictates that we should start with “hello, world!”, so here goes:
#include <Utils/ApplicationTools.h>
int main (int argc, char * const argv[])
{
bpp::ApplicationTools::displayMessage("Hello, world!");
return 0;
}
This is one of the IO functions of the library. We use the “application tools” class. This is just a collection of static functions, so it should probably be a namespace and might be it in the future, but right now it is a class.
No, there isn’t much here that you couldn’t just do with iostreams. Most of the functions in “application tools” add very little above iostreams, really, and I guess they are only there to get a uniform interface to the IO together with the few functions that do add something extra.
Let’s look at those…
A bit more output
Using the ApplicationTools class (namespace, really) you can send messages to the standard output stream as above, or you can send warnings or errors; the former to stdout the latter to stderr. You could just used the iostreams for that, though, so let’s not bother with that.
There are two non-trivial functions in ApplicationTools. Well, almost trivial, but not quite.
One is displayResult() that adds some formatting to the output, the other is displayGauge() that adds a progress bar to a text UI.
Let us look at displayResult() first. It is a function that simply formats a name-value pair. It is a template function, so you can use various values for the name-value pair, and what will be written to the output stream is the name, some dots to align the values, and then the values.
Here’s an example program:
#include <Utils/ApplicationTools.h>
int main (int argc, char * const argv[])
{
bpp::ApplicationTools::displayResult("A string", std::string("foo"));
bpp::ApplicationTools::displayResult("An integer", 10);
bpp::ApplicationTools::displayResult("A float", 3.14);
return 0;
}
and here’s the result:
$ ./build/Debug/DisplayResult A string...............................: foo An integer.............................: 10 A float................................: 3.14
Neat enough, and not quite trivial (although you could probably get most of the way with stream manipulators).
There is no flexibility in the formatting or anything, so it only works for simple stuff, but on the other hand most cases are simple so that is probably not a great loss.
Progress bars
The other output function worth mentioning is the progress bar. I can show you the code – and you can probably guess what it does – but I cannot show you the result. It is an animation in the shell, so of course I can’t. Compile it an try for yourself!
#include <Utils/ApplicationTools.h>
int main (int argc, char * const argv[])
{
bpp::ApplicationTools::displayMessage("A job with 10 steps");
for (int i = 0; i < 10; ++i) {
bpp::ApplicationTools::displayGauge(i, 10, '=', "Progress: ");
sleep(1);
}
bpp::ApplicationTools::displayGauge(10, 10, '=', "Progress: ");
bpp::ApplicationTools::displayMessage(" All done!");
return 0;
}
Options parsing
The utils library also provides code for options parsing, either for command line options or from options files (not unlike what you can find in Boost).
First a very simple example:
#include <Utils/ApplicationTools.h>
#include <Utils/AttributesTools.h>
int main (int argc, char * argv[])
{
std::map< std::string, std::string > params;
params = bpp::AttributesTools::parseOptions(argc, argv);
std::string name =
bpp::ApplicationTools::getStringParameter("name", params, "world");
std::string msg = std::string("Hello, ")+name+std::string("!");
bpp::ApplicationTools::displayMessage(msg);
return 0;
}
This example just gets a parameter, name, and writes a greeting.
What we see in it is the general approach to option parsing in Bio++. The task is shared between two classes (that really should be namespaces since you never instantiate them), the ApplicationTools and the AttributesTools. The latter is responsible for parsing the options and the former adds some functions for type conversion of parameters and for having default values.
If we go through the program line by line, we first create a map to store key value pairs and then use the attributes tools to parse the command line options into this map. There is quite a bit going on here, but I’ll get back to that shortly.
Then, using the application tools, we ask for the value of the “name” parameter – and say we want a string parameter and will use the default “world” if the parameter isn’t given.
After that we just display the message, but there is nothing there we haven’t seen before.
Returning to the options parsing, here is what the function does: First it reads “key=value” pairs from the command line arguments (not –key=value, just key=value). Second, if the key “param” is found the value is assumed to be an options file that will be parsed (but where command line arguments will overrule any key=value pairs in the file). Third, and last, it substitutes variable values. In the options file you can declare variables just as any other key=value pair and then refer back to them, so
qux = foo baz = $(qux)bar
is equivalent to
qux = foo baz = foobar
Here’s a program you can use to play with option parsing. It just outputs the key=value pairs the way Bio++ sees them, so you can check how command line variables overrule the param file and how variables are substituted.
#include <Utils/ApplicationTools.h>
#include <Utils/AttributesTools.h>
int main (int argc, char * argv[])
{
std::map< std::string, std::string > params;
params = bpp::AttributesTools::parseOptions(argc, argv);
std::map< std::string, std::string >::const_iterator itr;
for (itr = params.begin(); itr != params.end(); ++itr) {
std::cout << itr->first << '=' << itr->second << std::endl;
}
return 0;
}
The option parsing functionality in Bio++ Utils does not check for unknown parameters. Any key=value pair you specify in the command line options or the param file will be parsed up and put in the map. This means that the program won’t croak if you misspell a variable name – which is bad – but on the other hand means that independent components of your program can have different parameters that do not need to be registered globally – which is good.
File and text tools
The last two classes worth mentioning are FileTools and TextTools. Again, they are essentially (and really should be) namespaces rather than classes, since you only want them for their static methods.
FileTools provides two useful functions for checking the existence of files or directories: fileExists() and directoryExists().
#include <Utils/FileTools.h>
int main (int argc, char * argv[])
{
if (argc == 1) {
std::cerr << "Usage: " << argv[0] << " file-names." << std::endl;
exit(-1);
}
for (int i = 1; i < argc; ++i) {
if (bpp::FileTools::directoryExists(argv[i])) {
std::cout << argv[i] << " is a directory." << std::endl;
} else if (bpp::FileTools::fileExists(argv[i])) {
std::cout << argv[i] << " is a file." << std::endl;
} else {
std::cout << argv[i] << " isn't really anything." << std::endl;
}
}
return 0;
}
Unfortunately, in the CVS version of Bio++ they are completely identical, so you cannot actually check if a given file name is a directory, but I assume that is a bug and will be fixed.
There’s also some functions for picking out the directory part, “name” part or suffix part of filenames and such, plus some functionality for reading files, e.g. reading a line at a time as in this example:
#include <Utils/FileTools.h>
int main (int argc, char * argv[])
{
if (argc == 1) {
std::cerr << "Usage: " << argv[0] << " file-names." << std::endl;
exit(-1);
}
for (int i = 1; i < argc; ++i) {
if (!bpp::FileTools::fileExists(argv[i])) {
std::cerr << argv[i] << " is not a file, so skipping it" << std::endl;
} else {
std::ifstream is(argv[i]);
while (is) {
std::cout << bpp::FileTools::getNextLine(is) << std::endl;
}
}
}
return 0;
}
The TextTools class mainly provide some functions for text formatting.
Here’s just a short sample:
#include <iostream>
#include <Utils/TextTools.h>
int main (int argc, char * argv[])
{
std::cout
<< bpp::TextTools::toUpper("lower") << ' '
<< bpp::TextTools::toLower("upper")
<< std::endl
<< bpp::TextTools::resizeLeft("Will be shorted to 'this'", 6) << ' '
<< bpp::TextTools::resizeRight("'that' is all that's left", 7)
<< std::endl
<< bpp::TextTools::resizeLeft("Too short", 20, '.') << ' '
<< bpp::TextTools::resizeRight("Me too", 20, '.')
<< std::endl;
return 0;
}
The output looks like this:
$ ./build/Debug/FileTools LOWER upper 'this' 'that' ...........Too short Me too..............
Well, that is all I had to say about the Bio++ Utils library. Sometimes soon I am going to explore the NumCalc library, so stay tuned.
–
255-281=-26
September 12th, 2009 at 2:41 pm
Hey Thomas, great posting for sure, more complete than our actual tutorial :D
You may also be interested in the new BppApplication class, which is still experimental, but should offer a nicer way to handle command line options, in a similar way as Qt and its QtApplication.
The namespace vs. static class is something we can discuss.
Another point worth mentioning is that in the incoming version 2.00 of the libs, utils and numcalc will disappear and be superseeded by a bpp-core library, containing all basal classes, like a BppObject, BppException and so on. These object will be available in Bpp/, and the NumCalc objects in Bpp/NumCalc. The other libs will remain independent, but restructured as Bpp/Seq, Bpp/Phyl, Bpp/Pop, Bpp/Raa, Bpp/Qt, the two last ones being already in that configuration.
Best,
J.
September 12th, 2009 at 2:47 pm
Hey J!
Yes, I noticed the BppApplication class, but since it doesn’t do much besides having a timer for the application I didn’t bother with it here … plus, I am not really a fan of classes that write to stdout without me telling them to ;-)
September 15th, 2009 at 8:30 am
Yes it is very primitive now but shall be extended later. As I said before, you can change the default output stream by setting ApplicationTools::message to NULL or a fstream. Later options will allow you do that directly by program arguments that BppApplication will recognize.
J.