Yesterday, in my machine learning class, I showed some example code for artificial neural networks. I am giving my students a project tomorrow where they have to do regression using neural networks, and I wanted to show them how to program these networks. The pictures we draw of networks, and the mathematics we do with them, looks less like the implementation than the other machine learning methods we cover in the course, so I thought a code demo would be in order.
It's a simple feed-forward neural network I want them to implement in their project, so essentially they just need to be able to calculate target values from predictor variables and the network weights and then minimise the error function (as a function of the network weights).
target = y(x,w)
training: argminw Σn(tn - y(xn,w))2
Using the back-propagation algorithm they can get the gradient of the error function with respect to w
and use that in the optimisation.Now, I could
ask them to use the gradient descent algorithm or similar for the optimization, but this is not a course on numerical methods (even simple ones) and I don't want to give them the impression that numerical optimization is simpler than it really is, so I suggest to them that they use a library for the optimization.
For class I showed how simple neural networks could be implemented and trained in Python, using scipy to do the training. This works very fine for the demonstration, since I can interactively train the networks to various target functions and plot how well the network fits the target.
To show that the language is more or less arbitrary for the solution, I also implemented a neural network in C++, this time using GSL to minimize the error function.
In theory, there shouldn't be much difference between the Python and the C++ solutions -- and for the actual neural network implementation there isn't -- but using GSL is much more tedious than the Python approach. It really shouldn't be.
The main problem is that GSL is a C library and not a C++ library, so the features in C++ that would make the C++ solution just as simple and intuitive as the Python solution are simply not supported. One thing is operator overloading. It is missing from C, so working with vectors is a bit annoying. Especially since, in C++, you are used to the STL vectors, and switching back and forth between std::vector<> and gsl_vector * is a bit annoying. It is not the worst problem, though. Not in the neural network application, at least (it might be for more linear algebra intensive applications).
The one thing that annoyed me the most was how I should call the minimization function. In Python it was reasonably easy using high-level functions, so I could curry my error function and create a function of just the weights. In GSL, I had to go through a void pointer. In C++, I would have used a functor to deal with this, but since GSL is C it doesn't have templates, so I couldn't do that.
I wasn't happy with resource management either. GSL is clearly not designed with exceptions in mind -- after all, they do not exist in C -- and this means that you either have to wrap all GSL types in small specialized classes so they will be destroyed properly, or you need to be careful with calling the correct <type>_free() functions. The later is very hard to do exception safe.
C++ wrappers for GSL?
I googled around a bit for C++ wrappers for GSL, and it seems like there are a few around, but not really any I found particularly attractive. GSL for C++ doesn't appear particularly active (but I could be wrong). GSLwrap seems like it only wraps the linear algebra.
Are there any good GSL wrappers around, that use the high level features of C++? Or any obvious replacement?
Just overloading operators to provide linear algebra syntax for matrix manipulation is not the way to go, unless there is some pretty heavy template magic involved, I think. The methods in BLAS are pretty efficient because they can do a lot of operations with a single call. Doing the same using operators would be very inefficient. That is why the efficient C++ solutions use template expressions. I am not sure that there is an easy way to do this through GSL, although the template expressions could end up just calling the BLAS routines, of course.
What about high level functions (functors) for optimization?