Posts Tagged ‘machine learning’

Call for help: Teaching statistics for Machine Learning

Thursday, January 20th, 2011

On Monday I start teaching my Machine Learning course again. I’m looking at the material for the first week right now, and I want to change it from last year.

Typically, my students will have had classes on mathematical modeling, a bit of probability theory and a bit of statistics, but experience tells me that they only have a very superficial knowledge about it. They don’t need much more for this class, but I still want to get some key points out regarding the statistics that we will be using in the class, and the last few years I don’t think I managed that well.

I don’t want to focus on modeling so much, and I certainly don’t want to discuss experiment design since the data we look at generally is just collected data that we need to make some kind of sense of, not collected to decide one theory against another.

It really is about a few points: Given the data and some generic model, say a neural network, why do we estimate the parameters in the way we do? What can we say about the accuracy of predictions? That kind of stuff.

I usually go a little bit into Bayesian statistics for model selection, but most of what they see in the class are different generic models that they estimate parameters for through maximum likelihood.

The thing is, while they generally remember how they estimate the parameters in different models when we get to the exam, they focus on the details of a particular model and rarely remember that they are essentially doing the same thing for all the models: maximizing a likelihood in a probabilistic model.

The first couple of years I taught this class, I definitely focused too much on the mathematical details in this. Going through derivations of the math, explaining how you got various posteriors from conjugate priors and such. Major fail.

I tried changing that last year, focusing more on examples, but it didn’t help much once we got to the exam.

Do any of you have experience with teaching statistics core concepts, preferably with some good examples? Care to share?

If you don’t teach this stuff, but have had classes like it, what worked for you as a student and what definitely didn’t work?

Maybe I’m not crazy after all…

Sunday, September 21st, 2008

This evening I was reading in Pattern Recognition and Machine Learning, the book we use in our machine learning class.  We only use the first half of the book, but we are thinking about extending the class to cover two terms and then cover the entire book (or most of it, anyway) so I figured this was a good excuse to actually read the whole book.  So far, I’ve only read the chapters we actually use, plus a few pages here and there.

Anyway, I was reading chapter 6, on kernel methods, but I got stuck on the first figure.

It is supposed to illustrate kernel functions k(x,x’) as linear combinations of feature functions: k(x,x’)=Σφi(xi(x’). The top row shows the feature functions, φi(x), and the bottom row the kernel function, as a function of x with x’ fixed at 0.

That doesn’t make any sense at all to me.

On the left-most figure, the feature functions are all 0 for x’=0, so the kernel function is a sum of zeroes.  It should be constant zero, not the curvy blue line.

For the other two, the feature functions are all non-negative, so how can the kernel function ever be negative?  A product of non-negative values cannot be negative, and neither can the sum of non-negative numbers.

In short, the figure is all wrong.  There isn’t a single thing right about it.

That was my reasoning, in any case, but I wasn’t completely sure.  I could be missing something.

So I googled for the book, but then I found powerpoint presentations including the figure, with no mentioning of any errors.  Clearly someone was using the figure in their teaching, so maybe it wasn’t wrong after all.

It got me nervous.  I feel that I really need to understand something to teach it, so I expect other people to feel the same way, and someone had used this figure.

I am not mentioning names here, ’cause as you have probably guessed the figure is wrong.  There is nothing wrong with my reasoning above.

Well, another minutes Googling found me the errata list, and sure enough, the figure is fixed there.

I’m happy to find that I hadn’t completely misunderstood the topic and that I was right about the figure.

I am a little disappointed that a teacher would use the figure without at least checking that the figure actually makes sense.  Showing an example that makes no sense at all is doing a lot of harm to the students…

Today’s lecture: Neural networks

Wednesday, May 21st, 2008

Today’s lecture in my machine learning class was on artificial neural networks, slides below:

The approach to introduce them was to consider them just a way of automatically learning basis functions in a linear regression setup.

While this isn’t really the full story, it is motivated by the project they have just handed in, where they needed to predict values based on trained linear regression models.

Training linear models is rather straightforward, but guessing good feature functions (transformation of the predictor variables) is tricky, and for the data I gave them in the project, some of the models were downright evil.

This should motivate having models where you don’t need to be able to guess the features — or at least where it isn’t as essential — and that is how I present neural networks.

I think I’ll give my students another project now, that is just re-doing the first project but using neural networks instead of linear regression…

Correcting machine learning hand-ins

Thursday, May 15th, 2008

I’ve been correcting the hand-ins for the first project in my machine learning class. It is a very simple exercise where the students are given five data sets with predictor variables and target values, and from them they need to train a model (just using linear regression) and then predict targets for new predictor values.

They can transform the predictor variables in any way they want to come up with a good set of basis function to then use in the linear model, and this is really the only tricky part in the exercise. After that, it is a simple programming task to fit the model.

Some of the data sets are easy enough to work with, like the first one that is a simple line with gaussian error around it. This is the typical linear regression setup. Other data sets are harder to figure out, but the take-home message is that it doesn’t really matter so much if you can work out the true model specification, as long as you can make predictions better than mere guessing (although, in one of the data sets the predictors and targets are independent, just to show that that is also always a possibility).

The only measure that matters is the prediction accuracy on the new data, and that is what I have been looking at for the hand-ins.  I want to reduce this to a single score so I can pick a “winner” and give him a little prize.

For each dataset I’ve reduced the prediction to a single score, by taking the square-root if the sum if errors squared and then divided it with the mean of the true target values.  This scales the errors to “standard errors” so I can compare the individual models.

Still, some models are much harder to make predictions  about, so to take that into account, I’ve taken the mean of the errors in the hand-ins for each model and divided the individual errors with that.  That way, the difficult models count for less than the easy ones.  The waited sum of errors is then the final score.

Machine Learning slides

Wednesday, April 23rd, 2008

In less than two hours, I give my last Machine Learning lecture before I’m heading off to the UK (I’m giving a few more when I get back). The slides from the lectures so far can be seen below:

Course Introduction

This is just a brief overview of what’s to follow. Motivates the topic and such.

Introduction to probability and statistics

This is two lectures covering the basic probability theory and statistics needed for the course. This is probably the most theoretical part of the course, and in my experience the material that confuses the students the most.

Linear Regression

This is the basic linear regression that you learn in just about any statistics class, except that there’s also a bit of Bayesian statistics and some model-selection/over-fitting theory that I haven’t seen in pure statistics classes (the classes I have taken have focused more on hypothesis testing approaches).

To get our students activated at this point, we also give them a simple project where they need to fit data to a linear model and try to predict target values based on their model.

Linear Classification

This, then, is today’s lecture. Classification, where the training consists of changing weights that split the predictor space into linear regions.

Slideshare

There seems to be some problems with some of the slideshare plugins above (some of the text is missing from time to time), but if you are interested you can find the slides (in PDF or OpenOffice format) here and you can see the course description and schedule here.