16 Oct

How did I ever consider Linux stable?

I’ve been working on my Ubuntu Linux laptop the last couple of days, and I f*cking hate it.  It seems to me that whenever I start a memory intensive job, the very next thing I do is to reboot it using the power button.  It takes it a millisecond to reach the point where it is absolutely hopeless to try to interact with it.  It is a swapping hell.

Now, I have always had problems with Linux when I fill up the RAM.  In my PhD work I did explicit state space exploration, so I used massive amounts of memory and rebooting wasn’t that unusual.  Then again, I think I could crash most other operating systems as well ;-)

These days I don’t even have to try to crash the box.  Compiling a complex enough program, or leaving firefox on for a few hours, should do the trick.  I swear, I am rebooting Linux three or four times a day!  Usually loosing a bit of work each time, of course.

Have I fucked something up on my box?  It just didn’t use to be this way…

It is not really a problem I have on the other Linux boxes I use, to be fair.  On my desktop Fedora, I won’t have time to fill up the RAM before it crashes.  It doesn’t like my graphics card, you see, so it will typically crash within 5 minutes of me logging in.  That is part of the reason I am working on my laptop.

As long as I don’t log into the box through X, but ssh in instead, it is pretty stable though.

Still, I am so fed up with Linux.  I don’t have to reboot Windows every two-three ours, and I’ve only had to reboot my Mac a few times in all the time I’ve had it.

Linux is supposed to be stable.  It is praised to be.  Still, I am finding that it is growing increasingly unstable on me.

It might just be me, of course.  I’m no wiz systems administrator. I’m sure some of my geeky friends can set up a system that wont crash all the time, but I don’t have time for this if I also want to do my science. I just install Linox out of the box and it works like shit on me.

Oh well, with the new MacBook out I will soon not have this problem any longer… I will have the UNIX tools I love but on a box I can actually manage to set up.

37 thoughts on “How did I ever consider Linux stable?

  1. Pingback: How did I ever consider Linux stable?

  2. One of the things I would do is run a memtest utility. That utility checks memory for faults. Your laptop may have a chip with an intermittent bad bit that does not switch state.

    I would also check the software you write or are running to see if it is doing memory fetches (malloc() and free() that do match in use. I have oftem found problems here.

    What will happen in the former case is unknown. Perhaps if there is a bad chip and the kernel code is in it. your system may die.

    In the latter case, your user logon dies. If you then open a command line session, you could kill the failed user logon.

    And if that does not work, ask around for help for other ideas. I frankly don’t think it is linux that is the problem, but your application.

  3. Once you’ve ruled out hardware problems it may be time to have a look at the distribution you are using.

    Which distro (version, drivers, etc), and which laptop (make, model)?

    I have developed under Linux since about 1996, including some very large systems without having the kind of problems you describe. If you application does not check for memory availability every time you fetch more, or doesn’t react properly to exceptions your code may be causing your problems.

    Regarding Firefox, I have been using 3.0, and I agree with you that it is not stable, crashing once or twice a day for me. On the other hand it does not bring the system down when it crashes, and at the same time will restore itself to the same open websites when it is restarted.

  4. Buying a new Mac because you’re having trouble with Ubuntu is like buying a new car and junking the old one because it had a flat tire. You wouldn’t do that of course because you would know what the problem was and how to fix it. I’m not a geek but I’ve used Linux long enough to know spending money for a new Mac because of this is kind of nuts. Sure, get one if that’s what you really want but please don’t use your current troubles with Linux as the reason.

  5. pwd: Firefox is fine in restoring its state, and if it crashes it is not crashing the system. The problem is just that it is using a lot of memory (although less with 3.0 the before, as far as I can see). If Firefox is eating 50% of my RAM and I start another program that wants a large chunk as well, the whole system becomes unresponsive because it is swapping memory all the time and doing nothing else.

    Leslie: I’m not sure checking malloc and free will help me much since most of the time it isn’t my own programs that are eating up all the RAM. I usually don’t run larger computations on my laptop but only test my programs there. I run the actual computations on a cluster where I do not have this problem as far as I know.

    Also, it is not really that my system crashes. It starts swapping and swapping and swapping, and I cannot stop it. The terminal is swapped out and the system is simply not responding to my input any more.

    I think I can safely rule out that the problem is solely my own applications. I don’t really need to be running any of my own code to get this problem. I just need to start up enough applications. Maybe open some large documents and then wait for the swapping to begin.

    So no, checking if my own programs are leaking memory won’t do me much good.

    Of course there might be some other process running somewhere that is causing all this trouble. That would be my guess, really, ’cause I didn’t use to have these problems much a few years ago, but now I do.

    I don’t know what that program would be, though. Not something that shows up in top, at least not until it is too late to even try to fire up top…

    I’m not saying there’s a bug here, though. The swapping is probably the intended behavior. It just means that interacting with the system while it’s dealing with processes that are running out of RAM is impossible.

    If I could give my terminal priority over all other processes, then I could kill a few other programs and get out of the swapping hell, but I cannot get in contact with the damn terminal :-(

  6. I just found an interesting discussion on swapping here: http://kerneltrap.org/node/3202

    Rik van Riel:

    the speed of hard disks doesn’t grow anywhere near as fast as the size of memory and applications. This means that over the last years, swapping in any particular application has gotten SLOWER than it used to be … This means that even though the VM is way smarter than it used to be, the visibility of any wrong decision has increased.

    This could very easily be why I feel that Linux has gotten worse over the years. When I ran Slackware in 1995 I ran Emacs and gcc and didn’t fire up X unless I really needed it.

    Now, I’m running a lot more, all the time, so of course I’m swapping more…

    John Bradford:

    There is a very negative side to this approach as well, especially if users allocate excessive swap space.

    A run-away process on a server with too much swap can cause it to grind to almost a complete halt, and become almost compltely unresponsive to remote connections.

    If the total amount of storage is just enough for the tasks the server is expected to deal with, then a run-away process will likely be terminated quickly stopping it from causing the machine to grind to a halt.

    If, on the other hand, there is excessive storage, it can continue running for a long time, often consuming a lot of CPU.

    When the excess storage is physical RAM, this might not be particularly disasterous, but if it’s swap space, it’s much more likely to cause a serious drop in performance.

    For a desktop system, it might not be a big deal, but when it’s an ISP’s server in a remote data centre, it can create a lot of unnecessary work.

    Except for the part about it not being a problem on a desktop system, this sounds exactly like my problem. I just picked the default swap space when I installed Ubuntu, but maybe I should have another look at this…

  7. Richard: sorry I didn’t see your comment when I was responding to the others … I guess we wrote at the same time…

    I know that it is not really the reason for why I want a Mac, but it is part of it. I’m having more and more trouble with Linux, and now I want to pay my way out of this, rather than spend more time on it. There’s just so many hours in a day, and my time is better spent on my actual work.

    Plus, I’m not giving up on Linux completely. The cluster where I’m running all my computations is Linux, and I’m very happy with that. It’s just that someone else is maintaining that, so I don’t have to worry about it. With my Mac, I do not have those problems, so for my desktop/laptop computer, I think it is the better choice for me.

    I’m not just looking for an excuse to get a Mac — I don’t really need one if I want a Mac — I just don’t want to spend any more time fighting with my computer to get my work done.

  8. From what you describe, unless you’ve tried to run this code on a similarly configured Mac (same RAM, same swap) and found that it works beautifully, my bet is that it will work identically – the Mac will use up all its RAM, be forced into swap and will die an agonizing death (but with prettier graphics). It sounds like your app is progressively mallocing all the RAM, being forced into swap, where it will sit disk thrashing until the OS ceases to respond. This isn’t Linux – this is your code. I’ve done similar mem worst practices and the result is identical. You haven’t produced any indication that this isn’t the case, so my guess is that it is.

    It sounds like you have an initially RAM-starved machine & are trying to load it down with a very RAM-hungry app which makes about as much sense as it seems.

    I suppose you’ve maxed out the RAM in your laptop? You’ve looked at the output from xosview to see where your mem is going as the program runs? (cache, app, swap)? You’ve added debug statements to your code that indicate each malloc or malloc/free imbalance? It may very well not be a bug per se, but just a typo in a malloc request. Does the RAM usage jump dramatically?

    If you are RAM-squeezed, on linux you have the option of using a very lightweight Window manager (or none at all); on the Mac, not so much.

    Macs work great …until they don’t. I’m currently trying to do *nix-like work on one and it’s a gigantic PITA. Apple has decided to make many things as un-unix-like as possible (dare I speak the word ‘launchd’?) 30 lines of XML to replace 1 line of cron or bash? Yeah, that’s progress.

    On the other hand the dev platform for the Mac has some great tools, but in having access to all of them vs Linux, I find myself using the Linux ones more – they’re a bit more primitive, but to kill a bug, sometimes a stone works as well as a laser-sighted Predator-dropped Hellfire.

  9. Why not use another distribution? I think Mac OSX is based on BSD. There is therefore FreeBSD or PC-BSD, with quite some ports available. There are also other Linux distros that could be more stable (some based on RedHat Enterprise 5, among others) for free.

  10. Harry: I have a few good reasons to believe that the problem isn’t with my code. The main one is that I have the problem whether I’m running any of my own code or not ;-)

    Plus, of course, when I run my code on our cluster at work, I am not having the same problems, so if it is a problem with my code, it is not the obvious memory problems.

    That being said, you are right of course that the same problems would exist on a Mac. Absolutely. If I run out of RAM and have to swap, then swap I must.

    How painful that is depend on the swapping strategy, but there is no good solution to this problem.

    So, while it is not really fair to make this Linux’ problem (when it really is a general problem), I just haven’t experienced this yet on my Mac. Not in day to day use, and I do experience this on my Ubuntu.

    The cluster I run my computations on is also Linux, so obviously it is not Linux that is causing this. I do not really believe it is either. When I wrote the post I was angry and just wrote what I felt and wasn’t trying to be fair, but now that I’ve cooled down I will admit that I wasn’t, in fact, being fair.

    The problem is in all likelihood the way my laptop is set up. I’ve just installed Ubuntu with default parameters, and maybe that is a bad choice for my laptop and my use patterns.

    Write my rant off as momentary anger and not rational reasoning. You are right about all you write, of course, and my rant is generalising beyond what is reasonable about Linux. I have problem with exactly two Linux machines — my laptop and my desktop machine. The later is a graphics card issue (I bought a Dell with an ATI card and that was a bad idea). The former I am not sure about, but that is the swapping stuff I was ranting about.

    If I had the skills to configure my laptop properly, I probably wouldn’t have this problem. I don’t.

    Yeah, I’m not justified in blaming Linux for my own shortcomings. Sorry about that. I wrote in anger, as I always do when I’m tagging a post as Rants.

    On a different note, I like your final paragraph about the development tools on Mac. I also find myself using the terminal more than any other tool when I’m programming on Mac. That and Emacs. Whether out of habit or whether it is because it is simply the best way to get the job done, I don’t know.

    Related to that, you might like this link: http://archive.salon.com/21st/feature/1998/05/cov_12feature.html

  11. Martin: You are right, a different distro might solve my problem, but then, a re-installation of Ubuntu might also. Maybe the default swap setup, when I installed, is just not the right choice after a number of updates.

    I have used a number of different Linux distros over the years, and it’s only the last year or so I have been having these problems. To a larger extend now than earlier.

  12. This is funny!
    You have listed all the problems that I have with Windows XP Pro!
    (Somehow, I can’t stomach Vista enough to even try.)

    But, seriously.
    I just finished a time when I left my HP nc6230 (business) laptop on for about three weeks straight. Plugging and unplugging my HP has I moved from room to room and socket to socket. Shoving it into the bag and forgetting to turn it off–letting it run in this condition for over an hour more than once. Beat it to death! Didn’t follow my normal way of doing things. My bride was in the hospital and the 3-year laptop that I purchased from a business could die for all I cared. I abused it.

    I am running Ubuntu 7.10. I have used 7.10 since (you guessed it) late October of 2007.

    How many problems from my $200 laptop? None, nil, nada, zip, zero. No Foxfire problems! No hardware problems! No Ubuntu problems!

    Your laptop is the problem. My guess is that you bought the “Best Buy” $449 special. You didn’t do your homework. You are a PH.D. (or a Ph.D. candidate) in science? Really? Wouldn’t that qualify you as a “geek”? You are use to all kinds of technical papers, instructions, books, and thought, right? But you can’t follow a web-page “how-to”? You have the skills to do all of this! USE THEM! And, stop whining like an undergraduate arts major!!!!!!!

    Disclaimer to all you whiny undergraduate arts majors: I have a B.A.

  13. hike: I have a PhD in computer science, so it is much worse than that ;-) I’ve even worked as a sysadmin about ten years ago, and back then I would be able to solve these problems. Those are long forgotten skills, though :(

    Oh, and it’s a Linovo X 60, so it wasn’t the cheapest box I could find…

    But you are right, of course. If I did my homework, I wouldn’t have this problem.

    I just can’t be bothered to configure machines any more. I want to spend my time on my research projects…

  14. I feel your pain when you run into a ‘this shouldn’t be happening’ moment with a computer. However, in almost 30 yrs of programming, Linux (coupled to the inet) is the closest thing that allows me to disentangle that which should not be happening.
    The Salon article (read with great enjoyment when it 1st came out) encapsulates the almost-anger I feel when dealing with Windows and to a lesser extent MacOSX – that of being infantilized. Neal Stephenson’s “In the beginning there was the commandline”:

    has a similar lilt. The feeling that you CAN figure out what’s going on. That sitting, waiting for a help-desk 9 time zones away to pick up and help you solve your problem is not really productive. It is extraordinarily empowering (a horrible word, but accurate here). And in fact the paragraph in the Slate piece which describes the trail of information that finally resolved the mysterious command ‘BASIC MISSING’ message was like many AHA! moments that I’ve had doing just the same thing. Knowledge is not gained at the endpoint, but from the path taken.

    Re: your original problem, I still occasionally have similar issues with system slowdowns and most often it is with browsers, more specifically with plugins. ‘top’ is your friend here which can tell you which app is sucking up the CPU and RAM and allow you to kill the thing off. (one other thing I dislike about Macs is that their smooth skin of sophistication hides much of what could help you debug the problem. Is the disk active? No LED on a recent Mac will tell you (altho there are some good utils that will). What about debugging a wireless problem? Ugh.. Again, when they work, they work well, when they don’t, problems on Macs (wireless anad otherwise) have always been the hardest to debug as the delivered OS assumes that nothing will ever go wrong.

    Because it was born of unsupported hardware and intuited APIs, Linux assumes the reverse – that things WILL go wrong and must be chatty about it. Recent distros have hidden the burblings of the OS coming awake behind splash screens, but it’s a core difference between Linux and other OSs – Linux says “I may not make it this time, but I want it to be easier the next time” vs the arrogant Jedi hand wave of “you don’t need to know this”.
    Best
    Harry

  15. Someone above suggested a hardware problem with the memory; I second the diagnosis.

    I had the same problem with my Mac laptop — runs fine until you open one too many applications (generally Photoshop in my case), then stalls out. Maybe application crashes, maybe not. Kill the process, things are fine for awhile. Reboot, things are fine for awhile. Or maybe there are broken links in swap space that can be repaired on boot.

    If memory tests out OK, could be a circuit fault on the system board. These little hardware problems are easy to induce on a laptop from being jostled and bumped around.

    Oh, and hike makes a basic error of life history theory: A 3-year-old used laptop is a priori likely to survive abuse more than a new one, because if it were susceptible to abuse, it would have already died.

  16. Harry: I agree completely. Honestly.

    The post is really just born of frustration, and I am sure it will be easier to fix this problem on Linux than on Mac. For the reasons you list.

    Mainly my frustration is born from no longer being able to fix the stuff I used to be able to fix (in some parts because the snappy GUI is hiding what is really going on, compared to the text configuration files I used to have to edit.

  17. Pingback: Mailund on the Internet » Blog Archive » Ranting online

  18. Tom,

    – we all are frustrated when things go wrong
    – there is always something that goes wrong sooner or later
    – on other systems you have no tools
    – on Linux, you can get tools even if it is one of the shiny message hiding distros
    – after many years of working on computers, our patience runs thinner and thinner

    Please turn your frustration into a constructive way of improving the one thing that helped you so many times before. After all, even with your forgotten skills, you will lose more time (and money) ignoring the problem than you would loose fixing it yourself.

    Often I get onto the same frustration as you thinking that after so many years of development since computers appeared, we should have at least one platform that does everything we need daily in a reliable way. And every time I remember Linux is the closest one I can find. And every time I remember that trains, cars and planes have evolved even slower.

  19. In the first place….

    “If I could give my terminal priority over all other processes, then I could kill a few other programs and get out of the swapping hell, but I cannot get in contact with the damn terminal :-(”

    try [code]man nice[/code]

    In the second place, if you’ve been 8nixing since 1995, WTF are you doing diddling with Ubu?! Talk about “unstable”! They release more often than a 16 yr old!
    Tr a real Debian. Even Lenny (current “Testing” is more stable than the Ubu crap.

  20. handydan: nice would work if I new which program was eating my ram ;-)

    I don’t. By the time the swapping start, I cannot reach anything through my X interface. Trying to go out in a terminal is usually not working either.

  21. oh, or nice’ing -20 — to give the terminal priority to check which process is killing me — still requires that I can actually direct the input to that terminal when I do… but, yeah, maybe I can at least get output from it so I will know in the future…

  22. I’ll buy your laptop ;)

    Why not run ‘top’ or ‘htop’ in a terminal, givingi it a ‘renice -15 ‘ for another terminal, and do your thing to make it crash. Hopefully the top will keep going a bit at least, to see what the issue may be.

    Or check /var/log/messages.0 after a reboot to see if there is anything in there.

    But of course, I would run a liveCD with memtest86+ on it first, and let it test your RAM for a few hours.

    Your swap should be about 2x your RAM, upto 2GB of swap, at which point it should be about the same as your RAM. You never told us how much of either you have.

  23. lefty: My system isn’t actually crashing, it just starts swapping after which I cannot really get in contact with it, but yeah, I might try firing up a terminal + top to see what is going on. I’m not completely convinced that it is any particular program, though…

    Marc: Thanks for the link, I’ll have a look!

  24. I assume you have a properly sized swap partition? For me, i have 2gb of ram, which means i need a 2gb swap partition. It’s best to have a swap size that’s matched with the amount of ram you have.

    After this i would say change your habits with linux. I can tell that you do pride yourself with crashing stuff. Why not, crashing the perfectly stable osx and vista is a great feeling. But, concerning changing your habits in linux. If macosx fails, it’s apples fault, if windows fails, it’s microsofts fault, if linux fails, it’s your own damn fault.

    Essentially saying stop being a retard with root privileges. Linux is very stable, until you start messing with stuff you can’t quite fix because you were “rooting” around. Take it from me, ubuntu is a great distro, i use it, but you seem like you need something better.

    http://www.mepis.org —————————>try out mepis, i think you’ll like it better, if not, it makes a better live cd utility than knoppix for computer repair.

  25. Hey hey hey, I’m a happy man this morning. I followed Marc Paradise’s link (thanks Marc) and tried to follow the instructions there, but when it came to swapoff, I got this nice little error:

    $ sudo swapoff -a
    swapoff: cannot canonicalize /dev/disk/by-uuid/d00bfdb4-c541-4d56-8448-b0a13ce352a8: No such file or directory

    Hmm, that doesn’t look good.

    A bit of googling found me this thread: http://ubuntuforums.org/showthread.php?t=802398

    and updating /etc/fstab seemed to do the trick.

    I’ll do a bit of stress testing and see how it works, but I am pretty optimistic.

    Thanks for all your suggestions, guys! Much appreciated.

  26. I’m glad you seem to be fixing your problem. Still, I would suggest trying Slackware for x86 or a derivative (i.e. Bluewhite64) for x86_64. Slack is awesome for its simplicity and it’s rock solid. If you want massive speed out of the box (sort of speak) and don’t mind the time for downloading/compiling, go for Gentoo. BTW, Gentoo install can be speed up using the Sabayon distro.

  27. Did you recently “upgrade” your Ubuntu distribution?

    A few months back, I had a perfectly functioning distro of Ubuntu “gutsy” 7.1 on my IBM Thinkpad T23. When the update manager offered to upgrade my distro to “hardy” 8.04, I did so. That is when all hell broke loose. Crash, crash, crash. I couldn’t do updates without it locking up, etc.

    So I did some research on the new distro. All I found were rave reviews on how great “hardy” was. Finally, after alot of looking, I found out that the problem was not with the distro, but as a result of upgrading my distro through the update manager. Apparently, the upgrade install had some serious issues. The only way to get a stable install is to do a full, fresh install. Having to reinstall my OS from scratch was a big annoyance, but it has been stable ever since.

    I don’t know if this is helpful, but I felt compelled to share my experience.

  28. I always use the upgrades rather than re-installing, but whether this is what caused my problems I have no idea…

  29. i personally think of linux to be a good os because of its openess in editing its source code. if you downloaded the copy of linux over the internet from an unreliable source, chances are it has been meddled with

  30. Check whether you (were) using a -rt kernel. You should be using a normal kernel, it must not contain the suffix ‘-rt’ on it. This is the mayor cause for the problem you described, but it is just a matter of choosing the right option at startup.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

%d bloggers like this: