« Flying Dutchman downgrades to Flying Blue | Main | Get your Ofni here! »

Sun Microsystems: Breaking the Stupid barrier

In what I do I get to evaluate a lot of software. I get to install it, test it, see if it meets the needs of my clients, and I get to decide whether the software is worth the bits it's written on. Disturbingly often, I run into software that just plain doesn't work. And in a depressingly large number of cases, the software doesn't work because the software design was just plain blindingly stupid.

Take Sun Microsystems' Java System Directory Server as an example. This software has been around in various forms for in excess of eight years, and can cost big bucks if installed on a large server. You would think they would have got it right by now. You would think that they got at least their install process right.

Right?

First, the installer

After waiting a day for the 120MB worth of download to come down (the Linux version, we're bandwidth challenged down this way), I unpack the software, and Lo! a program called "installer". I run it, and up comes a beautifully rendered splash screen and install wizard, Windows style. Oooh. Looks expensive.

It has to be said at this point that all I need is a small subsection of the package called the "server console". The directory server and the admin server components of this package are not needed on the machine I am doing the install on. So I make the mistake of selecting the "I would like to configure the servers later" option so helpfully provided to me by the installer, in the hope the installer will shut up and stop being helpful, and leave me to run the console program as I require. "So be it" says the installer, leaving me to go back to the safety of my command line.

The console runs

Running the console is easy. Just change to the correct directory and run ./startconsole. Like this:


[root@gatekeeper ~]# ./startconsole
Segmentation Fault

We're off to a great start.

Debug mode kicks in

"Segmentation fault" in it's most basic form can often be translated into English as "The programmer made the blindingly stupid assumption that whatever they tried to do would succeed, and so didn't bother actually writing the tiny bit of error checking code that would give the end user a clue as to what is wrong or how to fix it when what they tried to do failed".

At this point the typical end user is sunk. "Segmentation fault" is not only fatal to the program, but also meaningless to the end user. But then I write software, I can translate, let me see what I can do. Lets throw in a bit of strace to see what is going on. Running a program through strace reveals what a program is doing in the background. Usually the pattern in the program is revealed as "try do something; ooh, the system returned an error code, then *boom*'". In this case, it was looking for a directory that was supposed to form part of the install. I manually create a temporary symbolic link pointing at the most likely place, and voila! the segmentation fault is gone.

What does this mean? It means that nobody at Sun has ever tested the option in the installer that said "install the servers later". That's not all they haven't tested.

The console, take two

Let's try that again.

*boom*

Another error message: Exception in thread "main" java.lang.NoClassDefFoundError:
com/netscape/management/client/console/Console
.

Translated into English, this error means "part of the software is missing". Oops. Well at least "NoClassDefFoundError" gives you a small bit of a clue. To a Java person, the message is clearly understandable, so at least we're one rung further up the ladder than before.

Error messages like these mean only one course of action:

Google it

The Great Oracle Google, all Wise and Knowing, and with an optionally "Safe Browsing disabled" feature, revealed this link. A forum! Hosted by Sun! Surely, they would know.


Warning: mysql_connect(): Can't connect to MySQL server on '192.168.2.1' (113) in /var/www/html/sjes/index.php on line 61


Warning: mysql_error(): supplied argument is not a valid MySQL-Link resource in /var/www/html/sjes/index.php on line 61


Warning: mysql_errno(): supplied argument is not a valid MySQL-Link resource in /var/www/html/sjes/index.php on line 61


Fatal error: SQL Error has occurred, please contact the administrator of the forum and have them review the forum's SQL query log in /var/www/html/sjes/index.php on line 49

Oh dear. Seems Sun's forums are down. I mailed the administrator, and got the following back:


Hello Graham,


the Supportform site is experiencing some hardware
issues. We hope to have this resolved as soon as
possible.


Sorry for the inconvience.

Fair enough. Servers give trouble from time to time. It happens. Never mind, the almighty Google cache comes to the rescue! The original forum entry at last! We're saved!

And the advice is simple, install the software again, this time configuring the servers. Oh well, as a temporary measure I can deal with the servers being there, I just won't start them up.

Back to the installer!

So, I run the installer again, and what do I get? It's like the Office Paperclip! The installer wants to be helpful! The installer points out that I have already installed the software. And it won't continue.

java-install-wizard.jpg

No worries, let me get rid of the software quickly:


[root@gatekeeper ~]# rm -rf /opt/sun
[root@gatekeeper ~]# ./installer

Ooh look. Pretty installer. Oh dear. The installer wants to be helpful. The installer is like the Office Paperclip. The installer just told me that the software is already installed. And it won't continue...

java-install-wizard-2.jpg

Let's sum up

The more picky^H^H^H^H^Hastute among you would have noticed that the advice in the forum (assuming that forum entry ever works again) suggested "Manually deleted each and every JES package and directory, and removed them from the "productregistry".". The key word here being "productregistry".

A "registry" in this case is simply "a place where settings are stored". Find and delete that registry, and we can start again from scratch. Oops, did we say "find the registry?".

Some more blindingly stupid programming from Sun. The standard way of handling a registry on a Unix box is to place it in the user's home directory, in a folder starting with a dot. Alternatively, if it's temporary as in this case, the registry might be found in a special directory on the machine specially set aside for temporary files, helpfully called /tmp.

And there are some registry directories! There in /tmp. They are a bunch of directories with the name "jes" (Java Enterprise System") So they get deleted.

Does it make a difference?

java-install-wizard-2.jpg

It's at this point where the end user is now frothing at the mouth. The ten minute job has just pushed through two hours, and yet again 3am looms closer on the clock dial. Another day, another software failure.

Is it the end user's fault? I dunno. If you iron sideways instead of forward and backward, does your iron fall to pieces? Does your car's doors fall off if you press the "lock car" button twice instead of once? Does your pen spontaneously leak out all the ink if it's used by a left handed person instead of a right handed person?

Many software engineers argue that they cannot check for all bugs. Not all problems can be gracefully handled. You could argue this point back and forth, but the simple underlying issue is that it's used as an excuse to leave the program unfinished.

"We can't" cry the programmers. "So we won't".

In this case it's perfectly logical for an install program to be run twice (if you run Windows, you're likely to run an installer more than twice). It is perfectly sane for a system whose directories have been physically deleted to say something other than "Compatible version already installed". And it is really blindingly stupid to tell a user "No products selected. Please select a product and click Next" when you've just greyed out the selection options, making it impossible for the user to select a product, and therefore continue.

Why pick on Sun? They were just the next in the long and depressing list of software manufacturers who let software out the door that shouldn't have left the developer's workstation. Alas, they won't be the last. :(

TrackBack

TrackBack URL for this entry:
https://blog.pier29.net/mt/mt-tb.cgi/149

Comments

Ahhh yes, Sun Directory Server. Had a run in with it about two months ago. Right as I started my new job. I already started laughing as I read the name of the software.... gaze upon this:

We were lucky enough to have a solaris box with the server pre-installed. And were helpfully pointed to some ldif files by a collegue, that we just "upload and it works". Fine. Sorted.

After a quick look at the ldif it seemed to make sense, and we ran the command line to import the ldif into the directory. (Which I don't remember at this moment), and it came back with no errors, no segfaults, nothing. Clearly success then!

Then we tried to do stuff and realised that the ldif had in fact been broken (first wrong assumption: people that have worked at a place for years know better than the office newbie). But not only that, sun's tool had assumed the LDIF would be peachy and had performed no sort of error checking whatsoever. Which is weird. Because that was something I'd come to expect from even free tools like, say, OpenLDAP.

So now, the broken ldif had been, by some miracle, imported, in all it's brokenness, into the directory.

Okay, so we simply remove the root node of the entry and try again.... oh wait. The server refuses to nuke it, because it's broken. In fact, we can't change anything on the tree, because, as you point out, the software is trying to be helpful, and refuses to operate on broken entries. Which is strange, because it had no problem importing a broken file to begin with. Oh dear...

So we try to uninstall the server. More chaos ensues.... because, you guessed it.... the entire install is now FUBAR. And we try to find EVERYTHING to do with the server on the box, and delete it. I think you can guess what happened at this stage... We can't find everything, and we can't re-install.

Great. Format box, Start over.

Good thing this wasn't a production server, but merely a testing environment. I shudder at the thought.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)