Part 2 of a serialized history
Posted by 09/13/2007
So there I am. It's 1999. I have been working at The Department of Human Services for about a year.
My unit has been given the task of converting a group of about 40 policy manuals to online format - but we still have to keep printing and distributing them. This involves mailing out only the pages that have changed - with instructions such as "replace pages 8-10 with these pages..."
Also, we are supposed to convert all the agency forms to MS Word™ (they are all in Wordperfect 5.1) and make them available online. A style for the manuals (hereafter referred to as 'handbooks') has been designed by committee, and those handbooks will be deployed as html files. I have proposed generating the html from Xml - and that proposal has been accepted. I have a rough idea how I'm going to pull that off, (see previous post) but no one has any idea how to get the forms online.
One idea was to make all the forms in MS Outlook™. I'm not sure how this counted as 'online'. Some contractor in I.T. had read that you could make forms in Outlook™ - so they pushed this idea.
I ended up making one form this way. The horrid "Purchase Request" form. With a total of 450 possible data fields, it was the form you needed to fill out to purchase anything. So it needed to be routed, and forwarded and carbon-copied just like an email. It almost made sense to use Outlook. It also had to integrate with an ancient accounting system that went back to the 1800s. That was a terrible experience, let me tell you. The sad part is it was used for years and years. By everyone.
Believe it or not there are over 1200 of these forms. Some of them are actually just letters, some of them are even envelopes - but most of them are the traditional forms you think of when you think of government forms - like the 1040-EZ. So it's not just a matter of setting up a few pages with a few forms. It has to be a system.
This is all complicated by the fact that handbooks (those things in Xml) in fact contain forms. And not in a straightforword way. You might think HandbookA has forms 1 and 2, HandbookB has forms 2 and 3 and so forth. But any form could be in any handbook.
When it came time to revise forms and distribute them, there were an intricate set of dependencies to trace out. And a revision to a form counted as a new version of a handbook. It got convoluted very quickly.
So, you have a software budget of $0 and an I.T. department that is too busy with Y2K to help you out. What do you do?
Since I am the technical lead for the project - I have to figure that out. It's not exactly what I was hired to do - but that's okay, because I was hired to make web pages. And that's easy.
Look at the Current Procedure
First step - is it possible to keep doing things the way they are done now? Look at the procedure in place, find the problems.
The procedure at this point was this:
- Someone requests a revision to a form - by filling out another form
- The requested form to be revised is worked on by one of the 2 people in the forms unit
- The "Forms coordinator" updates her master Wordperfect file list of forms when the work is done
- If the form is in a handbook a new version number is created for the handbook
- Someone enters that new version number into another Wordperfect file that has a list of all the handbooks versions
Then every week a 'weekly report' is given to the manager that is a running list of all the jobs you have ever worked on that have not yet been closed out. It doesn't matter if you worked on it that particular week. It still shows up on the 'weekly report'.
Sounds like a good system right?
Well, not really. It worked fine when everything was printed - but the attempt to move everything online was really stressing the system.
The managers had no idea what was going on. They have no way of tracking the conversion process, they have no way of assessing workloads. How do you know how many forms have been converted? How do you know how many are left to go? How do you accurately and systematically deal with distributing these forms online?
Solution to Problems:
Put it all in a database. So everyone in the unit can view and update the data, and so the managers can get some useful reports. They can gauge how many forms have been converted. They can assign projects based on workload. They can assess what a specific person did in a specific week. They can keep track of the dependencies between handbooks and forms, and the resulting handbook versions. In the meantime I can look into automating things. The Wordperfect file as database approach has no potential for automation.
It seems obvious right? No big deal. But this one idea met with severe resistance from nearly everyone in the unit. Luckily the managers had the vision to see that it would be a good idea. In a large organization like the state, without management behind you, you can accomplish nothing.
First - given any problem - try what you already know. That's the age-old secret, with it's good points and bad points. That's why some people get stuck doing the same thing year after year. What I knew was Access and Visual Basic, so that's what I tried.
I made a simple database of forms and handbooks and the dependencies between them, made some data entry screens and distributed it. Simple. Problem solved. Right?
Well, the Access database exhibited inconsistent behavior between computers. Which really irritated me to no end. "I don't know, it works on my computer ..." is something I said a lot. So I abandoned that approach.
Next, I took my own copy of Visual Basic (because the software budget was $0) and tried various things. I decided I wanted something web based - why? Well,
- To make for consistent behavior between machines
- So it would be easy to distribute updates
- Because it's the latest, coolest thing
- We're a web shop
My first approach to this new world of web applications was just to try and make the same kind of Visual Basic application I had written before, except as an ActiveX component.
ActiveX is an old technology, but it was fairly new then. It is an application you can plug in to a web browser, like a Java applet, but without all the security constraints that Java applets have (for instance a Java applet is not physically capable of deleting all the files from your hard drive). It only runs on Internet Explorer - because it's a Microsoft invention.
It seemed like it could be a useful technology to some people, but it was never widely used. Why is a long story. However, every time you open a Word document in Internet Explorer you are using ActiveX technology. So it's ubiquitous even though it's not widely used. If that makes sense.
The problem I had with writing ActiveX components that worked in the web browser is I found them problematic to update. It was not a technology geared toward experimentation, updated requirements and fast prototyping. Which I was all about, even then. Not so much as a philosophical viewpoint - but because I was hack and I didn't know what I was doing. But really, the most flexible technology will always win.
Visual Basic Web Application?
The next thing I tried was making web applications in Visual Basic. Visual Basic had added a new 'web application' feature in version 6 that looked promising. You just coded an application like you would a normal application - then when it came time to deploy you registered a *.dll on the server and deployed files.
NOTE: Keep in mind, every one of these represent a failed application. So it's not helping me convince the unit that a database is worthwhile approach. I'm failing at this point.
Asp to the Rescue
Since I was already making a prototype of the handbooks in asp it made sense to use asp for this new database driven site too. It was experimentation friendly. Easy to distribute and didn't worry the server administrators.
And possibly therin would lie the solution to how to get the forms 'online'. Outlook was not working out. I didn't have a solution to the forms at this point. Maybe they would all populate data to a centralized agency database. Maybe they would be in html. Maybe they would just stay in MS Word™.
So I engrossed myself in this project and spent quite a bit of time conceiving of a project tracking database - with projects, contacts, an inventory of forms and handbooks. And various things the unit needed in a database.
They call that method AJAX now. It's actually an old idea. And it was not that different back then, except it only worked on Internet Explorer. I didn't consider that a problem at the time.
There was only one problem. Everyone hated it. No one wanted to use it. It went on like this for months - the forms coordinator continued updating the Wordperfect list of forms. And if she remembered, she might update the database.
If I detected the data in the Wordperfect file didn't match the database anymore I would ask her about it. This was the typical response:
"Oh, I tried it the other day and it didn't work".
Some Rules of Thumb
So, Rule #1 - if there is something wrong with your application people will not tell you about it. They will just write it off with the phrase "it doesn't work".
But then something magical happened, I added a calendar to the front page. A goofy little thing where you could just add inconsequential things like "Rob is out sick" - or "Rob is on vacation" and it listed all of these "upcoming events" on the front page. I stole the idea from Outlook.
Strange thing is everyone loved it - and soon the database was this indispensable thing - and all woes were forgotten. Not that it was that great - in fact it really sucked. It still didn't work sometimes - and it was a maintenance nightmare.
But that's Rule #2 - All it takes is one cool thing to change someone's entire attitude toward an application.
After 9-12 months of hacking away and refining this thing, I had your typical convoluted asp application much like your typical convoluted php application. Sure I had factored pages out to functions. I had a lot of pages that started like this:
<% include "library/database.asp" %> <% include "library/comboboxes.asp" %> <% include "library/lists.asp" %> <% include "library/utilities.asp" %>
I had managed to miraculously commandeer a SQL Server database by this time. So I was finally working with a real database - that could deal with concurrent users, that was really fast, and that didn't just randomly fail.
So I migrated the data from Access, and then reworked
it; I made sure all
came through stored procedures for security, I
used triggers extensively. My goal was to get
as much business logic as I could in the database. I read
all about relational models, normalization etc...
All that typical database stuff.
I could have stayed at this level. I could have kept coding this way and with this set of technology forever. Plenty of people do. It works. It's not even that bad really.
But all this time I longed to be able to just write a web application like I used to write a program. I wanted to organize code into libraries. I wanted to be able to create objects and call methods on those objects - just like I would code a Visual Basic application.
Assumptions for me (1998-99)
I had certain assumptions at this point. I didn't even know they were assumptions:
- Programs can only be written using an IDE
- Microsoft is king - so it's a waste of time developing for anything else
- Web programming means a mass of pages litered with half-code and half html
- Databases are where you keep all 'business rules', numbering schemes, text formatting, 'combo-box' lists,
- I.T. departments do important mission-critical things with COBALT
- C++ is really, really hard - and if you made one mistake you could possibly destroy your entire computer
Asp sucks - Java to the Rescue
I don't know exactly how I made it from there to Java. Except that I wanted to write a web application like a program - and strangely, Java seemed to have the answer to that. Sort of.
It also seemed to be able to do what C++ promised to do, but that I always lost sight of when I actually tried to use it. With C++ I always got an endless parade of incomprehensible linker errors. I would download code that sounded cool, but would not be able to compile it. I would download code and try to use it as a library, but find out I needed to compile it - and that was always a "cross your fingers..." experience.
But with Java I could component-ize and compartmentalize like crazy - and it always seemed to work. I could download code and actually compile it. I could import and use 3rd party libraries and it actually worked. And I could write web applications, organizing the code in objects and libraries, using all the concepts of object-oriented programming. Much more so than in Visual Basic in fact. So it was the best thing I'd run across.
I hadn't heard much of Python or Ruby or Smalltalk - or a lot of things at the time. So, I wasn't really making a truly informed decision. I was marketed to, and I bought into the marketing.
Post-Java Assumptions (2000-2001)
After I started using Java I was operating on different assumptions:
- Sun invented the concept of the "Virtual Machine". And what a cool idea that was
- There were only 2 programming languages worth bothering with; Java and C++, but everything would eventually be written in Java
- Java had something called "garbage collection" that eliminated the majority of errors found in typical C++ code
- It is anti-establishment to use opensource software, Java is synonymous with opensource, therefore it is anti-establishment to use Java
- Microsoft is synonymous with proprietary, and proprietary is synonymous with bad. Therefore, Microsoft is synonymous with bad
- Standards are good - everything should be standardized
It was around 1999-2000 or so I started switching everything over to Java, and using Emacs all the time. And using Linux at home. This in turn, led to my solution of how to get the forms and handbook online with $0 budget and no I.T. support. That was the primary goal remember? All this Asp database SQL Server calendar stuff was secondary to that. Get the forms and handbooks online.
Solution to Primary Goal
So there are always two questions at this point; what do you have to do? and what do you have to work with? I will always remember that scene in the Apollo 13 movie when I think of this. You remember it, I'm sure:
Somebody walks in a small room filled with engineers, dumps replicas of every item aboard the Apollo 13 spacecraft onto a table and tells them to figure out how to combine all the debris into a workable exhaust system:
"We've got to find a way to make this (holding up a square canister)"
"fit into a hole made for this (holding up a round canister)"
"using nothing but that (pointing to the materials on the table)"
That scene sums up the entire field of programming.
So what did I have to work with? Well, I was stuck with Windows servers running IIS (Internet Information Server) and Asp. No one was willing to risk running a Java servlet container at my agency - or god forbid Linux servers with Apache. That would all have to wait until the IBM consultants came to town.
So I was stuck with Asp - and, even though I could use the SQL Server for our own unit's personal project tracking database - no one was willing to shell out the licensing costs to present a database driven application to the entire agency. That was for important I.T. stuff. And that was the realm of I.T. proper. Besides, they couldn't start any project until after this pesky Y2K thing ended.
No Thanks to the I.T. Department
Of course, it took quite a while to find this out. When we first approached the I.T. department with the simple question:
"Can you help us put a database driven web site up?"
The first answer was no answer at all. That's the typical response from most I.T. departments. And if your goal is to make the customer go away - it works 90% of the time.
After some tenacity and belligerence, we forced a meeting with I.T. After that the answer was "we'll look into it, but we're kind of busy now (duh! - Y2k - have you heard of it?). We'll get back with you..." which is actually the last thing we ever heard from them. Period.
It was obviously a very low priority for them. Although I don't understand why; the forms and handbook make up about 95% of the traffic on the servers. You'd think they'd want that kind of business. Go figure.
So given those constraints - forms online ended up meaning links to MS Word™ and Pdf™ files - nothing interactive, nothing about centralized agency data. That's too bad, but any kind of centralized data repository for the agency is still 10 years away. And it certainly won't be a system written by employees of the agency. It will be pieced together by outside consultants and will cost millions of dollars. It's just a prediction. I could be wrong.
Given all that, what I devised was basically a daily on-demand Xml data dump. Whereby the server, at any given time, contained a bunch of xml files that, pieced together, had all the data from our database necessary to run the site. Then I could use Xsl to render the Html.
Crude, but effective.
This involved running a Java routine - and since nobody wants to run things from the command line - I found myself writing a Swing GUI (Graphical User Interface) for it. And since it was connecting to the database and serializing to Xml I ended up writing an ORM (Object-Relational Mapper) and a generic Xml serializer - because that's what you do in Java - write things that solve possible future problems generically.
And since there were a lot of things people wanted to use the data in our database for besides just an Xml rendition pushed to web servers - it grew to an application for checking out files, checking in files, verifying files and publishing files. A little mini version-control system.
Note: One thing leads to another in an organic fashion when your writing automation routines for the people you work with. That's why I think all small units in a large organization should be assigned a programmer to help them do their jobs.
Myriad of Technologies
So now I had a truly schizophrenic system:
- Handbooks are in Xml and are run through an Xsl processor to produce Html and a DSSSL processor to produce Rtf
- Html Files are deployed manually to the servers
- Data is all in SQL Server updated via a monstrous Asp application
- Data is transformed to Xml and pushed to servers via a Java application
- Files are managed and versioned via a Java Swing application
I needed to consolidate things - rethink things, rewrite things. It had gotten to that point. You know that point. It doesn't matter whether you write software or not. Everybody knows about that point when things are too complicated.
1: Switch to MySQL
Getting Java to talk to MS SQL Server was an exercise in pain. Because of that old 'love/hate' relationship between Sun and Microsoft. In fact, lets just call it a 'hate' relationship. And having to deal with a DBA (Database Administrator) was also a pain. The HR department was using the same database as we were - so whenever anything went wrong with their system the database administrator blamed us. She had never worked with databases before so she was operating in pure superstition mode.
She also never understood how to set up permissions or how to enable a TCP/IP connection. In fact there were all sorts of things she didn't understand. This happens a lot. A technically savvy person is put in charge of a database and called a DBA. They don't actually know anything about it - and they are the kind of person that requires training to learn anything, so the whole time they are just as frustrated as everyone else is. "If I could get some training in this maybe I could do my job..." kind of attitude.
So I managed to find a surplus computer (no small feat at a state agency), installed MySQL and moved all the data over to it. Since I was using JDBC I could do the classic 'switch the database out' thing that is always used as demontration of the great advantage of using a standardized approach to accessing databases. The only problem is I had made extensive use of stored procedures and triggers - and MySQL didn't have those. So I couldn't just 'flip the switch'.
Stored procedures are just SQL statements - but triggers are a little more complicated. Those are SQL statements that run after a record has been updated, deleted or added.
I ended up building the triggers into my custom ORM - which turned out very useful. I could arbitrarily add function names to an Xml file to be called when something was saved, updated or deleted. And it could be anything. For instance, I could delete files off of an FTP server if a record in the database was deleted. You can't do that with your typical database. I think it took a while for Hibernate and the like to offer that sort of thing. Maybe they still don't. You can always subclass, but I don't know if they offer declarative 'triggers' like that.
This all logically leads to a 'pure Java' system. You can see it coming from a mile away.
2: Integrate with Ant
Ant came along - and I figured out if I took that Xml data
I was sending to the servers
and ran it through Xsl - I could generate a
that had all the instructions I needed for generation,
deployment or anything else. So I gave every item
in our inventory an
index.xml file and then used
that to generate instructions for ant.
That made it easy to run
configure from anywhere to
generate various build instructions for any particular project.
Since it was all Java I could subsume some of file
routines from the Swing GUI as custom ant tasks.
I did not foresee any problems with this approach. At the time Ant
seemed great. It really was better than
It wasn't until later that I really came to see build
files as little programs, and saw that xml - even xml generated
by Xsl, was not the right way to go about composing a syntax for these
kinds of things.
In fact I've come to believe that every time anybody writes code they will eventually want all the features of code organization that are available in most programming languages, such as imports and modules. That includes SQL, Html, Xsl and CSS. Which is one of many problems with those languages - but I'll save that rant for another post.
Now that I had an entire system in Java that understood the database, I needed to be able to multi-purpose that code. I had written this crude ORM. It was simple conceptually, but it worked. It was the kind of thing written by thousands of people before me until Hibernate came along.
The only system dependencies were the JDBC driver, an Xml parser, a connection pooling component, and the JDBC 2 extensions. 4 jar files. 4 dependencies. Looking back on it, what a beautiful thing that was. So simple. Later I would build a system less 15,000 SLOC (software lines of code), but plus 56 more jar files.
Once all this Java structure was in place - I could then use JSP to hook it all up as a web interface. With some straggling Swing components for file routines that just don't work well in a browser. It was almost like writing a program that ran as a web application. Way more so than an Asp application. It was a perfect, complete system. Or so I thought.
"Pure Java" System complete
All in all this took a few years to really perfect. So it's 2002 or so. I have a system running entirely on Java. I had accomplished my mission. It was a 40,000 SLOC monstrosity (and that's not counting the JSP). An under-documented, difficult to modify code base. Yet I had huge plans for it. I was going to open-source it. I was going to call it "Katerpillar" because it was a 'many-legged' creature. Very similiar to calling it 'mish-mash', or 'piece o'junk', but with better conotations.
I was also writing my own text editor - because I thought the world needed a Java version of Emacs. That was going to be called "Koffee". Get it?
None of that happened, which is good - because most of what I was writing then was crap.
In the meantime I wrote thousands and thousands of lines of code in a forgotten language called DSSSL, which was a derivative of Scheme. This was a largely un-documented language but somehow I muddled through it - and even wrote some extensions in C++ for the Jade RTF processor. I became all too familiar with the RTF specs. And learned how to use Scheme and C++ to produce the RTF code I needed for page specific revisions and revision side marks and 2 line headers and footers. All the things those $100,000 systems didn't do.
Although I considered this a dead technology and a complete waste of time in 2001. I later viewed this as the most advanced programming I did during that period of time - and all the Java code seemed, in comparison, juvenile and ill-conceived.
To be continued ...