Poor Man's Multi-Table Inheritance
A few tricks with Ruby and Meta-programming
Posted by 04/18/2008
A while ago, I was given the task of converting a working Java web application to Ruby on Rails. The word on the street was that Ruby on Rails was great for a 'green' project, lays i.e. starting from scratch - but not so good for a legacy application.
However, since the database I was using already had an auto-increment primary-key column 'id' for every table, I didn't run into that many problems. There was, however, one bit of functionality I puzzled over for a while.
The application was a project tracking application: projects with attached assignments, and each assigment was associated with a document of some sort. The document types weren't really related to each other all that much, except for a few common attributes and the fact that they could be assigned. To protect the names of the innocent, I will pretend the types of items were articles, books and blog posts. If you consider those, they don't have a lot in common. No article or blog post needs an isbn, and no book needs a date posted field. They are all pretty distinct conceptually; the commonality can be found in that each are something an editor might need to work on - i.e. they are all coherent bodies of text.
I looked at the existing Java code. The application had classes Article, Book and Post all derived from Document. But there was no 'documents' table - only articles, books and posts.
Technically this is Multi-table inheritance since each type of object gets its own table. This is not something supported by Rails. Rails supports the idea of Single Table Inheritance (STI): put every record in one table. This often works out very well because it is efficient and simple.
However, I didn't want to force all these records into one table for a variety of reasons. One such reason is I wanted to make sure no Book was created without an isbn - but if I put them all in one table, the isbn field would have to allow for NULL (for articles for instance). Some would argue this an application level concern, but I think if you can guarantee data validity at the database level - and bolster that reliability on the application side - you are much better off than just merely relying on the application.
Another reason was because I've found, when rewriting an application in Rails, the longer I can push forward keeping the original data structure intact the better. That way I can run both applications at the same time as I figure out the functionality of the first. In the ideal world, the functionality of the original app would be fully detailed in a spec somewhere. This project, unfortunately, was deeply rooted in the actual world.
The currently working Java application was using an ORM, just like Rails does. The ORM for this project was something from the Apache group called the ObjectRelationalBridge or OJB for short. I looked into the code to see how it was achieving this multi-table inheritance. Basically it was using a simple table to keep track of what the next unique id should be for a given 'type'. Some databases support sequences that do this, but not MySQL 3.
The name of the table was sequences and it looked roughly like this:
| id | type_name | value |
|---|---|---|
| 1 | document | 5825 |
Whenever it was time to create a new Article, Book or Post, the database went to this table looking for the 'value' in the row with the 'type_name' = 'document', used that value to assign an id in a table corresponding to the object, and incremented the value. So the next Article would get an id of 5826 which would be stored in the id column of articles, the next Book would get an id of 5827 in the id column of books and so on. Pretty simple.
The other tables were structured something along these lines (this is a simplification):
projects
- id
- subject
- start_date
- end_date
- status
assignments
- id
- document_id
- document_type
- start_date
- end_date
- status
- project_id
articles
- id
- title
- author_id
- published
books
- id
- title
- author_id
- isbn
posts
- id
- title
- author_id
- slug
So how do I go about duplicating the functionality in Rails? Ideally, I'd like not to have to normalize all data into one big table, and ideally, I'd like to do something that doesn't mess with the internals of Rails, simply using the conventions that are already there. Over the years, I've found meddling with the internals of a framework can come back to bite you. Call me paranoid.
First Try
So, working in the most basic Rails idioms, I start off with something like this:
has_many :assignments
end
class Sequence < ActiveRecord::Base
end
class Article < ActiveRecord::Base
before_create :generate_key
def generate_key
type_name = "document"
key = Sequence.find(:first, :conditions => [ "type_name = ?", type_name])
new_id = key.value + 1
key.value = new_id
key.save
self.id = new_id
end
end
class Book < ActiveRecord::Base
before_create :generate_key
def generate_key
type_name = "document"
key = Sequence.find(:first, :conditions => [ "type_name = ?", type_name])
new_id = key.value + 1
key.value = new_id
key.save
self.id = new_id
end
end
class Post < ActiveRecord::Base
before_create :generate_key
def generate_key
type_name = "document"
key = Sequence.find(:first, :conditions => [ "type_name = ?", type_name])
new_id = key.value + 1
key.value = new_id
key.save
self.id = new_id
end
end
That will take care of those Document objects getting their correct id, then I just have to make the Assignment reference it. This will work fine as a belongs_to since there can only be one match per row.
belongs_to :project
belongs_to :article, :foreign_key => 'document_id'
belongs_to :book, :foreign_key => 'document_id'
belongs_to :post, :foreign_key => 'document_id'
end
That's all fine and good, but I wanted to clean up my code for 3 reasons:
- I've had to copy and paste the
generate_keyandbefore_create :generate_keycode 3 times already. Not veryDRY. - There is nothing in the code itself to indicate that there is a relationship between a
Book, anArticleand aPost. - I have to type in
belongs_tofor each type of object. I figure there must be a way to make this more succinct.
Second Try
First order of business: fix all those redundant generate_key methods. Well, how do you go about adding both class method (before_create) and an instance method to a class? Answer: a Module.
The Rails library likes to use the base.extend(SomeModule) convention by defining ClassMethods and InstanceMethods submodules. This works great and is a good convention to follow. If there are not many instance methods, though, I've found calling module_eval within ClassMethods will work just as well (see below).
def self.included(base)
base.extend(ClassMethods)
end
module ClassMethods
def has_generated_key(object_name)
object_name = object_name
# add the before_create hook and make sure
# we can access the object_name in instance
# methods
class_eval <<-CODE
before_create :generate_key
cattr_accessor :object_name
@@object_name = object_name
CODE
# add the generate_key instance method
# NOTE: i've skimmed over worrying about Symbol
# vs. String for clarity
module_eval <<-CODE
def generate_key
type_name = object_name.to_s
key = Sequence.find(:first, :conditions => [ "type_name = ?", type_name])
new_id = key.value + 1
key.value = new_id
key.save
self.id = new_id
end
CODE
end
end
end
This means when I put has_generated_key :some_value in the preface of a class definition, the class will call the generate_key function before it runs create to generate the id value from the sequences table. Now I can set up my classes using inheritance. The only odd thing about that is that I have to set the table name on subclasses (otherwise it will always look for a 'documents' table).
include GeneratedKey
has_generated_key :document
end
class Article < Document
set_table_name "articles"
end
class Book < Document
set_table_name "books"
end
class Post < Document
set_table_name "posts"
end
The only thing left is to clean up that belongs_to :book, belongs_to :article etc. stuff in the Assignment class. Once again, I can just factor it out into a module.
def self.included(base)
base.extend(ClassMethods)
end
module ClassMethods
def belongs_to_type(object_name, implementations=[])
object_name = object_name
# take the array of implementations and do belongs_to for
# each one
implementations.each do |implementation|
code = <<-CODE belongs_to :#{implementation},
:foreign_key => '#{object_name.to_s}_id'
CODE
class_eval code
end
# build up an if for each type of object i.e.
# if article
# return self.article
# end
# if book
# return book
# end
# etc...
if_loop = ""
implementations.each do |implementation|
if_loop += <<-CODE
if self.#{object_name.to_s}_type == "#{implementation}"
return self.#{implementation.to_s}
end
CODE
end
# add the 'document' method
code = <<-CODE
def #{object_name.to_s}
#{if_loop}
end
CODE
module_eval(code)
end
end
end
This creates a method belongs_to_type that is called during class definition, taking the type (as a Symbol) and an Array of the allowable subclasses as parameters. It also adds a method whose name is derived from the type (in this case document) that will loop through all the values and pick out which kind it is based on a *_type field value. Now I can just write this code for my Assigment class:
include GenericAssociations
belongs_to :project, :foreign_key => 'project_id'
belongs_to_type :document, [:book, :article, :post]
end
So the only major surgery I might have had to do on the original database design was to add a document_type field to the assigments table. But that's it and, in my case, there was already a field by that name anyway because it was still necessary for the UI. Now I can write code like this to get the Document of an Assignment (and because of duck-typing - any field I need I can just reference when I need it):
project.assignments.each do |assignment|
print assignment.document
end
I can still retrieve only books, or only articles:
articles.each do |article|
print article
end
books = Book.find(:all)
books.each do |book|
print book
end
And I can still create an Article, Book or Post the same way I would any other object:
article.save
Download files here: multi_table_article.zip
Note: The download includes a Rakefile in which you need to set up the db, username and password that matches your own. Then run rake. The default task runs some tests using a Ruby version of doctest, something from the Python world I'd like to see used in the Ruby world. The basic idea of doctest is that you can copy the results of an interactive interpreter session (in Ruby's case IRB) into comments in your Ruby code and rerun those transcripts as a form of regression testing. For example:
#doctest Check that 1 + 1 = 2 >> 1 + 1 => 2 >> 2 + 3 => 5 =end
I included a slightly modified version so that I could run the test from Rake. For a more official version see http://code.google.com/p/ruby-roger-useful-functions/wiki/DocTest
Comments
Post a comment