Ephes Blog

Miscellaneous things. Mostly Weeknotes and links I stumbled upon.

Date -

Writing my own blog engine: The database model

, Jochen
 Since I’m writing my own blog software, I’ve thought about how to lay out the models in the database. This is the layout I am currently using:

There's the main table of blogposts having a blog_id foreign key column pointing to the blog a blogpost belongs to. Blogposts are also asscociated with the user which created them in a column named author. I omitted all of the other user <-> model relationships to make the entity relationship diagram more simple.

The relationships between blogposts and media entities like images or videos are more interesting. At the moment I'm using many to many relationships for each media type. Galleries of images are considered as a different media type and have another many to many relationship to images. Probably there will be some more of those relationships like audio being added in the future.

Having a many to many relationship for each media type seems to be tedious. Just getting all of the blogposts including their related media models for the list of recent posts requires now complex sql queries. Is this really necessary? I don't now, but all approaches have their advantages and drawbacks and this seems to be the most general approach, so I'll use it until I know that I really don't need that generality. This might sound a little bit like premature optimization, but this is a fun project so it doesn't have to be efficient.

The approach I used before that was to have one many to many relationship between blogposts and a media model which then had a generic relation to the actual media model. It would have been easier to add new media types: Just add a new audio table and relate blogposts to audio content by using the generic foreign key from the media model table. This is also the approach used by generic tagging applications. They can't know in advance for which models tags will be created, so they have a generic relation that could be used for every model. But in my case there's a finite set of media types. There are not hundreds of possibilities but just five to ten. And using generic relations has some bad disadvantages which are nicely summarized in the article Avoid Django’s GenericForeignKey.

Another approach I've thought about but didn't implement is to have just simple one to many relationships between blogposts and media models. So for example you could have a blogpost with many images but not an image that belongs to more than one blogpost. In theory this is wrong, because it should be completely possible that one image appears in more than one blogpost. But this shouldn't happen often and if it happens, it's enough to duplicate the image row in the database, which isn't a problem. The image itself lives in the file system and we could use a hash of it's own content as a filename to avoid duplicate images in the file system. We would have then multiple image models in the database pointing to the same image in the file system, but would be really that bad? The database queries would get a lot simpler then.

And finally I'm not sure whether it's a good idea to have galleries in the database as their own models. It would also be possible to add a json field to blogpost and write information like which image belongs to which gallery to this json field.

It always suprises me how seemingly easy problems like how to model a blog engine turn out to be not so trivial at all on close examination. It seems that The Schainker Converse to Hoare's Law of Large Problem still holds true:  Inside every small problem is a larger problem struggling to get out.