Ephes Blog

Miscellaneous things. Not sure what to put here, yet.

Django 3.1 Async

, Jochen

With version 3.1, you can finally use asynchronous views, middlewares and tests in Django. Support for async database queries will follow later. You don’t have to change anything if you don’t want to use those new async features . All of your existing synchronous code will run without modification in Django 3.1.

Async support for Django is on it’s way for quite some time now. Since version 3.0 there’s support for ASGI included. But there was not much benefit for end users though. The only thing you could do concurrently were file uploads, since uploads don’t reach the view layer which was not async capable in Django 3.0.

When do you might want to use those new features? If you are building applications that have to deal with a high number of tasks simultaneously. Here are some examples:

  • Chat services like Slack
  • Gateway APIs / Proxy Services
  • Games, especially MMOs like Eve Online
  • Applications using Phoenix Liveview - check out Phoenix Phrenzy results for additional examples
  • A reactive version of Django Admin where model changes are shown interactively
  • A new api frontend for Django REST framework updating list endpoints interactively as new data comes in
  • All kinds of dashboard applications showing currently active connections, requests per second updating in realtime

As Tom Christie explained in his talk Sketching out a Django redesign held at DjangoCon 2019 the core question is this: Do we want to have to switch languages to support those use cases? And while his Starlette project (gaining popularity recently in combination with the FastAPI framework) is allowing us to do all this in Python, we also might want to keep using Django.

What to Expect from this Article?

  1. A small example on how to use async views, middlewares and tests
  2. Why is async such a big deal anyway?
  3. The gory details of multithreading vs async, GIL and other oddities

Estimated read time: 25 minutes
There's also a podcast episode elaborating a little bit more on this topic (it's in german).

Scatter plots with density quartiles with python

, Jochen
The other day I saw an interesting blog post about scatter plots with density quartiles using r.

I liked the idea, but wondered whether a simple kernel density plot would have the same effect. And if not, maybe how difficult it will be to adapt the approach to python. So I created this little jupyter notebook:

Writing my own blog engine: The database model

, Jochen
 Since I’m writing my own blog software, I’ve thought about how to lay out the models in the database. This is the layout I am currently using:

There's the main table of blogposts having a blog_id foreign key column pointing to the blog a blogpost belongs to. Blogposts are also asscociated with the user which created them in a column named author. I omitted all of the other user <-> model relationships to make the entity relationship diagram more simple.

The relationships between blogposts and media entities like images or videos are more interesting. At the moment I'm using many to many relationships for each media type. Galleries of images are considered as a different media type and have another many to many relationship to images. Probably there will be some more of those relationships like audio being added in the future.

Having a many to many relationship for each media type seems to be tedious. Just getting all of the blogposts including their related media models for the list of recent posts requires now complex sql queries. Is this really necessary? I don't now, but all approaches have their advantages and drawbacks and this seems to be the most general approach, so I'll use it until I know that I really don't need that generality. This might sound a little bit like premature optimization, but this is a fun project so it doesn't have to be efficient.

The approach I used before that was to have one many to many relationship between blogposts and a media model which then had a generic relation to the actual media model. It would have been easier to add new media types: Just add a new audio table and relate blogposts to audio content by using the generic foreign key from the media model table. This is also the approach used by generic tagging applications. They can't know in advance for which models tags will be created, so they have a generic relation that could be used for every model. But in my case there's a finite set of media types. There are not hundreds of possibilities but just five to ten. And using generic relations has some bad disadvantages which are nicely summarized in the article Avoid Django’s GenericForeignKey.

Another approach I've thought about but didn't implement is to have just simple one to many relationships between blogposts and media models. So for example you could have a blogpost with many images but not an image that belongs to more than one blogpost. In theory this is wrong, because it should be completely possible that one image appears in more than one blogpost. But this shouldn't happen often and if it happens, it's enough to duplicate the image row in the database, which isn't a problem. The image itself lives in the file system and we could use a hash of it's own content as a filename to avoid duplicate images in the file system. We would have then multiple image models in the database pointing to the same image in the file system, but would be really that bad? The database queries would get a lot simpler then.

And finally I'm not sure whether it's a good idea to have galleries in the database as their own models. It would also be possible to add a json field to blogpost and write information like which image belongs to which gallery to this json field.

It always suprises me how seemingly easy problems like how to model a blog engine turn out to be not so trivial at all on close examination. It seems that The Schainker Converse to Hoare's Law of Large Problem still holds true:  Inside every small problem is a larger problem struggling to get out.


, Jochen
Probably one of the last days this year we got to sit outside.

Using a CDN

, Jochen
Today I switched from serving media files directly from s3 to cloudfront. The main reason is that s3 only supports http/1.1 while cloudfront also does http/2 which is much faster because images etc. could be downloaded in parallel.