Ephes Blog

Miscellaneous things. Not sure what to put here, yet.


Weeknotes 2021-06-28

- Jochen
Work on django_fileresponse goes on and it's now in a somewhat usable state. Recorded progress in four more streams / videos. We recorded a new podcast episode about DjangoCon Europe 2021. Joined the jazzband organisation to be able to just fix projects instead of forking and doing a PR maybe later (after Django releases for example..). After ruling out any other possible source for the strange noises I hear in my guitar sound, I replaced the cables between my guitar and the mixer in my audio setup. It was the cables \o/.

Interesting Articles / Podcast Episodes


Weeknotes 2021-06-21

- Jochen
Worked a little bit on django_fileresponse and started live streaming. Atm I stream to twitch and later upload the videos to youtube, too. The stream is about creating a podcast hosting application as a SaaS product. Being able to serve files directly from the application server is important for a podcast hosting application, therefore I started building some infrastructure to be able to do this.

For django_fileresponse I used nbdev because I liked the concept and you get a lot of infrastructure (python package, documentation) for free. But having a package with name that differs from the path from where you want to import code (django_fileresponse vs from fileresponse import x) breaks nbdev_build_docs so I had to replace this command line utility which patches the settings.ini on the fly. It was probably more effort to adapt nbdev to this than using a proper django package template, but now it works :).

Created an django example project whithin django_fileresponse and learned a little bit more about magic commands in jupyter notebooks. Serving files using gunicorn with uvicorn is working now. Had to modify the Django ASGIHandler to make it possible. I'm still not sure what's the best way to swap the default ASGIHandler with a modified version from a projects perspective (being explicit, monkeypatching..).

Serving files asynchronously from MinIO is now also possible and most of the streaming data to the client related code is shared with the class that serves files asynchronously from the filesystem.

Worked on my streaming setup too and tried to bring some structure into the mess of cables behind my desk. But it didn't improve much. Felt more like an expedition into the cable dungeon. Happy to be back alive. In preparation for the next Python Podcast episode I tried to watch most of the 2021 DjangoConEu Talks.

Interesing Articles / Podcast Episodes


Django 3.1 Async

- Jochen

With version 3.1, you can finally use asynchronous views, middlewares and tests in Django. Support for async database queries will follow later. You don’t have to change anything if you don’t want to use those new async features . All of your existing synchronous code will run without modification in Django 3.1.

Async support for Django is on it’s way for quite some time now. Since version 3.0 there’s support for ASGI included. But there was not much benefit for end users though. The only thing you could do concurrently were file uploads, since uploads don’t reach the view layer which was not async capable in Django 3.0.

When do you might want to use those new features? If you are building applications that have to deal with a high number of tasks simultaneously. Here are some examples:

  • Chat services like Slack
  • Gateway APIs / Proxy Services
  • Games, especially MMOs like Eve Online
  • Applications using Phoenix Liveview - check out Phoenix Phrenzy results for additional examples
  • A reactive version of Django Admin where model changes are shown interactively
  • A new api frontend for Django REST framework updating list endpoints interactively as new data comes in
  • All kinds of dashboard applications showing currently active connections, requests per second updating in realtime

As Tom Christie explained in his talk Sketching out a Django redesign held at DjangoCon 2019 the core question is this: Do we want to have to switch languages to support those use cases? And while his Starlette project (gaining popularity recently in combination with the FastAPI framework) is allowing us to do all this in Python, we also might want to keep using Django.

What to Expect from this Article?

  1. A small example on how to use async views, middlewares and tests
  2. Why is async such a big deal anyway?
  3. The gory details of multithreading vs async, GIL and other oddities

Estimated read time: 25 minutes
There's also a podcast episode elaborating a little bit more on this topic (it's in german).


Scatter plots with density quartiles with python

- Jochen
The other day I saw an interesting blog post about scatter plots with density quartiles using r.

I liked the idea, but wondered whether a simple kernel density plot would have the same effect. And if not, maybe how difficult it will be to adapt the approach to python. So I created this little jupyter notebook:

Writing my own blog engine: The database model

- Jochen
 Since I’m writing my own blog software, I’ve thought about how to lay out the models in the database. This is the layout I am currently using:

There's the main table of blogposts having a blog_id foreign key column pointing to the blog a blogpost belongs to. Blogposts are also asscociated with the user which created them in a column named author. I omitted all of the other user <-> model relationships to make the entity relationship diagram more simple.

The relationships between blogposts and media entities like images or videos are more interesting. At the moment I'm using many to many relationships for each media type. Galleries of images are considered as a different media type and have another many to many relationship to images. Probably there will be some more of those relationships like audio being added in the future.

Having a many to many relationship for each media type seems to be tedious. Just getting all of the blogposts including their related media models for the list of recent posts requires now complex sql queries. Is this really necessary? I don't now, but all approaches have their advantages and drawbacks and this seems to be the most general approach, so I'll use it until I know that I really don't need that generality. This might sound a little bit like premature optimization, but this is a fun project so it doesn't have to be efficient.

The approach I used before that was to have one many to many relationship between blogposts and a media model which then had a generic relation to the actual media model. It would have been easier to add new media types: Just add a new audio table and relate blogposts to audio content by using the generic foreign key from the media model table. This is also the approach used by generic tagging applications. They can't know in advance for which models tags will be created, so they have a generic relation that could be used for every model. But in my case there's a finite set of media types. There are not hundreds of possibilities but just five to ten. And using generic relations has some bad disadvantages which are nicely summarized in the article Avoid Django’s GenericForeignKey.

Another approach I've thought about but didn't implement is to have just simple one to many relationships between blogposts and media models. So for example you could have a blogpost with many images but not an image that belongs to more than one blogpost. In theory this is wrong, because it should be completely possible that one image appears in more than one blogpost. But this shouldn't happen often and if it happens, it's enough to duplicate the image row in the database, which isn't a problem. The image itself lives in the file system and we could use a hash of it's own content as a filename to avoid duplicate images in the file system. We would have then multiple image models in the database pointing to the same image in the file system, but would be really that bad? The database queries would get a lot simpler then.

And finally I'm not sure whether it's a good idea to have galleries in the database as their own models. It would also be possible to add a json field to blogpost and write information like which image belongs to which gallery to this json field.

It always suprises me how seemingly easy problems like how to model a blog engine turn out to be not so trivial at all on close examination. It seems that The Schainker Converse to Hoare's Law of Large Problem still holds true:  Inside every small problem is a larger problem struggling to get out.