Ephes Blog

PyCon DE Day Two

April 12, 2022, Jochen

PyCon had about 1.6K visitors which is not too crowded for the bcc, but there are lots of people. It's not easy to recognize someone while most people wear masks most of the time. Maybe wearing t-shirts with a picture the wearers face printed on it would have been helpful, but I didn't thought of that. Instead I wore this t-shirt which is also fitting the mask situation pretty well:

I was a little bit late and had to skip the first session. My first talk was conda-forge: supporting the growth of the volunteer-driven, community-based packaging project. The next one was Unclear Code Hurts by Dario Cannone. Watching bad code from a distance can be a lot of fun. A former collegue described it as watching a car crash show. Just make sure to keep a safe distance and don't end up being responsible for cleaning up the mess.

Then I watched 5 Things we've learned building large APIs with FastAPI by Maarten Huijsmans. The room was packed and it's obvious that fastAPI is a hot topic right now. But it's also still really new and people don't have a lot of experience using it. There was a channel established on the conference discord where people gathered that have a shared interest in fastAPI and we met at a table in person a little bit later. This was a lot of fun and I learned some cool things there. For example how people are dealing with the problem that you cannot get a return value back from global dependencies (just attach it to the request in your dependency).

Another talk I attended was What are data unit tests and why we need them by Theodore Meynard which was really interesting for me, because I didn't know libraries like Great Expectations even existed. Efficient data labelling with weak supervision by Maria Mestre was another talk I listened to. And since all things Django sound interesting to me, I also watched Make the most of Django by Paolo Melchiorre which wasn't about Django itself, but how the Django community is a great place to get started with open source etc. - I have to read the abstracts more careful, I guess. The last talk for the second day was 5 Years, 10 Sprints, A scikit-learn Open Source Journey by
Reshama Shaikh. And then the lightning talks which were also great.

After the conference we picked up some lunch in a small group of people and moved to the c-base because there supposedly was a nix meetup going on, which we didn't find.

Weeknotes 2022-04-11

April 11, 2022, Jochen

Recorded and produced a new podcast episode about microservices. For my podcast hosting project I need some place to store podcast episodes and therefore I finally automated the setup of minio. Let see if it will be fast enough :). Did a bugfix release of kptncook. And started to collect information about deployed services which then could be used by other services doing logging, monitoring or backup.

Things I Learned

If you don't want to run your minio installation publicly this ssh-portforwarding command works (you have to change username and ports): ssh -v -L 10000:127.0.0.1:10000 -L 10001:127.0.0.1:10001 minio@staging.wersdoerfer.de

Books

Fluent Python, 2nd Edition | Just released -> instant buy..

Podcasts

Sam Newman: Monolith to Microservices (InfoQ Podcast) | In preparation for own microservices episode
Folge 99 - Sam Newman - Monolith to Microservices (Software Architektur im Stream) | dito
Sam Newman on Information Hiding, Ubiquitous Language, UI Decomposition and Building Microservices (InfoQ Podcast) | dito
WR1347 Bruttointelligenzprodukt (WRINT - Wissenschaft)
WR1349 Embargo: Modelle, Zölle, Anreize (WRINT - Wirtschaftskunde) | What I like most about this podcast is to get a glance at the economic viewpoint, which is usually really foreign to me. Highlights: Maybe just use tariffs and form a price cartel to rob gas/oil from russia (they have to go down with their own price to be able to sell at market price). Those who scream loudest about oil/gas from russia being not substitutable are those who are most afraid of being substituted.
Intelligente Kraken - Aliens mit neun Gehirnen (Sein und Streit) | Octopus cannot see color with their eyes, but with their arms :€ and the information about color and ground structure does not need to be transferred to the central brain, because each arm has one (bigger than the central one). They build "houses" with an individual aesthetic and get angry if you mess it up.
Machtvoll? - Kunstwerke (Das philosophische Radio) | It's weird, every time I listen to Markus Gabriel talking about abstract topics I'm not familiar with, I'm thinking: Hmm, interesting and eloquent. Everytime I hear him say concrete things about topics I know about (computer stuff), I'm like: Wow, this is complete bullshit. Not sure what to make of this. But there were a lot of intereting definitions of art: From having no intent/purpose (Kant) to showing how things really are (which needs to be intentional, Klee) to art is the definition of art (Kosuth). Kept me entertained at least while doing sports.
Andrew Song of Whisper AI on solving hearing loss with AI (The robot brains)
Föderation Mehr als nur Mastodon (Der INNOQ Podcast) | meh
UKW096 Ukraine: Weltkrieg 2.5 (Unsere kleine Welt) | Really long
Verletzlichkeit macht nahbar - über Vulnerabilität und was wir daraus lernen können (Was denkst du denn?) | Meh
Rhetorik zwischen Wahrheit und Propaganda - das Handwerk der Glaubwürdigkeit (Sein und Streit) | Also meh

PyCon DE Day One

April 11, 2022, Jochen

Didn't expect to attend PyCon DE 2022, but somehow it worked out nevertheless. I arrived in Berlin at sunday evening. It's really good to be able to meet people in person again.

On monday morning there was I long line before the conference venue and I only arrived at 09:30, but I had no trouble to get in on time. Florian Bruhin greeted me from the waiting line, but he confused me with someone else. I should have used the opportunity and ask him about recording an podcast episode about pytest, but I was too perplexed.

Some time ago I tried to join the Python Software Verband e.V, but didn't succeed. Filling out the form locally should work now :).

The first session I attended was Building an ORM from scratch by Patrick Schemitz and Jonathan Oberländer which was great. I had trouble following through because it was so fast paced, but I like to be challanged. Here's the code used in the presentation.

After lunch I listened to the keynote of the first day: Beyond the basics: Contributor experience, diversity and culture in open source projects, which was held online by Melissa Weber Mendonça from brasil. This is a really difficult topic, but there's also a lot of room for improvement.

The talk Python 3.10: Welcome to pattern matching by Laysa Uchoa was informative as well as funny.

Seeing the needle AND the haystack: single-datapoint selection for billion-point datasets was also interesting. Never thought about that it might be hard to visualize a dataset that has many more datapoints than the screen has pixels, but it's kind of obvious to me now :)

The last slot I attended on the first day was the lightning talks. There was a lot of interesting stuff, just to hightlight some:

Typed Settings - Safe and flexible settings with types | it uses attrs / cattrs and it's main purpose is to aggregate settings from different sources like password managers etc
Folium - Python data, leaflet.js maps

But the most valuable experience was of course running into people I haven't seen in a long time and learn about the interesting stuff they have been up to in the meantime.

Weeknotes 2022-04-04

April 4, 2022, Jochen

Applied for Prototype Fund with django-cast, I would love to be able to improve documentation and usabiltiy. Finished a command line client for syncing kptncook recipes with my self-hosted mealie instance. Revisited my old "will_it_saturate" project to compare caddy vs uvicorn speed. Good fun.

Things I Learned

You can have validators run befor assigning values in pydantic models, creating a previously non-existing directory, for example.

Articles

Weeklog of shezi | Lots of interesting stuff as usual.
The Concept of the Ruliad | Uff, maybe I have to read it again..

Talks

Wagtail Space US 2022 | Wagtail conference talks

Youtube

Object-Oriented Programming is Bad | Well, I wouldn't say it's all bad, but there are some points in this talk
Object-Oriented Programming is Good* | Same author, same opinion :)

Twitter

The microphone talk | I remember having this conversation more often than I would have expected
I think the 1420 YouTube channel will make for a nice historical reference a few years from now | Really cool youtube channel. Those things are far more interesting to me than the usual media coverage. Previous generations didn't have access to this kind of content. It's not exactly flying cars, but still pretty cool that this is possible today.

Podcasts

Bits und so #784 (Im Watschnbaumwald) (Bits und so) | Well, there are reasons for limiting the password length that are not completely stupid, but most of the time they are..
(113) So Long (Das Coronavirus-Update) | I will be missing this. Interesting numbers from this episode: infection fatality rate did go down from 1.5% (begin) to 0.11% (now) which is comparable with the flue.. of course there are far more covid infections so it's not the same risk.. for every person dying from covid there's on which dies in hospital from a different cause, but was tested covid positiv
Gesund? - Ehrgeiz (Das philosophische Radio) | Good content, really crappy website :/
Überholt? - die Kultur des Maßes (Das philosophische Radio)
Episode 505: Daniel Stenberg on 25 years with cURL (Software Engineering Radio) | Asked what he would tell aspiring programmers who want to achieve what he did, he replied: "Well, I don't know if I'm able to give advice on how to win the lottery." - pretty wise.
Django the Good Parts - James Bennett (Django Chat) | The part about service layers was really interesting.
Revision 523: Wenig TypeScript und viel Vermischtes (Working Draft) | Learned about those two programming youtube videos above in this episode
Frontend & Feldfrüchte Mit Anselm Hannemann (Wo wir sind ist vorne.) | I cannot believe I would ever say this, but: maybe those episodes a little bit too long..
Wege zum Erfolg - Geduld, die lernbare Superpower (Hörsaal) | Hard to swallow pill
Verhaltensökonomie - Selbstkontrolle macht zufriedener (Hörsaal) | More hard to swallow pills - I do need a drink now
NEWS 13/22 – Redis Stack // Safari Technology Preview 142 // Java 18 // Entwickler:innen-Zufriedenheit // node-ipc.js Malware (programmier.bar) | News from other language communities..
Episode 48: The Simulation Hypothesis (Hotel Bar Sessions) | They talked about roko's basilisk, dammit

Benchmarking nginx vs caddy vs uvicorn for serving static files

April 3, 2022, Jochen

Disclaimer: I have no clue what I'm doing here. If you do: pls halp.

Setup

Last year I did a talk about serving files with Django. For demonstration purposes I wrote a little benchmark tool called "will it saturate". Recently I played around with mealie a little bit and noticed that it uses an additional caddy to serve images for recipes. On discord I asked why and was told it's used because caddy is faster at serving images than uvicorn / starlette. So I wondered how much faster it might be and tried to get my old "will_it_saturate" project to test it.

My base assumption is that there should not be a big difference between different web servers serving static files, because there's not much those web servers do when serving files. They just orchestrate operating system syscalls that do the real work and whether you call those via c (nginx), go (caddy) or python (uvicorn) shouldn't matter that much. Well, turns out that this assumption might be wrong. Which is interesting. The webservers used are: nginx, caddy, uvicorn. I included nginx, because it represents the state of the art in serving static files when configured properly, but I'm not sure whether I managed to do that. Caddy and uvicorn are the two servers I wanted to benchmark against each other.

For the benchmark I created 12.5K files, each containing 100KB of random data (similar to a recipe image in mealie) so that downloading them would saturate a gigabit link for about ten seconds. I know it's possible to saturate a gigabit connection with concurrent file downloads serving those files from uvicorn so I didn't repeat that. The question I'm interested this time is: How much faster than uvicorn is caddy? The main metric is transferred bytes per second. The bigger, the better.

Results

Here's a first result running this notebook:

server	client	elapsed	file_size_h	bytes_per_second_h	arch
nginx/minimal	wrk	0.695651	97.66KB	1.67GB	x86_64
nginx/sendfile	wrk	0.718180	97.66KB	1.62GB	x86_64
caddy	wrk	0.880563	97.66KB	1.32GB	x86_64
fastAPI/uvicorn	wrk	6.153709	97.66KB	193.72MB	x86_64

Ok, well. Seems like caddy is a lot faster than uvicorn, wow. The server is an old intel xeon running linux. The client is wrk opening up 20 connections concurrently (more than a browser usually does). Here are some points that surprised me:

Turning on sendfile didn't make nginx faster. I think this is because there's some hard limit on ssd bandwidth or something like that
Had to use multiple workers with uvicorn. Using a single worker yields only 50MB/s.
Caddy is not much slower than nginx despite nginx is using 4 workers and caddy only one. Probably another hint that there is some hardware bottleneck.

Let's see how the numbers change when I run the script on my macbook air running macOS:

server	client	elapsed	file_size_h	bytes_per_second_h	arch
nginx/sendfile	wrk	0.252434	97.66KB	4.61GB	arm64
nginx/minimal	wrk	0.287506	97.66KB	4.05GB	arm64
caddy	wrk	0.481639	97.66KB	2.42GB	arm64
fastAPI/uvicorn	wrk	4.228977	97.66KB	281.89MB	arm64

Ok, also interesting. My macbook air (M1) does not even get warm running the benchmark. Surprising details:

Using nginx with multiple workers is much faster now (no hardware bottleneck?)
Using sendfile is faster - I didn't know macOS even had a sendfile syscall, weird

Conclusion

There's indeed a big difference between nginx and caddy on one side and uvicorn on the other. And I don't know why, so further research is needed :). The reason I tried to benchmark nginx with and without sendfile was to make sure I'm not measuring some form of kernel level io / zero copy tcp vs having to do all the work in userspace, because uvicorn lacks zerocopy send support atm. The results seem to suggest that it's possible to be very fast without sendfile. Which is good, because nowadays we usually serve files from some kind of object store and not from the file system and therefore it's not possible to benefit from sendfile anyway (please correct me if I'm wrong).

Still, those numbers won't make a big difference in practice, because most machines don't have network links faster than one gigabit. It will become much more relevant when this changes to 10Gb, because than uvicorn will be too slow to saturate it.

The next thing I'll be testing is to preload the files in memory to rule out aiofiles being the culprit for being slow. What I also find really interesting is that I had to use wrk as http client for the benchmarks because the python http clients I tried (httpx, aiohttp) were far too slow. With aiohttp being a little bit faster than httpx. They max out at about 80MB/s. Why is that?

Ephes Blog

Filters / Fulltext-Search

Things I Learned

Books

Articles

Software

Twitter

Podcasts

Things I Learned

Articles

Talks

Youtube

Twitter

Podcasts

Setup

Results

Conclusion