Ephes Blog

Miscellaneous things. Not sure what to put here, yet.


Benchmarking nginx vs caddy vs uvicorn for serving static files

, Jochen
Disclaimer: I have no clue what I'm doing here. If you do: pls halp.


Setup

Last year I did a talk about serving files with Django. For demonstration purposes I wrote a little benchmark tool called "will it saturate". Recently I played around with mealie a little bit and noticed that it uses an additional caddy to serve images for recipes. On discord I asked why and was told it's used because caddy is faster at serving images than uvicorn / starlette. So I wondered how much faster it might be and tried to get my old "will_it_saturate" project to test it.

My base assumption is that there should not be a big difference between different web servers serving static files, because there's not much those web servers do when serving files. They just orchestrate operating system syscalls that do the real work and whether you call those via c (nginx), go (caddy) or python (uvicorn) shouldn't matter that much. Well, turns out that this assumption might be wrong. Which is interesting. The webservers used are: nginx, caddy, uvicorn. I included nginx, because it represents the state of the art in serving static files when configured properly, but I'm not sure whether I managed to do that. Caddy and uvicorn are the two servers I wanted to benchmark against each other.

For the benchmark I created 12.5K files, each containing 100KB of random data (similar to a recipe image in mealie) so that downloading them would saturate a gigabit link for about ten seconds. I know it's possible to saturate a gigabit connection with concurrent file downloads serving those files from uvicorn so I didn't repeat that. The question I'm interested this time is: How much faster than uvicorn is caddy? The main metric is transferred bytes per second. The bigger, the better.


Results

Here's a first result running this notebook:
 
server client elapsed file_size_h bytes_per_second_h arch
nginx/minimal wrk 0.695651 97.66KB 1.67GB x86_64
nginx/sendfile wrk 0.718180 97.66KB 1.62GB x86_64
caddy wrk 0.880563 97.66KB 1.32GB x86_64
fastAPI/uvicorn wrk 6.153709 97.66KB 193.72MB x86_64

Ok, well. Seems like caddy is a lot faster than uvicorn, wow. The server is an old intel xeon running linux. The client is wrk opening up 20 connections concurrently (more than a browser usually does). Here are some points that surprised me:
 
  • Turning on sendfile didn't make nginx faster. I think this is because there's some hard limit on ssd bandwidth or something like that
  • Had to use multiple workers with uvicorn. Using a single worker yields only 50MB/s.
  • Caddy is not much slower than nginx despite nginx is using 4 workers and caddy only one. Probably another hint that there is some hardware bottleneck.
Let's see how the numbers change when I run the script on my macbook air running macOS:
 
server client elapsed file_size_h bytes_per_second_h arch
nginx/sendfile wrk 0.252434 97.66KB 4.61GB arm64
nginx/minimal wrk 0.287506 97.66KB 4.05GB arm64
caddy wrk 0.481639 97.66KB 2.42GB arm64
fastAPI/uvicorn wrk 4.228977 97.66KB 281.89MB arm64

Ok, also interesting. My macbook air (M1) does not even get warm running the benchmark. Surprising details:
 
  • Using nginx with multiple workers is much faster now (no hardware bottleneck?)
  • Using sendfile is faster - I didn't know macOS even had a sendfile syscall, weird


Conclusion

There's indeed a big difference between nginx and caddy on one side and uvicorn on the other. And I don't know why, so further research is needed :). The reason I tried to benchmark nginx with and without sendfile was to make sure I'm not measuring some form of kernel level io / zero copy tcp vs having to do all the work in userspace, because uvicorn lacks zerocopy send support atm. The results seem to suggest that it's possible to be very fast without sendfile. Which is good, because nowadays we usually serve files from some kind of object store and not from the file system and therefore it's not possible to benefit from sendfile anyway (please correct me if I'm wrong).

Still, those numbers won't make a big difference in practice, because most machines don't have network links faster than one gigabit. It will become much more relevant when this changes to 10Gb, because than uvicorn will be too slow to saturate it.

The next thing I'll be testing is to preload the files in memory to rule out aiofiles being the culprit for being slow. What I also find really interesting is that I had to use wrk as http client for the benchmarks because the python http clients I tried (httpx, aiohttp) were far too slow. With aiohttp being a little bit faster than httpx. They max out at about 80MB/s. Why is that?


Weeknotes 2022-03-28

, Jochen

Got my kptncook to mealie integration to work. My mealie instance is now running at mealie staging. If you'd like to have an account, drop my a line. Not much progress on other projects.

Things I Learned

  • You can record traffic with your fritzbox using this link - very nice for debugging

Articles

Youtube

Twitter

Software 

Podcasts


Weeknotes 2022-03-21

, Jochen
Catched a cold, not corona, but still bad. Spent most of last week refactoring fastdeploy. There are still two difficult to fix issues left but it's usable anyway. Wrote some release notes this time :).

The second thing I spent time on was mealie. It's a receipt manager implemented with fastAPI and vue which is a stack I use, too. So I played around with it a little bit and deployed it to my own infrastructure. This was easier than expected, but I found an a bug in fastdeploy (big task output breaks on `await proc.stdout.readline()`). Wonder whether people pay money for a hosted version of mealie?
 

Things I Learned

  • Usually I prefer to run my tests directly with pytest instead of vscode because my console output is nicely colorized and it's much easier to add options like '--full-trace' etc. but I just found out it's at least possible to view the test stdout in vscode using ⌘⇧U and then selecting "Python Test Log" from the dropdown on the upper right. So I might run tests from vscode from time to time now :).
  • You can install optional poetry dependencies with `poetry install -E optional_name`

Articles

Youtube

Twitter

Software 

Podcasts


Weeknotes 2022-03-14

, Jochen
Still working on the sqlmodel removal from fastdeploy and adding some kind of software architecture:
  • Fetching the deploy steps for a service now works again
  • The cli commands work again with the new application architecture
  • All api endpoints work again
  • All the e2e frontend tests work again
So, it's nearly done. There's probably a new release next week. This whole thing took about 3 weeks (working only in my spare time). That's a bit more than I would have expected, but new things which were not planned got added too (async db support for example). Well, it's probably a good idea to have smaller chunks of work, but that is difficult to do if you rewrite lots of infrastructure stuff. My current rationalization for my time being well spent goes like this: As long as it is helping me shifting costs from marginal to fixed, additional effort is fine.

Maybe this bot can help me preventing further doomscrolling.


Things I Learned

Articles

Misc

Youtube

Books

Twitter

Software 

Podcasts


Weeknotes 2022-03-07

, Jochen

*Doomscrolling intensivies*

But there has been some progress on the removal of sqlmodel from fastdeploy task (this issue keeps getting bigger, and it was a slow work week) as well:

  • While I'm introducing an own unit of work pattern and rewriting lots of the database stuff anyway, I thought: Well, maybe switching to asyncpg and having the whole database layer async would be not a lot more additional work. This is now done.
  • Rewrote the auth module, because I wanted to get rid of all fastAPI dependencies. The code now looks much nicer, too.
  • Syncing service configurations from filesystem is now also possible again (I have to admit writing things like an AbstractFilesystem class bring me quickly to the point of reconsidering my lifestyle choices).

Things I Learned

  • You can use PYTHONPATH=$(pwd) in a Procfile to start jupyter lab/notebooks that keep the project root in pythonpath
  • You can use session.expunge(object) to be able to use mapped python objects after a sqlalchemy session is closed (yeah, I'm just beginning to use sqlalchemy)

Youtube

  • My Voice Over Chain | I often get mocked for being nerdy about audio quality. Well, maybe go and mock this guy instead (or learn something from him and improve your recordings).
  • Have Single-Page Apps Ruined the Web? | Transitional Apps with Rich Harris, NYTimes | Great video. Until a few months ago I would have completely agreed. But seeing what is possible with libraries like htmx I'm currently feeling more like: "Holy shit, SPAs are dead!". And I don't trust this whole notion of "Just use this sparkling framework X and your code will automatically deployed on an edge CDN node in a V8 vm and everything will be like magic". I like things to be simple, not complicated and magic. The reason for that is: I know that my own interest in keeping my stuff running is much greater than the interest of some bored bigcorp ops staff. Even kubernetes is far too complicated from my point of view. If something goes wrong, I want to be able to fix it. But maybe kubernetes or netlify or fly makes things possible that I need and could not do by myself? My opinion on that is: hell no.
  • Putin, die Ukraine und danach? | Mit offenen Karten Spezial Ukraine | ARTE | Good signal-to-noise ratio

Twitter

Software 

  • pgcli | Postgres command line client with syntax highlighting

Podcasts