Ephes Blog

Miscellaneous things. Not sure what to put here, yet.


Weeknotes 2022-04-25

, Jochen

Released version 0.0.5 of the command line kptncook scraping tool based on the work to reverse the kptncook api from Daniel. It's now possible to fetch recipe metadata just by providing the sharing url of a recipe from the kptncook app. It's also possible to download all stored favorites and backup them locally or import them to mealie.

After listeing to LOV021 Podlove-API mit Dirk Schumann I realised it might be possible to use the podlovers podcast frontend also for django-cast because it doesn't depend directly on wordpress anymore. This is very exciting, because having to use php/wordpress was the main reason keeping me from looking more closely into the whole podlove ecosystem. The only thing I have to do is to write an adapter for the api (and there are some parts missing as well like transcripts and contributors). But at first I have to be able to test and understand the existing api locally. Therefore my aversion for wordpress led me to having to use it finally. In the end it wasn't that difficult at all and I took some notes on how to reproduce that. File handling is not working yet, but I'm sure to figure it out next week.






Weeknotes 2022-04-18

, Jochen

Met a lot of people last week. Spent the first half of the week going to this years PyCon DE despite I thought I wouldn't. This was really cool (day 1, day 2, day 3). After that I spent the last half of the week attending family appointments. Avoided to catch fire / covid, ffp2 ftw. Somebody was more successful than me reversing the kptncook app and opened an github issue to notify me about that. How awesome is that? As soon as I'll be able to spent more time on computer stuff I'm going to enhance my kptncook-scraper accordingly.

Things I Learned

  • auto_error in fastAPI allows you to test for different authentication methods (cookie, bearer token, etc), but using an authentication middleware as in django is probably a cleaner solution
  • It's not possible to return values from dependencies declared at router level in fastapi, but you could attach stuff to the request and use it as a container for additional state






PyCon DE Day Three

, Jochen
Got up early enough today and catched the right bus. Buying a 24h ticket exactly 24h after you previously buyed one (same bus) seems to be a very strange usecase, because it broke the BVG-app. But I got to the bcc on time. Which was important, because the first talk was one of the talks I was most looking forward to: Python 3.11 in the Web Browser - A Journey by Christian Heimes. And it was as great as expected.

Another talk I was very eager to see was It is all about files and HTTP by Efe Öge which was scheduled right after the keynote in the same room, but it was more like a gentle introduction into file serving. Not the hardcore "Why do we even need nginx? Let's do it all in python with uvloop and asyncio! Let's bring zero copy tcp to uvicorn!!1" talk I would have loved to hear. Maybe I have to do this talk by myself someday. But not now, I don't have time. No, don't do it.

The next talk I attended was Squirrel - Efficient Data Loading for Large-Scale Deep Learning by Dr. Thomas Wollmann. It was about how to optimize your data ingestion to avoid having your GPUs idle (which is probably a valid usecase). Squirrel was said to be an implementation of a data-mesh - a pattern which was proposed by thoughtworks and mckinsey (never heard about it, until now). I don't know if it's fair to reject the idea solely based on this evidence (the proponents). I have a podcast episode about data meshes in my inbox, maybe I use this as an excuse to finally listen to it. Meh.

After that I listened to On Blocks, Copies and Views: updating pandas' internals by Joris Van den Bossche. This was a really great talk. Pandas is great, but it has the problem that it's often hard to say which operations do produce a data copy. And if you change data in a dataframe you believe to have copied but haven't, bad things might happen. And under exactly which circumstances a copy was created or the data referenced in multiple dataframes is altered is completely confusing. A clean solution for this confusion would be to use some kind of "copy on write" mechanism for dataframes where each operation yields a "copy", but an actual copy is only created when data is changed. But this will probably break a lot of old code, therefore it's not easy to switch to that solution.

After lunch I attended How to Find Your Way Through a Million Lines of Code by Jürgen Gmach which was also really great. It was about things you can do to get more familiar with a large codebase. For example: If you don't know were to put a new test, set a breakpoint at the location were you want to change something and then run the tests. Then put your new test beside the semantically nearest test you found using this method.

The last talk of the day was kind of a blast from the past: Transformer based clustering: Identifying product clusters for E-commerce by Sebastian Wanner and Christopher Lennan. I did something similar (use machine learning to solve this problem, not the transformer stuff which wasn't invented yet) 15 years ago working for billiger.de. Overall their approach seemed very similar to ours and their numbers (0.84 F0.5 on shoe offers) looked really good. The room was packed and the questions from the audience were also very good. I would bet there were more people working on the same problem in the room. Too bad nobody cared about this problem back when I was working on it. I have to say I'm a little bit jealous now, *sigh*. Very interesting talk.

So, back to Düsseldorf after three intense days. I  think I had one warning in my corona warn app in the last two years, but now it's going crazy. Public transport probably. But since I nearly always wore a mask I'm not really worried.

PyCon DE Day Two

, Jochen
PyCon had about 1.6K visitors which is not too crowded for the bcc, but there are lots of people. It's not easy to recognize someone while most people wear masks most of the time. Maybe wearing t-shirts with a picture the wearers face printed on it would have been helpful, but I didn't thought of that. Instead I wore this t-shirt which is also fitting the mask situation pretty well:

I was a little bit late and had to skip the first session. My first talk was conda-forge: supporting the growth of the volunteer-driven, community-based packaging project. The next one was Unclear Code Hurts by Dario Cannone. Watching bad code from a distance can be a lot of fun. A former collegue described it as watching a car crash show. Just make sure to keep a safe distance and don't end up being responsible for cleaning up the mess.

Then I watched 5 Things we've learned building large APIs with FastAPI by Maarten Huijsmans. The room was packed and it's obvious that fastAPI is a hot topic right now. But it's also still really new and people don't have a lot of experience using it. There was a channel established on the conference discord where people gathered that have a shared interest in fastAPI and we met at a table in person a little bit later. This was a lot of fun and I learned some cool things there. For example how people are dealing with the problem that you cannot get a return value back from global dependencies (just attach it to the request in your dependency).

Another talk I attended was What are data unit tests and why we need them by Theodore Meynard which was really interesting for me, because I didn't know libraries like Great Expectations even existed. Efficient data labelling with weak supervision by Maria Mestre was another talk I listened to. And since all things Django sound interesting to me, I also watched Make the most of Django by Paolo Melchiorre which wasn't about Django itself, but how the Django community is a great place to get started with open source etc. - I have to read the abstracts more careful, I guess. The last talk for the second day was 5 Years, 10 Sprints, A scikit-learn Open Source Journey by 
Reshama Shaikh. And then the lightning talks which were also great.

After the conference we picked up some lunch in a small group of people and moved to the c-base because there supposedly was a nix meetup going on, which we didn't find.

PyCon DE Day One

, Jochen
Didn't expect to attend PyCon DE 2022, but somehow it worked out nevertheless. I arrived in Berlin at sunday evening. It's really good to be able to meet people in person again.

On monday morning there was I long line before the conference venue and I only arrived at 09:30, but I had no trouble to get in on time. Florian Bruhin greeted me from the waiting line, but he confused me with someone else. I should have used the opportunity and ask him about recording an podcast episode about pytest, but I was too perplexed.

Some time ago I tried to join the Python Software Verband e.V, but didn't succeed. Filling out the form locally should work now :).

The first session I attended was Building an ORM from scratch by Patrick Schemitz and Jonathan Oberländer which was great. I had trouble following through because it was so fast paced, but I like to be challanged. Here's the code used in the presentation.

After lunch I listened to the keynote of the first day: Beyond the basics: Contributor experience, diversity and culture in open source projects, which was held online by Melissa Weber Mendonça from brasil. This is a really difficult topic, but there's also a lot of room for improvement.

The talk Python 3.10: Welcome to pattern matching by Laysa Uchoa was informative as well as funny.

Seeing the needle AND the haystack: single-datapoint selection for billion-point datasets was also interesting. Never thought about that it might be hard to visualize a dataset that has many more datapoints than the screen has pixels, but it's kind of obvious to me now :)

The last slot I attended on the first day was the lightning talks. There was a lot of interesting stuff, just to hightlight some: But the most valuable experience was of course running into people I haven't seen in a long time and learn about the interesting stuff they have been up to in the meantime.