Distributed systems class with Aphyr
2017-09-14 22:35
I was in luck. Nuno Job organized a distributed systems class for his YLD crowd and invited friends over to join. The class was thought by Kyle Kingsbury (Aphyr) who is probably best known for Jepsen. I obviously couldn't say no to such an offer.
I really enjoyed it. Although I've been working on a distributed system for the most of my career, it was good to get a general overview of distributed systems from ground up. There is obviously more than just databases or what you learn at university.
We touched many topics, real world stories were told. Kyle did a great job, leading seamlessly from one topic to the other. He really knows what he's talking about, is funny and makes it an overall great experience. You find the contents of the class on Github, but you really want to have Kyle teaching it to you.
New things I've learned about
So there was something to learn for everyone. Here's a few things that I need to dig into deeper:
- Formal models for processes: there's more design primitives for concurrency than CSP (Communicating sequential processes), which Go's channel are inspired from and the Actor Model which Erlang's currency is based on. There's also Ambient and ?-calculus. Both haven't seen such a widespread use in programming languages.
- When you want to do node discovery with a gossip protocol, have a look at Plumtree (Luis Rodrigues, Jose Pereira, Joao Leitao: Epidemic Broadcast Trees).
- I find it a good idea to call in-memory KV stores like Redis and Memcached "shared heaps".
- The paper Peter Alvaro, Joshua Rosen, and Joseph M. Hellerstein: Lineage-driven Fault Injection about testing the robustness of algorithms with fault injection I need to checkout.
- Another interesting (non-technical) essay Jo Freeman: The Tyranny of Structurelessness I need to read is about non-hierarchical groups. I was well aware that there's always an implicit hierarchy, but I never thought about the "unfairness" of it. If you know the right people, you have a big advantage.
- I also got excited about CRDT (conflict-free replicated data type). That really seems to be a good way for distributed changes that should be synced eventually.
- And finally some tooling, it was about distributed tracing to find networking latency issues. Tools mentioned were Zipkin (which is based on Google's Dapper paper, OpenTracing and a third one I can't read/remember/find (Omphalos or so?).
Categories: en