Hi folks! Not a very content-ful week as I’ve been busy recording some videos about EMR on EKS. But here’s a few things I did come across!
📊 Apache Druid rears its head again in this article from Reddit about scaling their internal reporting. I’m probably behind the times given the list of Druid-powered companies on their site, but I feel like we’ll see more and more folks moving reporting systems to Druid.
https://redditblog.com/2021/02/26/scaling-reporting-at-reddit/
🎙Speaking of Druid, did you know that Wikimedia uses it? Here’s a fantastic interview with Nuria Ruiz, a Principal Engineer that led the Data Engineering team. My own personal favorite quote? “Most of the real hard problems have to do with people, rather than technology.” 😍 The other awesome thing is a lot of their work is open (see: Oozie vs. Airflow) - what an awesome resource!
https://www.speedwins.tech/posts/some-words-with-nuria-ruiz
⚙️I just came across Airbyte last night - an open-source EL(T) platform - that honestly looks really impressive and makes me sad for own hacking in that space I’ve done lately. I haven't had a chance to use it firsthand, but it sure looks shiny!
https://github.com/airbytehq/airbyte
⚙️Finally, while we’re on the topic of flinking data around (hah!), here’s a fun post that shows how to profile data in Kafka using kafkacat
and VisiData. I’d never heard of VisiData before but it sounds awesome! “An open-source multitool for exploring data in the terminal. Like vim for tabular data.” Could you imagine anything better?!
That’s all the news in Lake Data - feel free to pass this on to your friends and ping me on Twitter if you find something you think I’d enjoy. 👋