Hello and welcome to this week’s data links. I’m your host, Damon, and I’ve been buried in exploring EMR and EKS integration the past couple of weeks. I’ve also been refreshing myself with Apache Airflow on Kubernetes so it’s been a busy week! Let’s get started…
🧑💻 Up first, I found a great post on the Astronomer blog on “Building a scalable analytics architecture with Airflow and dbt.” 😳 If there’s one tool I’ve heard about almost as much as Airflow…it’s dbt. It’s interesting to see the lengths the authors went to for these two tools to integrate.
https://www.astronomer.io/blog/airflow-dbt-1
🧑💼”Analytics Engineer” is a job role I saw pop up more and more in 2020. Spotify has a good post on what this looks like and how they built that function. One of the key things here, though, is related to one of the goals: “Our first analytics engineering project was to review the landscape of projects and business needs, then consolidate the data into a handful of tables that would cover 80% of the data scientists’ needs.” Less is more.
https://medium.com/spotify-insights/analytics-engineering-at-spotify-f165180a6722
🛠One of the challenges of picking the “right” analytics solution is that you often have to optimize for different use-cases. Apache Pinot is a real-time analytics datastore developed at LinkedIn and I came across a blog post about a Presto Pinot Connector. As a huge Presto fan, the ability to combine the flexibility of Presto with the throughput of Pinot seems really interesting.
https://medium.com/apache-pinot-developer-blog/real-time-analytics-with-presto-and-apache-pinot-part-i-cc672caea307
🧠I always find it tough to keep a mental model of how distributed systems like Apache Spark or Presto work under the hood. Since I’m doing a lot of work on Spark lately, I came across a recent post that describes Spark query execution plans. I found it pretty handy.
https://medium.com/the-code-shelf/spark-query-plans-for-dummies-6f3733bab146
That’s it for this week…if you enjoyed this post, feel free to share with other folks!