6 min read

My Data Engineering Study Framework for 2024

How I structure my learnings for the year into my own framework.
My Data Engineering Study Framework for 2024

This article outlines my personal learning plan for 2024, structured around four key categories: targeted upskilling, observing trends, experimenting with new skills, and avoiding potential pitfalls.

All of these components are dynamic throughout the year, but I try my best to organize my plans around them.

Targeted Upskilling

One of my goals for 2024 is to deepen my knowledge and skills in data engineering, a field that I've been working in for almost 10 years, but always find more interesting stuff to dive into. I have multiple topics in my I want to improve in; from data visualization to database internals.

To achieve this goal, I plan to immerse myself in a variety of resources, including:

Books

I read a few technical books per year usually. For 2024, my plans include reading Data Science for Business and Database Reliability Engineering.

For books I've already read and recommend, check out my other blog post.

If you have any recommendations, please reach out!

Articles

Regularly consuming insightful articles from industry publications and blogs will keep me abreast of the latest advancements and best practices in data engineering. I read a lot of blogs. Some of my favorite blogs:

Conference Talks

Attending or watching recordings of relevant conference talks will provide me with exposure to diverse perspectives and cutting-edge approaches in data engineering. I usually enjoy deep technical talks from Data Council and similar conferences, or more targeted talks about certain technologies, such as the ones from Snowflake's conference. Coalesce usually has great talks around the topic of analytics engineering.

Other recommendations:

Documentation

Thoroughly reviewing official documentation from open-source tools and frameworks will ensure I have a comprehensive understanding of their functionality and usage. Most docs are garbage, some are amazing – if you are interested in a tool, you'll have to deal with whatever they provide, or in case it's open source, you can take the initiative and make a few contributions!

I have to mention the docs of Modal, which contain an amazing collection of guided examples to teach you how to use their product – and all examples are unit-tested against new changes to the code, so you won't encounter anything that is not prepared to use the latest version! How cool is that?!

Writing

I have a whole article about how writing is one of the best tools for learning. Check it out below.

Writing to learn
Writing can be almost as powerful a tool as reading for learning new concepts, especially if you also publish your notes in some form.

In addition to deepening my expertise in data engineering, I also aim to stay informed about emerging trends that are shaping the future of the field. The biggest one currently is AI, but there are some other aspects of the data world that I want to stay up to date in.

These trends include:

  • Large Language Models (LLMs): Understanding the potential and limitations of LLMs in data engineering tasks will be crucial for future-proofing my skillset not just in Data Engineering but all aspects of my life.
  • Data Contracts: Gaining insights into the latest and greatest best practices around data quality is always good to do. While fairly popular nowadays, Data contracts are still an emerging (and interesting) concept in my mind with room to grow
  • Table formats: I'm invested in seeing how table formats evolve with time. Not so much a believer in any "wars" between them – as a consumer, I can only win if their goal is to be better than each other.
  • Streaming data: Not quite emerging as it's been with us for a good while (similarly to table formats I guess) but it's something that I want to stay on top of, even if it won't be the focus of my learnings this year.

To keep abreast of these trends, I plan to:

  • Read articles and watch videos: Regularly consuming relevant content from industry experts and thought leaders will provide me with a solid understanding of these emerging areas. The hardest part here is avoiding vendor content – if you read any technical article about these topics take everything with a grain of salt and know that their main purpose is to sell their product.
  • Attend webinars and workshops: Participating in webinars and workshops on specific topics will allow me to engage with experts and gain practical insights. For example, to deepen my knowledge of streaming topics, I often see what the folks over at Redpanda are up to. They take a lot of effort into creating educational content, although the same disclaimer applies as above.
  • Contribute to open-source projects: Getting involved in open-source projects that are exploring these trends will provide me with hands-on experience and valuable networking opportunities. A lot of the data world is glue, the necessary tooling that connects data producers and consumers – this brings with itself the notion of "connectors" aka. the software implementations of said glue. I find that a lot of open-source data products encourage folks to develop connectors for their tools, such as Airbyte or Meltano.
  • Follow emerging companies: Companies such as dbt, or Decodable which put considerable effort into user education are worth following to stay up to date in their domain. Sometimes it's worth joining their Slack/Discord groups and seeing what current discussions are about.

Experimenting with New(-ish) Skills: Exploring Tangential Domains

While data engineering remains my primary focus, I also want to explore new skills that could potentially open up new career opportunities or complement my existing expertise. These skills include:

  • Cloud Computing & Serverless: Although I have a lot of experience with most Cloud Service Providers, I feel it's time to expand my horizons to new cool tech, such as Modal. Infrastructure will always be close to my heart and the latest tooling around it can give a good developer superpowers.
  • Web Development: I did some web development years ago, but just recently I realized to ship my side projects to users, I'll have a lot of room to improve. Also, apparently, JQuery is not cool anymore!

To experiment with these new skills, I plan to:

  • Enroll in online courses: Completing online courses will provide me with structured learning and hands-on practice. For end-to-end development workflows such as for full-stack web development, a well-guided course is a great starting point.
  • Build personal projects: Working on personal projects will allow me to apply my newfound skills in real-world scenarios. I'm a serial side-project-builder. It has pretty much become my go-to way of learning things – the hard part will be to actually see them to completion and ship something!
  • Network with professionals in related fields: Connecting with individuals who are experts in these areas will provide me with mentorship and insights.

Avoiding Potential Pitfalls

In my experience, time spent studying is not equal to time spent studying effectively. I've found myself going down rabbit holes before that didn't provide any actual value. This is probably the most important component of the framework!

To get around these mistakes, I aim to avoid the following:

  • Overreliance on LinkedIn Influencers: While LinkedIn influencers can provide valuable insights, their content should be critically evaluated and not blindly followed. Always keep in mind that their only goal is to generate engagement, which is sometimes easier with bullshit.
  • Focusing on Beginner Content: While introductory resources are helpful, overindulging in beginner content can induce a state of analysis paralysis. In addition to that, most beginner content is just straight-up garbage since it requires the lowest barrier of entry to write, so the internet is pretty much flooded with rehashed, stale material that is hard to avoid. Similar quality advanced material is harder to find, and using recursive learning you can always fill the holes in your knowledge.
  • Relying on GPT-Generated Content: While GPT models can produce creative text formats, their use in learning should be limited due to potential inaccuracies and biases.

That's all!

Have fun learning this year!