9 min read

Data Engineering Sucks

We've all been there—a seemingly innocent request for a quick and dirty data solution, only to find ourselves knee-deep in nitpicking and dissatisfaction. "Can you quickly pull this data for me?" they ask, blissfully unaware of the Pandora's box they've just opened.
Data Engineering Sucks

We've all been there—a seemingly innocent request for a quick and dirty data solution, only to find ourselves knee-deep in nitpicking and dissatisfaction. "Can you quickly pull this data for me?" they ask, blissfully unaware of the Pandora's box they've just opened. Little do they know, data engineering, much like fine wine, does not age well when rushed.

My relationship with Data Engineering is perfectly summarized using one word (thanks Germany 🇩🇪): Schadenfreude.

Schadenfreude is the experience of pleasure, joy, or self-satisfaction that comes from learning of or witnessing the troubles, failures, pain, or humiliation of another. 

In this article, I'll rant a little about data engineering's worst facets, exploring everything from the quick and dirty dilemma to the longing for the days of software programming. I'll touch on the cycle of massive projects, the specificity struggle, and the ever-elusive refinement conundrum. So, fasten your seatbelts as we navigate the data engineering rollercoaster together.

Ready or not, here we go!

The Quick and Dirty Dilemma

Quick vs. Quality: Why Data Engineering Should Never Be Rushed

Choose one!

In the frenetic world of data engineering, where requests fly in like arrows, the temptation to deliver a quick and dirty solution can be alluring. "Can you do it quick and dirty so I can use it by the end of the week?" they say, their urgency palpable. However, for the seasoned data engineer, this request rings alarm bells louder than a fire drill.

The Lure of Quick Fixes and Their Hidden Costs

It's a scenario familiar to many in the field: the business urgently needs a data solution, and the pressure is on to deliver promptly. The allure of a quick fix is undeniable. After all, why wait until next month when the data isn't needed? This seemingly innocent request, though, often unravels into a cascade of issues.

In the quest for speed, the foundational principles of robust data engineering can be compromised. Rushed projects may need more proper documentation, thorough testing, or adherence to best practices. The result? A brittle solution that might work for now but poses a ticking time bomb for future complications.

How to Push Back: Triage and Prioritization

To combat the quick and dirty dilemma, a strategic approach is paramount. New requests don't land directly on the data engineer's desk. Instead, they go through a meticulous triage process. The project manager and team prioritize and size them during biweekly sprint planning. This deliberate delay allows for thoughtful consideration and ensures that urgent does not equate to hasty.

When faced with objections to this method, seasoned data engineer holds their ground. Directing new requests to Triage and prioritizing them within the sprint planning cycle becomes a shield against the chaos of urgency. It's not about ignoring the speed but channeling it effectively. If stakeholders are displeased, the product owner becomes the battleground for arguing project prioritization, which, as experience shows, is seldom won against a well-structured prioritization process.

The Value of Patience in Data Engineering

Data engineering is a meticulous craft that demands patience. Unlike some software roles where features can be continuously added, improved, and maintained, data engineering often follows a pattern of deliverables and immediate movement to the next task. It's a rhythm that, for better or worse, defines the profession.

In the dance between quick and quality, data engineers must resist the seductive call of immediacy. The quick and dirty dilemma is a siren song that, if heeded, can lead to a cacophony of issues. By embracing the triage and prioritization approach, data engineers not only protect the integrity of their work but also pave the way for a more sustainable and strategic data ecosystem. In data engineering, patience is a virtue, and quality will always trump speed.

The Dashboard Debacle

The Art of Dashboard Discontent: When DEs Become Designers

Hello? Based department?

Data engineers are familiar with navigating the labyrinth of requests, and among the more perplexing demands is the call for dashboards. "Can you turn this into a dashboard?" they ask, perhaps unaware of the Pandora's box they are opening. The transition from crafting intricate data pipelines to playing the role of a dashboard designer is a journey laden with challenges and unexpected detours.

From Code to Canvas: The Dashboard Shift

Yet, the reality often unfolds differently. The data engineer, accustomed to the precision of coding, now grapples with visualization tools, color schemes, and the delicate balance between information overload and user-friendly design. It's a transition that feels more like straying from the comfort of code to the uncertainty of canvas.

While some data engineers are okay with dashboard building, it's essential to know that it is not considered a part of the DE toolset, and expectations should be set accordingly.

As the demand for dashboards persists, data engineers are thrust into a dual role: the architect of data pipelines and the artist of visual narratives. The dashboard debacle unfolds as a technical challenge and a quest for balance. Can data engineers find a way to infuse creativity into their pipelines' precision, move beyond being mere providers of data, and become true collaborators in its interpretation? The journey through the dashboard dilemma is not just about pixels and charts; it's about the evolving identity of the data engineer in a world increasingly hungry for insights and understanding.

The Unspoken Longing for Programming Glory

Tangentially related to unnecessary dashboarding requests, but still deserving of a mention: Amidst the data flow and transformation scripts, there's a whisper of nostalgia for the days of programming software or crafting business logic. The sentiment echoes through the minds of many data engineers, who occasionally yearn for a task beyond data movement—a task with a touch of creativity and problem-solving, much like the artistry involved in software development.

Data Engineering still hasn't found its place in the software world – is it a subset of software engineering? Is it closer to analytics? Is it DevOps? Who knows? But one thing is for sure, most Data Engineers who come from either side of the spectrum want to dive deeper into problems that require coding to be solved.

The Deliverables Dilemma

Deliver, Rinse, Repeat: Data Engineering's Cycle of Massive Projects

The pace is relentless. Projects come and go with a rhythm that resembles more of a sprint than a marathon. The data engineer is often likened to a diligent courier who delivers data solutions swiftly and efficiently. Yet, beneath this apparent efficiency lies a dilemma— a perpetual cycle of massive projects that leaves little room for reflection, refinement, or the luxury of revisiting past endeavors.

The Unending Cycle of Data Delivery

The nature of data engineering work often follows a predictable pattern. A request emerges, and the data engineer dives into the intricacies of design and implementation, and soon enough, a deliverable is born. It's a cycle that keeps the gears of the data ecosystem turning and propels the engineer into a perpetual sprint toward the next project.

The engineer pours effort into a project, births a data pipeline, and promptly moves on to the next challenge. The sense of continuity, improvement, or revisitation, common in software development, is a rarity in data engineering.

The Reward: Another Massive Project

As the completion flag is raised on one project, there's little time for celebration before the next behemoth appears on the horizon. The reward for a job well done is not a breather or an opportunity to refine and improve but rather the daunting prospect of another massive undertaking. The deliverables dilemma manifests—data engineers tirelessly produce yet seldom get the chance to perfect.

Data Engineering vs. Application Development

A stark contrast emerges when comparing data engineering to other software roles, particularly application development. While application developers often find themselves immersed in the ongoing process of adding features, maintaining, and refining the same piece of software, data engineers oscillate between distinct projects with finite timelines.

Yearning for Meaningful Utilization

"I feel like I'm moving data from A to B and handing it over to whoever wants to analyze it," laments the data engineer. The desire to engage in something more substantial, like data science, creeps in. The paradox is that the data engineers who facilitate analysis often find themselves far from the analysis itself.

The dichotomy between delivering data and actively participating in its utilization paints a vivid picture of the dashboard debacle. It's not just about creating aesthetically pleasing visualizations; it's about bridging the gap between data movement and meaningful data utilization—a delicate tightrope walk many data engineers navigate.

The Quest for Balance

The deliverables dilemma poses a fundamental question: How can data engineers balance meeting the demands of ongoing projects and allocating time for refinement and improvement? Finding this equilibrium is no small feat in a field where the urgency of data needs dictates the pace.

Check out my other article about Data Products if you are interested in the details of how Data Engineers can maximize their productivity.
The Evolution and Impact of Data Products [A Deep Dive]
You have probably heard that “data is the new oil” - but can oil do AI? I don’t think so! In this article, I collect my thoughts about Data Products and how they shape the technology scene.

It becomes evident that the rhythm of data engineering is both a strength and a challenge. While swiftly responding to data demands is crucial, a thoughtful pause for refinement is equally vital. The quest for balance lies at the heart of this dilemma. This balance allows data engineers not only to deliver promptly but also to evolve, refine, and leave an enduring mark on the ever-evolving landscape of data engineering. The deliverables may be relentless, but so should the commitment to continuous improvement.

The Specificity Struggle

From Models to Sharepoint: The Ever-Evolving Requests

Data engineers are the architects of the data realm, crafting intricate models that promise to deliver information to the right place at the right time. However, in the complex dance of data, it's not just about the elegance of the model; it's about the myriad of specific requests that flood in, each a unique challenge requiring a tailored solution.

Most data engineers take pride in sculpting models like artists molding clay. Some data engineers try to stay as far away from modeling as possible. This split in motivation is the main reason the whole discipline is split into two, with one side tending more toward platform engineering and the other toward analytics engineering. I'll explore this in a future article.

The Dichotomy: Generalized Models vs. Specific Solutions

"We build beautiful dimensional models that deliver data right place, right time," proudly declares the data engineer. Yet, the struggle lies in the clash between the elegance of these generalized models and the relentless demand for particular solutions.

No matter the model's beauty, the data engineer inevitably faces a barrage of specific, sometimes quirky, requests. "How can I attach this SharePoint document?" or "My report is not refreshing. Can you help me?" The struggle intensifies when dealing with requests that defy the logic of the elegant models painstakingly crafted.

Handling the 'Chores'

These specific requests are the 'chores' of data engineering—tasks that sometimes seem menial compared to the grandeur of designing data architectures. Whether it's troubleshooting a Python script gone awry in PowerBi or unraveling the mystery of obscure data manipulation, the data engineer is summoned to untangle the intricacies.

No matter how well-constructed the model, the 'But How?' persists. "I've created this very obscure DAX script to run in PowerBI over your model as a composite model, but it doesn't work anymore; please help." The specificity struggle emerges not only as a technical challenge but as a diplomatic art—finding solutions to requests that defy the laws of data engineering physics.

Obviously, the beauty of data engineering lies not just in crafting generalized elegance but in the ability to adapt to the quirks of specific demands. In their journey, the data engineer becomes a navigator, skillfully balancing the precision of a model with the adaptability needed to address the ever-evolving, sometimes perplexing, requests. The specificity struggle is not just a technical challenge; it's an art—a dance between precision and generality that defines the essence of data engineering.

Conclusion

Before you start contemplating a career switch to interpretive dance or underwater basket weaving, let's take a collective breath.

Data engineering, with all its idiosyncrasies, remains an awe-inspiring odyssey. It's a realm where every data hiccup, every "specificity struggle", and every "deliverables dilemma" is a riddle waiting to be solved. It's a journey full of plot twists in the grand narrative of digital adventures.

As you navigate the complexities of your data realm, remember this: Every challenge is an opportunity in disguise, and every specific request is a chance to showcase your data knowledge.

Here's to the joy of turning chaos into coherence. Data engineering, you quirky, challenging, and utterly amazing field, we wouldn't trade you for anything else—even if interpretive dance does have its allure. Until our next data-driven adventure, keep coding, laughing, and keep engineering the future! 🚀✨