Data engineering is a fast-growing field with no shortage of cloud platforms to build projects on. But how do you choose between options like AWS, GCP, and Azure as a beginner?

In a recent podcast, host Mohammad Arshad sat down with data engineering expert Pooja Jain to get her insight. With over 5 years of experience across banking, healthcare, and ecommerce, Pooja has worked on everything from predictive modeling to building data pipelines.

Key Factors to Consider

When evaluating cloud platforms, Pooja explains that the most important factor is identifying the specific data challenges you need to solve:

“Our focus should be more towards the activities because let’s say I want to do the orchestration or I want to store the data or I want to do cleaning or I want to focus more on streaming data right so each platform has their own set of services.”

Some key considerations include:

While Pooja has worked extensively with both AWS and GCP, she explains that:

“I feel more comfortable working with GCP the we are always comfortable with.”

But ultimately the right platform depends on the architecture and data infrastructure needed for your specific project.

Must-Have Skills for Aspiring Data Engineers

Besides just knowing Python and SQL, Pooja emphasizes the importance of conceptual knowledge:

“They have to understand the basic concepts of data engineering the concepts of Big Data how the Hadoop system ecosystem has evolved and why we are moving towards Cloud what are the challenges that we are facing.”

Here are some key skills she recommends focusing on:

Understanding these data engineering concepts will ensure you can design the right solutions.

Open Source Tools to Build Your Skills

While cloud platforms handle a lot of built-in complexity, Pooja suggests leveraging open source tools like:

Open source tools give beginners:

Starting with these tools allows young professionals to showcase relevant skills when applying for roles.

Getting Started with Data Engineering Projects

We all know working on projects is one of the fastest ways to skill up. So where should beginners focus their efforts?

Pooja outlines three key phases of any data engineering project:

  1. Ingesting and collecting data (extract/load)
  2. Processing and transforming data (transform)
  3. Consuming clean data for reporting, analytics or ML models (visualize/predict)

Open datasets provide great fodder for mock projects. But Pooja offers an alternative idea for scenarios where additional data is needed:

“There are two ways one is you can utilize the existing data sets get it into your raw Zone…the other is to just utilize the open source tools and Technologies and then try to do.”

Rather than focus on analyzing insights, she suggests showcasing your ability to:

This develops expertise needed to shine in any data engineering interview.

Using Generative AI as an Asset

No conversation about the future of technology is complete without discussing red-hot trends like generative AI and ChatGPT. So what impact do tools like these have?

“Generative AI is just it can it is not this thing I separate right it is not something separate it is you can use it as a tool to do whatever you are doing in a better manner that is the starting point.”

Mohammad points out relying on AI alone eliminates practice critical for building expertise. But Pooja sees it becoming an asset that eliminates rote tasks:

“It gives a very uh good compactor you know set of services that we can invent use it we don’t have to exclusively call or do anything.”

The key is finding opportunities for generative AI to augment human creativity rather than replace it completely.

Big Data Skills Still Needed for Cloud Migration

It’s easy to dismiss legacy ecosystems like Hadoop and MapReduce as dated. But Pooja argues parts of these skills remain relevant today:

“If a person doesn’t knows Big Data either Hadoop systems the hdfs the hype and all those things were they going to move it to the cloud the data migration the uh there used to be easy workflows at that time.”

Understanding these frameworks helps with challenges like:

So while specifics around coding algorithms are less relevant, the architectural concepts around handling big data are still enormously important.

Growing Your Audience on LinkedIn

With over 25,000 followers, Pooja has seen great success using LinkedIn to share data engineering knowledge. So what lessons can help beginners find their own audience?

“It’s important to consume the content it’s important to understand what is actually happening on the planet what is it that the experienced professionals are actually trying to convey.”

She suggests 3 simple steps to get started:

  1. Identify relevant professionals to follow and engage with. Comment thoughtfully on their posts when you have something useful to contribute.
  2. Share articles or posts you come across that provide value to the community.
  3. Summarize key learnings or insights to start establishing your expertise.

Rather than contributing noise, focus on understanding community needs first.

The Importance of Patience and Consistency

Like any skill, building influence takes time and focused effort. Pooja closes with an important reminder for anyone starting their data journey:

“We have to give some time and stay consistent we cannot expect you know immediate results.”

Stay motivated by finding little ways to add value every single day, whether:

Analyzing data teaches the importance of systemic thinking. Apply that mindset to your own career growth by structuring consistent progress over quick hacks.

Start Your Data Engineering Journey Today

Mohammad and Pooja covered several key concepts relevant to anyone getting started in data engineering. Hopefully their insights have piqued your interest!

Here are three simple action steps you can take right away:

  1. Join online communities centered around big data, cloud platforms, AI etc. Follow professionals like Pooja actively sharing their expertise.
  2. Identify an open dataset that aligns with your industry interests. Use tools listed in this article to start building a portfolio project showcasing core data skills.
  3. Set up a LinkedIn profile to establish your personal brand as you continue learning.

What resonated with you most from today’s conversation? Share your key takeaways in the comments!

Leave a Reply

Your email address will not be published. Required fields are marked *

Need help?