Choosing the Right Cloud Platform for Your Data Engineering Projects

Data engineering is a fast-growing field with no shortage of cloud platforms to build projects on. But how do you choose between options like AWS, GCP, and Azure as a beginner?

In a recent podcast, host Mohammad Arshad sat down with data engineering expert Pooja Jain to get her insight. With over 5 years of experience across banking, healthcare, and ecommerce, Pooja has worked on everything from predictive modeling to building data pipelines.

Key Factors to Consider

When evaluating cloud platforms, Pooja explains that the most important factor is identifying the specific data challenges you need to solve:

“Our focus should be more towards the activities because let’s say I want to do the orchestration or I want to store the data or I want to do cleaning or I want to focus more on streaming data right so each platform has their own set of services.”

Some key considerations include:

Type of data you need to process like batch, streaming, structured, unstructured etc.
Tools and services required for pipeline orchestration, ETL, data storage options etc.
Cost optimization – pricing and free tier services differ across platforms
Ease of use – GCP tends to have a gentler learning curve according to Pooja
Team skills & experience – leverage existing knowledge

While Pooja has worked extensively with both AWS and GCP, she explains that:

“I feel more comfortable working with GCP the we are always comfortable with.”

But ultimately the right platform depends on the architecture and data infrastructure needed for your specific project.

Must-Have Skills for Aspiring Data Engineers

Besides just knowing Python and SQL, Pooja emphasizes the importance of conceptual knowledge:

“They have to understand the basic concepts of data engineering the concepts of Big Data how the Hadoop system ecosystem has evolved and why we are moving towards Cloud what are the challenges that we are facing.”

Here are some key skills she recommends focusing on:

Relational databases like SQL
Non-relational databases like NoSQL
ETL vs ELT pipelines
Data warehousing and data lakes
Hadoop ecosystems
Cloud migration challenges

Understanding these data engineering concepts will ensure you can design the right solutions.

Open Source Tools to Build Your Skills

While cloud platforms handle a lot of built-in complexity, Pooja suggests leveraging open source tools like:

Apache Spark for processing huge datasets
Kafka for streaming pipelines
Airflow for pipeline orchestration
dbt for data transformation

Open source tools give beginners:

Exposure to common industry frameworks
Ability to replicate real challenges
Flexibility to use across any platform or infrastructure
Hands-on practice applying engineering concepts learned

Starting with these tools allows young professionals to showcase relevant skills when applying for roles.

Getting Started with Data Engineering Projects

We all know working on projects is one of the fastest ways to skill up. So where should beginners focus their efforts?

Pooja outlines three key phases of any data engineering project:

Ingesting and collecting data (extract/load)
Processing and transforming data (transform)
Consuming clean data for reporting, analytics or ML models (visualize/predict)

Open datasets provide great fodder for mock projects. But Pooja offers an alternative idea for scenarios where additional data is needed:

“There are two ways one is you can utilize the existing data sets get it into your raw Zone…the other is to just utilize the open source tools and Technologies and then try to do.”

Rather than focus on analyzing insights, she suggests showcasing your ability to:

Build a reliable, scalable pipeline
Handle various data formats
Perform ETL operations with open source tools
Follow best practices around monitoring, testing and validation

This develops expertise needed to shine in any data engineering interview.

Using Generative AI as an Asset

No conversation about the future of technology is complete without discussing red-hot trends like generative AI and ChatGPT. So what impact do tools like these have?

“Generative AI is just it can it is not this thing I separate right it is not something separate it is you can use it as a tool to do whatever you are doing in a better manner that is the starting point.”

Mohammad points out relying on AI alone eliminates practice critical for building expertise. But Pooja sees it becoming an asset that eliminates rote tasks:

“It gives a very uh good compactor you know set of services that we can invent use it we don’t have to exclusively call or do anything.”

The key is finding opportunities for generative AI to augment human creativity rather than replace it completely.

Big Data Skills Still Needed for Cloud Migration

It’s easy to dismiss legacy ecosystems like Hadoop and MapReduce as dated. But Pooja argues parts of these skills remain relevant today:

“If a person doesn’t knows Big Data either Hadoop systems the hdfs the hype and all those things were they going to move it to the cloud the data migration the uh there used to be easy workflows at that time.”

Understanding these frameworks helps with challenges like:

Knowing limitations of data pipelines
Replicating data architecture in cloud
Optimization for specific use cases

So while specifics around coding algorithms are less relevant, the architectural concepts around handling big data are still enormously important.

Growing Your Audience on LinkedIn

With over 25,000 followers, Pooja has seen great success using LinkedIn to share data engineering knowledge. So what lessons can help beginners find their own audience?

“It’s important to consume the content it’s important to understand what is actually happening on the planet what is it that the experienced professionals are actually trying to convey.”

She suggests 3 simple steps to get started:

Identify relevant professionals to follow and engage with. Comment thoughtfully on their posts when you have something useful to contribute.
Share articles or posts you come across that provide value to the community.
Summarize key learnings or insights to start establishing your expertise.

Rather than contributing noise, focus on understanding community needs first.

The Importance of Patience and Consistency

Like any skill, building influence takes time and focused effort. Pooja closes with an important reminder for anyone starting their data journey:

“We have to give some time and stay consistent we cannot expect you know immediate results.”

Stay motivated by finding little ways to add value every single day, whether:

Helping someone new learn concepts you know
Testing out new tools for first-hand experience
Building your online credibility through engagement

Analyzing data teaches the importance of systemic thinking. Apply that mindset to your own career growth by structuring consistent progress over quick hacks.

Start Your Data Engineering Journey Today

Mohammad and Pooja covered several key concepts relevant to anyone getting started in data engineering. Hopefully their insights have piqued your interest!

Here are three simple action steps you can take right away:

Join online communities centered around big data, cloud platforms, AI etc. Follow professionals like Pooja actively sharing their expertise.
Identify an open dataset that aligns with your industry interests. Use tools listed in this article to start building a portfolio project showcasing core data skills.
Set up a LinkedIn profile to establish your personal brand as you continue learning.

What resonated with you most from today’s conversation? Share your key takeaways in the comments!

Choosing the Right Cloud Platform for Your Data Engineering Projects

Key Factors to Consider

Must-Have Skills for Aspiring Data Engineers

Open Source Tools to Build Your Skills

Getting Started with Data Engineering Projects

Using Generative AI as an Asset

Big Data Skills Still Needed for Cloud Migration

Growing Your Audience on LinkedIn

The Importance of Patience and Consistency

Start Your Data Engineering Journey Today

Leave a Reply Cancel reply

About Us

Follow

Like & Share

Interested in AI/Data Science?