Monday, July 6, 2020

BECOME A BIG DATA ENGINEER IN 3 MONTHS with less than $100 investment

INTRO:

Below is my guide to becoming a data engineer based on the current job market (08/08/2020) demands. I have outlined the TOP 5 foundational skills needed to be successful. I went ahead to give useful resources that I have reviewed and deemed sufficient for your consumption. I also give a realistic time allocation to absorb the material and it's estimated cost. Most of the courses I provided are on Udemy. However, feel free to use any other website. I have another blog about using Youtube.
Please note that taking your time to practice is really crucial when doing self-learning so budget 3 times the course videos duration.

1. Good Foundational Knowledge of SQL Programming (SQL Query writing)

The Complete SQL Bootcamp 2020: Go from Zero to Hero in Udemy
This is a good first step to get you from beginner to intermediate in SQL

Cost: 11-15 dollars

Course time: 9 hrs

Learning Time: 3 weeks (Spending 10 hrs a week)


SQL - Beyond The Basics
This course focuses on advanced concepts that are crucial in getting through most interviews these days.

Cost 11 - 15 dollars
Course time: 5hrs
Learning Time: 1.5 weeks


2. Knowledge of Python Programming Knowledge (only applies if you do not have any programming background)


a. Beginner to Intermediate Python Course:

Complete Python Bootcamp: From Zero to Hero in Python:
This will give you a good grasp on some fundamentals of coding in Python and object-oriented programming. 

https://www.udemy.com/course/complete-python-bootcamp/

Cost 15-20 dollars
Course Time: 24hrs
Learning Time: 1 month

b. Python for Data Analysis: Numpy, Panda's Dataframe

This free videos in youtube are very comprehensive as it goes over the most popular python libraries used in the real world for data analysis like Pandas, Numpy. Feel free to skip the 4 hr course and jump straight to pandas if you don't have time.


Numpy + Pandas 4 hr course
https://youtu.be/r-uOLxNrNk8

Pandas 1 hr course
https://youtu.be/vmEHCJofslg

Pandas Advanced concepts 1 hr course
https://youtu.be/P_t8LO-KgWM

c. Python Algorithms and Data Structures (for Mid to Senior Data Engineers)

Python for Data Structures algorithms and interviews
This course is crucial for understanding the fundamentals of software engineering. Please note you have to be at an intermediate level before taking this course. This is essential to get through most coding interviews for mid or senior roles.

https://www.udemy.com/course/python-for-data-structures-algorithms-and-interviews/

Cost 12-15 dollars
Course Time: 17hrs
Learning Time: 1 month

3. Good understanding of ETL Computing Engine for Big Data- Spark/Databricks


a. Create a Databricks community edition account so you can have a platform to practice

b. Understand Spark architecture and the overall capabilities of Spark in Scala course: 

I have not watched this video course but it promises to go over the in-depth architecture of Spark and Scala (which is the primary language of Spark). Don't worry about Scala because Spark supports SQL and Python so you don't need to be proficient in it. 

Spark Essentials
Cost: 11-15 dollars
Course Time: 7.5hrs
Course Learning Time: 3weeks

c. Optional: Pyspark Tutorial- Knowledge of SQL and Python would really make learning Pyspark very easy

Pyspark for Spark
If your SQL is really strong Spark SQL will be sufficient to work in Spark D for most ata Warehousing use cases. Things that you need pyspark for are Spark streaming use cases and machine learning. They can be learned on the job by google searching or take this course

https://www.udemy.com/course/spark-and-python-for-big-data-with-pyspark/
Cost: 11-15 dollars
Course Time: 11hrs
Course Learning Time: 3weeks

d. Databricks /Spark Optimization: this is important because a lot of interviews ask about this


Note that if you have good knowledge of SQL and Python you can work a lot with Spark

Video time: 1hrs

e. Learn a simple ETL tool in Azure- Azure Data Factory

Comprehensive overview playlist
https://www.youtube.com/watch?v=Mc9JAra8WZU&list=PLMWaZteqtEaLTJffbbBzVOv9C0otal1FO

Advanced Data Factory concepts (Parameterization)
https://youtu.be/K5Ak4IdtBCo

4. Cloud Knowledge AWS or AZURE; Get Certified if possible

This will help your resume if you don't have experience. This knowledge is a bonus but also crucial for hitting the ground running on various projects. Most jobs right now are migrating data analytics solutions from one premise to the cloud. Therefore a good grasp on cloud architecture is important.

I would start with
Azure AZ-900: fundamental of Azure

AWS Solution Architect: Fundamentals of AWS

Pick one
Total Course Hours: 6-10 hrs
My estimated Learning Time: 3 week

PS: I don't have much knowledge of AWS. I only know Azure



5. Basics of Data Modelling: For Data engineers that need to work more with Business intelligence use cases 

Data warehousing and Data modeling is good knowledge for data engineers that will help deliver efficient analytics solutions. Normally, you will gain this knowledge with experience. however, if you really want to differentiate yourself at the job. It is good to learn it.

Data Modelling Fundamentals
https://www.udemy.com/course/mastering-data-modeling-fundamentals/
Cost 13 dollars
Course Time: 3hrs


Data Warehousing Fundamentals

https://www.youtube.com/watch?v=J326LIUrZM8
https://youtu.be/lWPiSZf7-uQ

Total Hours: 8 hrs
My estimated Learning hours: 2 weeks


6. Relational Database Concepts and Fundamentals

Database Lessons
This Youtube playlist goes over the core concepts of a relational database like ACID property, indexes, etc. It is important for Engineers working with data stored in relational databases.

https://www.youtube.com/playlist?list=PL1LIXLIF50uXWJ9alDSXClzNCMynac38g

Senior/Advanced level Data Engineers need the below as well. ( note that I am not there yet)

7. Hadoop Architecture and Ecosystem


Hadoop Developer In Real World
This is a really good course that covers the essential things needed for working as a data engineer in the big data space. It covers the most useful aspects of the Hadoop ecosystem that is utilized in the real world. For example Big data file formats, Hive, Spark, etc.

https://www.udemy.com/course/hadoopinrealworld/


8. Build real-time Analytics Pipeline with Kafka/Event Hubs and Spark Streaming

https://www.udemy.com/course/kafka-streams-real-time-stream-processing-master-class/

9. Massively Parallel Processing Databases (Snowflake, Azure DW, Redshift, Netezza, Teradata)


https://youtu.be/NUGcAUyQY-k

10. Working with NOSQL database (MongoDB, CosmosDB)

https://www.udemy.com/course/learn-mongodb-leading-nosql-database-from-scratch/

11. Working with Graph Databases

https://www.udemy.com/course/neo4j-foundations/

12. Python for Data Science and Machine Learning

This course is important for Data Engineering roles that involve working also as a Data Scientist. This course will go over the machine learning models and techniques.

https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/

Cost 12-15 dollars
Course Time: 24hrs
Learning Time: 3 weeks (skip some parts not relevant)

Summary

Getting into a Data Engineering career is not easy but I believe hard work and dedication can get you there. If you dedicate 3 months of the absolute focus of learning for 4 to 5 hrs a day or 30 hrs a week, you can master most of these fundamental skills and start getting entry-level jobs

I would spend 1 month in SQL,1 month in Python and last month in Spark, Cloud, and Data Modelling


Please reach to me on LinkedIn for more questions and follow me as well

There is a lot more to learn and I have another blog that listed comprehensively all the various tools and technologies a Data Engineer could have http://plsqlandssisheadaches.blogspot.com/2020/01/how-i-transitioned-to-data-engineer-as.html

#dataengineer #datascience #bigdataengineer #dataengineering




3 comments:

  1. Thank you for sharing this Informative blog, it is very useful.

    Data Engineering Services

    ReplyDelete
  2. Very Nice, Thanks for sharing such an informative Article, It was great reading this article.

    data engineering

    ReplyDelete
  3. Those who are looking for a data engineering solution in the UK must connect with the team of iTelenet. We have a team of experts who offer the right kind of services to deliver data engineering services.

    ReplyDelete

BECOME A BIG DATA ENGINEER IN 3 MONTHS with less than $100 investment

INTRO: Below is my guide to becoming a data engineer based on the current job market (08/08/2020) demands. I have outlined the TOP 5 foun...