Thursday, January 16, 2020

How to grow into a Data Engineer from an ETL, DBA, Analyst Background at no cost




This post is meant to highlight the core skills needed to be developed for anyone that is interested to be a data engineer. I have added some reference material that I actually used for my studies.

The below are ranked in terms of priority

1. Advanced SQL query knowledge
     https://youtu.be/HXV3zeQKqGY
    https://youtu.be/2Fn0WAyZV0E
2. Intermediate to Advanced understanding of Relational Databases
     https://youtu.be/ztHopE5Wnpc
3. Intermediate understanding of Data Modelling (Star Schema, Snowflake schema)
     https://youtu.be/tR_rOJPiEXc
     https://youtu.be/lWPiSZf7-uQ
4. Extract Transform Load basics
     https://youtu.be/7MOU1l30lXs
5. Data Warehousing
     https://youtu.be/lWPiSZf7-uQ
     https://youtu.be/CHYPF7jxlik

The above list most ETL, DBA or Business Analysts like me should have this already.

Additional Core Skills for Data Engineering (I had to learn these)
1. Deep understanding of the fundamental of any Big Data or Distributed Systems.
     https://youtu.be/tpspO9K28PM
     https://youtu.be/Y6Ev8GIlbxc
2. Apache Spark architecture and programming in Spark
     https://youtu.be/CF5Ewk0GxiQ
     https://youtu.be/GFC2gOL1p9k
     https://youtu.be/dq73Ghk3MQg
     Note: that the above videos might not be comprehensive feel free to go deeper. Also note that        RDD API is no more in common use, rather focus on Spark SQL, PYSpark or Scala API's
3. Python programming intermediate level
    https://youtu.be/rfscVS0vtbw
    https://youtu.be/mkv5mxYu0Wk   (Python for Datascience, )
    https://youtu.be/vmEHCJofslg (learn Pandas library)
    https://www.youtube.com/watch?v=K8L6KVGG-7o
4. Cloud computing basics (Azure, AWS) fundamentals
     https://www.youtube.com/playlist?list=PL-V4YVm6AmwWLTTwZdI7hcpKqTpFUIKUE (Azure)
5. Hadoop Distributed Files System Architecture
     https://youtu.be/pY0Wgbe712o
   
6. Big Data File Formats
     https://youtu.be/UXhyENkYokw
     https://youtu.be/rVC9F1y38oU
7. Hive
    https://youtu.be/AcpGl0TQIRM 
8. Optimization Techniques for all the above systems or topics

Additional Skills that are also needed but not a priority
1. Kafka and streaming tools like (Spark Streaming)
2. NO SQL Databases
3. Continuous Integration and Continuous Development coding method
4. Data Lake basics
5. Cloud ETL tools
6. Graph Databases
7. Machine Learning
8. Microservices
     https://youtu.be/j1gU2oGFayY
9. Map Reduce
10. Unix file system scripts basics
11. Regex

1 comment:

BECOME A BIG DATA ENGINEER IN 3 MONTHS with less than $100 investment

INTRO: Below is my guide to becoming a data engineer based on the current job market (08/08/2020) demands. I have outlined the TOP 5 foun...