Thursday, January 16, 2020
How to grow into a Data Engineer from an ETL, DBA, Analyst Background at no cost
This post is meant to highlight the core skills needed to be developed for anyone that is interested to be a data engineer. I have added some reference material that I actually used for my studies.
The below are ranked in terms of priority
1. Advanced SQL query knowledge
https://youtu.be/HXV3zeQKqGY
https://youtu.be/2Fn0WAyZV0E
2. Intermediate to Advanced understanding of Relational Databases
https://youtu.be/ztHopE5Wnpc
3. Intermediate understanding of Data Modelling (Star Schema, Snowflake schema)
https://youtu.be/tR_rOJPiEXc
https://youtu.be/lWPiSZf7-uQ
4. Extract Transform Load basics
https://youtu.be/7MOU1l30lXs
5. Data Warehousing
https://youtu.be/lWPiSZf7-uQ
https://youtu.be/CHYPF7jxlik
The above list most ETL, DBA or Business Analysts like me should have this already.
Additional Core Skills for Data Engineering (I had to learn these)
1. Deep understanding of the fundamental of any Big Data or Distributed Systems.
https://youtu.be/tpspO9K28PM
https://youtu.be/Y6Ev8GIlbxc
2. Apache Spark architecture and programming in Spark
https://youtu.be/CF5Ewk0GxiQ
https://youtu.be/GFC2gOL1p9k
https://youtu.be/dq73Ghk3MQg
Note: that the above videos might not be comprehensive feel free to go deeper. Also note that RDD API is no more in common use, rather focus on Spark SQL, PYSpark or Scala API's
3. Python programming intermediate level
https://youtu.be/rfscVS0vtbw
https://youtu.be/mkv5mxYu0Wk (Python for Datascience, )
https://youtu.be/vmEHCJofslg (learn Pandas library)
https://www.youtube.com/watch?v=K8L6KVGG-7o
4. Cloud computing basics (Azure, AWS) fundamentals
https://www.youtube.com/playlist?list=PL-V4YVm6AmwWLTTwZdI7hcpKqTpFUIKUE (Azure)
5. Hadoop Distributed Files System Architecture
https://youtu.be/pY0Wgbe712o
6. Big Data File Formats
https://youtu.be/UXhyENkYokw
https://youtu.be/rVC9F1y38oU
7. Hive
https://youtu.be/AcpGl0TQIRM
8. Optimization Techniques for all the above systems or topics
Additional Skills that are also needed but not a priority
1. Kafka and streaming tools like (Spark Streaming)
2. NO SQL Databases
3. Continuous Integration and Continuous Development coding method
4. Data Lake basics
5. Cloud ETL tools
6. Graph Databases
7. Machine Learning
8. Microservices
https://youtu.be/j1gU2oGFayY
9. Map Reduce
10. Unix file system scripts basics
11. Regex
Subscribe to:
Post Comments (Atom)
BECOME A BIG DATA ENGINEER IN 3 MONTHS with less than $100 investment
INTRO: Below is my guide to becoming a data engineer based on the current job market (08/08/2020) demands. I have outlined the TOP 5 foun...
-
1. Good Foundational Knowledge of SQL Programming (SQL Query writing) Paid Course The best SQL course that I know. (I have not affili...
-
Hi I was recently working on extracting a table that has 400 columns from a SQL Server DB to load into an Oracle Target Database. I design...
-
In this blog post I will show you how to implement your own custom logging in Azure Data Factory Before you do this please note that Azur...
Thank you for sharing this Informative blog, it is very useful.
ReplyDeleteData Engineering Services