Wednesday, January 15, 2020

Common Data Engineer Interview Questions

Azure cloud

1. What is the difference between Azure Data Lake Gen 1 and Gen 2"
2. Types of roles in DataFactory and their differences?
3. What is needed in Datafactory to copy data from On premise Database to Azure storage (ADLS)
3. What are the Integration runtime details and the types?
4. How and different ways to connect to Azure Data Lake to Databricks?

Distributed Systems Open Source

    Spark
      1. Explain your understanding of Spark architecture?
      2. What are Broadcast and Accumulator variables?
      3. Difference between Spark cluster mode and Client mode?
      4. Difference between Checkpoint and Cache
      5. Types of Caching in Spark, Cache vs Persist?
      6. Types of transformations in Spark
      7. What are Spark Jobs, Stages, Tasks and their differences?
      8. What defines the number of stages that can be created in a spark job?
      9. What is Delta Lake?
      10. What is Vaccum in Delta lake and time travel?
      11. Spark optimization techniques for large table joins and skewed data?
      12. Explain spark partitioning or how is parallelism achieved in spark?
       13.  How does spark ensure fault tolerance
        14.    Bucketing in spark, Partition pruning,
        15.   What is spark lazy execution



     Hive
      1. What is H Catalog in Hive?

     

      File Formats
       1. Advantages of parquet files
       2. Types of parquet files



1 comment:

BECOME A BIG DATA ENGINEER IN 3 MONTHS with less than $100 investment

INTRO: Below is my guide to becoming a data engineer based on the current job market (08/08/2020) demands. I have outlined the TOP 5 foun...