top of page
Intertwined

              Training Curriculum

​

Day 1

  • Introduction to Apache Hadoop

  • Hadoop Distributed File System (HDFS)

  • MapReduce Concepts and Execution

​

Day 2

  • Introduction to Apache Spark

  • PySpark Basics: RDDs (Resilient Distributed Datasets)

  • DataFrames and SQL in PySpark

​

Day 3

  • Advanced PySpark Concepts: Transformations and Actions

  • PySpark Streaming

  • Data Analytics with PySpark

​

Day 4

  • Integration of PySpark and Prefect

  • Building Data Pipelines with PySpark and Prefect

  • Real-World Use Cases and Best Practices

​

Day 5

  • Comprehensive Project: Applying PySpark and Hadoop

  • Q&A and Course Wrap-Up

bottom of page