top of page

Data engineering using PySpark

About the Course

​

Welcome to the data engineering using PySpark 5-Day Course! This comprehensive training is designed to provide you with in-depth knowledge and hands-on experience in utilising Apache Spark and Hadoop for big data processing and analytics.

Throughout this course, you will gain a solid understanding of the core concepts and practical techniques required to effectively leverage PySpark, a Python library for distributed data processing, and Hadoop, a distributed processing framework.

​

Course Benefits

​

  • Gain hands-on experience in processing and analyzing big data using PySpark and Hadoop

  • Learn from industry experts with extensive experience in the field

  • Understand the fundamental concepts and practical techniques for distributed data processing

  • Apply your knowledge through real-world use cases and hands-on lab exercises

  • Enhance your career prospects in the rapidly growing field of big data analytics

​

Who Should Attend

​

This training is suitable for:

  • Data engineers and developers

  • Data analysts and scientists

  • Software engineers

  • IT professionals

  • Anyone interested in learning PySpark and Hadoop for big data processing

​

Course Schedule

​

The course will span over 5 days, covering the following topics:

  • Day 1: Introduction to Apache Hadoop and HDFS

  • Day 2: Introduction to Apache Spark and PySpark Basics

  • Day 3: Advanced PySpark Concepts and PySpark Streaming

  • Day 4: Integration of PySpark and Prefect

  • Day 5: Comprehensive Project and Q&A

​

Course Fee

  • Instructor-led In-person: $2999

  • Instructor-led Online: $1999

​

bottom of page