IT solutions to achieve operational excellence
Data engineering using PySpark
About the Course
​
Welcome to the data engineering using PySpark 5-Day Course! This comprehensive training is designed to provide you with in-depth knowledge and hands-on experience in utilising Apache Spark and Hadoop for big data processing and analytics.
Throughout this course, you will gain a solid understanding of the core concepts and practical techniques required to effectively leverage PySpark, a Python library for distributed data processing, and Hadoop, a distributed processing framework.
​
Course Benefits
​
-
Gain hands-on experience in processing and analyzing big data using PySpark and Hadoop
-
Learn from industry experts with extensive experience in the field
-
Understand the fundamental concepts and practical techniques for distributed data processing
-
Apply your knowledge through real-world use cases and hands-on lab exercises
-
Enhance your career prospects in the rapidly growing field of big data analytics
​
Who Should Attend
​
This training is suitable for:
-
Data engineers and developers
-
Data analysts and scientists
-
Software engineers
-
IT professionals
-
Anyone interested in learning PySpark and Hadoop for big data processing
​
Course Schedule
​
The course will span over 5 days, covering the following topics:
-
Day 1: Introduction to Apache Hadoop and HDFS
-
Day 2: Introduction to Apache Spark and PySpark Basics
-
Day 3: Advanced PySpark Concepts and PySpark Streaming
-
Day 4: Integration of PySpark and Prefect
-
Day 5: Comprehensive Project and Q&A
​
Course Fee
-
Instructor-led In-person: $2999
-
Instructor-led Online: $1999
​