Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction:
- Apache Spark within the Hadoop Ecosystem
- Brief introduction to Python and Scala
Fundamentals (Theory):
- Architecture Overview
- RDD Concepts
- Transformations and Actions
- Stages, Tasks, and Dependencies
Practical Workshop: Mastering Basics in the Databricks Environment:
- Exercises using the RDD API
- Essential action and transformation functions
- PairRDDs
- Join operations
- Optimizing with caching strategies
- Exercises using the DataFrame API
- SparkSQL
- DataFrame operations: select, filter, group, sort
- User Defined Functions (UDFs)
- Introduction to the DataSet API
- Streaming capabilities
Practical Workshop: Deployment in the AWS Environment:
- Fundamentals of AWS Glue
- Comparing AWS EMR and AWS Glue
- Example jobs across both environments
- Analysis of pros and cons
Additional Topics:
- Introduction to Apache Airflow for orchestration
Requirements
Programming skills (preferably in Python or Scala)
Basic knowledge of SQL
21 Hours
Testimonials (3)
Having hands on session / assignments
Poornima Chenthamarakshan - Intelligent Medical Objects
Course - Apache Spark in the Cloud
1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise
Steven Wu - Intelligent Medical Objects
Course - Apache Spark in the Cloud
Get to learn spark streaming , databricks and aws redshift