Optimizing Apache Spark on Databricks Training Course

Price
$1,500.00 USD

Duration
2 Days

Delivery Methods
Virtual Instructor Led
Private Group

Request More Information

Download Course Details

Course Details Only
Course Details & Schedule

Skip to Class Dates

Course Overview

Apache Spark is powerful, but only when optimized. Most Spark performance issues boil down to a handful of root causes: shuffle, skew, spill, serialization, and storage. In this two-day course, you’ll learn to diagnose and resolve these using the Spark UI, targeted optimization techniques, and tools in Spark 3.0.

This course also explores how to optimize your query execution, manage shuffle partition issues, and structure data using Delta Lake, partition strategies, and data skipping. You’ll apply hands-on skills to improve the performance of real-world workloads and design better clusters. Whether you’re tuning for SQL queries or preparing for large-scale machine learning pipelines, this course will help you get the most out of Spark and Databricks.

Course Objectives

By the end of this course, you’ll be able to identify and fix common Spark performance bottlenecks. You’ll also understand how to apply Spark 3.x features and cluster design strategies to improve efficiency.

Diagnose skew, spill, shuffle, storage, and serialization issues
Use the Spark UI to investigate performance bottlenecks
Apply performance tuning techniques during data ingestion
Use features like Z-ordering, bucketing, and Adaptive Query Execution (AQE)
Configure a Databricks cluster for optimal Spark performance

Top-rated instructors: Our crew of subject matter experts have an average instructor rating of 4.8 out of 5 across thousands of reviews.
Authorized content: We maintain more than 35 Authorized Training Partnerships with the top players in tech, ensuring your course materials contain the most relevant and up-to date information.
Interactive classroom participation: Our virtual training includes live lectures, demonstrations and virtual labs that allow you to participate in discussions with your instructor and fellow classmates to get real-time feedback.
Post Class Resources: Review your class content, catch up on any material you may have missed or perfect your new skills with access to resources after your course is complete.
Private Group Training: Let our world-class instructors deliver exclusive training courses just for your employees. Our private group training is designed to promote your team’s shared growth and skill development.
Tailored Training Solutions: Our subject matter experts can customize the class to specifically address the unique goals of your team.

Course Prerequisites

Hands-on experience developing Apache Spark applications (6+ months). We recommend the Apache Spark Programming course to get started working with Spark.
Intermediate experience in Python or Scala

Agenda

Day 1: Understanding and Diagnosing Performance Issues

Spark architecture and Spark UI
Skew and data imbalance
Spill and memory issues
Shuffle mechanics
Storage formats and tuning
Serialization performance

Day 2: Optimizing and Scaling Spark Workloads

Data ingestion: partitioning, predicate pushdown
Z-ordering and bucketing strategies
Adaptive Query Execution (AQE)
Designing clusters for specific workloads
Hands-on optimization labs using Databricks

Get in touch to schedule training for your team
We can enroll multiple students in an upcoming class or schedule a dedicated private training event designed to meet your organization’s needs.

CourseID: 3605021E

Do You Have Additional Questions? Please Contact Us Below.

Contact Us about Starting Your Business Training Strategy with New Horizons