Data Engineering with Databricks Overview
Data professionals from all walks of life will benefit from this comprehensive introduction to the components of the Databricks Lakehouse Platform that directly support putting ETL pipelines into production. This course teaches learners how to leverage SQL and Python to define and schedule pipelines that incrementally process new data from a variety of data sources to power analytic applications and dashboards in the Lakehouse.
Through hands-on instruction, you'll explore key features of the platform including the Databricks Data Science & Engineering Workspace, Databricks SQL, Delta Live Tables, Databricks Repos, Databricks Task Orchestration, and the Unity Catalog. By the end of the course, you’ll be prepared to apply these skills in a real-world environment and sit for the Databricks Certified Data Engineer Associate exam.
Course Objectives
By the end of this course, you'll understand the essential components of the Databricks Lakehouse Platform used for building and managing ETL pipelines. You’ll learn how to use SQL and Python to define pipelines that process data incrementally and power real-time dashboards and analytic applications. You will gain hands-on experience building production-grade workflows using Delta Live Tables, scheduling and orchestrating tasks, managing version control with Databricks Repos, and applying governance best practices with Unity Catalog. This course also prepares you to take the Databricks Certified Data Engineer Associate exam, equipping you with the foundational skills needed to succeed in data engineering roles.
- Top-rated instructors: Our crew of subject matter experts have an average instructor rating of 4.8 out of 5 across thousands of reviews.
- Authorized content: We maintain more than 35 Authorized Training Partnerships with the top players in tech, ensuring your course materials contain the most relevant and up-to date information.
- Interactive classroom participation: Our virtual training includes live lectures, demonstrations and virtual labs that allow you to participate in discussions with your instructor and fellow classmates to get real-time feedback.
- Post Class Resources: Review your class content, catch up on any material you may have missed or perfect your new skills with access to resources after your course is complete.
- Private Group Training: Let our world-class instructors deliver exclusive training courses just for your employees. Our private group training is designed to promote your team’s shared growth and skill development.
- Tailored Training Solutions: Our subject matter experts can customize the class to specifically address the unique goals of your team.
Data Engineering with Databricks Agenda
Day 1
- Delta Lake
- Relational entities on Databricks
- ETL with Spark SQL
- Incremental data processing with Structured Streaming and Auto Loader
Day 2
- Medallion architecture in the data lakehouse
- Delta Live Tables
- Task orchestration with Databricks Jobs
- Databricks SQL
- Managing Permissions in the lakehouse
- Productionizing dashboards and queries on Databricks SQL