This three-day course is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of the Apache Spark platform.
The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the platform, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.
Each topic includes slide and lecture content along with hands-on use of Spark through an elegant web-based notebook environment. Inspired by tools like IPython/Jupyter, notebooks allow attendees to code jobs, data analysis queries, and visualizations using their own Spark cluster, accessed through a web browser. All class code is directly usable with pure open-source Spark or any commercial Spark distribution.
After taking this class you will be able to:
• Describe Spark’s fundamental mechanics
• Use the core Spark APIs to operate on data
• Articulate and implement typical use cases for Spark
• Build data pipelines with SparkSQL and DataFrames
• Analyze Spark jobs using the UIs and logs
• Create Streaming and Machine Learning jobs
• Data Analysts
• Software Developers
• Basic Python or Scala required
• SQL is helpful but not required
• Spark Overview
• RDD Fundamentals
• SparkSQL and DataFrames
• Spark Job Execution
• Cluster Architectures for Spark
• Intro to Spark Streaming
• Machine Learning Basics
通过考试可获得有 databricks 和 O’Reilly 联合颁发的“Certified Spark Developer”认证证书。