Apache Spark 大数据开发培训课程


This three-day course is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of the Apache Spark platform.

The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the platform, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.

Each topic includes slide and lecture content along with hands-on use of Spark through an elegant web-based notebook environment. Inspired by tools like IPython/Jupyter, notebooks allow attendees to code jobs, data analysis queries, and visualizations using their own Spark cluster, accessed through a web browser. All class code is directly usable with pure open-source Spark or any commercial Spark distribution.

After taking this class you will be able to:
• Describe Spark’s fundamental mechanics
• Use the core Spark APIs to operate on data
• Articulate and implement typical use cases for Spark
• Build data pipelines with SparkSQL and DataFrames
• Analyze Spark jobs using the UIs and logs
• Create Streaming and Machine Learning jobs

• Data Analysts
• Software Developers

• Basic Python or Scala required
• SQL is helpful but not required

• Spark Overview
• RDD Fundamentals
• SparkSQL and DataFrames
• Spark Job Execution
• Cluster Architectures for Spark
• Intro to Spark Streaming
• Machine Learning Basics


通过考试可获得有 databricks 和 O’Reilly 联合颁发的“Certified Spark Developer”认证证书。


您的电子邮箱地址不会被公开。 必填项已用 * 标注