Apache Spark 大数据开发培训课程

课程简介：
This three-day course is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of the Apache Spark platform.

The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the platform, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.

Each topic includes slide and lecture content along with hands-on use of Spark through an elegant web-based notebook environment. Inspired by tools like IPython/Jupyter, notebooks allow attendees to code jobs, data analysis queries, and visualizations using their own Spark cluster, accessed through a web browser. All class code is directly usable with pure open-source Spark or any commercial Spark distribution.

培训目标：
After taking this class you will be able to:
• Describe Spark’s fundamental mechanics
• Use the core Spark APIs to operate on data
• Articulate and implement typical use cases for Spark
• Build data pipelines with SparkSQL and DataFrames
• Analyze Spark jobs using the UIs and logs
• Create Streaming and Machine Learning jobs

培训对象：
• Data Analysts
• Software Developers

预备知识：
• Basic Python or Scala required
• SQL is helpful but not required

课程内容：
• Spark Overview
• RDD Fundamentals
• SparkSQL and DataFrames
• Spark Job Execution
• Cluster Architectures for Spark
• Intro to Spark Streaming
• Machine Learning Basics

培训时间：
3天

认证考试：
通过考试可获得有 databricks 和 O’Reilly 联合颁发的“Certified Spark Developer”认证证书。

发表回复取消回复

关于我们

联系信息

热点新闻

微信会员

发表回复 取消回复

关于我们

联系信息

热点新闻

微信会员

发表回复取消回复