Spark is an open source processing engine built, in spark we have ecosystems like Spark SQL, Streaming, Mlib, Graphx, for processing the data we use Scala as a programming language and Apache kafka is most advanced feature of big data used for streaming data integrated with java API’s.
After completing the Apache Spark training, you will be able to:
- Understand Scala and its implementation
- Install Spark and implement Spark operations on Spark Shell
- Understand the role of Spark RDD
- Implement Spark applications on YARN (Hadoop)
- Learn Spark Streaming API
- Implement machine learning algorithms in Spark MLlib API
- Analyse Hive and Spark SQL architecture
- Understand Spark Graphx API and implement graph algorithms
- Understand Kafka and its components.
- Kafka cluster deployment on Hadoop and YARN
- Understanding real time Kafka streaming
- Integrating Kafka with real time streaming systems like Spark Streaming.
- Introduction to the Kafka API
- Professionals aspiring to work on Big Data Analytics.
- Spark Developers
- Data Scientist
- Individuals looking for a change in career
- Project Managers, Messaging and Queuing System professionals
Basic knowledge of big data, HDFS, any programming language like java, python, etc. but it is not mandatory.
- Introduction to Spark Getting started
- Resilient Distributed Dataset and Data Frames
- Spark application programming
- Introduction to Spark Eco System (Spark SQL)
- Spark Streaming
- Spark MLib
- Spark Graphx
- Basic Object Oriented Programming
- Case Objects and Classes
- Idiomatic Scala
- Introduction to Apache Kafka
- Kafka Command Line
- Kafka Producer Java API
- Kafka Consumer Java API
- Kafka Connect and Spark Streaming