Apache Spark Advanced
This test challenges your expertise in cluster management, tuning Spark jobs, and integrating Spark with Hadoop and Kafka. To deepen your knowledge, review the Spark Cluster Guide and watch this in-depth Advanced Spark Tutorial.
1
What is the purpose of the Tungsten engine?
2
What is the difference between Spark on YARN and Spark Standalone?
3
What is the purpose of the Catalyst optimizer?
4
What is the difference between Spark Streaming and Structured Streaming?
5
What is the purpose of the WholeStageCodeGen?
6
What is the difference between DataFrameWriter and DataFrameReader?
7
What is the purpose of the Cost-Based Optimizer (CBO)?
8
What is the difference between broadcast and non-broadcast joins?
9
What is the purpose of the Adaptive Query Execution (AQE)?
10
What is the difference between mapPartitions and map?
11
What is the purpose of the Dynamic Partition Pruning (DPP)?
12
What is the difference between Spark on Kubernetes and Spark on YARN?
13
What is the purpose of the RDD lineage?
14
What is the difference between Spark SQL and Hive?
15
What is the purpose of the Spark History Server?
16
What is the difference between Spark MLlib and Spark ML?
17
What is the purpose of the Spark Thrift Server?
18
What is the difference between Spark on Mesos and Spark on YARN?
19
What is the purpose of the Spark UI?
20
What is the difference between Spark Streaming and Flink?
21
What is the purpose of the Spark REST API?
22
What is the difference between Spark and MapReduce?
23
What is the purpose of the Spark SQL CLI?
24
What is the difference between Spark and Pandas?
25
What is the purpose of the Spark GraphX library?