Data Parallel Computing with Spark
Last updated on 2024-07-26 | Edit this page
Estimated time 60 minutes
Overview
Questions
- How does Linux come to be?
Objectives
- Explain the historical development of Linux
Data parallel computing with Spark
Hands-on: Data analytics in Spark
- Download Move Dataset
- Unzip the movie data file.
- Open a terminal.
- Activate the
pyspark
conda environment, then launch Jupyter notebook
$ conda activate pyspark
$ jupyter notebook
- Create a new notebook using the
pyspark
kernel, then change the notebookâs name tospark-2
. - Copy the code from
spark-1
to setup and launch a Spark application.
{% include links.md %}