Big Data Engineering

Data Parallel Computing with Spark

Last updated on 2024-07-26 | Edit this page

Estimated time 60 minutes

Overview

Questions

How does Linux come to be?

Objectives

Explain the historical development of Linux

Data parallel computing with Spark

Hands-on: Data analytics in Spark

Download Move Dataset
Unzip the movie data file.
Open a terminal.
Activate the pyspark conda environment, then launch Jupyter notebook

$ conda activate pyspark
$ jupyter notebook

Create a new notebook using the pyspark kernel, then change the notebook’s name to spark-2.
Copy the code from spark-1 to setup and launch a Spark application.

{% include links.md %}