Data Parallel Computing with Spark

Last updated on 2024-07-26 | Edit this page

Estimated time 60 minutes

Overview

Questions

  • How does Linux come to be?

Objectives

  • Explain the historical development of Linux

Data parallel computing with Spark

Hands-on: Data analytics in Spark

  • Download Move Dataset
  • Unzip the movie data file.
  • Open a terminal.
  • Activate the pyspark conda environment, then launch Jupyter notebook
$ conda activate pyspark
$ jupyter notebook
  • Create a new notebook using the pyspark kernel, then change the notebook’s name to spark-2.
  • Copy the code from spark-1 to setup and launch a Spark application.

{% include links.md %}