Introduction to OpenMP
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What is the OpenMP model?
Objectives
Know the target hardware architecture for OpenMP
Be able to compile and run an OpenMP program.
Understand the concept of parallel regions.
Target hardware
- Single computing node, multiple sockets, multiple cores.
- Dell PowerEdge M600 Blade Server.
- Intel Sandy Bridge CPU.
- In summary
- Node with up to four sockets.
- Each socker has up to 60 cores.
- Each core is an independent CPU.
- Each core has access to all the memory on the node.
Target software
- Provide wrappers for
threads
andfork/join
model of parallelism.
- Program originally runs in sequential mode.
- When parallelism is activated, multiple
threads
areforked
from the original proces/thread (master
thread).- Once the parallel tasks are done,
threads
arejoined
back to the original process and return to sequential execution.
- The threads have access to all data in the
master
thread. This isshared
data.- The threads also have their own private memory stack.
Basic requirements to write, compile, and run an OpenMP program
- Source code (C) needs to include
#include <omp.h>
- Compiling task need to have
-fopenmp
flag.- Specify the environment variable OMP_NUM_THREADS.
OMP directives
- OpenMP must be told when to parallelize.
- For C/C++,
pragma
is used to annotate:#pragma omp somedirective clause(value, othervalue) parallel statement;
- or
#pragma omp somedirective clause(value, othervalue) { parallel statement 1; parallel statement 2; ... }
Hands-on 1: Setup directory
- Create a directory named
csc466
inside your home directory, then change into that directory.- Next, create a directory called
openmp
, and change into that directory$ cd $ mkdir csc466 $ cd csc466 $ mkdir openmp $ cd openmp
Hands-on 2: Create hello_omp.c
- In the EXPLORER window, right-click on
csc466/openmp
and selectNew File
.- Type
hello_omp.c
as the file name and hits Enter.- Enter the following source code in the editor windows:
- Save the file when you are done:
Ctrl-S
for Windows/LinuxCommand-S
for Macs- Memorize your key-combos!.
#include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { /* Fork a team of threads giving them their own copies of variables */ #pragma omp parallel { /* Obtain thread number */ int tid = omp_get_thread_num(); printf("Hello World from thread = %d\n", tid); /* Only master thread does this */ if (tid == 0) { int nthreads = omp_get_num_threads(); printf("Number of threads = %d\n", nthreads); } } /* All threads join master thread and disband */ }
- Line 1: Include
omp.h
to have libraries that support OpenMP.- Line 7: Declare the beginning of the
parallel
region. Pay attention to how the curly bracket is setup, comparing to the other curly brackets.- Line 10:
omp_get_thread_num
gets the ID assigned to the thread and then assign it to a variable namedtid
of typeint
.- Line 15:
omp_get_num_threads
gets the value assigned toOMP_NUM_THREADS
and return it to a variable namednthreads
of typeint
.
What’s important?
tid
andnthreads
.- They allow us to coordinate the parallel workloads.
- Specify the environment variable OMP_NUM_THREADS.
$ export OMP_NUM_THREADS=4
Example: trapezoidal
- Problem: estimate the integral of on using trapezoidal rule. four threads.
- With 4 threads:
nthreads=4
.
- How to decide which thread will handle which segment?
- How to get all results back together?
Hands-on 3: Trapezoid implementation
- In the EXPLORER window, right-click on
csc466/openmp
and selectNew File
.- Type
trapezoid.c
as the file name and hits Enter.- Enter the following source code in the editor windows:
- Save the file when you are done:
Ctrl-S
for Windows/LinuxCommand-S
for Macs- Memorize your key-combos!.
#include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { //init parameters and evaluators double a = atof(argv[1]); double b = atof(argv[2]); int N = atoi(argv[3]); int nthreads = atoi(argv[4]); double partial_sum[nthreads]; double h = ((b - a) / nthreads); omp_set_num_threads(nthreads); #pragma omp parallel { int tid = omp_get_thread_num(); /* number of trapezoids per thread */ int partial_n = N / nthreads; double delta = (b - a)/N; double local_a = a + h * tid; double local_b = local_a + delta; for (int i = 0; i < partial_n; i++) { partial_sum[tid] += (local_a * local_a + local_b * local_b) * delta / 2; local_a = local_b; local_b += delta; } } double sum = 0; for (int i = 0; i < nthreads; i++) { sum += partial_sum[i]; } printf("The integral is: %.4f\n", sum); return 0; }
Hands-on 4: A bit more detailed
- Modify the
trapezoid.c
so that it looks like below.- Save the file when you are done:
Ctrl-S
for Windows/LinuxCommand-S
for Macs- Memorize your key-combos!.
#include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { //init parameters and evaluators double a = atof(argv[1]); double b = atof(argv[2]); int N = atoi(argv[3]); int nthreads = atoi(argv[4]); double partial_sum[nthreads]; double h = ((b - a) / nthreads); omp_set_num_threads(nthreads); #pragma omp parallel { int tid = omp_get_thread_num(); /* number of trapezoids per thread */ int partial_n = N / nthreads; double delta = (b - a)/N; double local_a = a + h * tid; double local_b = local_a + delta; for (int i = 0; i < partial_n; i++) { partial_sum[tid] += (local_a * local_a + local_b * local_b) * delta / 2; local_a = local_b; local_b += delta; } printf("Thread %d calculate a partial sum of %.4f from %.4f to %.4f\n", tid, partial_sum[tid], a + h*tid, local_a); } double sum = 0; for (int i = 0; i < nthreads; i++) { sum += partial_sum[i]; } printf("The integral is: %.4f\n", sum); return 0; }
Challenge 1:
Alternate the
trapezoid.c
code so that the parallel region will invokes a function to calculate the partial sum.Solution
#include <omp.h> #include <stdio.h> #include <stdlib.h> double trap(double a, double b, int N, int nthreads, int tid) { double h = ((b - a) / nthreads); int partial_n = N / nthreads; double delta = (b - a)/N; double local_a = a + h * tid; double local_b = local_a + delta; double p_sum = 0; for (int i = 0; i < partial_n; i++) { p_sum += (local_a * local_a + local_b * local_b) * delta / 2; local_a = local_b; local_b += delta; } return p_sum; } int main (int argc, char *argv[]) { //init parameters and evaluators double a = atof(argv[1]); double b = atof(argv[2]); int N = atoi(argv[3]); int nthreads = atoi(argv[4]); double partial_sum[nthreads]; omp_set_num_threads(nthreads); #pragma omp parallel { int tid = omp_get_thread_num(); partial_sum[tid] = trap(a, b, N, nthreads, tid) ; } double sum = 0; for (int i = 0; i < nthreads; i++) { sum += partial_sum[i]; } printf("The integral is: %.4f\n", sum); return 0; }
Challenge 2:
- Write a program called
sum_series.c
that takes a single integerN
as a command line argument and calculate the sum of the firstN
non-negative integers.- Speed up the summation portion by using OpenMP.
- Assume N is divisible by the number of threads.
Solution
#include <omp.h> #include <stdio.h> #include <stdlib.h> int sum(int N, int nthreads, int tid) { int count = N / nthreads; int start = count * tid + 1; int p_sum = 0; for (int i = start; i < start + count; i++) { p_sum += i; } return p_sum; } int main (int argc, char *argv[]) { int N = atoi(argv[1]); int nthreads = atoi(argv[2]); int partial_sum[nthreads]; omp_set_num_threads(nthreads); #pragma omp parallel { int tid = omp_get_thread_num(); partial_sum[tid] = sum(N, nthreads, tid) ; } int sum = 0; for (int i = 0; i < nthreads; i++) { sum += partial_sum[i]; } printf("The sum of series is: %.4f\n", sum); return 0; }
Challenge 3:
- Write a program called
sum_series_2.c
that takes a single integerN
as a command line argument and calculate the sum of the firstN
non-negative integers.- Speed up the summation portion by using OpenMP.
- There is no assumtion that N is divisible by the number of threads.
Solution
#include <omp.h> #include <stdio.h> #include <stdlib.h> int sum(int N, int nthreads, int tid) { int count = N / nthreads; int start = count * tid; int end = start + count; int p_sum = 0; for (int i = start; i < end; i++) { p_sum += i; } if (tid < remainder) { p_sum += count * remainder + tid + 1; } return p_sum; } int main (int argc, char *argv[]) { int N = atoi(argv[1]); int nthreads = atoi(argv[2]); int partial_sum[nthreads]; omp_set_num_threads(nthreads); #pragma omp parallel { int tid = omp_get_thread_num(); partial_sum[tid] = sum(N, nthreads, tid) ; } int sum = 0; for (int i = 0; i < nthreads; i++) { sum += partial_sum[i]; } printf("The sum of series is: %.4f\n", sum); return 0; }
Hands-on 5: Trapezoid implementation with timing
- In the EXPLORER window, right-click on
csc466/openmp
and selectNew File
.- Type
trapezoid_time.c
as the file name and hits Enter.- Enter the following source code in the editor windows (You can copy the contents of
trapezoid.c
with function from Challenge 1 as a starting point):- Save the file when you are done:
Ctrl-S
for Windows/LinuxCommand-S
for Macs- Memorize your key-combos!.
#include <omp.h> #include <stdio.h> #include <stdlib.h> #include <time.h> int main (int argc, char *argv[]) { //init parameters and evaluators double a = atof(argv[1]); double b = atof(argv[2]); int N = atoi(argv[3]); int nthreads = atoi(argv[4]); double partial_sum[nthreads]; double h = ((b - a) / nthreads); clock_t start, end; omp_set_num_threads(nthreads); start = clock(); #pragma omp parallel { int tid = omp_get_thread_num(); /* number of trapezoids per thread */ int partial_n = N / nthreads; double delta = (b - a)/N; double local_a = a + h * tid; double local_b = local_a + delta; for (int i = 0; i < partial_n; i++) { partial_sum[tid] += (local_a * local_a + local_b * local_b) * delta / 2; local_a = local_b; local_b += delta; } } end = clock(); double sum = 0; for (int i = 0; i < nthreads; i++) { sum += partial_sum[i]; } printf("The integral is: %.4f\n", sum); printf("The run time is: %.4f\n", ((double) (end - start)) / CLOCKS_PER_SEC); return 0; }
- How’s the run time?
Key Points