Introduction to OpenMP
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What is the OpenMP model?
Objectives
Know the target hardware architecture for OpenMP
Be able to compile and run an OpenMP program.
Understand the concept of parallel regions.
Target hardware
- Single computing node, multiple sockets, multiple cores.
- Dell PowerEdge M600 Blade Server.
- Intel Sandy Bridge CPU.
- In summary
- Node with up to four sockets.
- Each socker has up to 60 cores.
- Each core is an independent CPU.
- Each core has access to all the memory on the node.
Target software
- Provide wrappers for
threadsandfork/joinmodel of parallelism.
- Program originally runs in sequential mode.
- When parallelism is activated, multiple
threadsareforkedfrom the original proces/thread (masterthread).- Once the parallel tasks are done,
threadsarejoinedback to the original process and return to sequential execution.
- The threads have access to all data in the
masterthread. This isshareddata.- The threads also have their own private memory stack.
Basic requirements to write, compile, and run an OpenMP program
- Source code (C) needs to include
#include <omp.h>- Compiling task need to have
-fopenmpflag.- Specify the environment variable OMP_NUM_THREADS.
OMP directives
- OpenMP must be told when to parallelize.
- For C/C++,
pragmais used to annotate:#pragma omp somedirective clause(value, othervalue) parallel statement;
- or
#pragma omp somedirective clause(value, othervalue) { parallel statement 1; parallel statement 2; ... }
Hands-on 1: Setup directory
- Create a directory named
csc466inside your home directory, then change into that directory.- Next, create a directory called
openmp, and change into that directory$ cd $ mkdir csc466 $ cd csc466 $ mkdir openmp $ cd openmp
Hands-on 2: Create hello_omp.c
- In the EXPLORER window, right-click on
csc466/openmpand selectNew File.- Type
hello_omp.cas the file name and hits Enter.- Enter the following source code in the editor windows:
- Save the file when you are done:
Ctrl-Sfor Windows/LinuxCommand-Sfor Macs- Memorize your key-combos!.
#include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { /* Fork a team of threads giving them their own copies of variables */ #pragma omp parallel { /* Obtain thread number */ int tid = omp_get_thread_num(); printf("Hello World from thread = %d\n", tid); /* Only master thread does this */ if (tid == 0) { int nthreads = omp_get_num_threads(); printf("Number of threads = %d\n", nthreads); } } /* All threads join master thread and disband */ }
- Line 1: Include
omp.hto have libraries that support OpenMP.- Line 7: Declare the beginning of the
parallelregion. Pay attention to how the curly bracket is setup, comparing to the other curly brackets.- Line 10:
omp_get_thread_numgets the ID assigned to the thread and then assign it to a variable namedtidof typeint.- Line 15:
omp_get_num_threadsgets the value assigned toOMP_NUM_THREADSand return it to a variable namednthreadsof typeint.
What’s important?
tidandnthreads.- They allow us to coordinate the parallel workloads.
- Specify the environment variable OMP_NUM_THREADS.
$ export OMP_NUM_THREADS=4
Example: trapezoidal
- Problem: estimate the integral of on using trapezoidal rule. four threads.
- With 4 threads:
nthreads=4.
- How to decide which thread will handle which segment?
- How to get all results back together?
Hands-on 3: Trapezoid implementation
- In the EXPLORER window, right-click on
csc466/openmpand selectNew File.- Type
trapezoid.cas the file name and hits Enter.- Enter the following source code in the editor windows:
- Save the file when you are done:
Ctrl-Sfor Windows/LinuxCommand-Sfor Macs- Memorize your key-combos!.
#include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { //init parameters and evaluators double a = atof(argv[1]); double b = atof(argv[2]); int N = atoi(argv[3]); int nthreads = atoi(argv[4]); double partial_sum[nthreads]; double h = ((b - a) / nthreads); omp_set_num_threads(nthreads); #pragma omp parallel { int tid = omp_get_thread_num(); /* number of trapezoids per thread */ int partial_n = N / nthreads; double delta = (b - a)/N; double local_a = a + h * tid; double local_b = local_a + delta; for (int i = 0; i < partial_n; i++) { partial_sum[tid] += (local_a * local_a + local_b * local_b) * delta / 2; local_a = local_b; local_b += delta; } } double sum = 0; for (int i = 0; i < nthreads; i++) { sum += partial_sum[i]; } printf("The integral is: %.4f\n", sum); return 0; }
Hands-on 4: A bit more detailed
- Modify the
trapezoid.cso that it looks like below.- Save the file when you are done:
Ctrl-Sfor Windows/LinuxCommand-Sfor Macs- Memorize your key-combos!.
#include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { //init parameters and evaluators double a = atof(argv[1]); double b = atof(argv[2]); int N = atoi(argv[3]); int nthreads = atoi(argv[4]); double partial_sum[nthreads]; double h = ((b - a) / nthreads); omp_set_num_threads(nthreads); #pragma omp parallel { int tid = omp_get_thread_num(); /* number of trapezoids per thread */ int partial_n = N / nthreads; double delta = (b - a)/N; double local_a = a + h * tid; double local_b = local_a + delta; for (int i = 0; i < partial_n; i++) { partial_sum[tid] += (local_a * local_a + local_b * local_b) * delta / 2; local_a = local_b; local_b += delta; } printf("Thread %d calculate a partial sum of %.4f from %.4f to %.4f\n", tid, partial_sum[tid], a + h*tid, local_a); } double sum = 0; for (int i = 0; i < nthreads; i++) { sum += partial_sum[i]; } printf("The integral is: %.4f\n", sum); return 0; }
Challenge 1:
Alternate the
trapezoid.ccode so that the parallel region will invokes a function to calculate the partial sum.Solution
#include <omp.h> #include <stdio.h> #include <stdlib.h> double trap(double a, double b, int N, int nthreads, int tid) { double h = ((b - a) / nthreads); int partial_n = N / nthreads; double delta = (b - a)/N; double local_a = a + h * tid; double local_b = local_a + delta; double p_sum = 0; for (int i = 0; i < partial_n; i++) { p_sum += (local_a * local_a + local_b * local_b) * delta / 2; local_a = local_b; local_b += delta; } return p_sum; } int main (int argc, char *argv[]) { //init parameters and evaluators double a = atof(argv[1]); double b = atof(argv[2]); int N = atoi(argv[3]); int nthreads = atoi(argv[4]); double partial_sum[nthreads]; omp_set_num_threads(nthreads); #pragma omp parallel { int tid = omp_get_thread_num(); partial_sum[tid] = trap(a, b, N, nthreads, tid) ; } double sum = 0; for (int i = 0; i < nthreads; i++) { sum += partial_sum[i]; } printf("The integral is: %.4f\n", sum); return 0; }
Challenge 2:
- Write a program called
sum_series.cthat takes a single integerNas a command line argument and calculate the sum of the firstNnon-negative integers.- Speed up the summation portion by using OpenMP.
- Assume N is divisible by the number of threads.
Solution
#include <omp.h> #include <stdio.h> #include <stdlib.h> int sum(int N, int nthreads, int tid) { int count = N / nthreads; int start = count * tid + 1; int p_sum = 0; for (int i = start; i < start + count; i++) { p_sum += i; } return p_sum; } int main (int argc, char *argv[]) { int N = atoi(argv[1]); int nthreads = atoi(argv[2]); int partial_sum[nthreads]; omp_set_num_threads(nthreads); #pragma omp parallel { int tid = omp_get_thread_num(); partial_sum[tid] = sum(N, nthreads, tid) ; } int sum = 0; for (int i = 0; i < nthreads; i++) { sum += partial_sum[i]; } printf("The sum of series is: %.4f\n", sum); return 0; }
Challenge 3:
- Write a program called
sum_series_2.cthat takes a single integerNas a command line argument and calculate the sum of the firstNnon-negative integers.- Speed up the summation portion by using OpenMP.
- There is no assumtion that N is divisible by the number of threads.
Solution
#include <omp.h> #include <stdio.h> #include <stdlib.h> int sum(int N, int nthreads, int tid) { int count = N / nthreads; int start = count * tid; int end = start + count; int p_sum = 0; for (int i = start; i < end; i++) { p_sum += i; } if (tid < remainder) { p_sum += count * remainder + tid + 1; } return p_sum; } int main (int argc, char *argv[]) { int N = atoi(argv[1]); int nthreads = atoi(argv[2]); int partial_sum[nthreads]; omp_set_num_threads(nthreads); #pragma omp parallel { int tid = omp_get_thread_num(); partial_sum[tid] = sum(N, nthreads, tid) ; } int sum = 0; for (int i = 0; i < nthreads; i++) { sum += partial_sum[i]; } printf("The sum of series is: %.4f\n", sum); return 0; }
Hands-on 5: Trapezoid implementation with timing
- In the EXPLORER window, right-click on
csc466/openmpand selectNew File.- Type
trapezoid_time.cas the file name and hits Enter.- Enter the following source code in the editor windows (You can copy the contents of
trapezoid.cwith function from Challenge 1 as a starting point):- Save the file when you are done:
Ctrl-Sfor Windows/LinuxCommand-Sfor Macs- Memorize your key-combos!.
#include <omp.h> #include <stdio.h> #include <stdlib.h> #include <time.h> int main (int argc, char *argv[]) { //init parameters and evaluators double a = atof(argv[1]); double b = atof(argv[2]); int N = atoi(argv[3]); int nthreads = atoi(argv[4]); double partial_sum[nthreads]; double h = ((b - a) / nthreads); clock_t start, end; omp_set_num_threads(nthreads); start = clock(); #pragma omp parallel { int tid = omp_get_thread_num(); /* number of trapezoids per thread */ int partial_n = N / nthreads; double delta = (b - a)/N; double local_a = a + h * tid; double local_b = local_a + delta; for (int i = 0; i < partial_n; i++) { partial_sum[tid] += (local_a * local_a + local_b * local_b) * delta / 2; local_a = local_b; local_b += delta; } } end = clock(); double sum = 0; for (int i = 0; i < nthreads; i++) { sum += partial_sum[i]; } printf("The integral is: %.4f\n", sum); printf("The run time is: %.4f\n", ((double) (end - start)) / CLOCKS_PER_SEC); return 0; }
- How’s the run time?
Key Points








