whizard is hosted by Hepforge, IPPP Durham

Opened 14 years ago

Closed 14 years ago

#131 closed enhancement (fixed)

Parallelize helicity loop via OpenMP

Reported by: Christian Speckner Owned by: ohl
Priority: P2 Milestone: v2.3.1
Component: omega Version: 2.0.0beta
Severity: normal Keywords:
Cc:

Description

The helicity loop in the amplitude code can be parallelized via OpenMP on multicore / multicpu systems. Unfortunately, the code I wrote for 1.9x is a perl postprocessor for the amplitude and cannot be reused in W2 where it should be properly implemented in the O'Mega FORTRAN backend, but as an example of how this might be done, I've attached a parallelized amplitude for gg -> ggg in the SM which works and can be used as a drop-in replacement for the "normal" one (the compilation must be done by hand though as my perl code splits the amplitudes into chunks). Although this was originally done with ifort, I've checked that it also works with gfortran via the "-fopenmp" option. For this process, the speed gain on using two threads on two cores (via OMP_NUM_THREADS=2) is only a factor ~1.6, but for more complicated processes with more nonzero helicity combinations, it's pretty close to 2.

Attachments (3)

test.f90 (51.9 KB) - added by Christian Speckner 14 years ago.
test_decl.f90 (12.2 KB) - added by Christian Speckner 14 years ago.
test_pamp_21_21_21_21_21.f90 (41.2 KB) - added by Christian Speckner 14 years ago.

Download all attachments as: .zip

Change History (5)

Changed 14 years ago by Christian Speckner

Attachment: test.f90 added

Changed 14 years ago by Christian Speckner

Attachment: test_decl.f90 added

Changed 14 years ago by Christian Speckner

comment:1 Changed 14 years ago by ohl

Status: newassigned

As of r1917, I've started to add OpenMP directives.

Usage:

  • use --enable-openmp in configure (easier than hacking WHIZARD options)
  • add -fopenmp to gfortran in configure (e.g. FC="/archive/ohl/tools64/bin/gfortran -fopenmp")
  • set the OMP_NUM_THREADS environment variable.

Caveats:

  • it might run slower
  • it might burn more cycles
  • it might do both (currently, it does).

comment:2 Changed 14 years ago by ohl

Resolution: fixed
Status: assignedclosed

Works and scales linearly (tested up to 4 cores) in standalone O'Mega:

 max. threads:   4
       dynamic: elapsed   1.4087 seconds, elapsed * #threads:   5.6347 seconds, amp2 = 0.8660E+04
 #threads =  1, elapsed   5.2956 seconds, elapsed * #threads:   5.2956 seconds, amp2 = 0.8660E+04
 #threads =  2, elapsed   2.6757 seconds, elapsed * #threads:   5.3514 seconds, amp2 = 0.8660E+04
 #threads =  3, elapsed   1.8633 seconds, elapsed * #threads:   5.5899 seconds, amp2 = 0.8660E+04
 #threads =  4, elapsed   1.3568 seconds, elapsed * #threads:   5.4274 seconds, amp2 = 0.8660E+04
STOP 0
PASS: test_openmp

use --with-openmp in the O'Mega configure and make check.

Note: See TracTickets for help on using tickets.