whizard is hosted by Hepforge, IPPP Durham

Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#380 closed task (duplicate)

Parallelization

Reported by: Juergen Reuter Owned by: kilian, cnspeckn, trudewind
Priority: P2 Milestone: v2.3.1
Component: core Version: 2.0.3
Severity: major Keywords:
Cc:

Description


Change History (5)

comment:1 Changed 14 years ago by Christian Speckner

After running a couple of processes and measuring the execution time as a function of OMP_NUM_THREADS, I attach here the results. The tests were done on a quadcore Core2 @ 2.66GHz using my branches/speckner/openmp_v2 OpenMP implementation. The tests demonstrate that the speedup is highly dependent on the process under consideration, ranging from a factor of nearly 3 (for four threads) to no improvement at all. While the results indicate that this piece of parallization really pays off for high multiplicity (final state >=4) processes with big flavor sums (which is good imho), they also suggest that the complexity of phasespace generation (generating points, not the maps) grows stronger than that of the matrix elements with the number of external legs. Also, for some processes, we seem to be entirely dominated by phasespace. Interestingly, this conclusion is also bolstered by monitoring the CPU time consumed by the threads in top: for processes with a significant speedup, this is between 80% and 100% of a single core each, while for others it can be less than 10% (I am not sure about the user value given by time in the results below; I have a feeling it might be off). I think detailed profiling for different processes would be very interesting to confirm whether those assertions are indeed correct.

At the moment, the parallelization is still a bit cumbersome to use: you have to build WHIZARD with -fopenmp (this is vital especially for O'Mega as all functions called in the parallel section have to be reentrant which is guaranteed by -fopenmp). When running WHIZARD, intercept it immediatelly after calling O'Mega, paste the command into the terminal and append -target:openmp and rerun WHIZARD with --recompile. The current implementation does not have a hardcoded limit on the number of threads anymore. It doesn't hurt to check the number of threads actually running via top :).

I think that this implementation could be merged into the trunk after I expose the functionality in a more convenient manner (i.e. a ?omega_support_openmp - like flag in sindarin); what's your opinion on that? Also, JR, if you're interested, it would be interesting how the heavier of the processes scale with the number of threads on the mighty DESY multicore machines...

OK, now for those results:

Process 1

model = SM                                                                                                                                               
process test = "e+", "e-" => "W+", "W-", A, A                                                                                                            
compile                                                                                                                                                  
                                                                                                                                                         
seed = 0                                                                                                                                                 
sqrts = 500 GeV                                                                                                                                          
cuts = all Pt > 5 GeV [A] and                                                                                                                            
   all abs (cos (Theta)) < 0.95 [A, "W+":"W-"] and                                                                                                       
   all abs (cos (Theta)) < 0.95 [A]                                                                                                                      
integrate (test)

OMP_NUM_THREADS = 4 :

real 73.71                                                                                                                                               
user 291.18                                                                                                                                              
sys 0.0

OMP_NUM_THREADS = 2 :

real 125.89                                                                                                                                              
user 249.24                                                                                                                                              
sys 0.12

OMP_NUM_THREADS = 1 :

real 226.97                                                                                                                                              
user 225.52                                                                                                                                              
sys 0.09

Process 2

model = SM                                                                                                                                               
alias parton = u:ubar:d:dbar:g                                                                                                                           
process test = parton, parton => parton, parton, parton, parton                                                                                          
$fcflags = "-fopenmp -O0"                                                                                                                                
compile                                                                                                                                                  
sqrts = 500 GeV                                                                                                                                          
seed = 0                                                                                                                                                 
cuts = all Pt > 10 GeV [parton] and                                                                                                                      
   all -0.95 < cos (Theta) < 0.95 [parton, parton] and                                                                                                   
   all -0.95 < cos (Theta) < 0.95 [parton]                                                                                                               
integrate (test)

OMP_NUM_THREADS = 4 :

real 5803.28                                                                                                                                             
user 22600.91                                                                                                                                            
sys 1.73

OMP_NUM_THREADS = 2 :

real 8604.38                                                                                                                                             
user 17000.73                                                                                                                                            
sys 1.07                                                                                      

OMP_NUM_THREADS = 1 :

real 12603.54                                                                                                                                            
user 12589.63                                                                                                                                            
sys 1.72

Process 3

model = SM                                                                                                                                               
process test = "e+", "e-" => "e+", nue, "e-", nuebar                                                                                                     
alias lepton = "e+":"e-"                                                                                                                                 
$fcflags = "-fopenmp -O0"                                                                                                                                
compile                                                                                                                                                  
seed = 0                                                                                                                                                 
sqrts = 500 GeV                                                                                                                                          
cuts = all Pt > 5 GeV [lepton]                                                                                                                           
integrate (test

OMP_NUM_THREADS = 4 :

real 50.53                                                                                                                                               
user 196.19                                                                                                                                              
sys 0.15

OMP_NUM_THREADS = 2 :

real 55.26                                                                                                                                               
user 107.32                                                                                                                                              
sys 0.17

OMP_NUM_THREADS = 1 :

real 63.60                                                                                                                                               
user 61.80                                                                                                                                               
sys 0.12

Process 4

model = SM                                                                                                                                               
process test = u:ubar:d:dbar,u:ubar:d:dbar => "e+", nue, "e-", nuebar                                                                                    
alias lepton = "e+":"e-"                                                                                                                                 
$fcflags = "-fopenmp -O0"                                                                                                                                
compile                                                                                                                                                  
seed = 0                                                                                                                                                 
sqrts = 14000 GeV                                                                                                                                        
beams = p, p => pdf_builtin                                                                                                                              
cuts = all Pt > 5 GeV [lepton]                                                                                                                           
integrate (test)

OMP_NUM_THREADS = 4 :

real 37.50                                                                                                                                               
user 144.88                                                                                                                                              
sys 0.14

OMP_NUM_THREADS = 2 :

real 43.86                                                                                                                                               
user 84.89                                                                                                                                               
sys 0.13

OMP_NUM_THREADS = 1 :

real 50.53                                                                                                                                               
user 48.88                                                                                                                                               
sys 0.09

Process 5

model = SM                                                                                                                                               
alias parton = u:ubar:d:dbar:g                                                                                                                           
alias lepton = "e+":"e-":"mu+":"mu-"                                                                                                                     
process test = parton, parton => lepton, lepton, parton, parton                                                                                          
$fcflags = "-fopenmp -O0"                                                                                                                                
compile                                                                                                                                                  
seed = 0                                                                                                                                                 
sqrts = 14000 GeV                                                                                                                                        
beams = p, p => pdf_builtin                                                                                                                              
cuts = all Pt > 5 GeV [lepton:parton] and                                                                                                                
   all abs (cos (Theta)) < 0.95 [parton] and                                                                                                             
   all abs (cos (Theta)) < 0.95 [parton,parton]                                                                                                          
integrate (test)

OMP_NUM_THREADS = 4 :

real 348.73                                                                                                                                              
user 1379.92                                                                                                                                             
sys 0.19

OMP_NUM_THREADS = 2 :

real 548.58                                                                                                                                              
user 1090.55                                                                                                                                             
sys 0.21

OMP_NUM_THREADS = 1 :

real 940.45                                                                                                                                              
user 937.52                                                                                                                                              
sys 0.29

Process 6

model = SM                                                                                                                                               
alias parton = u:ubar:d:dbar:g                                                                                                                           
alias lepton = "e+":"e-":"mu+":"mu-"                                                                                                                     
process test = "e+", "e-" => lepton, lepton, parton, parton, A, A                                                                                        
$fcflags = "-fopenmp -O0"                                                                                                                                
compile                                                                                                                                                  
seed = 0                                                                                                                                                 
sqrts = 500 GeV                                                                                                                                          
cuts = all Pt > 5 GeV [lepton:parton:A] and                                                                                                              
   all abs (cos (Theta)) < 0.95 [parton:A] and                                                                                                           
   all abs (cos (Theta)) < 0.95 [parton,parton] and                                                                                                      
   all abs (cos (Theta)) < 0.95 [A,parton:lepton]                                                                                                        
integrate (test)

OMP_NUM_THREADS = 4 :

real 2932.39                                                                                                                                             
user 6196.92                                                                                                                                             
sys 1.19

OMP_NUM_THREADS = 2 :

real 2985.87                                                                                                                                             
user 4110.38                                                                                                                                             
sys 1.02

OMP_NUM_THREADS = 1 :

real 3072.80                                                                                                                                             
user 3042.17                                                                                                                                             
sys 0.76

Process 7

model = SM                                                                                                                                               
alias parton = u:ubar:d:dbar:g                                                                                                                           
alias lepton = "e+":"e-":"mu+":"mu-"                                                                                                                     
process test = parton, parton => lepton, lepton, parton, parton, A, A                                                                                    
$fcflags = "-fopenmp -O0"                                                                                                                                
compile                                                                                                                                                  
seed = 0                                                                                                                                                 
sqrts = 500 GeV                                                                                                                                          
cuts = all Pt > 5 GeV [lepton:parton:A] and                                                                                                              
   all abs (cos (Theta)) < 0.95 [parton:A] and                                                                                                           
   all abs (cos (Theta)) < 0.95 [parton,parton] and                                                                                                      
   all abs (cos (Theta)) < 0.95 [A,parton:lepton]                                                                                                        
integrate (test)

OMP_NUM_THREADS = 4 :

real 11403.65                                                                                                                                            
user 23107.70                                                                                                                                            
sys 3.09

OMP_NUM_THREADS = 2 :

real 14134.74                                                                                                                                            
user 20695.98                                                                                                                                            
sys 2.51

OMP_NUM_THREADS = 1 :

real 17787.65                                                                                                                                            
user 17695.38                                                                                                                                            
sys 2.55

Process 8

model = SM                                                                                                                                               
alias parton = u:ubar:d:dbar:g                                                                                                                           
alias lepton = "e+":"e-":"mu+":"mu-"                                                                                                                     
process test = parton, parton => lepton, lepton, parton, parton, parton, A                                                                               
$fcflags = "-fopenmp -O0"                                                                                                                                
compile                                                                                                                                                  
seed = 0                                                                                                                                                 
sqrts = 500 GeV                                                                                                                                          
cuts = all Pt > 5 GeV [lepton:parton:A] and                                                                                                              
   all abs (cos (Theta)) < 0.95 [parton:A] and                                                                                                           
   all abs (cos (Theta)) < 0.95 [parton,parton] and                                                                                                      
   all abs (cos (Theta)) < 0.95 [A,parton:lepton]                                                                                                        
integrate (test)

OMP_NUM_THREADS = 4 :

real 34728.93                                                                                                                                            
user 98228.01                                                                                                                                            
sys 8.68

OMP_NUM_THREADS = 2 :

real 45609.35                                                                                                                                            
user 78104.33                                                                                                                                            
sys 8.18

OMP_NUM_THREADS = 1 :

real 61143.80                                                                                                                                            
user 61002.22                                                                                                                                            
sys 9.80

comment:2 Changed 14 years ago by Christian Speckner

With r3104, branches/speckner/openmp_v2 is merged into the trunk. OpenMP support can now be enabled by setting ?omega_openmp = true. This also works on a per-process basis; changing the setting for a process will cause WHIZARD to rebuild the corresponding matrix element, but will not invalidate exisiting phasespace, grids or events.

Keep in mind that you have to compile both the matrix element and WHIZARD with OpenMP flags (-fopenmp for gfortran) for the parallelization to work correctly; enabling the flags only for the matrix element means certain doom (a corresponding warning is printed when matrix elements with OpenMP support are generated). For the moment, I have disabled the configure options referring to the previous OpenMP implementation, but the code is just commented out and still there if we choose to reactivate it in the future.

The runtime memory consumption is determined by the number of threads as the OpenMP implementation has to duplicate the wavefunctions and brakets for each threads and shouldn't be significantly higher than that of the serial version. However, compiling with -fopenmp will cause more data to end up on the stack to ensure reentrance; this can trigger segfaults for very complicated processes which require the stack limit to be raised via ulimit -s (happened to me once).

Documentation on this feature is still missing in the manual, but I will add it next week.

comment:3 Changed 14 years ago by Juergen Reuter

Owner: changed from kilian to kilian, cnspeckn, trudewind

comment:4 Changed 14 years ago by Juergen Reuter

Resolution: duplicate
Status: newclosed

As someone has opened up another (more specialized ticket) I close this one as duplicate.

comment:5 Changed 14 years ago by kilian

@JR: Oops, you were faster than me. Here are my comments:

The helicity loop is done and works nicely.

Now that we have a clearer picture of what might be useful, I split this up into new tickets #414, #415, #415. Closing this one.

Note: See TracTickets for help on using tickets.