Opened 14 years ago
Closed 14 years ago
#391 closed task (fixed)
Profiling
Reported by: | Juergen Reuter | Owned by: | kilian |
---|---|---|---|
Priority: | P3 | Milestone: | v2.0.6 |
Component: | core | Version: | 2.0.4 |
Severity: | critical | Keywords: | speed |
Cc: |
Description
Profiling is necessary for large LHC 2->6 processes, in the SM as well as especially in BSM models. I am using specifically the following process, testable with the svn tree.
model = NMSSM_Hgg alias tau_b = e3:E3:b:B alias proton = u:d:s:c:U:D:S:C:g alias fjet = proton:tau_b process pp_4t = proton, proton => fjet, fjet, e3, E3, e3, E3 compile ?slha_read_decays = true read_slha("nmssm.slha") cuts = all Pt >= 10 GeV [fjet] and all -5 <= Eta <= 5 [fjet] and all Dist > 0.4 [fjet,fjet] sqrts = 14 TeV beams = p,p => lhapdf integrate (pp_4t) { iterations = 5:50000,15:100000 }
The code generation takes roughly order of an hour, compilation and phase space generation 1-2 days. The first iteration is now running 2 days, still waiting for the first number.
Change History (3)
comment:1 Changed 14 years ago by
Owner: | changed from ALL to kilian |
---|---|
Status: | new → assigned |
comment:3 Changed 14 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
After some mangling with static executables, profiling works, and I close the ticket. Individual profile results may be attached to the tickets where they are useful, e.g., #389, #380.
Here's a howto:
- configure Whizard with --enable-fc-profiling
- make a static executable, say whizard-static
compile as "whizard-static"
- run this executable instead of whizard
This should set -pg flags wherever needed, and the run produces gmon.out, which can be interpreted as usual:
gprof whizard-static > gprof.out
To give profiling a chance, I fixed the configure option --enable-fc-profiling in -r2995, also added --enable-fc-static which is needed here. Note that for useful profiling, one has to create a standalone executable
and then run this executable in place of whizard.
For a simpler 2->6 process I verified that most of the running time is in the matrix element, as it should be, so the only likely speedup would be parallelization. But I'll check the present example as well.