whizard is hosted by Hepforge, IPPP Durham

Opened 8 years ago

Closed 8 years ago

#783 closed defect (fixed)

OVM segmentation fault with quadruple precision

Reported by: Juergen Reuter Owned by: bcho
Priority: P0 Milestone: v2.3.0
Component: omega Version: 2.2.8
Severity: critical Keywords:
Cc:

Description

The WHIZARD functional test method_ovm_1 fails with quadruple precision by a segmentation fault, leading to the following backtrace:

   1        784     1.767E+03   3.39E+00    0.19    0.05    39.2

Program received signal SIGSEGV, Segmentation fault.
0x00007fffebd3de7a in omegavm95::vm_run (vm=0x7fffed69edc0 <opr_method_ovm_1_2_i1_mp_vm_>, mom=..., flavor=<error reading variable: Cannot access memory at address 0x0>, 
    color=<error reading variable: Cannot access memory at address 0x0>, helicity=...) at omegavm95.f90:620
620	        vm%momenta(i) = mom(:, i)            ! outgoing
(gdb) bt
#0  0x00007fffebd3de7a in omegavm95::vm_run (vm=0x7fffed69edc0 <opr_method_ovm_1_2_i1_mp_vm_>, mom=..., flavor=<error reading variable: Cannot access memory at address 0x0>, 
    color=<error reading variable: Cannot access memory at address 0x0>, helicity=...) at omegavm95.f90:620
#1  0x00007fffebd5844c in omegavm95::vm_new_event (vm=0x7fffed69edc0 <opr_method_ovm_1_2_i1_mp_vm_>, p=...) at omegavm95.f90:1867
#2  0x00007fffed3f15da in opr_method_ovm_1_2_i1::new_event (p=...) at method_ovm_1_2_i1.f90:160
#3  0x00007fffed3df831 in method_ovm_1_2_i1_new_event (p=...) at default_lib.f90:665
#4  0x00007ffff4f68b92 in prc_omega::prc_omega_compute_amplitude (amp=(<invalid float value>,<invalid float value>), object=0xd36ab0, j=1, p=..., f=1, h=1, c=1, fac_scale=0, ren_scale=0, alpha_qcd_forced=0x0, core_state=0xd59560)
    at prc_omega.f90:1010
#5  0x00007ffff4a6d69d in processes::term_instance_evaluate_interaction (term=0xd5df70, component=...) at processes.f90:4919
#6  0x00007ffff4aa83b5 in processes::process_instance_evaluate_trace (instance=0xd55f80) at processes.f90:5981
#7  0x00007ffff4ac24bf in processes::process_instance_evaluate_sqme (instance=0xd55f80, channel=1, x=...) at processes.f90:6279
#8  0x00007ffff4ac2b99 in processes::process_instance_evaluate (sampler=0xd55f80, c=1, x_in=..., val=<invalid float value>, x=..., f=...) at processes.f90:6317
#9  0x00007ffff4c780d5 in mci_base::mci_instance_evaluate (mci=0xd35e10, sampler=0xd55f80, c=1, x=...) at mci_base.f90:646
#10 0x00007ffff4cb6f9e in mci_vamp::vamp_sampling_function (f=<invalid float value>, xi=..., data=0xd6cde0, weights=..., channel=1, grids=...) at mci_vamp.f90:1838
#11 0x00007ffff33d6a33 in vamp_rest::vamp_sample_grid0 (rng=..., g=..., data=0xd6cde0, channel=1, weights=..., grids=..., exc=..., negative_weights=.FALSE.) at vamp.f90:738
#12 0x00007ffff33f5a4c in vamp_rest::vamp_sample_grids (rng=..., g=..., data=0xd6cde0, iterations=1, integral=0, std_dev=0, avg_chi2=<error reading variable: Cannot access memory at address 0x0>, 
    accuracy=<error reading variable: Cannot access memory at address 0x0>, history=..., histories=..., exc=..., eq=..., warn_error=<error reading variable: Cannot access memory at address 0x0>, negative_weights=.FALSE.)
    at vamp.f90:1941
#13 0x00007ffff4cb43cd in mci_vamp::mci_vamp_instance_sample_grids (instance=0xd35e10, rng=0xd57ee0, sampler=0xd55f80, eq=...) at mci_vamp.f90:1722
#14 0x00007ffff4caaeea in mci_vamp::mci_vamp_integrate (mci=0xd580f0, instance=0xd35e10, sampler=0xd55f80, n_it=3, n_calls=1000, results=0xd57bd0, pacify=4294967295) at mci_vamp.f90:1130
#15 0x00007ffff49f8c1c in processes::process_mci_entry_integrate (mci_entry=0xd57a90, instance=..., n_it=3, n_calls=1000, adapt_grids=4294967295, adapt_weights=4294967295, final=.FALSE., pacify=4294967295, i_component=1)
    at processes.f90:2675
#16 0x00007ffff49c43f7 in processes::process_integrate (process=0xcec260, instance=..., i_mci=1, n_it=3, n_calls=1000, adapt_grids=4294967295, adapt_weights=4294967295, final=.FALSE., pacify=4294967295) at processes.f90:1467
#17 0x00007ffff3f821b6 in integrations::integration_evaluate (intg=0x7fffffffaa70, process_instance=..., i_mci=1, pass=1, it_list=..., pacify=4294967295) at integrations.f90:486
#18 0x00007ffff3f97737 in integrations::integration_integrate (intg=0x7fffffffaa70, local=..., eff_reset=<error reading variable: Cannot access memory at address 0x0>) at integrations.f90:671
#19 0x00007ffff3fab082 in integrations::integrate_process (process_id=..., local=..., global=..., local_stack=<error reading variable: Cannot access memory at address 0x0>, 
    init_only=<error reading variable: Cannot access memory at address 0x0>, eff_reset=<error reading variable: Cannot access memory at address 0x0>) at integrations.f90:814
#20 0x00007ffff40e6709 in commands::cmd_integrate_execute (cmd=0xc5d1c0, global=...) at commands.f90:3022
#21 0x00007ffff41aac27 in commands::command_list_execute (cmd_list=0x7fffffffaeb0, global=...) at commands.f90:5762
#22 0x00007ffff41d8ab2 in whizard::whizard_process_stream (whizard=0x805950, stream=..., lexer=..., quit=.FALSE., quit_code=0) at whizard.f90:357
#23 0x00007ffff41d3fd5 in whizard::whizard_process_file (whizard=0x805950, file=..., quit=.FALSE., quit_code=0) at whizard.f90:332
#24 0x00007ffff7bcad89 in main () at main.f90:416
#25 0x000000000040bc9e in main ()

Change History (9)

comment:1 Changed 8 years ago by Juergen Reuter

My suspicion is that the ifort doesn't like the declaration

   real(default), dimension(0:3, *), intent(in) :: mom

comment:2 Changed 8 years ago by Bijan Chokoufe Nejad

But isn't this the exact declaration that is also in the Fortran written by OMega?

The backtrace makes not really sense for me. Could maybe the mom array get out of scope somewhere in the interface? Is it everywhere intent(in)? And what happens if you replace the declaration by dimension(:,:)? (But I think there was a reason why it is not this way)

comment:3 Changed 8 years ago by Bijan Chokoufe Nejad

Please also attach the used configure call. Didn't you say it doesn't happen with backtrace?

comment:4 Changed 8 years ago by Juergen Reuter

It now also happens with backtrace.

comment:5 Changed 8 years ago by Juergen Reuter

It's seems to be very volatile, maybe we could get Heinz Bast involved?

comment:6 Changed 8 years ago by Juergen Reuter

Priority: P3P0
Severity: normalcritical

This happens basically all the time when the test suite is run with the Intel compiler and quadruple precision. Pretty annoying.

comment:7 Changed 8 years ago by Bijan Chokoufe Nejad

Cannot reproduce:

Making check in functional_tests
...
============================================================================
Testsuite summary for WHIZARD 2.3.0
============================================================================
# TOTAL: 202
# PASS:  197
# SKIP:  2
# XFAIL: 3
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================

with this configure

/scratch/bcho/trunk/configure --prefix=/scratch/bcho/trunk/_install/ifort-quadruple FC=ifort FCFLAGS=-O2 -g -traceback  FFLAGS=-g -traceback  --enable-fastjet --enable-gosam --enable-lcio --enable-openloops --with-openloops=/data/bcho/OpenLoops/ --with-precision=quadruple

and

make -s V=0 check -j 10

Please give the exact configure options to reproduce failure

comment:8 Changed 8 years ago by Juergen Reuter

it happens both for

../configure --prefix=/home/reuter/local/packages/whizard/trunk/inst_ifort16_quad FC=ifort2016 F77=ifort2016 --enable-hepmc HEPMC_DIR=/home/reuter/local --enable-lcio LCIO_DIR=/home/reuter/local --enable-pythia8 --enable-fastjet --enable-gosam --enable-openloops --enable-distribution --with-precision=quadruple

and

../configure --prefix=/home/reuter/local/packages/whizard/trunk/inst_ifort16_quad_backtrace FC=ifort2016 F77=ifort2016 --enable-hepmc HEPMC_DIR=/home/reuter/local --enable-lcio LCIO_DIR=/home/reuter/local --enable-pythia8 --enable-fastjet --enable-gosam --enable-openloops --enable-distribution --with-precision=quadruple FCFLAGS=-g -traceback FFLAGS=-g -traceback

but not in every run. When using gdb, I was able to sometimes trigger it with the first setup, but never with the second (yet).

comment:9 Changed 8 years ago by Juergen Reuter

Resolution: fixed
Status: newclosed

This was an error in the omegalib that caused wrong memory to be read. Repaired in r7507.

Note: See TracTickets for help on using tickets.