whizard is hosted by Hepforge, IPPP Durham

Opened 15 years ago

Closed 15 years ago

#203 closed defect (fixed)

Compile clears process information.

Reported by: kilian Owned by: Christian Speckner
Priority: P3 Milestone: v2.0.1
Component: core Version: 2.0.0rc3
Severity: normal Keywords:
Cc:

Description (last modified by Christian Speckner)

Calling compile wipes all process information.

Change History (14)

comment:1 Changed 15 years ago by sschmidt

I'm not sure if this is what you meant, but I'll write it here nevertheless:

with r1771 W2 gives a segfault for the input file

process eemm = e1, E1 -> e2, E2

compile("eemm")

sqrts = 500 GeV
beams = e1, E1

integrate (eemm) { iterations = 3:5000, 2:5000 }

because of the 'compile("eemm")' commandlike in test/qedtest.sin.

comment:2 Changed 15 years ago by Juergen Reuter

Sebastians part of the ticket was caused by an un-associated pointer in cmd_compile_execute, if the process library with the corresponding name had not been declared/defined. Done in r1965. For Wolfgangs case the very thing he always demands is missing :D:D:D (guess: a test file).

comment:3 Changed 15 years ago by Juergen Reuter

Milestone: v2.0.0finalv2.0-rc4

comment:4 Changed 15 years ago by Juergen Reuter

Milestone: v2.0-rc4v2.0.0final

comment:5 Changed 15 years ago by Christian Speckner

The segfaults depends on the order of execution of the statements; e.g., the sindarin snippet

model = SM

process test1 = "e+", "e-" => "W+", "W-"

compile

process test2 = "e+", "e-" => Z, Z

compile

sqrts = 500 GeV

integrate (test1)
integrate (test2)

works fine while

model = SM

process test1 = "e+", "e-" => "W+", "W-"

compile

sqrts = 500 GeV
integrate (test1)

process test2 = "e+", "e-" => Z, Z

compile
integrate (test2)

triggers a segfault at the end. Furthermore, the crash only happens if the compilation actually takes place --- if the library is only loaded without recompiling, all is well. The backtrace for the crash is

(gdb) bt                                            
#0  0x7adc0124 in ?? ()                             
#1  0xb7dc5ffd in hard_interaction_final (hi=...) at hard_interactions.f90:349
#2  0xb7def343 in process_final (process=...) at processes.f90:1106           
#3  0xb7dd88b4 in process_store_final () at processes.f90:3157                
#4  0xb7e6b6e1 in whizard_final () at whizard.f90:145                         
#5  0xb7f722be in MAIN__ () at main.f90:275                                   
#6  0xb7f731bd in main (argc=3, argv=0xbfffea73 "/home/pestix/physik/local/svn/whizard/trunk/install/bin/whizard") at main.f90:30
#7  0x7ac30a51 in __libc_start_main () from /lib/libc.so.6                                                                       
#8  0x08048511 in _start ()

comment:6 Changed 15 years ago by Christian Speckner

Update: if the crash is triggerd (e.g. by whizard -r nonworking.sin), the pointer to the finalizer is obviously pointing into nirvana (the disassembly is trash), while the pointer is indeed valid and the disassembly correct if everything works well.

comment:7 Changed 15 years ago by Christian Speckner

Priority: P4P3
Severity: minorcritical
Summary: WHIZARD crashes upon finalization if compile is executed twice.Segfaults if compile is executed twice.

Update: This is a WHIZARD bug which is not related to the dlopen mechanism: at each compile statement, the process library is reloaded, but if the actual compilation takes, hard_interaction_data_init is not called again after reloading, leaving the procedure pointers for test1 in my example undefined. Furthermore, the issue is not necessarily restricted to finalization, the code

model = SM

process test1 = "e+", "e-" => "W+", "W-"

compile

sqrts = 500 GeV
integrate (test1)

process test2 = "e+", "e-" => Z, Z

compile
?rebuild_grids = true
integrate (test1)
integrate (test2)

leads to a segfault after the second integrate (test1) -> changing the description (and the ranking).

comment:8 Changed 15 years ago by Christian Speckner

OK, turns out my last assessment was not quite correct; final diagnosis: the segfault happens at the end of the chain

process_store_init_process -> process_store_get_fresh_process_pointer -> process_final -> hard_interaction_final -> hi%data$final

The reason for the different behavior in the two testcases is that dlopen seems to either load the library to the same location in memory or uses the already loaded copy if the library file has not been touched between opening and closing (the pointers thus staying valid), while, after a true recompilation, the library is loaded at a different location. To fix this properly, the finalizers must be called _before_ the library is unloaded, not after reloading. I'll try to find a fix later or tomorrow :)

comment:9 Changed 15 years ago by kilian

Owner: changed from kilian to Christian Speckner

Thanks for finding this, must have been a hard one!

comment:10 Changed 15 years ago by Christian Speckner

Resolution: fixed
Status: newclosed

Fixed in r2254 by calling process_store_final when the process library is unloaded. However, the solution is less elegant than one might wish as calling this directly from process_library_unload would introduce a circular dependency between processes and process_libraries. I worked around this by introducing a procedure pointer into process_library_t which can be set to an optional, additional destructor and which is set by process_store_init_process. However, when W2 is rewritten in a more object oriented manner at some point, this should be solved more elegantly ;)

comment:11 Changed 15 years ago by Christian Speckner

Description: modified (diff)
Milestone: v2.0.0finalv2.0.1
Resolution: fixed
Severity: criticalnormal
Status: closedreopened
Summary: Segfaults if compile is executed twice.Compile clears process information.
Version: 2.0rc22.0rc3

I've got the feeling that my fix is a bit too radical --- turns out that now, on executing compile, all the process information (integral etc.) is wiped by process_store_final. I'll devise a more elegant solution which just reloads the procedure pointers, but I don't think I'll manage before the pending release, so I'd say simply we should document this as a known pitfall in 2.0 and improve it in 2.0.1 - at least it's better than leaving the pointers in an undefined and possibly invalid state ;).

comment:12 Changed 15 years ago by Christian Speckner

Mostly fixed for good in r2279. An rare segfault remains when copies of process_t are involved in the game, but I'll fix that tomorrow.

comment:13 Changed 15 years ago by Christian Speckner

Uuups, meant r2280.

comment:14 Changed 15 years ago by Christian Speckner

Resolution: fixed
Status: reopenedclosed

Finally fixed for good in r2283. I'll add a test for this kind of trouble to the testsuite after the release has been rolled out.

Note: See TracTickets for help on using tickets.