Opened 15 years ago
Closed 15 years ago
#203 closed defect (fixed)
Compile clears process information.
Reported by: | kilian | Owned by: | Christian Speckner |
---|---|---|---|
Priority: | P3 | Milestone: | v2.0.1 |
Component: | core | Version: | 2.0.0rc3 |
Severity: | normal | Keywords: | |
Cc: |
Description (last modified by )
Calling compile wipes all process information.
Change History (14)
comment:1 Changed 15 years ago by
comment:2 Changed 15 years ago by
Sebastians part of the ticket was caused by an un-associated pointer in cmd_compile_execute, if the process library with the corresponding name had not been declared/defined. Done in r1965. For Wolfgangs case the very thing he always demands is missing :D:D:D (guess: a test file).
comment:3 Changed 15 years ago by
Milestone: | v2.0.0final → v2.0-rc4 |
---|
comment:4 Changed 15 years ago by
Milestone: | v2.0-rc4 → v2.0.0final |
---|
comment:5 Changed 15 years ago by
The segfaults depends on the order of execution of the statements; e.g., the sindarin snippet
model = SM process test1 = "e+", "e-" => "W+", "W-" compile process test2 = "e+", "e-" => Z, Z compile sqrts = 500 GeV integrate (test1) integrate (test2)
works fine while
model = SM process test1 = "e+", "e-" => "W+", "W-" compile sqrts = 500 GeV integrate (test1) process test2 = "e+", "e-" => Z, Z compile integrate (test2)
triggers a segfault at the end. Furthermore, the crash only happens if the compilation actually takes place --- if the library is only loaded without recompiling, all is well. The backtrace for the crash is
(gdb) bt #0 0x7adc0124 in ?? () #1 0xb7dc5ffd in hard_interaction_final (hi=...) at hard_interactions.f90:349 #2 0xb7def343 in process_final (process=...) at processes.f90:1106 #3 0xb7dd88b4 in process_store_final () at processes.f90:3157 #4 0xb7e6b6e1 in whizard_final () at whizard.f90:145 #5 0xb7f722be in MAIN__ () at main.f90:275 #6 0xb7f731bd in main (argc=3, argv=0xbfffea73 "/home/pestix/physik/local/svn/whizard/trunk/install/bin/whizard") at main.f90:30 #7 0x7ac30a51 in __libc_start_main () from /lib/libc.so.6 #8 0x08048511 in _start ()
comment:6 Changed 15 years ago by
Update: if the crash is triggerd (e.g. by whizard -r nonworking.sin
), the pointer to the finalizer is obviously pointing into nirvana (the disassembly is trash), while the pointer is indeed valid and the disassembly correct if everything works well.
comment:7 Changed 15 years ago by
Priority: | P4 → P3 |
---|---|
Severity: | minor → critical |
Summary: | WHIZARD crashes upon finalization if compile is executed twice. → Segfaults if compile is executed twice. |
Update: This is a WHIZARD bug which is not related to the dlopen mechanism: at each compile
statement, the process library is reloaded, but if the actual compilation takes, hard_interaction_data_init
is not called again after reloading, leaving the procedure pointers for test1
in my example undefined. Furthermore, the issue is not necessarily restricted to finalization, the code
model = SM process test1 = "e+", "e-" => "W+", "W-" compile sqrts = 500 GeV integrate (test1) process test2 = "e+", "e-" => Z, Z compile ?rebuild_grids = true integrate (test1) integrate (test2)
leads to a segfault after the second integrate (test1)
-> changing the description (and the ranking).
comment:8 Changed 15 years ago by
OK, turns out my last assessment was not quite correct; final diagnosis: the segfault happens at the end of the chain
process_store_init_process
-> process_store_get_fresh_process_pointer
-> process_final
-> hard_interaction_final
-> hi%data$final
The reason for the different behavior in the two testcases is that dlopen
seems to either load the library to the same location in memory or uses the already loaded copy if the library file has not been touched between opening and closing (the pointers thus staying valid), while, after a true recompilation, the library is loaded at a different location. To fix this properly, the finalizers must be called _before_ the library is unloaded, not after reloading. I'll try to find a fix later or tomorrow :)
comment:9 Changed 15 years ago by
Owner: | changed from kilian to Christian Speckner |
---|
Thanks for finding this, must have been a hard one!
comment:10 Changed 15 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fixed in r2254 by calling process_store_final
when the process library is unloaded. However, the solution is less elegant than one might wish as calling this directly from process_library_unload
would introduce a circular dependency between processes
and process_libraries
. I worked around this by introducing a procedure pointer into process_library_t
which can be set to an optional, additional destructor and which is set by process_store_init_process
. However, when W2 is rewritten in a more object oriented manner at some point, this should be solved more elegantly ;)
comment:11 Changed 15 years ago by
Description: | modified (diff) |
---|---|
Milestone: | v2.0.0final → v2.0.1 |
Resolution: | fixed |
Severity: | critical → normal |
Status: | closed → reopened |
Summary: | Segfaults if compile is executed twice. → Compile clears process information. |
Version: | 2.0rc2 → 2.0rc3 |
I've got the feeling that my fix is a bit too radical --- turns out that now, on executing compile
, all the process information (integral etc.) is wiped by process_store_final
. I'll devise a more elegant solution which just reloads the procedure pointers, but I don't think I'll manage before the pending release, so I'd say simply we should document this as a known pitfall in 2.0 and improve it in 2.0.1 - at least it's better than leaving the pointers in an undefined and possibly invalid state ;).
comment:12 Changed 15 years ago by
Mostly fixed for good in r2279. An rare segfault remains when copies of process_t are involved in the game, but I'll fix that tomorrow.
comment:14 Changed 15 years ago by
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Finally fixed for good in r2283. I'll add a test for this kind of trouble to the testsuite after the release has been rolled out.
I'm not sure if this is what you meant, but I'll write it here nevertheless:
with r1771 W2 gives a segfault for the input file
because of the 'compile("eemm")' commandlike in test/qedtest.sin.