Opened 15 years ago
Closed 15 years ago
#251 closed defect (fixed)
Multiple beams statements cause unit leak
Reported by: | Juergen Reuter | Owned by: | Christian Speckner |
---|---|---|---|
Priority: | P2 | Milestone: | v2.0.1 |
Component: | core | Version: | 2.0.0rc3 |
Severity: | normal | Keywords: | compiler runtime error |
Cc: |
Description
With gfortran 4.5.0, I get the following runtime error in mappings.f90 using the Sindarin file for the nmssm_ext-dd.sin test. There are 120 processes in it, and the runtime error occurs on a 64bit Linux for process prc87.
************************************************************************ * Integrating d,D -> sb1,sb1c @ sqrt(s) = 5000 GeV ************************************************************************ sqrts = 5000.0000000000000 Beam data (collision): d (mass = 0.0000000 GeV) dbar (mass = 0.0000000 GeV) sqrts = 5000.00000000000 GeV | Integrating process 'prc87' At line 91 of file mappings.f90 Fortran runtime error: Bad unit number in OPEN statement
IF there is a limitation (at least a practical one) this should be documented. I'll test with NAG on 32 bit
Change History (12)
comment:1 Changed 15 years ago by
Component: | configure → core |
---|---|
Keywords: | compiler runtime error added |
Milestone: | → v2.0.0final |
Owner: | changed from ALL to kilian |
Version: | → 2.0rc3 |
comment:2 Changed 15 years ago by
comment:3 Changed 15 years ago by
Here is the backtrace from the NAG compiler, but its not really enlightening:
| Integrating process 'prc87' Runtime Error: Unit number -1 out of range Program terminated by fatal I/O error mappings.f90, line 91: Error occurred in MAPPINGS:MAPPING_DEFAULTS_MD5SUM commands.f90, line 3795: Called by COMMANDS:CMD_INTEGRATE_EXECUTE commands.f90, line 1868: Called by COMMANDS:COMMAND_EXECUTE commands.f90, line 5933: Called by COMMANDS:COMMAND_LIST_EXECUTE whizard.f90, line 201: Called by WHIZARD:WHIZARD_PROCESS_STREAM whizard.f90, line 177: Called by WHIZARD:WHIZARD_PROCESS_FILE main.f90, line 217: Called by MAIN Aborted (core dumped)
comment:4 Changed 15 years ago by
It is always prc87, also for the input file nmssm_ext-ee.sin. Why not 42?
comment:5 follow-up: 6 Changed 15 years ago by
The point is the hard-coded limit for output units from 11 to 99 in limits.f90. The FORTRAN standard strictly demands units to be only in that range, as units over 100 cannot be guaranteed to be supported by all processors. So what can we do about it? Flush in between and restart if possible, or leave this limit and just document it in the manual?
comment:6 Changed 15 years ago by
Priority: | P1 → P2 |
---|---|
Severity: | blocker → major |
Replying to jr_reuter:
The point is the hard-coded limit for output units from 11 to 99 in limits.f90 So what can we do about it? [...] leave this limit and just document it in the manual?
I would document it in the manual and abort with a meaningful error message when free_unit () fails. I doubt that any user will use testsuite.m4 to create validations suites with O(100) processes ...
comment:7 follow-up: 8 Changed 15 years ago by
Summary: | Fortran runtime error in mappings → Limitation in number of output units |
---|
Well, I split the m4 test macros. Question is where to put the catcher: do we forbid to define more than 80 processes in the SINDARIN file, or do we only forbid to run more than 80? Just put an error message for having no free unit really makes the user angry: after having run 86 processes get an error saying: Sorry, guys!
comment:8 Changed 15 years ago by
Replying to jr_reuter:
Just put an error message for having no free unit really makes the user angry: after having run 86 processes get an error saying: Sorry, guys!
That's right. For the time being, I would limit the number of processes (50 sounds about right). For 2.0.1, we could lift this restriction by opening and closing the log file just before and after the process is run.
comment:9 Changed 15 years ago by
Milestone: | v2.0.0final → v2.0.1 |
---|---|
Severity: | major → normal |
In r2021 I limited the number of processes to 75. The rest will be done for v2.0.1
comment:10 Changed 15 years ago by
Owner: | changed from kilian to Christian Speckner |
---|
Actually, I just ran 456 - 1 = 455 catpiss tests without running into any problem whatsoever, and during the whole run, WHIZARD behaved decently according to lsof
--- all opened files were closed again once the data was written, so the "unit leak" described in this ticket did not happen. I'll investigate further once I'm done with catpiss + FeynRules?, but to me this appears like an issue with the gfortran runtime on 64 bit (or some other commit fixed this one on the fly).
comment:11 Changed 15 years ago by
Summary: | Limitation in number of output units → Multiple beams statements cause unit leak |
---|
Found the real issue: every beams
statement triggers the allocation of a scratch unit, leading to unit leak.
comment:12 Changed 15 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Turns out the culprit was the MD5 calculation for structure functions. Fixed in r2127 (and the restriction on the number of processes is gone again).
Reproduced also with the NAG compiler on 32bit, the error happens at exactly the same point (no difference between 32 and 64 bit!), error message from NAG: