whizard is hosted by Hepforge, IPPP Durham

Opened 15 years ago

Closed 15 years ago

#251 closed defect (fixed)

Multiple beams statements cause unit leak

Reported by: Juergen Reuter Owned by: Christian Speckner
Priority: P2 Milestone: v2.0.1
Component: core Version: 2.0.0rc3
Severity: normal Keywords: compiler runtime error
Cc:

Description

With gfortran 4.5.0, I get the following runtime error in mappings.f90 using the Sindarin file for the nmssm_ext-dd.sin test. There are 120 processes in it, and the runtime error occurs on a 64bit Linux for process prc87.

************************************************************************                                                                                                                                                                                             
* Integrating d,D -> sb1,sb1c    @ sqrt(s) = 5000 GeV                                                                                                                                                                                                                
************************************************************************                                                                                                                                                                                             
sqrts =    5000.0000000000000                                                                                                                                                                                                                                        
Beam data (collision):                                                                                                                                                                                                                                               
 d     (mass = 0.0000000 GeV)                                                                                                                                                                                                                                        
 dbar  (mass = 0.0000000 GeV)                                                                                                                                                                                                                                        
 sqrts = 5000.00000000000 GeV                                                                                                                                                                                                                                        
| Integrating process 'prc87'                                                                                                                                                                                                                                        
At line 91 of file mappings.f90                                                                                                                                                                                                                                      
Fortran runtime error: Bad unit number in OPEN statement     

IF there is a limitation (at least a practical one) this should be documented. I'll test with NAG on 32 bit

Change History (12)

comment:1 Changed 15 years ago by Juergen Reuter

Component: configurecore
Keywords: compiler runtime error added
Milestone: v2.0.0final
Owner: changed from ALL to kilian
Version: 2.0rc3

comment:2 Changed 15 years ago by Juergen Reuter

Reproduced also with the NAG compiler on 32bit, the error happens at exactly the same point (no difference between 32 and 64 bit!), error message from NAG:

Runtime Error: Unit number -1 out of range
Program terminated by fatal I/O error
Aborted (core dumped)

comment:3 Changed 15 years ago by Juergen Reuter

Here is the backtrace from the NAG compiler, but its not really enlightening:

| Integrating process 'prc87'
Runtime Error: Unit number -1 out of range
Program terminated by fatal I/O error
mappings.f90, line 91: Error occurred in MAPPINGS:MAPPING_DEFAULTS_MD5SUM
commands.f90, line 3795: Called by COMMANDS:CMD_INTEGRATE_EXECUTE
commands.f90, line 1868: Called by COMMANDS:COMMAND_EXECUTE
commands.f90, line 5933: Called by COMMANDS:COMMAND_LIST_EXECUTE
whizard.f90, line 201: Called by WHIZARD:WHIZARD_PROCESS_STREAM
whizard.f90, line 177: Called by WHIZARD:WHIZARD_PROCESS_FILE
main.f90, line 217: Called by MAIN
Aborted (core dumped)

comment:4 Changed 15 years ago by Juergen Reuter

It is always prc87, also for the input file nmssm_ext-ee.sin. Why not 42?

comment:5 Changed 15 years ago by Juergen Reuter

The point is the hard-coded limit for output units from 11 to 99 in limits.f90. The FORTRAN standard strictly demands units to be only in that range, as units over 100 cannot be guaranteed to be supported by all processors. So what can we do about it? Flush in between and restart if possible, or leave this limit and just document it in the manual?

comment:6 in reply to:  5 Changed 15 years ago by ohl

Priority: P1P2
Severity: blockermajor

Replying to jr_reuter:

The point is the hard-coded limit for output units from 11 to 99 in limits.f90 So what can we do about it? [...] leave this limit and just document it in the manual?

I would document it in the manual and abort with a meaningful error message when free_unit () fails. I doubt that any user will use testsuite.m4 to create validations suites with O(100) processes ...

comment:7 Changed 15 years ago by Juergen Reuter

Summary: Fortran runtime error in mappingsLimitation in number of output units

Well, I split the m4 test macros. Question is where to put the catcher: do we forbid to define more than 80 processes in the SINDARIN file, or do we only forbid to run more than 80? Just put an error message for having no free unit really makes the user angry: after having run 86 processes get an error saying: Sorry, guys!

comment:8 in reply to:  7 Changed 15 years ago by ohl

Replying to jr_reuter:

Just put an error message for having no free unit really makes the user angry: after having run 86 processes get an error saying: Sorry, guys!

That's right. For the time being, I would limit the number of processes (50 sounds about right). For 2.0.1, we could lift this restriction by opening and closing the log file just before and after the process is run.

comment:9 Changed 15 years ago by Juergen Reuter

Milestone: v2.0.0finalv2.0.1
Severity: majornormal

In r2021 I limited the number of processes to 75. The rest will be done for v2.0.1

comment:10 Changed 15 years ago by Christian Speckner

Owner: changed from kilian to Christian Speckner

Actually, I just ran 456 - 1 = 455 catpiss tests without running into any problem whatsoever, and during the whole run, WHIZARD behaved decently according to lsof --- all opened files were closed again once the data was written, so the "unit leak" described in this ticket did not happen. I'll investigate further once I'm done with catpiss + FeynRules?, but to me this appears like an issue with the gfortran runtime on 64 bit (or some other commit fixed this one on the fly).

comment:11 Changed 15 years ago by Christian Speckner

Summary: Limitation in number of output unitsMultiple beams statements cause unit leak

Found the real issue: every beams statement triggers the allocation of a scratch unit, leading to unit leak.

comment:12 Changed 15 years ago by Christian Speckner

Resolution: fixed
Status: newclosed

Turns out the culprit was the MD5 calculation for structure functions. Fixed in r2127 (and the restriction on the number of processes is gone again).

Note: See TracTickets for help on using tickets.