whizard is hosted by Hepforge, IPPP Durham

Opened 12 years ago

Closed 12 years ago

#483 closed defect (fixed)

Discuss support for gfortran 4.6

Reported by: kilian Owned by: kilian
Priority: P0 Milestone: v2.2.0
Component: core Version: 2.1.1
Severity: critical Keywords:
Cc:

Description

With the latest changes, gfortran 4.5 is probably out of the game (no/broken OO support, etc.). We should aim at supporting 4.6. Currently, 4.6.3 is ok (some bugs, but workarounds possible).

Check how severe the problems with 4.6.0 are.

Attachments (6)

process_libraries_6.out (1.9 KB) - added by Juergen Reuter 12 years ago.
Process_libraries logfile
omega_interface_1.out (2.5 KB) - added by Juergen Reuter 12 years ago.
omega interfaces log
prclib_interfaces_4.out (1.2 KB) - added by Juergen Reuter 12 years ago.
prclib 4
prclib_interfaces_5.out (1.2 KB) - added by Juergen Reuter 12 years ago.
prclib 5
prclib_interfaces_6.out (1.2 KB) - added by Juergen Reuter 12 years ago.
prclib 6
fptr.f90 (746 bytes) - added by kilian 12 years ago.

Download all attachments as: .zip

Change History (36)

comment:1 Changed 12 years ago by Juergen Reuter

Unfortunately, 4.6.0 turns out to be desastrous :((((

prclib_interfaces.f90:410.65:

    write (unit, "(2x,9A)")  "use ", char (writer%get_module_name (id)), &
                                                                 1
Error: 'get_module_name' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:436.39:

    name = record%writer%get_c_procname (record%id, feature)
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:445.33:

       call writer%write_use_line (unit, record%id, feature)
                                 1
Error: 'write_use_line' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:864.36:

         char (writer%get_c_procname (id, var_str ("md5sum"))), " ())"
                                    1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:874.36:

         char (writer%get_c_procname (id, var_str ("get_md5sum"))), " ()"
                                    1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1003.36:

         char (writer%get_c_procname (id, feature)), &
                                    1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1016.36:

         char (writer%get_c_procname (id, feature)), " (", char (feature), ")"
                                    1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1069.36:

         char (writer%get_c_procname (id, var_str ("col_state"))), &
                                    1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1087.36:

         char (writer%get_c_procname (id, var_str ("col_state"))), &
                                    1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1137.36:

         char (writer%get_c_procname (id, var_str ("color_factors"))), " (cf)"
                                    1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1148.36:

         char (writer%get_c_procname (id, var_str ("color_factors"))), &
                                    1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1160.39:

            char (writer%get_c_procname (id, var_str ("get_md5sum"))), &
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1166.39:

            char (writer%get_c_procname (id, var_str ("get_md5sum")))
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1171.39:

            char (writer%get_c_procname (id, feature)), &
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1177.39:

            char (writer%get_c_procname (id, feature))
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1182.39:

            char (writer%get_c_procname (id, feature)), &
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1188.39:

            char (writer%get_c_procname (id, feature))
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1193.39:

            char (writer%get_c_procname (id, feature)), &
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1200.39:

            char (writer%get_c_procname (id, feature))
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1205.39:

            char (writer%get_c_procname (id, feature)), &
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1214.39:

            char (writer%get_c_procname (id, feature))
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1219.39:

            char (writer%get_c_procname (id, feature)), &
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1230.39:

            char (writer%get_c_procname (id, feature))
                                       1
Error: 'get_c_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1591.48:

       char (id), "_", char (writer%get_procname (feature)), &
                                                1
Error: 'get_procname' at (1) is not a member of the 'prc_writer_t' structure
prclib_interfaces.f90:1681.35:

       char (writer%get_module_name (id)), "_", &
                                   1
Error: 'get_module_name' at (1) is not a member of the 'prc_writer_t' structure
Fatal Error: Error count reached limit of 25.
make[2]: *** [prclib_interfaces.lo] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

comment:2 Changed 12 years ago by Juergen Reuter

Maybe to veto against 4.5.0 we should include into the OO check in the configure something that is implemented in 4.6.0 but not yet in 4.5.0. Is there something like this? Just saying that such a test is more robust and compiler-independent than checking the version number. Of course, we must be sure that 4.5.1-4 do not have this feature, already. After all, I guess we should then switch off the 4.5.0 check on Jenkins, obviously.

comment:3 Changed 12 years ago by Juergen Reuter

One more info: with gfortran 4.6.2 everything compiles, but one of the tests fails, the one for the process libraries:

FAIL: process_libraries.run (exit: 139)
=======================================

Running script ./process_libraries.run
| ============================================================================
| Running self-test: process_libraries
| ----------------------------------------------------------------------------
/afs/desy.de/group/theorie/software/packages/whizard/build/test/run_whizard.sh: line 24: 24653 Segmentation fault      (core dumped) ../src/whizard --logfile $basename.log --library lib$libname --rebuild $* $script.sin

comment:4 Changed 12 years ago by Juergen Reuter

4.6.0 should be checked by Jenkins, right? as well as 4.7.0? I'm going to check 4.7.1/2 these days ...

Changed 12 years ago by Juergen Reuter

Attachment: process_libraries_6.out added

Process_libraries logfile

comment:5 Changed 12 years ago by Juergen Reuter

Component: configurecore
Owner: changed from ALL to kilian
Priority: P1P0
Severity: normalblocker

With gfortran 4.7.2 it compiles, but ALL tests fail! For process_libraries, test 6 fails (file attached) For omega_interfaces, file attached. For prclib_interfaces, it is tests 4,5,6.

Changed 12 years ago by Juergen Reuter

Attachment: omega_interface_1.out added

omega interfaces log

Changed 12 years ago by Juergen Reuter

Attachment: prclib_interfaces_4.out added

prclib 4

Changed 12 years ago by Juergen Reuter

Attachment: prclib_interfaces_5.out added

prclib 5

Changed 12 years ago by Juergen Reuter

Attachment: prclib_interfaces_6.out added

prclib 6

comment:6 Changed 12 years ago by kilian

Thanks!

If the tests fail but produce output, it's not too bad .. earlier gfortrans died with segfault or memory corruption.

Let's see ...

comment:7 Changed 12 years ago by kilian

OK, this is trivial: JR, on your system the filename of a shared lib is

libname.0.so

instead of

libname.so.0

which I assumed. So the reference output is not portable. Otherwise everything ok. Which system was that?

comment:8 Changed 12 years ago by Juergen Reuter

Hahahaha, no, actually it was MAC OS X, and the name is something like libname.0.so indeed. Actually, I am completely confused because I thought on MAC OS X it is _always_ libname.dylib! Somehow WK seemed to have overruled the MAC OS X convention, which is for sure not autoconf/make -compliant which makes me kinda nervous, actually!

comment:9 Changed 12 years ago by Juergen Reuter

Moreover there was a spurious LaTeX error I got. But maybe only spurios.

comment:10 Changed 12 years ago by Juergen Reuter

4.7.1 seems to behave identically compared to 4.7.2.

comment:11 Changed 12 years ago by Juergen Reuter

Same also for 4.7.0, so it seems, we can tackle the problem for all 4.7.x.

comment:12 Changed 12 years ago by Juergen Reuter

Actually, 4.6.3 shows exactly the same errors as 4.7.x.

comment:13 Changed 12 years ago by Juergen Reuter

For 4.6.2 it is almost the same, except that I get a bus error/seg fault for the process libraries test. On linux actually, the omega and prclib tests work, so probably it is indeed only the incompatibility between hardwired library assumptions and MAC OS X. Which is problem though!!

comment:14 Changed 12 years ago by Juergen Reuter

Finally, 4.6.1 shows the same error as 4.6.0 reported above. To be discussed!

comment:15 Changed 12 years ago by Juergen Reuter

This is the backtrace I get for the seg fault in the process_libraries test:

(gdb) bt
#0  0x00007ffff6a76d20 in process_libraries::process_def_list_append (list=..., entry=0x60ca50) at process_libraries.f90:727
#1  0x00007ffff6a8a7c6 in process_libraries::process_libraries_2 (u=12) at process_libraries.f90:1327

comment:16 Changed 12 years ago by Juergen Reuter

Ok, doing the full backtrace with a debug mode-compiled program yields:

Running test: process_libraries_1
Program received signal SIGSEGV, Segmentation fault.
0x0000000000409e7e in iso_varying_string::len_ (string=<error reading variable: Cannot access memory at address 0x100000000>) at iso_varying_string.f90:1009
1009        if(ALLOCATED(string%chars)) then
(gdb) bt
#0  0x0000000000409e7e in iso_varying_string::len_ (string=<error reading variable: Cannot access memory at address 0x100000000>) at iso_varying_string.f90:1009
#1  0x00000000005dac9a in process_libraries::process_def_write (object=..., unit=12) at process_libraries.f90:527
#2  0x00000000005d53b8 in process_libraries::process_def_list_write (object=..., unit=12) at process_libraries.f90:690
#3  0x00000000005cd6a0 in process_libraries::process_libraries_1 (u=12) at process_libraries.f90:1248
#4  0x0000000000413435 in unit_tests::test (test_proc=0x5cd513 <process_libraries::process_libraries_1>, name=<error reading variable: Cannot access memory at address 0x614d78>,
    description=<error reading variable: Cannot access memory at address 0x614d66>, u_log=11, results=..., _name=19, _description=18) at unit_tests.f90:145
#5  0x00000000005cd7bc in process_libraries::process_libraries_test (u=11, results=...) at process_libraries.f90:1222
#6  0x00000000005f30d5 in whizard::whizard_check (check=..., lhapdf_present=.TRUE., results=...) at whizard.f90:122
#7  0x00000000005f7f4d in MAIN__ ()
#8  0x00000000005f8e44 in main ()
(gdb)

comment:17 Changed 12 years ago by kilian

Priority: P0P2
Severity: blockercritical

Summary of the present situation:

All gfortran versions from 4.6.3 on are ok.

We don't support versions prior to 4.6.0.

4.6.0 to 4.6.2 have problems. Since the current tests work with later versions, these are compiler bugs. We still have to check whether it is feasible to work around those bugs. I rank this down now, but the issue has to be resolved before the next release.

comment:18 Changed 12 years ago by Juergen Reuter

Priority: P2P0

For the distribution that is very important and remains my most urgent task.

comment:19 Changed 12 years ago by kilian

With Janus' response to JR's enquiry, the 4.6.x issues may be solvable. Will check this asap, provided the Siegen network is up and running again.

comment:20 Changed 12 years ago by kilian

Given the new failure with r4050, I'm inclined for trashing 4.6.x altogether ... arggh!

Not yet giving up ...

Changed 12 years ago by kilian

Attachment: fptr.f90 added

comment:21 Changed 12 years ago by kilian

Attachment fptr.f90 isolates the bug in 4.6.3.

The problem occurs when I extend a basic type, such that the extended type contains a procedure pointer. The target is a function.

Calling the function, after correctly assigning the procedure pointer, results in segfault. The problem doesn't occur with a non-polymorphic type.

JR: Maybe you could find a corresponding bugzilla entry (Janus?)? The bug appears to be fixed from in 4.7 and later, so it should be a known problem.

Apparently, a workaround is to change the function into a subroutine. Pointers to subroutines work (at least in a short test). This is not nice, but maybe I should do it this way ... it would be more natural to get the OMega amplitude as a function.

comment:22 Changed 12 years ago by Juergen Reuter

I sent an email to Janus. I'm completely confused about the status of the different compilers at the moment. For the reason of SL7 and other distributions (like Debian) I would definitely demand to keep 4.6.x (let's see how small x could really be).

comment:23 Changed 12 years ago by kilian

So I work around the issue with 4.6.3 in r4060. The matrix element code is accessed only by subroutines, not by functions. (Doesn't affect the O'Mega-generated code, only the automatically generated driver code.)

The tests with 4.6.3 are successful, again.

comment:24 Changed 12 years ago by kilian

r4066 revealed another bug in gfortran 4.6.3.

This time, unrelated to OO stuff. Instead, it is triggered by an allocatable scalar containing an allocatable array. (The latter being in reality the ISO varying string type.)

Here is a minimal example:

module objects

  type :: data_t
     real(4) :: number = 0
     character, dimension(:), allocatable :: chars
  end type data_t

  type :: object_t
     type(data_t), allocatable :: data
  end type object_t


contains

  subroutine sub
    type(object_t) :: object
    call do_something (object)
  end subroutine sub
  
  subroutine do_something (object)
    type(object_t), intent(in) :: object
  end subroutine do_something

end module objects


program main

  use objects

  call sub

end program main

With gfortran 4.6.3, this compiles, but segfaults immediately when run. Note that there is no real code executed, it is just the memory layout.

This is REALLY annoying. gfortran 4.6 is much worse than 4.5 in that respect: too many bugs in supported features. Fortunately, 4.7 is in a much better shape, but what shall we do?

comment:25 Changed 12 years ago by Juergen Reuter

Is there a simple and painless way to work around this!?

comment:26 Changed 12 years ago by kilian

Yes, in the concrete case it's trivial. But I fear that issues like this surface every other day ... and in general allocatable scalars are mandatory for data abstraction.

comment:27 Changed 12 years ago by Juergen Reuter

For gfortran 4.6.2 the problem is the following: WK starts the test with an empty process_def_list, whose entires first and last are set to => null (). However, gfortran 4.6.2 incorrectly interpretes them to be associated. I don't see a simple way to program around this!? However, if that problem is really unavoidable and persists in 4.6.2, it does not make much sense to try to get gfortran 4.6.[1-2] to work, right? I mean I'm fine to veto from now on in that version against 4.6.[0-2] if WK gives his final statement / opinion about this. WK, do you know a way to work around this problem with 4.6.2?

comment:28 Changed 12 years ago by Juergen Reuter

!?

comment:29 Changed 12 years ago by Juergen Reuter

Summary: Discuss support for gfortran 4.5, 4.6Discuss support for gfortran 4.6

Discussion about gfortran 4.5 closed, 4.6 still pending ...

comment:30 Changed 12 years ago by Juergen Reuter

Resolution: fixed
Status: newclosed

After the discussions today we decided that supporting gfortran 4.6.0/1/2 is not worth the effort as those early versions of the compiler do have severe bugs and deficiencies. At the moment, the strategy for the future is to keep gfortran 4.6.3+ on the boat as long as possible, as this will be the default compiler for the next Debian (correct?). I already removed the 4.6.0 gfortran tests from the Jenkins test such that they do not bother us any longer. The vetoing against gfortran 4.6.0/1/2 is done r4090.

Note: See TracTickets for help on using tickets.