Difference between local allocated and automatic arrays

I am interested in the difference between alloc_array

and automatic_array

in the following example:

subroutine mysub(n)
integer, intent(in)  :: n
integer              :: automatic_array(n)
integer, allocatable :: alloc_array(:)

allocate(alloc_array(n))
...[code]...

      

I'm familiar enough with the base of the distribution (not so much the advanced techniques) to know that allocation allows you to resize the array in the middle of the code (as pointed out in this question ), but I'm interested in considering a case where you don't needto resize the array; they can be passed to other routines to work, but the only purpose of both variables in the code and any routine is to hold the data of the dimension array n

(and possibly change the data, but not the size).

(1) Is there a difference in memory usage?I'm not an expert on low-level procedures, but I have very little knowledge of how they matter and how they can affect higher (the kind of experience I'm talking about): once trying to run a large code in fortran, I got an error that i didn't understand, sysadmin told me "oh yeah you are probably saturating the stack, try adding this line in your startup script", all that gives me an idea of ​​how to deal with this when actually coding and not need to be corrected later, is encouraged). I was told that it might depend on many other things, such as the compiler or architecture, but I realized from these answers that they do not quite know exactly how it was. Whether it is completely dependent on many factors, or whether there is a default / intended behavior in the coding,which can then be overly spared by compiling keywords or system settings?

(2) Did the routines have different interface needs? Again, not an expert, but this happened to me before because of the way I declare the variables of the subroutines, so I end up having to put the subroutines in a module. I have been given to understand that this can differ depending on whether I am using items that are special to the variable being allocated. I am thinking of a case where everything I do with variables can be done with both allocatables and automation, rather than intentionally using anything specific to allocatables (other than allocating before use, that is).

Finally, in case this is helpful: the reason I ask is that we are developing in a group and we recently noticed that different people use these two ads in different ways and we needed to determine if it is something that can be left to personal preference, or there may be reasons why it might be a good idea to set clear criteria (and how to set those criteria). I do not need detailed answers, I am trying to determine if I need to do this in order to be careful in how we use it, and in what aspects it should be aimed at research.

While I would be interested to know about "interesting tricks" of how this can be done in allocation, they are not directly related to the need for variability in size, I leave them for a possible future follow-up question and focus here on strictly functional differences (which means: that I directly tell the compilers with my code). The two points I mentioned are what I might have come up with from previous experience, but any other important that I am missing and need to consider, please mention them.

+3


source to share


2 answers


For clarity, I will briefly discuss the terminology. Of the two arrays, both are local variables and rank 1 arrays.

  • alloc_array

    is the array to allocate;
  • automatic_array

    - an automatic object with an explicit form.

Again, as in the linked question, after the allocation expression, both arrays are sized n

. I will answer here that they are still two different things. Of course, the allocated array can change the distribution status and move the selection. I'll leave both of them (mostly) outside the scope of this answer. Of course, an allocated array shouldn't change these things once it is defined.

Memory usage

What has been partially controversial about the previous revision of the question is how poorly defined the concept of memory usage is. Fortran, as a standard, tells us that both arrays are the same size and they will have the same storage layout and are contiguous at the same time. Beyond that, a lot follows the terms you'll hear a lot: implementation specific, processor dependent.

In a comment, you expressed your interest in ifort. So that I don't wander too far, I'll just stick with this one compiler and point out what to consider.

Often, ifort pushes automatic objects and array time series onto the stack. There is a (default) compiler option -no-heap-arrays

described to have the effect

The compiler places automatic arrays and temporary arrays in the stack storage area.

Using an alternate -heap-arrays

gives you some control over this:

This option places automatic arrays and arrays created for temporary computations on the heap instead of the stack.

It is possible to control the size thresholds for which the heap / stack is selected (when known at compile time):

If the compiler cannot determine the size at compile time, it always puts the automatic array on the heap.

Since n

it is not a constant expression, then you would expect your array automatic_array

to be on the heap with this option regardless of the size specified.



There's probably more to say, but it might be too long if I try.

Interface requirements

There is nothing special about the interface requirements of a subroutine mysub

: local variables do not affect this. Any program call that will be happy with the implicit interface. What you are asking is how two local arrays can be used.

It pretty much boils down to what two arrays can be used for.

If the dummy argument of the second procedure has the allocatable attribute, then only that allocated array can be passed to that procedure. It must also have an explicit interface. It is true whether the procedure changed the distribution.

Of course, both arrays can be passed as dummy type argument arguments without the allocatable attribute, and then we don't have different interface requirements.

Anyway, why would one need to pass an argument to the dummy assignable when there will be no changes in the allocation status, etc.? There are good reasons:

  • there may be a code path in a procedure that has an allocation change (controlled by a switch, say);
  • allocatable dummy arguments also skip bounds;
  • etc.

This second is more obvious if the subroutine had a BOM

subroutine mysub(n)
integer, intent(in)  :: n
integer              :: automatic_array(2:n+1)
integer, allocatable :: alloc_array(:)

allocate(alloc_array(2:n+1))

      

Finally, the automatic object has fairly strict conditions for its size. n

is explicitly allowed here, but things don't have to be much more complicated before allocation is the only plausible way. Depending on how much you need to play with the constructions block

.

Taking also a comment from IanH: if we have a very large n

, automatic object can crash and burn. With allocatable, you can use an option stat=

to come to a friendly compiler runtime agreement.

+1


source


Since gfortran or ifort + Linux (x86_64) are some of the most popular combinations used for HPC, I did some performance comparison between local allocatable and auto arrays for these combinations. The processor used is Xeon E5-2650 v2@2.60GHz and the compilers are gfortran4.8.2 and ifort14.0. The testing program is as follows.

In test.f90:

!------------------------------------------------------------------------           
subroutine use_automatic( n )
    integer :: n

    integer :: a( n )   !! local automatic array (with unknown size at compile-time)
    integer :: i

    do i = 1, n
        a( i ) = i
    enddo

    call sub( a )
end

!------------------------------------------------------------------------           
subroutine use_alloc( n )
    integer :: n

    integer, allocatable :: a( : )  !! local allocatable array                      
    integer :: i

    allocate( a( n ) )

    do i = 1, n
        a( i ) = i
    enddo

    call sub( a )

    deallocate( a )  !! not necessary for modern Fortran but for clarity                  
end

!------------------------------------------------------------------------           
program main
    implicit none
    integer :: i, nsizemax, nsize, nloop, foo
    common /dummy/ foo

    nloop = 10**7
    nsizemax = 10

    do i = 1, nloop
        nsize = mod( i, nsizemax ) + 1

        call use_automatic( nsize )
        ! call use_alloc( nsize )                                                   
    enddo

    print *, "foo = ", foo   !! to check if sub() is really called
end

In sub.f90:

!------------------------------------------------------------------------
subroutine sub( a )
    integer a( * )
    integer foo
    common /dummy/ foo

    foo = a( 1 )
ends

      

In the above program, I tried to avoid compiler optimizations that eliminate (:) itself (i.e. no operation) by placing sub () in another file and making the interface implicit. First, I compiled the program using gfortran as

gfortran -O3 test.f90 sub.f90

      

and checked different nsizemax values, keeping nloop = 10 ^ 7. The result is shown in the following table (time in seconds measured several times by the time command).

nsizemax    use_automatic()    use_alloc()
10          0.30               0.31               # average result
50          0.48               0.47
500         1.0                0.90
5000        4.3                4.2
100000      75.6               75.7

      

So the total time seems to be almost the same for the two calls when -O3 is used (but see Editing for different options). Then I compiled the ifort as

[O3]  ifort -O3 test.f90 sub.f90
or
[O3h] ifort -O3 -heap-arrays test.f90 sub.f90

      

In the first case, the automatic array is stored on the stack, and when arrays are connected, the array is stored on the heap. The result is

         use_automatic()    use_alloc()
         [O3]    [O3h]      [O3]    [O3h]
10       0.064   0.39       0.48    0.48
50       0.094   0.56       0.65    0.66
500      0.45    1.03       1.12    1.12
5000     3.8     4.4        4.4     4.4
100000   74.5    75.3       76.5    75.5

      

So, for ifort, using automatic arrays seems useful when relatively small arrays are mainly used. On the other hand, gfortran-O3 shows no difference, because both arrays are treated the same way (see Editing for more details).



Additional comparison:

Below is the output for the Oracle Fortran 12.4 compiler for Linux (used with f90-O3). The general trend seems to be similar; automatic arrays are faster for small n, which indicates internal stack usage.

nsizemax    use_automatic()    use_alloc()
10          0.16               0.45
50          0.17               0.62
500         0.37               0.97
5000        2.04               2.67
100000      65.6               65.7

      


Edit

Thanks to Vladimir's comment, it turns out that gfortran -O3 piles up automatic arrays (with unknown size at compile time). This explains why use_automatic () and use_alloc () didn't make any difference above. So I made another comparison between the various options below:

[O3]  gfortran -O3
[O5]  gfortran -O5
[O3s] gfortran -O3 -fstack-arrays
[Of]  gfortran -Ofast                   # this includes -fstack-arrays

      

This -fstack-arrays

means that the compiler pushes all local arrays of unknown size onto the stack. Note that this flag is enabled by default with -Ofast

. The resulting result

nsizemax    use_automatic()               use_alloc()
            [Of]   [O3s]  [O5]  [O3]     [Of]  [O3s]  [O5]  [O3]
10          0.087  0.087  0.29  0.29     0.29  0.29   0.29  0.29
50          0.15   0.15   0.43  0.43     0.45  0.44   0.44  0.45
500         0.57   0.56   0.84  0.84     0.92  0.92   0.92  0.92
5000        3.9    3.9    4.1   4.1      4.2   4.2    4.2   4.2
100000      75.1   75.0   75.6  75.6     75.6  75.3   75.7  76.0

      

where the average of ten measurements is shown. This table shows that if enabled -fstack-arrays

, execution time for small n becomes shorter. This trend is consistent with the ifort results above.

It should be noted, however, that the above comparison appears to be in a "best case" scenario that highlights the difference between the two, so the time difference can be much smaller in practice. For example, I compared the times for the above parameters using some other program (including both small and large arrays) and the results did not significantly affect the stack parameters. Of course, the result must also depend on the architecture of the machine, as well as on the compilers. Thus, your mileage may vary.

+4


source







All Articles