Allocating a stack of `isbits` types in Julia

Summary of questions and answers

Objects of a specific type, say

type Foo
    a::A
    b::B
end

      

can be stored in one of two ways:

  • Inlined (aka by value): In this case, the "variable foo::Foo

    is stored in place x

    " statement effectively means that we have a variable foo.a::A

    in location x

    and a variable foo.b::B

    in location x + sizeof(A)

    (technically, addresses can be a little more complicated, but that doesn't matter for our purposes).

  • Reference (by reference): " foo::Foo

    stored in location x

    " means that the location x

    contains a pointer fooptr::Ptr{Foo}

    , so there is a variable foo.a::A

    at the location fooptr

    and foo.b::B

    at the point fooptr + sizeof(A)

    .

Unlike other languages ​​(I'm looking at you C/C++

), Julia decides for herself whether to store variables in strings or references, and this is done based on properties like:

  • mutable types -> link,
  • immutable types -> are referenced if at least one of its fields is mentioned, otherwise.

There are at least two reasons for this rule:

  • StefanKarpinski's answer: The garbage collector should be able to find all pointers to heap allocated objects on the stack. Julia currently enforces this by keeping all such pointers in a separate "shadow stack", but if we allowed composite types containing pointers to be pushed onto the stack, then such a neat separation would not be possible. Instead, the compiler must look for pointers among other variables, which create technical difficulties.

  • Yuyichao's answer: Julia requires inline / referential decision to be made both on principle and in isolation, which means hypothetical type

    immutable A
        a::A
    end
    
          

    would have to be infinitely large if we insisted on embedding it. Therefore, we would either have to forbid such recursive immutable types, or we could allow non-recursive immutable types as much as possible.


Original question

My understanding of memory management in Julia:

  • mutable types -> allocated heaps,
  • immutable types and tuples -> stack allocation if one of their fields is not heap allocated (i.e. changed).
However, I do not fully understand the reason for this behavior. I read somewhere that the problem with setting immutable stacks with pointers to variables is that the garbage collector may consider unavailable unavailable and destroy them prematurely. On the other hand, if we put the immutable on the heap, then there will still be a pointer to mutables, so it might seem like we avoided the problem, but in fact we just moved it to make sure that now the immutable itself won't be destroyed.

Can someone explain this to me who has only a very superficial knowledge of how garbage collection works?

+3


source to share


2 answers


A problem with stack allocation of objects that reference other objects, knowing to keep track of them during garbage collection. The easiest way to do this is what Julia does: the heap allocates objects and "roots" them using a "shadow stack" that is popped and synchronized with the actual stack. This introduces a fair bit of overhead and makes these objects heap allocated.

A more sophisticated approach to avoid shadow stack and heap allocation overhead is to stack those allocated objects and then scan the stack doing garbage collection and follow references from objects on the stack to objects on the heap. However, this requires knowing which objects on the stack are pointers to objects on the heap — in general, non-heap objects are not guaranteed to retain integrity or continuity across registers or the stack. One way to do this is called "conservative stack scan", which implies that gc assumes that any value on the stack that looks like this can be a pointer to an object on the heap. This approach has been used successfully in applications such as JavaScript Safari, but it is not without its problems.We reviewed the use of conservative stack scanning in Julia and an initial attempt was made to do so, but the effort was never completed.



Literature:

+6


source


There are several problems / concepts that are often confused with each other whenever this happens.



  • volatile or non-pointerfree immutable does not necessarily mean heap allocation, we already have optimization passes to overcome some of the optimizations and keep improving them.

  • The layout of the ABI object is user-visible behavior, not that the optimization skip is easy to change (unless he can prove that the local optimization he wants to do doesn't go away). The current ABI is that only immutable isbits will be kept inline (and "stack allocated" when used as a local variable). There is a fundamental limitation of removing the keyless reference requirement for an embedded object, i.e. The need to handle recursive types. It is not possible to do all the types in a control circle stored in a string, and the loop must be broken somewhere if we want to nest some of them. I believe we have a consistent and predictable model for doing this, although this desire is a different issue.

    This is somewhat related to performance, but not always. The inline text saved means more copy, so it's hard to make sure there is no regression if we do the switch.

    Edit: And I should also mention that no pointers is a sufficient condition for a free loop and is easier to compute, in part because we currently use it to interrupt nesting loops.

  • GC support. This is the easiest part. It is very easy to get the GC to recognize pointers on the stack. It just needs to be done if we decide to change the layout of the ABI object.

    Edit: And I must add that "GC support" is needed because we currently only support a limited / simple stack layout for object references (ie an array of pointers). This needs to be improved.

+5


source







All Articles