Python izip and memory iteration

I have nested function calls where multiprocessing is also applied. izip or repetition or whatever seems to be making copies of the objects rather than passing them by reference, meanwhile boxing and unboxing is also done.

Here is the structure in order of call:

def main():
    print 'Rel_list id main: %s' % str(id(rel_list))
    par_objective(folder.num_proc, batch, r, folder.d, vocab_len, \
                                          rel_list, lambdas)

def par_objective(num_proc, data, params, d, len_voc, rel_list, lambdas):
    pool = Pool(processes=num_proc) 

    # non-data params
    oparams = [params, d, len_voc, rel_list]

    print 'Rel_list id paro: %s' % str(id(rel_list))
    result = pool.map(objective_and_grad, izip(repeat(oparams),split_data))


 def objective_and_grad(par_data):
    (params, d, len_voc, rel_list),data = par_data

    print 'Rel_list id obag: %s' % str(id(rel_list))

      

Output:

ID IN MAIN
Rel_list id main: 140694049352088
ID IN PAR_OBJECTIVE
Rel_list id paro: 140694049352088
IDs IN OBJECTIVE_AND_GRAD (24 Processes):
Rel_list id obag: 140694005483424
Rel_list id obag: 140694005481840
Rel_list id obag: 140694311306232
Rel_list id obag: 140694048889168
Rel_list id obag: 140694057601144
Rel_list id obag: 140694054472232
Rel_list id obag: 140694273611104
Rel_list id obag: 140693878744632
Rel_list id obag: 140693897912976
Rel_list id obag: 140693753182328
Rel_list id obag: 140694282174976
Rel_list id obag: 140693900442800
Rel_list id obag: 140694271314328
Rel_list id obag: 140694276073736
Rel_list id obag: 140694020435696
Rel_list id obag: 140693901952208
Rel_list id obag: 140694694615376
Rel_list id obag: 140694271773512
Rel_list id obag: 140693899163264
Rel_list id obag: 140694047135792
Rel_list id obag: 140694276808432
Rel_list id obag: 140694019346088
Rel_list id obag: 140693897455016
Rel_list id obag: 140694067166024
Rel_list id obag: 140694278467024
Rel_list id obag: 140694010924280
Rel_list id obag: 140694026060576

BACK TO MAIN, RINSE AND REPEAT
Rel_list id main: 140694049352088
Rel_list id paro: 140694049352088

      

As you can see, the identifier for the list is the same in main () and par_obj (), but changes when passed to the multiprocessing pool

multiprocessing forks in copy-on-write mode and the list never changes, but changing the id, does that mean the memory is copied or only the id changes?

Is there a way to check if the memory has been copied? If they are copies, why are they copied?

+3


source to share


1 answer


Your python objects are mutable; you create additional references to them, so the number of references in the object changes and is created by the OS copy.



Any Python object that a subprocess needs to access must have an independent reference count from the main process. Since this kind of Python multiprocessing will never use the same memory area, a copy will always be needed.

+4


source







All Articles