Copying files on a distributed system so that all servers have a copy of all files

Full disclaimer: This is an interview question:

There are M-machines. We need to copy datasets from these M machines to each other so that each server has a copy of all datasets. What is the most optimal algorithm for this?

I know I can solve this problem in O (MN) (where N is the average number of datasets on each machine), iterating through each server. Is there a better approach?


source to share

1 answer

How about a self-replication system?

eg.; If you have M = 100 machines, for each dataset you will have:

1tic: 1machine with the data
2tic: 2machines with the data
3tic: 4machines with the data
4tic: 8machines with the data
5tic: 16machines with the data
6tic: 32machines with the data
7tic: 64machines with the data
8tic: 64machines with the data
9tic: 100+machines with the data


I think it is less difficult than O (MN)



All Articles