Copying files on a distributed system so that all servers have a copy of all files

Question

Copying files on a distributed system so that all servers have a copy of all files

Full disclaimer: This is an interview question:

There are M-machines. We need to copy datasets from these M machines to each other so that each server has a copy of all datasets. What is the most optimal algorithm for this?

I know I can solve this problem in O (MN) (where N is the average number of datasets on each machine), iterating through each server. Is there a better approach?

+3

synchronization algorithm concurrency data-structures

Arnold.Kern Apr 22 15 at 23:16

source to share

1 answer

mayo · Answer 1 · 2015-05-09T22:55:37+0000

How about a self-replication system?

http://en.wikipedia.org/wiki/Self-replication#A_self-reproducing_computer_program

eg.; If you have M = 100 machines, for each dataset you will have:

1tic: 1machine with the data
2tic: 2machines with the data
3tic: 4machines with the data
4tic: 8machines with the data
5tic: 16machines with the data
6tic: 32machines with the data
7tic: 64machines with the data
8tic: 64machines with the data
9tic: 100+machines with the data

I think it is less difficult than O (MN)

Copying files on a distributed system so that all servers have a copy of all files

More articles: