Copying files on a distributed system so that all servers have a copy of all files
Full disclaimer: This is an interview question:
There are M-machines. We need to copy datasets from these M machines to each other so that each server has a copy of all datasets. What is the most optimal algorithm for this?
I know I can solve this problem in O (MN) (where N is the average number of datasets on each machine), iterating through each server. Is there a better approach?
+3
source to share
1 answer
How about a self-replication system?
http://en.wikipedia.org/wiki/Self-replication#A_self-reproducing_computer_program
eg.; If you have M = 100 machines, for each dataset you will have:
1tic: 1machine with the data
2tic: 2machines with the data
3tic: 4machines with the data
4tic: 8machines with the data
5tic: 16machines with the data
6tic: 32machines with the data
7tic: 64machines with the data
8tic: 64machines with the data
9tic: 100+machines with the data
I think it is less difficult than O (MN)
0
source to share