Selecting Randon files from the folder tree

I have an organization of this folder

root/folder_1/file1_1 --up to-- file_5693
root/folder_2/file2_1 --up to-- file_100
root/folder_3/file3_1 --up to-- file_600
root/folder_4/file4_1 --up to-- file_689

      

I would like to select a number (1000 examples) of random files in each folder and put them all together in the output folder, but for folders with less than 200 files, I would like to copy all the files.

root_2/output:
file1_350
.
.
.
file2_1 --> file2_100
.
.
.
etc

      

How can i do this?

I tried to list all the folder names in a directory using a command dir

, but the folder numbers are not sequential. Any help?

0


source to share


1 answer


I might misunderstand, but I see no reason for ordering the folder names as you will copy them anyway. Below is a script to copy files inside folders which is in the root directory again.

You can simply change the following four variables ROOT_DIR

, OUT_DIR

, THRESHOLD_COPY

and N_RANDOM_COPY

.



% Define 
ROOT_DIR = './'; % where the subdirectories are located
OUT_DIR = './root2'; % copy destination
THRESHOLD_COPY = 200; % threshold for copying all files
N_RANDOM_COPY = 100; % number of files that you want to copy

dirList = dir(ROOT_DIR);
dirList = dirList(3:end); % first two are ./ and ../
dirOnlyIndicators = cell2mat({dirList.isdir});

dirs = dirList(dirOnlyIndicators);
for dirIterator = transpose(dirs)
  subdirList = dir([ ROOT_DIR dirIterator.name]);
  fileIndicators = ~cell2mat({subdirList.isdir});
  subfileList = {subdirList(fileIndicators)};
  nFiles = sum(fileIndicators);
  copyIndices = [];
  if nFiles > THRESHOLD_COPY
    copyIndices = randperm(nFiles);
    copyIndices = copyIndices(1:N_RANDOM_COPY);
  else
    copyIndices = 1:nFiles;
  end

  for copyIndex = copyIndices
      copyfile([ ROOT_DIR dirIterator.name '/' subfileList{copyIndex}.name],...
        [OUT_DIR '/' subfileList{copyIndex}.name],...
        'f');
  end
end

      

0


source







All Articles