Choose a file at random from different directories and sorting

I have a lot of text files spread across multiple directories. I would like to sort all files and create a list of filenames (in a text file), but in a specific and specific order. My initial thoughts are to randomly select the first file *1.txt

from these directories. Then repeat the ( *2.txt, *3.txt, etc.

) process until all of the filenames are included in the list. How do I do this in bash?

The basics:

Randomly selects a file from 1 directory:

shuf -n1 –e *

      

Selects the first file from 1 directory:

ls | sort -n | head -1

      

EXAMPLE:

UPDATED: strucutre / real file names format file (these are just a few files, there are several hundred)

Initial order:

media/sf_linux_sandbox/papers/
|-- semester_1
|   |-- cs630-linux_research_paper-fname_lname-001.txt
|   |-- cs635-progamming_languages-fname_lname-002.txt
|   |-- cs645-java_programming_paper-fname_lname-003.txt
|   `-- cs900-computer_robotics_capstone-fname_lname-004.txt
|-- semester_2
|   |-- cs650-software_methodologies-fname_lname-001.txt
|   |-- cs675-nosql_db_research-fname_lname-002.txt
|   |-- cs700-artificial_intelligence_reasearch-fname_lname-003.txt
|   |-- cs800-algorithms_and_computational_complexity-fname_lname-004.txt
|   |-- cs825-database_systems_internals-fname_lname-005.txt
|   `-- cs850-computer_graphics-fname_lname-006.txt
|-- semester_3
    |-- cs725-web_programming_technologies-fname_lname-001.txt
    |-- cs750-data_programming-fname_lname-002.txt
    `-- cs775-hardware_software_interface_paper-fname_lname-003.txt

      

Result / result I'm looking for to generate (shuffle files arbitrarily, but keep the sequence number):

results.txt
/filepath/cs650-software_methodologies-fname_lname-001.txt
/filepath/s630-linux_research_paper-fname_lname-001.txt
/filepath/cs725-web_programming_technologies-fname_lname-001.txt
/filepath/cs635-progamming_languages-fname_lname-002.txt
/filepath/cs750-data_programming-fname_lname-002.txt
/filepath/cs675-nosql_db_research-fname_lname-002.txt
/filepath/cs645-java_programming_paper-fname_lname-003.txt
/filepath/cs775-hardware_software_interface_paper-fname_lname-003.txt
/filepath/cs700-artificial_intelligence_reasearch-fname_lname-003.txt
/filepath/cs900-computer_robotics_capstone-fname_lname-004.txt
/filepath/cs800-algorithms_and_computational_complexity-fname_lname-004.txt
/filepath/cs825-database_systems_internals-fname_lname-005.txt
/filepath/cs850-computer_graphics-fname_lname-006.txt

      

+3


source to share


5 answers


This moves all the files in the original tree, partially sorts on the numeric part with a stable sort, so the rest of the items remain shuffled.

$ target=~/tmp/shuf
$ destination=/filepath/
$ tree $target
~/tmp/shuf
`-- papers
    |-- semester_1
    |   |-- cs630-linux_research_paper-fname_lname-001.txt
    |   |-- cs635-progamming_languages-fname_lname-002.txt
    |   |-- cs645-java_programming_paper-fname_lname-003.txt
    |   `-- cs900-computer_robotics_capstone-fname_lname-004.txt
    |-- semester_2
    |   |-- cs650-software_methodologies-fname_lname-001.txt
    |   |-- cs675-nosql_db_research-fname_lname-002.txt
    |   |-- cs700-artificial_intelligence_reasearch-fname_lname-003.txt
    |   |-- cs800-algorithms_and_computational_complexity-fname_lname-004.txt
    |   |-- cs825-database_systems_internals-fname_lname-005.txt
    |   `-- cs850-computer_graphics-fname_lname-006.txt
    `-- semester_3
        |-- cs725-web_programming_technologies-fname_lname-001.txt
        |-- cs750-data_programming-fname_lname-002.txt
        `-- cs775-hardware_software_interface_paper-fname_lname-003.txt

4 directories, 13 files
$ find $target -type f -iname "*.txt" \
   | shuf \
   | awk -F- '{printf("%s:%s\n", $0, $NF)}' \
   | sort -t : -k 2 -s \
   | cut -d : -f 1 \
   | xargs -n1 basename \
   | sed "s,^,$destination,"
/filepath/cs725-web_programming_technologies-fname_lname-001.txt
/filepath/cs650-software_methodologies-fname_lname-001.txt
/filepath/cs630-linux_research_paper-fname_lname-001.txt
/filepath/cs635-progamming_languages-fname_lname-002.txt
/filepath/cs750-data_programming-fname_lname-002.txt
/filepath/cs675-nosql_db_research-fname_lname-002.txt
/filepath/cs775-hardware_software_interface_paper-fname_lname-003.txt
/filepath/cs700-artificial_intelligence_reasearch-fname_lname-003.txt
/filepath/cs645-java_programming_paper-fname_lname-003.txt
/filepath/cs900-computer_robotics_capstone-fname_lname-004.txt
/filepath/cs800-algorithms_and_computational_complexity-fname_lname-004.txt
/filepath/cs825-database_systems_internals-fname_lname-005.txt
/filepath/cs850-computer_graphics-fname_lname-006.txt

      



To store the result in a file named filename

, you can redirect:

$ find $target -type f -iname "*.txt" \
   | shuf \
   | awk -F- '{printf("%s:%s\n", $0, $NF)}' \
   | sort -t : -k 2 -s \
   | cut -d : -f 1 \
   | xargs -n1 basename \
   | sed "s,^,$destination," \
   > filename

      

+4


source


I'm not sure I fully understand what you are asking about. Are you trying to sort numerically based on the numbers in the file name? If so, you will need to provide exact specifications for your filenames so that the correct regex can be used to extract the digit and sort from there ... Ie are your filenames always [az] [1-9] or are there multiple characters, special characters, etc.? If you can provide the real paths you are using, as well as the exact expected result, it will probably make things easier.

To answer the question "randomly select a file from different directories" ... Here are two very similar methods for displaying the path of one random file from each subdirectory of your current directory.

while IFS= read -r dir; do
    find "$dir" -maxdepth 1 -type f | shuf -n1
done < <(find -type d) > results.txt

      

Or...

shopt -s globstar
for dir in ./**/; do
    find "$dir" -maxdepth 1 -type f | shuf -n1
done > results.txt
shopt -u globstar

      



If you want to use the base name of each random file (rather than the full path), you can replace the internal command find

like this:

random="$(find "$dir" -maxdepth 1 -type f | shuf -n1)"
[[ -n $random ]] && echo "${random##*/}"

      

If you only want random txt files then just add the parameter -name '*.txt'

to the end of the internal command find

.

Note that I used the shuf command since you mentioned it in your question, but it could probably be solved just as easily using $ RANDOM.

+1


source


I was trying to match the output you provided in your post at the time when I wrote the answer.

#!/bin/bash
usage_exit () {
    echo "usage: $0 <target-directory>"
    exit 0
}

if [ $# != 1 ] ; then
    usage_exit
fi

# The pattern below searches files in the range 000 through 199.
# You can change the pattern to match your needs.
for n in {0..1}{0..9}{0..9}
    do find $1 -type f -name '*'$n'.txt' | shuf
done

      

+1


source


Another alternative that stores search results of type d in an array. Then find the largest number of regular files in any directory in your array, use that as the maximum boundary in the loop ((i=1; i<=max; i++))

, shuffle the array in each body of the loop, and then move it around, copying the file $i'th

in each one if it exists, and nothing if it doesn't (i.e. (i.e. if the directory has fewer $i

files).

#!/bin/bash

#shuffle function taken from http://mywiki.wooledge.org/BashFAQ/026
shuffle() {
   local i tmp size max rand

   # $RANDOM % (i+1) is biased because of the limited range of $RANDOM
   # Compensate by using a range which is a multiple of the array size.
   size=${#array[*]}
   max=$(( 32768 / size * size ))

   for ((i=size-1; i>0; i--)); do
      while (( (rand=$RANDOM) >= max )); do :; done
      rand=$(( rand % (i+1) ))
      tmp=${array[i]} array[i]=${array[rand]} array[rand]=$tmp
   done
}

destination=/filepath
max=0
shopt -s nullglob dotglob
while IFS= read -d $'\0' -r dir ; do
  array+=("$dir")
  count=$(ls -F "$dir" | egrep -v "^*[/*]$" | wc -l)
  ((count>max)) && max=$count
done < <(find . -mindepth 1 -type d -print0)

for ((i=1; i<=max; i++)); do
  shuffle
  for dir in "${array[@]}"; do
    file=$(find "$dir" -maxdepth 1 -type f -iname '*.txt' | sort -n | awk "NR==$i")
    [[ -n $file ]] && echo "$destination/$file" 
  done
done

      

Example

> tree
.
β”œβ”€β”€ script
β”œβ”€β”€ semester_1
β”‚   β”œβ”€β”€ cs630-linux_research_paper-fname_lname-001.txt
β”‚   β”œβ”€β”€ cs635-progamming_languages-fname_lname-002.txt
β”‚   β”œβ”€β”€ cs645-java_programming_paper-fname_lname-003.txt
β”‚   └── cs900-computer_robotics_capstone-fname_lname-004.txt
β”œβ”€β”€ semester_2
β”‚   β”œβ”€β”€ cs650-software_methodologies-fname_lname-001.txt
β”‚   β”œβ”€β”€ cs675-nosql_db_research-fname_lname-002.txt
β”‚   β”œβ”€β”€ cs700-artificial_intelligence_reasearch-fname_lname-003.txt
β”‚   β”œβ”€β”€ cs800-algorithms_and_computational_complexity-fname_lname-004.txt
β”‚   β”œβ”€β”€ cs825-database_systems_internals-fname_lname-005.txt
β”‚   └── cs850-computer_graphics-fname_lname-006.txt
└── semester_3
    β”œβ”€β”€ cs725-web_programming_technologies-fname_lname-001.txt
    β”œβ”€β”€ cs750-data_programming-fname_lname-002.txt
    └── cs775-hardware_software_interface_paper-fname_lname-003.txt

3 directories, 14 files
> ./script
/filepath/cs725-web_programming_technologies-fname_lname-001.txt 
/filepath/cs650-software_methodologies-fname_lname-001.txt
/filepath/cs630-linux_research_paper-fname_lname-001.txt
/filepath/cs750-data_programming-fname_lname-002.txt
/filepath/cs635-progamming_languages-fname_lname-002.txt
/filepath/cs675-nosql_db_research-fname_lname-002.txt
/filepath/cs645-java_programming_paper-fname_lname-003.txt
/filepath/cs775-hardware_software_interface_paper-fname_lname-003.txt
/filepath/cs700-artificial_intelligence_reasearch-fname_lname-003.txt
/filepath/cs900-computer_robotics_capstone-fname_lname-004.txt
/filepath/cs800-algorithms_and_computational_complexity-fname_lname-004.txt
/filepath/cs825-database_systems_internals-fname_lname-005.txt
/filepath/cs850-computer_graphics-fname_lname-006.txt

      

+1


source


I think this might be helpful:

dir='some/directory'
file=`/bin/ls -1 "$dir" | sort --random-sort | head -1`
path=`readlink --canonicalize "$dir/$file"` # Converts to full path
echo "The randomly-selected file is: $path"

      

Consider the following question .

Hope it helps.

Clemencio Morales Lucas.

-1


source







All Articles