Unix bash find file directories with 2 explicit file extensions

I am trying to create a small bash script that essentially traverses a directory containing hundreds of subdirectories. SOME of these subdirectories have textfile.txt and htmlfile.html, where the name text files and htmlfile are variables.

I only really care about the sub-directories that have both .txt and .html in them, all other sub-directories can be ignored.

Then I want to list all the .html and .txt files that are in the same subdirectory

it seems like a pretty simple problem to solve, but I'm at a loss. all I can really get is a line of code that outputs the sub directories with a .html or .txt file without linking to the actual subdirectory they are in, and I'm pretty new to bash scripting so I can't go any further

any help would be greatly appreciated

#!/bin/bash

files="$(find ~/file/ -type f -name '*.txt' -or -name '*.html')"

for file in $files
do 
echo $file

done

      

early

+3


source to share


5 answers


The following command find

checks each subdirectory and, if it has both html

and txt

files, it lists them all:

find . -type d -exec env d={} bash -c 'ls "$d"/*.html &>/dev/null && ls "$d"/*.txt &>/dev/null && ls "$d/"*.{html,txt}' \;

      

Explanation:



  • find . -type d

    All subdirectories of the current directory are displayed here.

  • -exec env d={} bash -c '...' \;

    This sets the environment variable d

    to the value of the found subdirectory and then runs the bash command, which is enclosed in single quotes (see below).

  • ls "$d"/*.html &>/dev/null && ls "$d"/*.txt &>/dev/null && ls "$d/"*.{html,txt}

    This is the bash command that is being executed. It consists of three statements and together. The first one checks if d

    there are any html files in the directory . If so, the second statement is executed and it checks if there are any txt files. If so, the last statement is executed and it lists all the html and txt files in the directory d

    .

This command is safe for all file and directory names that contain spaces, tabs, or other complex characters.

+2


source


You can do this by searching recursively with the option globstar

:



shopt -s globstar
for file in **; do
    if [[ -d $file ]]; then
        for sub_file in "$file"/*; do
            case "$sub_file" in
                *.html)
                    html=1;;
                *.txt)
                    txt=1;;
            esac
        done
        [[ $html && $txt ]] && echo "$file"
        html=""
        txt=""
    fi
done

      

+2


source


You can use -o

#!/bin/bash

files=$(find ~/file/ -type f -name '*.txt' -o -name '*.html')

for file in $files
do 
echo $file

done

      

+1


source


#!/bin/bash

#A quick peek into a dir to see if there at least one file that matches pattern
dir_has_file() { dir="$1"; pattern="$2";
  [ -n "$(find "$dir" -maxdepth 1 -type f -name "$pattern" -print -quit)" ]
}

#Assumes there are no newline characters in the filenames, but will behave correctly with subdirectories that match *.html or *.txt
find "$1" -type d|\
  while read d
  do
    dir_has_file "$d" '*.txt' && 
    dir_has_file "$d" '*.html' &&
    #Now print all the matching files
    find "$d" -maxdepth 1 -type f -name '*.txt' -o -name '*.html'
 done

      

This script takes the root directory to be searched as the first argument ($ 1).

+1


source


The command test

is what you need to check for the presence of every file in each of the subdirectories:

find . -type d -exec sh -c "if test -f {}/$file1 -a -f {}/$file2 ; then ls {}/*.{txt,html} ; fi" \;

      

where $file1

and $file2

are the two .txt and .html files you are looking for.

+1


source







All Articles