Bash: remove a series of lines in variable A from variable B?
I am trying to do this, what is the most powerful way to achieve this goal?
#!/bin/bash
# Remove DOGS from CATSNDOGS to give CATS
DOGS="fido rover oscar bowwow spike max"
CATSNDOGS="bowwow figaro pussy oscar boots rover kitty max spike meowser fluffles fido"
CATS="" #?? How do I do this?
source to share
The answer is comm
innovative, but certainly not the only method. You can also do this exclusively in bash without using additional tools.
#!/bin/bash
DOGS="fido rover oscar bowwow spike max"
CATSNDOGS="bowwow figaro pussy oscar boots rover kitty max spike meowser fluffles fido"
# make an associative array...
declare -A dogs_a
for dog in $DOGS; do
dogs_a[$dog]=1;
done
CATS=""
# step through everything
for beast in $CATSNDOGS; do
# if it not a dog...
if [ -z "${dogs_a[$beast]}" ]; then
CATS="$CATS $beast"
fi
done
echo $CATS
Note that this also relies on spaces as field delimiters, and you should read about always enclosing your variables in quotes when programming in bash.
source to share
You can do this with a program comm
. The option -3
gets rid of the matched lines (not words) and the inputs need to be sorted, so there is a little more. Something like that:
comm -3 <(echo $DOGS | tr ' ' '\n' | sort) <(echo $CATSNDOGS | tr ' ' '\n' | sort)
To maintain the original input format (with spaces) and avoid creating temporary files, we convert the spaces to newlines, sort both inputs, and use them as "virtual" file arguments for comm
.
Edit: I didn't log the output, it just gets printed to stdout. You could tell CATS=$(...)
to keep it, although you may need to massage it a little to get back into space if you need to.
source to share
In one command, keeping the order of the cats, but using complex logic sed
:
sed -e 'N;s/^/ /;s/$/ /;s/\n/ \n /;bbegin' \
-e ':begin;s/ \(.*\) \(.*\)\n\(.*\) \1 / \2\n\3 /;tbegin' \
-e 's/^ //;s/ \n //' << EOF
$CATSNDOGS
$DOGS
EOF
This is explained by logic:
- Place
$CATSNDOGS
and$DOGS
on the same line, marked with a newline (\n
). - Add a space before and after
$CATSNDOGS
and$DOGS
to facilitate the following logic. - If a word is found before and after the newline, remove it.
- Try the above again until the word is deleted.
- Remove leading space and trailing space and new line before printing.
Edit
I understand that the above breaks down if the dog is not in $CATSNDOG
or if the dog is twice in $CATSNDOG
. Improved version:
sed -e 'N;s/^/ /;s/$/ /;s/\n/ \n /;bbegin' \
-e ':begin;s/ \(.*\) \(.*\)\n\(.*\) \1 / \2\n\3 \1 /;tbegin' \
-e 's/^ //;s/ \n.*//' << EOF
$CATSNDOGS
$DOGS
EOF
source to share
This is the job for join
using the print unsapraable lines ( -a
) argument . Then we keep the lines that end with a space and remove that space. To avoid using temporary files, we use process replacement bash
.
join -a 1 -j 1 -o 1.1,2.1 \
<(tr " " "\n" <<< "$CATSNDOGS" | sort) \
<(tr " " "\n" <<< "$DOGS" | sort) | sed -e '/ $/!d;s/ //'
It loses its original order $CATSNDOGS
, but we could easily add cat -n
a sort
to return the original order.
To return this value to a variable, do:
CATS="$(join -a 1 -j 1 -o 1.1,2.1 \
<(tr " " "\n" <<< "$CATSNDOGS" | sort) \
<(tr " " "\n" <<< "$DOGS" | sort) | sed -e '/ $/!d;s/ //' | paste -s -d " ")"
source to share