Find lines starting with "t", continue vowel and total length 4
I have a file that contains over 300 words. I need to find lines starting with "t", continue with vowel and total length 4. Then I need to convert them to a format where each line has one word.
tr -s "[[:blank:]]" "\n" < file | grep .
With this I can format the file, but I cannot figure out how I can select the words with the above requirement. I am stuck:/
i.e. I have a file that includes "terra train chair tol mourn". I need to format this file like this:
tera
train
chair
tola
mourn
and find the ones that start with "t" and continue with a vowel with a total length of 4. So it should be like this:
tera
tola
source to share
You can use grep for this. If you just want the first word from a string:
grep -Eow '^t[aeiou]\S{2}' file > formatted_file
If you need to match the entire string:
grep -Eow '^t[aeiou]\S{2}$' file > formatted_file
-
^
searches at the beginning of a string. -
t
matches exactly the letter "t". -
[aeiou]
matches any of the characters between[
and]
. -
\S{2}
matches 2 characters without spaces -
$
matches end of line -
-w
means grep will match whole words, which effectively limits your search to the exact number of characters specified inPATTERN
. -
-o
means that you only output an exact match (in this case your 4-letter word)
EDIT
You can also use a parameter -i
if you want to grep
ignore case (upper and lower case)
source to share
Following perl oneliner
perl -nle 'push @A,$_ for /\bt[aeiou]..\b/gi;END{print"@A"}' <file
It is not clear if a single line of input can contain many words or if all words of the output must be on one line.
perl -nle 'print for /\bt[aeiou]..\b/gi' <file
The following grep updates are enough for this to work
grep -i '^t[eaiou][a-z][a-z]$' <file
source to share