How can I use grep to search for specific words in fields in a flat file database?

Question

How can I use grep to search for specific words in fields in a flat file database?

I need this grep call:

grep "field3=highland" data_file

to return both results with "field3 = highland" as well as "field3 = chicago highland". How can I reconfigure the account grep call for both scenarios?

+2

grep

goe Sep 20 '09 at 12:33

source to share

5 answers

akf · Answer 1 · 2009-09-20T00:38:34+0000

you can use * wildcard

grep "field3=.*highland" data_file

Jim dennis · Answer 2 · 2009-09-20T01:30:13+0000

GE,

My advice would be to spend significantly more effort in writing your question.

You mentioned "grep tool (Linux)" and "SQL LIKE operator" ... in the thread ... then include a frankly obscure question that seems to be about matching two different input string fetch options.

You get answers that only guess what your real question is.

I think this is the question:

"I have data that contains some strings like: field3=highland

and field3=other stuff highland

, and I want to match all of those strings (filtering out everything else).

The simplest regex that could work would be:

grep "field3=.*highland

... but this will match things like "field3 = highlands" and "field3 = thighland" and "myfield3 = ..." etc. Also it will not match "field3 = ..." (with a space between the field designator and the equal sign).

Is "field3" supposed to be at the beginning of the line? Is the highland supposed to be anchored at the end of the line? If "highland" only matches if it is not a substring in the longer "word" (ie, if the character before "h" and after "d" is not an alphabet)?

There are many questions about your expected costs and desired results ... which will have a significant impact on the types of regexes that match or not.

Referencing SQL LIKE and% token expressions are mostly useless. For the most part, the% token in the SQL LIKE expression is equivalent to the ". *" Regular expression. If you have a piece of SQL that works (on the same input range) and are trying to find a functionally equivalent regex ... then you should take the time to insert into a working SQL expression.

This question isn't particularly specific for grep

(Linux or otherwise). It would be better tagged as a regex question.

In general, there are three or four general abstractions for pattern-matching text: regular expressions (with many variations), glob and wildmat patterns (such as shell and MS-DOS), and SQL LIKE expressions.

Of these regexes, the most used by programmers ... and by far the most complex. These range from the oldest simplest variants (including in the historical ed

UNIX line editors ed

that it grep

was originally shipped from), to more powerful "extended" versions (typed egrep

or grep -E

) and up to the insanely thoughtful "Perl-compatible regular expressions" (now widely used by other programming languages as PCRE libraries).

Glob templates are much simpler. They support shell wild cards ... simple initially? and * (any single character or any number of characters, respectively). Later enhancements supported by modern shells and other tools include support for character classes (for example, [0-9] for any digit and [a-zA-Z] for any letter, etc.). Some of them also support negative character classes.

Because glob patterns use special characters (? And *), which are similar to regular expression syntax, albeit for different purposes ... and because they use nearly identical syntax to describe character classes and their complements, glob patterns are often mistaken for ordinary expressions. When I teach classes in sysadmin, I usually have to do it in such a way that the students "wean" the sloppy terminology that is so common.

The old MS-DOS "wildmat" or "wildcard mapping" can be thought of as a variation on the original globe patterns. Does he only support? and * metacharacters ... with much of the same semantics as UNIX shell globbing. However, my advice is not to think of them that way. The underlying semantics of how the MS-DOS command line handles arguments containing these patterns is quite different, and it is a pitfall to think of them as "gulps". (A command like: is COPY *.TXT *.BAK

good at MS-DOS, while a UNIX command like: cp *.txt *.bak

is wrong for almost any reasonable situation.)

Obviously, as I described above, the SQL LIKE expression is very similar to a UNIX globe. Most basic SQL LIKE implementations have only two "special" or "meta" characters% (like *) and _ (like?).

Pay attention to the words of affection. I will not argue that% matches glob * and not _ matches glob ? the character. There might be some corner cases (as to how they can be done at the beginning or end of lines, or next to spaces, etc.). There might be differences between different SQL implementations, and there might even be some Cruftier versions of the UNIX / Linux fnmatch (globbing) libraries that might make a difference if you try to rely on such statements.

DigitalRoss · Answer 3 · 2009-09-20T00:37:56+0000

$ grep 'f=h\|f=c h' << eof
> f=c h
> f=h
> not
> going f= to
> match
> eof
f=c h
f=h
$

Or, if the idea is that it c

could be anything, perhaps something like:

$ grep 'f=.*h'

paxdiablo · Answer 4 · 2009-09-20T00:51:01+0000

If you want to get all lines with "field3 =" followed by any characters followed by "highland", you need:

grep 'field3=.*highland' data_file

'.'

means any character, and '*'

means zero or more occurrences of the last pattern. Thus '.*'

- virtually any string, including the empty one.

dmckee · Answer 5 · 2009-09-20T00:37:01+0000

If you want to match the third field of a string to a string (rather than match a literal " field3=highland

") grep

, this is not the right tool for you. In this case, consider awk

:

awk '$3=="highland" { print $0 }' <input file>

for an exact match or

awk '$3~".*highland.*" { print $0 }' <input file>

to match the regex.

Note that it awk

accepts a space as a field separator, but you can use " -F <field separator>

" to change it on the command line to

awk -F : '$1~".*oo.*" {print $0}' /etc/passwd

grabs the root line from the password file.

How can I use grep to search for specific words in fields in a flat file database?

More articles: