How to limit search and replace to only one column in CSV?

I have a CSV file with 4 columns, for example:

0001 @ fish @ animal @ eats worms

      

I am using sed

to find and replace in a file, but I need to limit this detection and replace with only the text inside column 3.

How can I find and replace just one column?

+3


source to share


3 answers


Do you really want to use sed

? What about csvfix

? Is your CSV nice and simple, no quotes or embedded commas or other nasty things that make regexes ... less satisfying way to work with a generic CSV file? I am assuming that @

is "comma" in your format.

Consider awk

instead sed

:

awk -F@ '$3 ~ /pattern/ { OFS= "@"; $3 = "replace"; }'

      

Perhaps you should have a BEGIN block that sets OFS once. For a single line of input, it didn't make any chance (and you probably would have a hard time writing the difference per million lines of input):

$ echo "pattern @ pattern @ pattern @ pattern" | 
> awk -F@ '$3 ~ /pattern/ { OFS= "@"; $3 = "replace"; }'
pattern @ pattern @replace@ pattern
$

      

If sed

it still seems attractive, then:

sed '/^\([^@]*@[^@]*\)@pattern@\(.*\)/ s//\1@replace@\2/'

      

For example (and note the slightly different input and output, you can fix it to handle the same awk

if needed):



$ echo "pattern@pattern@pattern@pattern" |
> sed '/^\([^@]*@[^@]*\)@pattern@\(.*\)/ s//\1@replace@\2/'
pattern@pattern@replace@pattern
$

      

The first regex searches for the beginning of a string, a non-character field, an at sign, another non-character field, and remembers the lot; it looks for an at sign, a pattern (which should be in the third field, since the first two fields have already been matched), another at sign, and then string deduction. When a string matches, it replaces the string with the first two fields (unchanged, as needed), then adds a third replacement field and the remainder of the string (unchanged, if required).

If you need to change rather than just replace the third field, consider using awk

either Perl or Python. If you are still attached to sed

, then you will investigate using hold space to hold a portion of the line while manipulating another portion in the template space, and eventually re-integrate the desired output line from the hold space and template before printing the line. It's almost as messy as it sounds; in fact, perhaps even more chaotic than it seems. I'll go with Perl (because I learned this a long time ago and it makes it pretty easy), but you can use whatever tool sed

you like.


Perl edits the third field. Note that the default output is $_

, which had to be reassembled from the auto-split fields in the array @F

.

$ echo "pattern@pattern@pattern@pattern" | sh -x xxx.pl
> perl -pa -F@ -e '$F[2] =~ s/\s*pat(\w\w)rn\s*/ prefix-$1-suffix /; $_ = join "@", @F; ' "$@"
pattern@pattern@ prefix-te-suffix @pattern
$

      

Explanation. Loop -p

means "loop", reading lines in $_

and printing $_

at the end of each iteration ". -a

Means" auto split $_

into an array @F

. "The value -F@

means field separator @

. Followed -e

by a Perl program. Arrays are indexed from 0 in Perl, so the third field is divisible by $F[2]

(sigil - @

or $

- varies depending on whether you are working with a value from an array or an array as a whole.- =~

this is a match operator, it applies a regular expression in RHS to a value in LHS. Wildcard recognizes zero or more spaces \s*

, followed by pat

then two words "words" that are memorized in $1

, thenrn

and zero or more spaces, perhaps there should be ^

and $

to bind to the beginning and end of the field. Replacement is a space, prefix- ', a remembered letter pair and' -suffix 'and a space. $_ = join "@", @F;

collects an input string $_

from the possible modified individual fields and then -p

prints that. (so maybe the best way to do it) but it works. And you can do arbitrary transformations on arbitrary fields in Perl without too much difficulty. Perl al therefore has a module Text::CSV

(and a high speed C version Text::CSV_XS

) that can handle really complex CSV files.

+4


source


Essentially, split the line into three parts, with the pattern you're looking for in the middle. Then keep the outer pieces and replace the middle.

/\([^@]*@[^@]*@\[^@]*\)pattern\([^@]*@.*\)/s//\1replacement\2/

\([^@]*@[^@]*@\[^@]*\)

- collect everything before the pattern, including the 3rd and any text before the math - this becomes \ 1



pattern

- the thing you are looking for

\([^@]*@.*\)

- collect everything after the template - this becomes \ 2

Then change that line to \1

, then to replacement

, then everything after pattern

which\2

+1


source


This might work for you:

echo 0001 @ fish @ animal @ eats worms|
sed 's/@/&\n/2;s/@/\n&/3;h;s/\n@.*//;s/.*\n//;y/a/b/;G;s/\([^\n]*\)\n\([^\n]*\).*\n/\2\1/'
0001 @ fish @ bnimbl @ eats worms

      

Explanation:

  • Define the field to process (in this case the third) and insert a new line ( \n

    ) before and immediately after it.s/@/&\n/2;s/@/\n&/3

  • Store the string in hold space. h

  • Remove margins on both sides s/\n@.*//;s/.*\n//

  • Now process the field i.e. change everything a's

    to b's

    .y/a/b/

  • Now add the original line. G

  • Replace the new field for the old field (removing any new lines as well). s/\([^\n]*\)\n\([^\n]*\).*\n/\2\1/

NB In step 4, the template space contains only a specific field, so any number of commands can be executed here and the result will not affect the rest of the line.

+1


source







All Articles