How to search / replace a "binary" file from the command line

I have some data files to import into a database with some "unique" delimiters:

Field Separator (FS): SOH (ASCII character 1)

Write Separator (RS): STX (ASCII character 2) + \ n

I would like to import files into Postgres using the COPY command, but as long as I can specify my own field separator, it cannot handle the record separator.

I can't just strip the \ 002 from the data, because if there is (and is) a newline in one of the fields, it would be wrong if COPY thinks it is a new record when in fact it is not.

It's important to note that it doesn't matter that newlines in fields are preserved, it's fine if they just convert to space.

With this in mind, I was thinking of using something like sed to convert newlines to spaces and then convert \ 002 to newlines. However, since sed is a linear tool, it doesn't seem to see newlines at the end of each line, and cannot do search / replace on them.

Are there any other unix command line tools that could do the job?

EDIT: I guess what I'm really asking is a unix utility that can treat a file (search / replace) as "binary" without breaking it into lines

+1


source to share


2 answers


Based on Patrick's suggestion , I was able to do it using Perl:



cat file | perl -pe 's / \ 002 \ n / \ 002 \ 002 / g' | perl -pe 's / \ n // g' | perl -pe 's / \ 002 \ 002 / \ n / g'

+2


source


Can you do multiple passes through the file? Gear 1 converts all \ 002 \ n to \ 002 \ 002. Pass 2 can convert all \ n to spaces. Pass 3 can convert all \ 002 \ 002 to \ n.



+1


source







All Articles