Find all matches of a two-character string in a text file and swap them

Searching a text file for an underscore preceded by a punctuation mark --- [.?!;:]_

--- I want to change the order.

For example, given the line

On this _line,_ I show an example. !_

I want to change it to:

On this _line_, I show an example. _!

I can find all cases, say Silver Searcher or ripgrep:

rg '[.?!;:]_' myfile.txt

but I'm not sure how those two characters are then replaced and written to the location or to a new file.

I could just use sed

for each instance of punctuation, for example:

sed -ie 's/,_/_,/g' myfile.txt

then

sed -ie 's/\._/_\./g' myfile.txt

then ...

but it would be nice to accomplish this with a single command.

Is it possible to reference the found instance and use it in the ripgrep option -r ARG

? Or am I barking the wrong tree and wise to use another tool?

+3


source to share


2 answers


sed

supports backreferences to capture the groups
defined in the call regex s

argument in the replace argument ( bash

here-string ( <<<

) syntax is used for brevity ):

$ sed -E 's/([.?!;:])_/_\1/g' <<<'On this _line,_ I show an example. !_'
On this _line,_ I show an example. _!

      

\1

refers to the first capture group ( (...)

) in the regex.

Note that it was -E

used to support extended regular expressions that use modern syntax - both GNU sed

and BSD / macOS support sed

.




Generally, you don't need the option sed

-E

unless you are passing the sed

script in multiple parts, in which case each part must be -E

-prefixed.

Regarding an in-place update of the original file:

-ie

probably doesn't do (exactly) what you want: while it updates the input file (replacing it with a new file with updated content), it creates a suffixed backup file e

because it is e

interpreted as an option by the option -i

argument.

Unless the goal is to create a backup file, the syntax - sadly - differs depending on which implementation sed

you're using:

  • GNU sed

    :sed -i ...

    • -i

      must not be followed by any other parameters / characters.
  • BSD / macOS sed

    :sed -i '' ...

    • -i

      should follow ''

      as the next, separate argument.
+2


source


Here's one way to do it with one line:

sed  's/\([^\w\s]\)\(_\)/\2\1/g' test.txt

      

You are essentially looking for two characters and replacing them.

s / - Runs replacement

\( \)

- It escapes the parentheses. Should do it even if its ugly.

\s

space character

[ ]

sets the character class



^

negates at the first position within a character class

[^\w\s]

all characters that are not letters or spaces (e.g. punctuation)

Then we move on to the next match, underscore. We do this as the second check point.

\(_\)

- First find punctuation and mark it as match with number 1, then find the underscore next to it and mark it as match with number 2.

/\2\1/

- Now swap matches 1 and 2

/g

- do it globally.

The end. Now you can output this to another file or use a different modifier sed

(switch -i

) to change the inline file.

+1


source







All Articles