Ambiguity crest nested square bracket

sample.txt

contains

abcde
abde

      

Can someone explain the output of the following commands -

  • grep '[[ab]]' sample.txt

    - no output
  • grep '[ab[]]' sample.txt

    - no output
  • grep '[ab[]' sample.txt

    - conclusion abcde

    ,abde

  • grep '[ab]]' sample.txt

    - no output

And what does [(ab)]

and mean [^(ab)]

? It's the same thing [ab]

, and [^ab]

?

+3


source to share


2 answers


First of all, to understand, within a character class, none of the regex metacharacters have any special meaning. They match literally. For example, *

will match the value *

and will not mean repetition 0 or 1

. Likewise, ()

will match (

and )

will not create capture group

.

Now, if a is ]

found in a character class, this automatically closes the character class and the further character will not be part of that character class. Now let's understand what's going on above:


B 1

, 2

and 4

your character class ends on first close ]

. So the last closing parenthesis - ]

, is not part of the character class. It must be selected separately. So your template will match something like this:

'[[ab]]' is same as '([|a|b)(])'  // The last `]` has to match.
'[ab[]]' is same as '(a|b|[)(])'  // Again, the last `]` has to match.
'[ab]]'  is same as '(a|b|])(])'  // Same, the last `]` has to match.
    ^
    ^---- Character class closes here.

      

Now, since there is no end on both lines ]

, so no match was found.

Whereas in the third pattern, your character class is only closed last ]

. And therefore everything is included in the character class.

'[ab[]' means match string that contains 'a', or 'b', or '['

      



which is absolutely correct and matches the string.


And what does [(ab)]

and mean [^(ab)]

?

[(ab)]

means match any of the (

, a

, b

, )

. Remember that within a character class, the regex metacharacter has some special meaning. Thus, you cannot create groups within a character class.

[^(ab)]

means the exact opposite [(ab)]

. It matches any string that does not contain any of the specified characters.


Is it the same as [ab]

and [^ab]

?

Not. These two do not include (

and )

. Hence, they are slightly different.

+4


source


I give it a try:

grep '[[ab]]' - match string  which has one of "[,a,b" and then a "]" char followed
grep '[ab[]]' - match string  which has one of "a,b,[" and then a "]" char followed
grep '[ab[]'  - match string  which has one of "a,b,["
grep '[ab]]'  - match string  which has one of "a,b" and then a "]" char followed
grep '[(ab)]' - match string  which has one of "(,a,b,)"
grep '[^(ab)]' - match string  which doesn't contain "(,a,b" and ")"
grep '[ab]'    - match string  which contains one of "a,b"
grep '[^ab]' - match string  which doesn't contain "a" and "b"

      

you can walk through these grep

cmds in this example:



#create a file with below lines:
abcde
abde
[abcd
abcd]
abc[]foo
abc]bar
[ab]cdef
a(b)cde

      

you will see the difference and think about it with my comment / explanation.

+2


source







All Articles