Ambiguity crest nested square bracket
sample.txt
contains
abcde
abde
Can someone explain the output of the following commands -
-
grep '[[ab]]' sample.txt
- no output -
grep '[ab[]]' sample.txt
- no output -
grep '[ab[]' sample.txt
- conclusionabcde
,abde
-
grep '[ab]]' sample.txt
- no output
And what does [(ab)]
and mean [^(ab)]
? It's the same thing [ab]
, and [^ab]
?
source to share
First of all, to understand, within a character class, none of the regex metacharacters have any special meaning. They match literally. For example, *
will match the value *
and will not mean repetition 0 or 1
. Likewise, ()
will match (
and )
will not create capture group
.
Now, if a is ]
found in a character class, this automatically closes the character class and the further character will not be part of that character class. Now let's understand what's going on above:
B 1
, 2
and 4
your character class ends on first close ]
. So the last closing parenthesis - ]
, is not part of the character class. It must be selected separately. So your template will match something like this:
'[[ab]]' is same as '([|a|b)(])' // The last `]` has to match.
'[ab[]]' is same as '(a|b|[)(])' // Again, the last `]` has to match.
'[ab]]' is same as '(a|b|])(])' // Same, the last `]` has to match.
^
^---- Character class closes here.
Now, since there is no end on both lines ]
, so no match was found.
Whereas in the third pattern, your character class is only closed last ]
. And therefore everything is included in the character class.
'[ab[]' means match string that contains 'a', or 'b', or '['
which is absolutely correct and matches the string.
And what does
[(ab)]
and mean[^(ab)]
?
[(ab)]
means match any of the (
, a
, b
, )
. Remember that within a character class, the regex metacharacter has some special meaning. Thus, you cannot create groups within a character class.
[^(ab)]
means the exact opposite [(ab)]
. It matches any string that does not contain any of the specified characters.
Is it the same as
[ab]
and[^ab]
?
Not. These two do not include (
and )
. Hence, they are slightly different.
source to share
I give it a try:
grep '[[ab]]' - match string which has one of "[,a,b" and then a "]" char followed
grep '[ab[]]' - match string which has one of "a,b,[" and then a "]" char followed
grep '[ab[]' - match string which has one of "a,b,["
grep '[ab]]' - match string which has one of "a,b" and then a "]" char followed
grep '[(ab)]' - match string which has one of "(,a,b,)"
grep '[^(ab)]' - match string which doesn't contain "(,a,b" and ")"
grep '[ab]' - match string which contains one of "a,b"
grep '[^ab]' - match string which doesn't contain "a" and "b"
you can walk through these grep
cmds in this example:
#create a file with below lines:
abcde
abde
[abcd
abcd]
abc[]foo
abc]bar
[ab]cdef
a(b)cde
you will see the difference and think about it with my comment / explanation.
source to share