Regular expression to extract the beginning of a string or character
I am writing a function to find the value of an attribute from a given string and a given attribute name.
The input sting looks like this:
sip:+19999999999@trunkgroup2:5060;user=phone
<sip:+19999999999;tgrp=0180401;trunk-context=aaaa.aaaa.ca@10.10.10.100:8000;user=phone;transport=udp>
<sip:19999999999;tgrp=0306001;trunk-context=aaaa.aaaa.ca@10.10.10.100:8000;transport=udp>
<sip:+19999999999;tgrp=SMPPDIN;trunk-context=aaaa.aaaa.ca@10.10.10.100:8000;transport=udp>
After a few hours I came out with this regex:, /(\Wsip[:,+,=]+)(\w+)/g
but it doesn't work for the first example - since there is no word symbol in front of the attribute name.
How can I fix this expression to extract both cases - <sip...
and sip..
only when it is the beginning of a line.
I am using this function to retrieve values sip
and tgrp
.
source to share
Replace \W
with \b
and use
\b(sip[:+=]+)(\w+)
Or, to match at the beginning of the line:
^\W?(sip[:+=]+)(\w+)
See the first regex demo and the second regex demo .
Since \W
is a consumption pattern matching any non-word char (a char other than letter / number / _
), you will not have a match at the beginning of the string. A \b
word boundary will match at the beginning of the string and in case there is no word char before s
.
If you literally need to find a match at the beginning of a string after the optional non-word char, \W
you must replace with ^\W?
where the ^
beginning of the string \W?
matches and matches 1 or 0 non-word characters.
Also note that ,
within a character class is matched as a literal ,
. If you want to use it to enumerate characters, you must remove it.
Template details :
-
\b
- word boundary
OR -
^
- beginning of line -
\W?
- 1 or 0 (due to the quantifier?
) non-word characters (i.e. characters other than letters / numbers and_
) -
(sip[:+=]+)
- Group 1: a substringsip
followed by one or more characters:
,+
or=
-
(\w+)
- Group 2: one or more characters of the word.
source to share