Regular expression for IPv6 addresses

I have regex for IPv6 addresses as below

IPV4ADDRESS      [ \t]*(([[:digit:]]{1,3}"."){3}([[:digit:]]{1,3}))[ \t]*
x4               ([[:xdigit:]]{1,4})
xseq             ({x4}(:{x4}){0,7})
xpart            ({xseq}|({xseq}::({xseq}?))|::{xseq})
IPV6ADDRESS      [ \t]*({xpart}(":"{IPV4ADDRESS})?)[ \t]*

      

All IPv6 address formats are correct, including

1) non-compressed IPv6 addresses
2) compressed IPv6 addresses
3) IPv6 addresses in legacy formats.(supporting IPv4)

      

Ideal examples of legacy IPv6 addresses would be

2001:1234::3210:5.6.7.8

     OR
2001:1234:1234:5432:4578:5678:5.6.7.8

As you can see above there are 10 groups separated by either `":" or ".".`

      

Unlike 8 groups in regular IPv6 addresses. This is because the last 4 groups are separated by a `". " must be compressed to the least significant 32-bit IPv6 addresses. We need 10 groups to satisfy 128 bits.

However, if I use the following address format

   2001:1234:4563:3210:5.6.7.8

      

Here, each group separated by ":" represents 16 bits. The last four groups are separated by a "." represents 8 bits. The total number of bits is 64 + 32 = 96 bits. 32 bits are missing.

The regex takes it as a valid IPv6 address format. I can't figure out how to fix the regex to discard values ​​like this. Any help is greatly appreciated.

+3


source to share


1 answer


Here's the grammar of IPv6 addresses as given in RFC 3986 and subsequently confirmed in RFC 5954 :

 IPv6address   =                             6( h16 ":" ) ls32
                /                       "::" 5( h16 ":" ) ls32
                / [               h16 ] "::" 4( h16 ":" ) ls32
                / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
                / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
                / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
                / [ *4( h16 ":" ) h16 ] "::"              ls32
                / [ *5( h16 ":" ) h16 ] "::"              h16
                / [ *6( h16 ":" ) h16 ] "::"

 h16           = 1*4HEXDIG
 ls32          = ( h16 ":" h16 ) / IPv4address
 IPv4address   = dec-octet "." dec-octet "." dec-octet "." dec-octet
 dec-octet     = DIGIT                 ; 0-9
                / %x31-39 DIGIT         ; 10-99
                / "1" 2DIGIT            ; 100-199
                / "2" %x30-34 DIGIT     ; 200-249
                / "25" %x30-35          ; 250-255

      

Using this, we can create a standard regex for IPv6 addresses.



dec_octet      ([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])
ipv4address    ({dec_octet}"."){3}{dec_octet}
h16            ([[:xdigit:]]{1,4})
ls32           ({h16}:{h16}|{ipv4address})
ipv6address    (({h16}:){6}{ls32}|::({h16}:){5}{ls32}|({h16})?::({h16}:){4}{ls32}|(({h16}:){0,1}{h16})?::({h16}:){3}{ls32}|(({h16}:){0,2}{h16})?::({h16}:){2}{ls32}|(({h16}:){0,3}{h16})?::{h16}:{ls32}|(({h16}:){0,4}{h16})?::{ls32}|(({h16}:){0,5}{h16})?::{h16}|(({h16}:){0,6}{h16})?::)

      

Disclaimer: unverified.

+5


source







All Articles