Regular expression for IPv6 addresses
I have regex for IPv6 addresses as below
IPV4ADDRESS [ \t]*(([[:digit:]]{1,3}"."){3}([[:digit:]]{1,3}))[ \t]*
x4 ([[:xdigit:]]{1,4})
xseq ({x4}(:{x4}){0,7})
xpart ({xseq}|({xseq}::({xseq}?))|::{xseq})
IPV6ADDRESS [ \t]*({xpart}(":"{IPV4ADDRESS})?)[ \t]*
All IPv6 address formats are correct, including
1) non-compressed IPv6 addresses
2) compressed IPv6 addresses
3) IPv6 addresses in legacy formats.(supporting IPv4)
Ideal examples of legacy IPv6 addresses would be
2001:1234::3210:5.6.7.8
OR
2001:1234:1234:5432:4578:5678:5.6.7.8
As you can see above there are 10 groups separated by either `":" or ".".`
Unlike 8 groups in regular IPv6 addresses. This is because the last 4 groups are separated by a `". " must be compressed to the least significant 32-bit IPv6 addresses. We need 10 groups to satisfy 128 bits.
However, if I use the following address format
2001:1234:4563:3210:5.6.7.8
Here, each group separated by ":" represents 16 bits. The last four groups are separated by a "." represents 8 bits. The total number of bits is 64 + 32 = 96 bits. 32 bits are missing.
The regex takes it as a valid IPv6 address format. I can't figure out how to fix the regex to discard values like this. Any help is greatly appreciated.
source to share
Here's the grammar of IPv6 addresses as given in RFC 3986 and subsequently confirmed in RFC 5954 :
IPv6address = 6( h16 ":" ) ls32
/ "::" 5( h16 ":" ) ls32
/ [ h16 ] "::" 4( h16 ":" ) ls32
/ [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
/ [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
/ [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32
/ [ *4( h16 ":" ) h16 ] "::" ls32
/ [ *5( h16 ":" ) h16 ] "::" h16
/ [ *6( h16 ":" ) h16 ] "::"
h16 = 1*4HEXDIG
ls32 = ( h16 ":" h16 ) / IPv4address
IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
dec-octet = DIGIT ; 0-9
/ %x31-39 DIGIT ; 10-99
/ "1" 2DIGIT ; 100-199
/ "2" %x30-34 DIGIT ; 200-249
/ "25" %x30-35 ; 250-255
Using this, we can create a standard regex for IPv6 addresses.
dec_octet ([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])
ipv4address ({dec_octet}"."){3}{dec_octet}
h16 ([[:xdigit:]]{1,4})
ls32 ({h16}:{h16}|{ipv4address})
ipv6address (({h16}:){6}{ls32}|::({h16}:){5}{ls32}|({h16})?::({h16}:){4}{ls32}|(({h16}:){0,1}{h16})?::({h16}:){3}{ls32}|(({h16}:){0,2}{h16})?::({h16}:){2}{ls32}|(({h16}:){0,3}{h16})?::{h16}:{ls32}|(({h16}:){0,4}{h16})?::{ls32}|(({h16}:){0,5}{h16})?::{h16}|(({h16}:){0,6}{h16})?::)
Disclaimer: unverified.
source to share