Extract phone number in R

Having such numbers:

ll <- readLines(textConnection("(412) 573-7777 opt 1
563.785.1655 x1797
(567) 523-1534 x7753
(567) 483-2119 x 477
(451) 897-MALL
(342) 668-6255 ext 7
(317) 737-3377 Opt 4
(239) 572-8878 x 3
233.785.1655 x1776
(138) 761-6877 x 4
(411) 446-6626 x 14
(412) 337-3332x19
412.393.3177 x24
327.961.1757 ext.4"))

      

What regex should I write to get:

xxx-xxx-xxxx

      

I tried the following:

gsub('[(]([0-9]{3})[)] ([0-9]{3})[-]([0-9]{4}).*','\\1-\\2-\\3',ll)

      

It does not cover all possibilities. I think I can do this using multiple regex patterns, but I think I can do it with a single regex.

+3


source to share


1 answer


If you also want to extract numbers that are represented by letters, you can use the following regex in gsub

:

gsub('[(]?([0-9]{3})[)]?[. -]([A-Z0-9]{3})[. -]([A-Z0-9]{4}).*','\\1-\\2-\\3',ll)

      

See IDEONE demo



You can remove everything A-Z

from character classes to just match numbers without letters.

REGEX

  • [(]?

    - optional (

  • ([0-9]{3})

    - 3 digits
  • [)]?

    - optional )

  • [. -]

    - either a period, a space, or a hyphen
  • ([A-Z0-9]{3})

    - three-digit or letter sequence
  • [. -]

    - either a period, a space, or a hyphen
  • ([A-Z0-9]{4})

    - four-digit or letter sequence
  • .*

    - any number of characters to the end
+2


source







All Articles