Extract house number from (address) string using r

I want to parse (extract) addresses into HouseNumber and Streetname. I had to later write the extracted "values" to new columns ($ HouseNumber stores and $ Streetname stores).

So, let's say I have a data frame called "stores":

> shops
      Name                 city        street
 1    Something            Fakecity    New Street 3
 2    SomethingOther       Fakecity    Some-Complicated-Casestreet 1-3
 3    SomethingDifferent   Fakecity    Fake Street 14a

      

So there is a way to split the street column into two lists, one with street names, and one for house numbers, including cases like "1-3", "14a", so that the result can be assigned to a dataframe and look like ...

 > shops
      Name                 city        Streetname                    HouseNumber
 1    Something            Fakecity    New Street                    3
 2    SomethingOther       Fakecity    Some-Complicated-Casestreet   1-3
 3    SomethingDifferent   Fakecity    Fake Street                   14a 

      

Example: Easyfakestreet 5 → Easyfakestreet, 5

This is complicated a little by the fact that some of my street strings will have hyphenated street addresses and have non-numerical components.

Examples:
New Street 3 → ['New Street', '3']         
Some-Complicated-Casestreet 1-3 → ['Some-Complicated-Casestreet', '1-3']  
Fake Street 14a → ['Fake Street', '14a']

I would be grateful for your help!

+3


source to share


4 answers


Here's a possible solution tidyr



library(tidyr)
extract(df, "street", c("Streetname", "HouseNumber"), "(\\D+)(\\d.*)")
#                 Name     city                   Streetname HouseNumber
# 1          Something Fakecity                  New Street            3
# 2     SomethingOther Fakecity Some-Complicated-Casestreet          1-3
# 3 SomethingDifferent Fakecity                 Fake Street          14a

      

+8


source


You may try:

shops$Streetname <- gsub("(.+)\\s[^ ]+$","\\1", shops$street)
shops$HousNumber <- gsub(".+\\s([^ ]+)$","\\1", shops$street)

      

<strong> data



shops$street
#[1] "New Street 3"                    "Some-Complicated-Casestreet 1-3" "Fake Street 14a" 

      

results

shops$Streetname
#[1] "New Street"                  "Some-Complicated-Casestreet" "Fake` Street" 

shops$HousNumber
#[1] "3"   "1-3" "14a"

      

+5


source


Create a pattern with backlinks that match both the street and the number, then use to sub

replace it on each backlink in turn. No packages required:

pat <- "(.*) (\\d.*)"
transform(shops,
   street = sub(pat, "\\1", street), 
   HouseNumber = sub(pat, "\\2", street)
)

      

giving:

                Name     city                      street  HouseNumber
1          Something Fakecity                  New Street            3
2     SomethingOther Fakecity Some-Complicated-Casestreet          1-3
3 SomethingDifferent Fakecity                 Fake Street          14a

      

Here's a visualization pat

:

(.*) (\d.*)

      

Regular expression visualization

Demo Debuggex

Note:

1) We used this for shops

:

shops <-
structure(list(Name = c("Something", "SomethingOther", "SomethingDifferent"
), city = c("Fakecity", "Fakecity", "Fakecity"), street = c("New Street 3", 
"Some-Complicated-Casestreet 1-3", "Fake Street 14a")), .Names = c("Name", 
"city", "street"), class = "data.frame", row.names = c(NA, -3L))

      

2) Here you can use David Arenburg's pattern alternately. Just install for it pat

. The above template has the advantage that it allows street names that have inline numbers, but David has the advantage that there may be no space before the street number.

+2


source


You can use the unglue package

library(unglue)
unglue_unnest(shops, street, "{street} {value=\\d.*}")
#>                 Name     city                      street value
#> 1          Something Fakecity                  New Street     3
#> 2     SomethingOther Fakecity Some-Complicated-Casestreet   1-3
#> 3 SomethingDifferent Fakecity                 Fake Street   14a

      

Created on 2019-10-08 by the reprex package (v0.3.0)

0


source







All Articles