Change string in DF with hive command and mutate with sparklyr

Using the Hive command regexp_extract

I am trying to change the following lines:

201703170455 to 2017-03-17:04:55

      

and from:

2017031704555675 to 2017-03-17:04:55.0010

      

I am doing it in sparklyr trying to use this code which works with gsub in R:

  newdf<-df%>%mutate(Time1 = regexp_extract(Time, "(....)(..)(..)(..)(..)", "\\1-\\2-\\3:\\4:\\5"))

      

and this code:

newdf<-df%>mutate(TimeTrans = regexp_extract("(....)(..)(..)(..)(..)(....)", "\\1-\\2-\\3:\\4:\\5.\\6"))

      

but doesn't work at all. Any suggestions on how to do this using regexp_extract?

+1


source to share


1 answer


Apache Spark uses Java regex dialogs, not R, and groups should be referenced by $

. It is also regexp_replace

used to retrieve a single group using a numeric index .

You can use regexp_replace

:

df <- data.frame(time = c("201703170455", "2017031704555675"))
sdf <- copy_to(sc, df)

sdf %>% 
  mutate(time1 = regexp_replace(
    time, "^(....)(..)(..)(..)(..)$", "$1-$2-$3 $4:$5" )) %>%
  mutate(time2 = regexp_replace(
    time, "^(....)(..)(..)(..)(..)(....)$", "$1-$2-$3 $4:$5.$6"))

      



Source:   query [2 x 3]
Database: spark connection master=local[8] app=sparklyr local=TRUE

# A tibble: 2 x 3
              time            time1                 time2
             <chr>            <chr>                 <chr>
1     201703170455 2017-03-17 04:55          201703170455
2 2017031704555675 2017031704555675 2017-03-17 04:55.5675

      

+5


source







All Articles