Convert chr value to num from multiple columns?
I have this dataframe and I want to change the chr values ββto num:
> dput(Df)
structure(list(`@MeasurementDateGMT` = c("2016-09-01 00:00:00",
"2016-09-01 01:00:00", "2016-09-01 02:00:00", "2016-09-01 03:00:00",
"2016-09-01 04:00:00", "2016-09-01 05:00:00", "2016-09-01 06:00:00",
"2016-09-01 07:00:00", "2016-09-01 08:00:00", "2016-09-01 09:00:00",
"2016-09-01 10:00:00", "2016-09-01 11:00:00", "2016-09-01 12:00:00",
"2016-09-01 13:00:00", "2016-09-01 14:00:00", "2016-09-01 15:00:00",
"2016-09-01 16:00:00", "2016-09-01 17:00:00", "2016-09-01 18:00:00",
"2016-09-01 19:00:00", "2016-09-01 20:00:00", "2016-09-01 21:00:00",
"2016-09-01 22:00:00", "2016-09-01 23:00:00"), `@Value` = c("10.9",
"9.8", "9.9", "14.1", "13.6", "16.5", "15", "18.5", "18", "17",
"16.6", "12", "12.1", "18.1", "15.9", "15.9", "16.9", "21.6",
"23.5", "40.7", "16.6", "12.7", "12.4", "12.2")), .Names = c("@MeasurementDateGMT",
"@Value"), class = "data.frame", row.names = c(NA, 24L))
Conversion code:
columns <- sapply(Df, is.factor)
Df[, columns] <- lapply(Df[, columns, drop = FALSE], function(x) as.numeric(as.character(x)))
Result:
> str(Df)
'data.frame': 24 obs. of 2 variables:
$ @MeasurementDateGMT: chr "2016-09-01 00:00:00" "2016-09-01 01:00:00" "2016-09-01 02:00:00" "2016-09-01 03:00:00" ...
$ @Value : chr "10.9" "9.8" "9.9" "14.1" ...
They still are chr
. What did I miss? any ideas?
source to share
We can use type.convert
.
Df[] <- lapply(Df, function(x) type.convert(x, as.is = TRUE))
str(Df)
#'data.frame': 24 obs. of 2 variables:
#$ @MeasurementDateGMT: chr "2016-09-01 00:00:00" "2016-09-01 01:00:00" "2016-09-01 02:00:00" "2016-09-01 03:00:00" ...
#$ @Value : num 10.9 9.8 9.9 14.1 13.6 16.5 15 18.5 18 17
...
If we need to convert the 'datetime' column,
Df[[2]] <- as.POSIXct(Df[[2]])
Since the columns are all character
in the OP's post, we don't need to convert it to characcter
before applying type.convert
otherwise usetype.convert(as.character(x), ..
Ok if we need dplyr
to do this
library(dplyr)
res <- Df %>%
mutate_all(funs(type.convert(as.character(.), as.is = TRUE)))
str(res)
#'data.frame': 24 obs. of 2 variables:
#$ @MeasurementDateGMT: chr "2016-09-01 00:00:00" "2016-09-01 01:00:00" "2016-09-01 02:00:00" "2016-09-01 03:00:00" ...
#$ @Value : num 10.9 9.8 9.9 14.1 13.6 16.5 15 18.5 18 17 ...
Or another option data.table
library(data.table)
setDT(Df)[, lapply(.SD, function(x) type.convert(x, as.is = TRUE))]
source to share
You can use dplyr::mutate_if
which applies the function (in this case as.numeric
) to all columns that satisfy the predicate function (in this case is.character
).
library(dplyr) df %>% janitor::clean_names() %>% # removes the "@" from names since that messes up mutate_if tibble::as_tibble() %>% # just for the nice printing mutate_if(is.character, as.numeric) #> Warning in eval(substitute(expr), envir, enclos): NAs introduced by #> coercion #> # A tibble: 24 x 2 #> x_measurementdategmt x_value #> <dbl> <dbl> #> 1 NA 10.9 #> 2 NA 9.8 #> 3 NA 9.9 #> 4 NA 14.1 #> 5 NA 13.6 #> 6 NA 16.5 #> 7 NA 15.0 #> 8 NA 18.5 #> 9 NA 18.0 #> 10 NA 17.0 #> # ... with 14 more rows
But the above doesn't work well for the first column as it is datetime. It just gets set to NA
on as.numeric
because it contains non-numeric characters. You should probably change it to a datetime variable instead.
df %>% janitor::clean_names() %>% tibble::as_tibble() %>% mutate(x_measurementdategmt = lubridate::as_datetime(x_measurementdategmt)) %>% mutate_if(is.character, as.numeric) #> # A tibble: 24 x 2 #> x_measurementdategmt x_value #> <dttm> <dbl> #> 1 2016-09-01 04:00:00 10.9 #> 2 2016-09-01 05:00:00 9.8 #> 3 2016-09-01 06:00:00 9.9 #> 4 2016-09-01 07:00:00 14.1 #> 5 2016-09-01 08:00:00 13.6 #> 6 2016-09-01 09:00:00 16.5 #> 7 2016-09-01 10:00:00 15.0 #> 8 2016-09-01 11:00:00 18.5 #> 9 2016-09-01 12:00:00 18.0 #> 10 2016-09-01 13:00:00 17.0 #> # ... with 14 more rows
source to share