Split jagged character string in R with space
I have read many posts about line splitting in R. However, I ran into an error that I think is related to how the variables were read in R, i.e. after the date in some cases, because the identifier is shorter. I am trying to split the character variable "VESSELID" into two new variables: "shipID" and "DATE". Below is a subset of my dataset.
> dput(df)
structure(list(SETID = c(24153L, 24187L, 24215L, 31990L, 31990L,
31995L, 31995L, 31995L, 31996L, 31996L, 31996L, 31997L, 31997L,
32002L, 32002L, 32002L, 32002L, 32003L, 32003L, 32003L), VESSELID = c("6830 2002/08/13 ",
"6830 2002/08/12 ", "6830 2002/08/15 ", "105372 2002/08/23",
"105372 2002/08/23", "104234 2002/07/20", "104234 2002/07/20",
"104234 2002/07/20", "104234 2002/07/21", "104234 2002/07/21",
"104234 2002/07/21", "104234 2002/07/22", "104234 2002/07/22",
"5744 2002/08/14 ", "5744 2002/08/14 ", "5744 2002/08/14 ",
"5744 2002/08/14 ", "5744 2002/08/13 ", "5744 2002/08/13 ",
"5744 2002/08/13 ")), .Names = c("SETID", "VESSELID"), row.names = c(1L,
2L, 3L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L,
21L, 22L, 23L, 24L, 25L, 26L), class = "data.frame")
I tried the following:
library(reshape2)
test <- data.frame(df, colsplit(df$VESSELID, split= " ",names=c("vesselID","DATE")))
However, I am getting this error message:
Error in colsplit(log21$VESSELID, split = " ", names = c("vesselID", "DATE")) :
unused argument(s) (split = " ")
The command split
doesn't seem to be working as expected. I don't know how to fix my character string.
source to share
The argument name is not split
, it is pattern
:
test <- data.frame(df, colsplit(df$VESSELID, pattern = " ",names=c("vesselID","DATE")))
gives:
SETID VESSELID vesselID DATE
1 24153 6830 2002/08/13 6830 2002/08/13
2 24187 6830 2002/08/12 6830 2002/08/12
3 24215 6830 2002/08/15 6830 2002/08/15
10 31990 105372 2002/08/23 105372 2002/08/23
11 31990 105372 2002/08/23 105372 2002/08/23
12 31995 104234 2002/07/20 104234 2002/07/20
13 31995 104234 2002/07/20 104234 2002/07/20
14 31995 104234 2002/07/20 104234 2002/07/20
15 31996 104234 2002/07/21 104234 2002/07/21
16 31996 104234 2002/07/21 104234 2002/07/21
17 31996 104234 2002/07/21 104234 2002/07/21
18 31997 104234 2002/07/22 104234 2002/07/22
19 31997 104234 2002/07/22 104234 2002/07/22
20 32002 5744 2002/08/14 5744 2002/08/14
21 32002 5744 2002/08/14 5744 2002/08/14
22 32002 5744 2002/08/14 5744 2002/08/14
23 32002 5744 2002/08/14 5744 2002/08/14
24 32003 5744 2002/08/13 5744 2002/08/13
25 32003 5744 2002/08/13 5744 2002/08/13
26 32003 5744 2002/08/13 5744 2002/08/13
source to share
I would just use read.table
this column like below. Assuming your dataset is called "mydata":
mydata.new <- cbind(mydata[-2],
read.table(text = as.character(mydata$VESSELID),
strip.white=TRUE, header = FALSE))
names(mydata.new)[2:3] <- c("VesselID", "Date")
mydata.new
# SETID VesselID Date
# 1 24153 6830 2002/08/13
# 2 24187 6830 2002/08/12
# 3 24215 6830 2002/08/15
# 10 31990 105372 2002/08/23
# 11 31990 105372 2002/08/23
# 12 31995 104234 2002/07/20
# 13 31995 104234 2002/07/20
# 14 31995 104234 2002/07/20
# 15 31996 104234 2002/07/21
# 16 31996 104234 2002/07/21
# 17 31996 104234 2002/07/21
# 18 31997 104234 2002/07/22
# 19 31997 104234 2002/07/22
# 20 32002 5744 2002/08/14
# 21 32002 5744 2002/08/14
# 22 32002 5744 2002/08/14
# 23 32002 5744 2002/08/14
# 24 32003 5744 2002/08/13
# 25 32003 5744 2002/08/13
# 26 32003 5744 2002/08/13
source to share
try:
do.call("rbind", strsplit(VESSELID, " "))
should return something like:
[,1] [,2] [,3]
[1,] "6830" "2002/08/13" ""
[2,] "6830" "2002/08/12" ""
[3,] "6830" "2002/08/15" ""
[4,] "105372" "2002/08/23" "105372"
[5,] "105372" "2002/08/23" "105372"
[6,] "104234" "2002/07/20" "104234"
[7,] "104234" "2002/07/20" "104234"
[8,] "104234" "2002/07/20" "104234"
[9,] "104234" "2002/07/21" "104234"
[10,] "104234" "2002/07/21" "104234"
[11,] "104234" "2002/07/21" "104234"
[12,] "104234" "2002/07/22" "104234"
[13,] "104234" "2002/07/22" "104234"
[14,] "5744" "2002/08/14" ""
[15,] "5744" "2002/08/14" ""
[16,] "5744" "2002/08/14" ""
[17,] "5744" "2002/08/14" ""
[18,] "5744" "2002/08/13" ""
[19,] "5744" "2002/08/13" ""
[20,] "5744" "2002/08/13" ""
take what you need from there
source to share