Split jagged character string in R with space

I have read many posts about line splitting in R. However, I ran into an error that I think is related to how the variables were read in R, i.e. after the date in some cases, because the identifier is shorter. I am trying to split the character variable "VESSELID" into two new variables: "shipID" and "DATE". Below is a subset of my dataset.

> dput(df)
structure(list(SETID = c(24153L, 24187L, 24215L, 31990L, 31990L, 
31995L, 31995L, 31995L, 31996L, 31996L, 31996L, 31997L, 31997L, 
32002L, 32002L, 32002L, 32002L, 32003L, 32003L, 32003L), VESSELID = c("6830 2002/08/13  ", 
"6830 2002/08/12  ", "6830 2002/08/15  ", "105372 2002/08/23", 
"105372 2002/08/23", "104234 2002/07/20", "104234 2002/07/20", 
"104234 2002/07/20", "104234 2002/07/21", "104234 2002/07/21", 
"104234 2002/07/21", "104234 2002/07/22", "104234 2002/07/22", 
"5744 2002/08/14  ", "5744 2002/08/14  ", "5744 2002/08/14  ", 
"5744 2002/08/14  ", "5744 2002/08/13  ", "5744 2002/08/13  ", 
"5744 2002/08/13  ")), .Names = c("SETID", "VESSELID"), row.names = c(1L, 
2L, 3L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 
21L, 22L, 23L, 24L, 25L, 26L), class = "data.frame")

      

I tried the following:

library(reshape2)
test <- data.frame(df, colsplit(df$VESSELID, split= " ",names=c("vesselID","DATE")))

      

However, I am getting this error message:

Error in colsplit(log21$VESSELID, split = " ", names = c("vesselID", "DATE")) : 
      unused argument(s) (split = " ")

      

The command split

doesn't seem to be working as expected. I don't know how to fix my character string.

+3


source to share


3 answers


The argument name is not split

, it is pattern

:

test <- data.frame(df, colsplit(df$VESSELID, pattern = " ",names=c("vesselID","DATE")))

      



gives:

   SETID          VESSELID vesselID         DATE
1  24153 6830 2002/08/13       6830 2002/08/13  
2  24187 6830 2002/08/12       6830 2002/08/12  
3  24215 6830 2002/08/15       6830 2002/08/15  
10 31990 105372 2002/08/23   105372   2002/08/23
11 31990 105372 2002/08/23   105372   2002/08/23
12 31995 104234 2002/07/20   104234   2002/07/20
13 31995 104234 2002/07/20   104234   2002/07/20
14 31995 104234 2002/07/20   104234   2002/07/20
15 31996 104234 2002/07/21   104234   2002/07/21
16 31996 104234 2002/07/21   104234   2002/07/21
17 31996 104234 2002/07/21   104234   2002/07/21
18 31997 104234 2002/07/22   104234   2002/07/22
19 31997 104234 2002/07/22   104234   2002/07/22
20 32002 5744 2002/08/14       5744 2002/08/14  
21 32002 5744 2002/08/14       5744 2002/08/14  
22 32002 5744 2002/08/14       5744 2002/08/14  
23 32002 5744 2002/08/14       5744 2002/08/14  
24 32003 5744 2002/08/13       5744 2002/08/13  
25 32003 5744 2002/08/13       5744 2002/08/13  
26 32003 5744 2002/08/13       5744 2002/08/13  

      

+2


source


I would just use read.table

this column like below. Assuming your dataset is called "mydata":



mydata.new <- cbind(mydata[-2], 
                    read.table(text = as.character(mydata$VESSELID), 
                               strip.white=TRUE, header = FALSE))
names(mydata.new)[2:3] <- c("VesselID", "Date")
mydata.new
#    SETID VesselID       Date
# 1  24153     6830 2002/08/13
# 2  24187     6830 2002/08/12
# 3  24215     6830 2002/08/15
# 10 31990   105372 2002/08/23
# 11 31990   105372 2002/08/23
# 12 31995   104234 2002/07/20
# 13 31995   104234 2002/07/20
# 14 31995   104234 2002/07/20
# 15 31996   104234 2002/07/21
# 16 31996   104234 2002/07/21
# 17 31996   104234 2002/07/21
# 18 31997   104234 2002/07/22
# 19 31997   104234 2002/07/22
# 20 32002     5744 2002/08/14
# 21 32002     5744 2002/08/14
# 22 32002     5744 2002/08/14
# 23 32002     5744 2002/08/14
# 24 32003     5744 2002/08/13
# 25 32003     5744 2002/08/13
# 26 32003     5744 2002/08/13

      

+2


source


try:

do.call("rbind", strsplit(VESSELID, " "))

      

should return something like:

[,1]     [,2]         [,3]    
[1,] "6830"   "2002/08/13" ""      
[2,] "6830"   "2002/08/12" ""      
[3,] "6830"   "2002/08/15" ""      
[4,] "105372" "2002/08/23" "105372"
[5,] "105372" "2002/08/23" "105372"
[6,] "104234" "2002/07/20" "104234"
[7,] "104234" "2002/07/20" "104234"
[8,] "104234" "2002/07/20" "104234"
[9,] "104234" "2002/07/21" "104234"
[10,] "104234" "2002/07/21" "104234"
[11,] "104234" "2002/07/21" "104234"
[12,] "104234" "2002/07/22" "104234"
[13,] "104234" "2002/07/22" "104234"
[14,] "5744"   "2002/08/14" ""      
[15,] "5744"   "2002/08/14" ""      
[16,] "5744"   "2002/08/14" ""      
[17,] "5744"   "2002/08/14" ""      
[18,] "5744"   "2002/08/13" ""      
[19,] "5744"   "2002/08/13" ""      
[20,] "5744"   "2002/08/13" "" 

      

take what you need from there

0


source







All Articles