Fill in missing column values ​​based on the values ​​in the previous row

I have a huge table that looks something like this:

A  B  C  D  E  F
A  B        &
A  B  C  D     $
A  B  C  @

      

The processed version should look like this:

A  B  C  D  E  F
A  B  B& B& B& B&
A  B  C  D  D$ D$
A  B  C  C@ C@ C@

      

The challenge is to concatenate the value from the last non-blank cell with the value from the previous non-blank cell (in the same row) and use the new value to fill in the blank cells in the same row.

Any suggestions how to do this in R ?

+3


source to share


3 answers


Here is one option that crosses the rows of the dataset. We multiply the elements of each row by selecting elements that are not empty ('x1'), the paste

last two non-empty elements in 'x1' together ('x2'), and then concatenate all values ​​except the last ( head(x1,-1)

) with the values ​​"x2". which are replicated based on the number of columns "df1" and length

"x1". The result can be transposed ( t

) and converted todata.frame

 m1 <- t(apply(df1, 1, function(x) {
          x1 <- x[x!=''] #elements that are not-blank
          x2 <- paste(tail(x1,2), collapse='') #paste  the last two non-blank
          if(any(x=='')) #if there is any blank value
          c(head(x1,-1), rep(x2, ncol(df1)-length(x1)+1)) #concatenate
          else x #else return the row
           }))

 as.data.frame(m1, stringsAsFactors=FALSE)
 #  V1 V2 V3 V4 V5 V6
 #1  A  B  C  D  E F
 #2  A  B B& B& B& B&
 #3  A  B  C  D D$ D$
 #4  A  B  C C@ C@ C@

      



data

 df1 <- structure(list(v1 = c("A", "A", "A", "A"), v2 = c("B", "B", "B", 
 "B"), v3 = c("C", "", "C", "C"), v4 = c("D", "", "D", "@"), v5 = c("E", 
 "&", "", ""), v6 = c("F", "", "$", "")), .Names = c("v1", "v2", 
 "v3", "v4", "v5", "v6"), class = "data.frame", row.names = c(NA, -4L))

      

+3


source


This issue cried out na.locf

from zoo

:

First replace ""

with NA

:x[sapply(x,function(y)y=="X")]<-NA

Strip symbols:

x.no.sym<-x
x.no.sym[sapply(x.no.sym,function(y)!y%in%LETTERS)]<-NA

      

Fill in the letters:



x.no.sym.fill<-t(apply(x.no.sym,1,na.locf))
     V1  V2  V3  V4  V5  V6 
[1,] "A" "B" "C" "D" "E" "F"
[2,] "A" "B" "B" "B" "B" "B"
[3,] "A" "B" "C" "D" "D" "D"
[4,] "A" "B" "C" "C" "C" "C"

      

Now fill in the symbols and remove the letters:

x.sym.fill<-x.sym.fill<-t(apply(x,1,function(y)na.locf(na.locf(y,fromLast=T,na.rm=F),na.rm=F)))
x.sym.fill[sapply(x.sym.fill,function(y)y%in%LETTERS)]<-""
     V1 V2 V3  V4  V5  V6 
[1,] "" "" ""  ""  ""  "" 
[2,] "" "" "&" "&" "&" "&"
[3,] "" "" ""  ""  "$" "$"
[4,] "" "" ""  "@" "@" "@"

      

Now connect:

> matrix(paste0(x.no.sym.fill,x.sym.fill),ncol=ncol(x))

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "A"  "B"  "C"  "D"  "E"  "F" 
[2,] "A"  "B"  "B&" "B&" "B&" "B&"
[3,] "A"  "B"  "C"  "D"  "D$" "D$"
[4,] "A"  "B"  "C"  "C@" "C@" "C@"

      

+1


source


It sounds funny. I used spaces in the data frame as I ""

called the data frame df

.

fill = apply(df, 1, function(x) { 
  x = x[x != ""]
  paste(tail(x, 2), collapse = "")
})

df[df == ""] = matrix(fill, ncol = ncol(df), nrow = nrow(df))[df == ""]

      

Find a unique filler value for each row, make a matrix of the same structure as the original one from the fill values, then cherry pick the values ​​you need to replace.

df = structure(list(A = c("A", "A", "A"), B = c("B", "B", "B"), C = c("", 
"C", "C"), D = c("", "D", "@"), E = c("&", "", ""), F = c("", 
"$", "")), .Names = c("A", "B", "C", "D", "E", "F"), row.names = c(NA, 
-3L), class = "data.frame")

      

0


source







All Articles