Fill in missing column values based on the values in the previous row

Question

Fill in missing column values based on the values in the previous row

I have a huge table that looks something like this:

A  B  C  D  E  F
A  B        &
A  B  C  D     $
A  B  C  @

The processed version should look like this:

A  B  C  D  E  F
A  B  B& B& B& B&
A  B  C  D  D$ D$
A  B  C  C@ C@ C@

The challenge is to concatenate the value from the last non-blank cell with the value from the previous non-blank cell (in the same row) and use the new value to fill in the blank cells in the same row.

Any suggestions how to do this in R ?

+3

r row fill value

iggy 31 jul. 15 at 15:29

source to share

3 answers

This issue cried out na.locf

from zoo

:

First replace ""

with NA

:x[sapply(x,function(y)y=="X")]<-NA

Strip symbols:

x.no.sym<-x
x.no.sym[sapply(x.no.sym,function(y)!y%in%LETTERS)]<-NA

Fill in the letters:

x.no.sym.fill<-t(apply(x.no.sym,1,na.locf))
     V1  V2  V3  V4  V5  V6 
[1,] "A" "B" "C" "D" "E" "F"
[2,] "A" "B" "B" "B" "B" "B"
[3,] "A" "B" "C" "D" "D" "D"
[4,] "A" "B" "C" "C" "C" "C"

Now fill in the symbols and remove the letters:

x.sym.fill<-x.sym.fill<-t(apply(x,1,function(y)na.locf(na.locf(y,fromLast=T,na.rm=F),na.rm=F)))
x.sym.fill[sapply(x.sym.fill,function(y)y%in%LETTERS)]<-""
     V1 V2 V3  V4  V5  V6 
[1,] "" "" ""  ""  ""  "" 
[2,] "" "" "&" "&" "&" "&"
[3,] "" "" ""  ""  "$" "$"
[4,] "" "" ""  "@" "@" "@"

Now connect:

> matrix(paste0(x.no.sym.fill,x.sym.fill),ncol=ncol(x))

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "A"  "B"  "C"  "D"  "E"  "F" 
[2,] "A"  "B"  "B&" "B&" "B&" "B&"
[3,] "A"  "B"  "C"  "D"  "D$" "D$"
[4,] "A"  "B"  "C"  "C@" "C@" "C@"

+1

MichaelChirico 31 jul. 15 at 17:45

source to share

It sounds funny. I used spaces in the data frame as I ""

called the data frame df

.

fill = apply(df, 1, function(x) { 
  x = x[x != ""]
  paste(tail(x, 2), collapse = "")
})

df[df == ""] = matrix(fill, ncol = ncol(df), nrow = nrow(df))[df == ""]

Find a unique filler value for each row, make a matrix of the same structure as the original one from the fill values, then cherry pick the values you need to replace.

df = structure(list(A = c("A", "A", "A"), B = c("B", "B", "B"), C = c("", 
"C", "C"), D = c("", "D", "@"), E = c("&", "", ""), F = c("", 
"$", "")), .Names = c("A", "B", "C", "D", "E", "F"), row.names = c(NA, 
-3L), class = "data.frame")

0

Akhil Nair 31 jul. 15 at 15:56

source to share

akrun · Accepted Answer · 2015-07-31T15:39:31+0000

Here is one option that crosses the rows of the dataset. We multiply the elements of each row by selecting elements that are not empty ('x1'), the paste

last two non-empty elements in 'x1' together ('x2'), and then concatenate all values except the last ( head(x1,-1)

) with the values "x2". which are replicated based on the number of columns "df1" and length

"x1". The result can be transposed ( t

) and converted todata.frame

 m1 <- t(apply(df1, 1, function(x) {
          x1 <- x[x!=''] #elements that are not-blank
          x2 <- paste(tail(x1,2), collapse='') #paste  the last two non-blank
          if(any(x=='')) #if there is any blank value
          c(head(x1,-1), rep(x2, ncol(df1)-length(x1)+1)) #concatenate
          else x #else return the row
           }))

 as.data.frame(m1, stringsAsFactors=FALSE)
 #  V1 V2 V3 V4 V5 V6
 #1  A  B  C  D  E F
 #2  A  B B& B& B& B&
 #3  A  B  C  D D$ D$
 #4  A  B  C C@ C@ C@

data

 df1 <- structure(list(v1 = c("A", "A", "A", "A"), v2 = c("B", "B", "B", 
 "B"), v3 = c("C", "", "C", "C"), v4 = c("D", "", "D", "@"), v5 = c("E", 
 "&", "", ""), v6 = c("F", "", "$", "")), .Names = c("v1", "v2", 
 "v3", "v4", "v5", "v6"), class = "data.frame", row.names = c(NA, -4L))

Fill in missing column values ​​based on the values ​​in the previous row

data

More articles:

Fill in missing column values based on the values in the previous row