Create a list of all combinations by replacing a symbol with many possibilities
I would like to create a new dataframe column containing a list of all combinations, replacing the symbol with many possibilities, e.g .:
I have a table generated with this code:
x <- expand.grid(rep(list(c('a', 'g', 't', 'c', 'n')), 3))
xx <- data.frame(do.call(paste0, x))
tabcomb <- droplevels(xx[grep('n',xx[,1]),,drop=TRUE])
data.frame(tabcomb)
This gives me a table of 61 rows, here are the first 10 rows:
> head(tabcomb,10)
tabcomb
1 naa
2 nga
3 nta
4 nca
5 ana
6 gna
7 tna
8 cna
9 nna
10 nag
The letter n
can be any of ('a', 'c', 't' or 'g')
. I would like to create a second column containing a list of all combinations obtained by replacing a letter n
to get a table that has this format:
tabcomb all
1 naa aaa caa taa gaa
2 nga aga cga tga gga
3 nta ata cta tta gta
4 nca aca cca tca gca
5 ana aaa aca ata aga
6 gna gaa ...
7 tna taa ....
8 cna ........
9 nna aaa taa gaa caa aaa aca aga ata .....
10 nag .......
11 nnn ...............................
PS the space between the combinations in the second column is optional (I put it in the example for explanation.
source to share
I think it works the way you wanted it to (and gives long answers where there are 2 or 3 ns in sequence) ...
df<-data.frame(tabcomb)
df$A <- sapply(as.character(df$tabcomb),function(S) {
v <- lapply(1:3,function(i) ifelse(substr(S,i,i)=="n",list(c('a', 'g', 't', 'c')),list(substr(S,i,i))))
z <- expand.grid(v[[1]][[1]],v[[2]][[1]],v[[3]][[1]])
zz <- paste(do.call(paste0,z),collapse=" ")
return(zz)
})
source to share
df <- data.frame(tabcomb)
df$tabcomb <- as.character(df$tabcomb)
myfun <- function( x )
{
a1 <- lapply( as.list( strsplit( x, '')[[1]] ), function( y ) {
if( y == 'n') { y <- c('a', 'c', 't', 'g') }
y
} )
apply( expand.grid(a1), 1, paste, collapse = '' )
}
sapply( df$tabcomb, myfun )
source to share