Need help extending a function and loop in R

Question

Need help extending a function and loop in R

I have the following function with a for loop:

getSequences <- function(input.seq){
peptide.result <- c()
for (i in 1:nrow(peptides.df)) {
    peptide.seq <- substr(input.seq, peptides.df$StartAA[i], peptides.df$EndAA[i])
    peptide.info <- data.frame(cbind(peptide.name = peptides.df$Name[i], peptide.seq)) 
    peptide.result <- rbind(peptide.result, peptide.info)
}  
    return(peptide.result)
}

test.results <- getSequences(input.seq)

The function takes an amino acid sequence, and then using this input and a matrix of peptides with start and end positions, it extracts a subset of the sequence in different positions to generate a set of peptides. Sequence:

Example amino acid sequence:

input.seq <- ("MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE")

The first few lines of peptides.df look like this:

   Name StartAA EndAA
peptide_1   25    48
peptide_2   33    56
peptide_3   41    64

Current peptide yield. Result:

peptide.name    peptide.sequence
peptide_1   QNYWEHPYQNSDVYRPINEHREHP
peptide_2   QNSDVYRPINEHREHPKEYEYPLH
peptide_3   INEHREHPKEYEYPLHQEHTYQQE

How can I extend it to take a dataframe with sample # and its input sequence. For each sample # and its sequence, I want to generate a set of peptides as in the example.

new input: dataframe with sample_sequences (200 samples with input sequences)

sample1     MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE
sample2     MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE
sample3     MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE
...
sample200   MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE

new output: sample_peptides

sample1 peptide_1   QNYWEHPYQNSDVYRPINEHREHP
sample1 peptide_2   QNSDVYRPINEHREHPKEYEYPLH
sample1 peptide_3   INEHREHPKEYEYPLHQEHTYQQE
sample2 peptide_1   QNYWEHPYQNSDVYRPINEHREHP
sample2 peptide_2   QNSDVYRPINEHREHPKEYEYPLH
sample2 peptide_3   INEHREHPKEYEYPLHQEHTYQQE
sample3 peptide_1   QNYWEHPYQNSDVYRPINEHREHP
sample3 peptide_2   QNSDVYRPINEHREHPKEYEYPLH
sample3 peptide_3   INEHREHPKEYEYPLHQEHTYQQE
...
sample200   peptide_1   QNYWEHPYQNSDVYRPINEHREHP
sample200   peptide_2   QNSDVYRPINEHREHPKEYEYPLH
sample200   peptide_3   INEHREHPKEYEYPLHQEHTYQQE

+3

function for-loop r dataframe

tkh86 June 26. 17 at 13:44

source to share

1 answer

Pierre lapointe · Answer 1 · 2017-06-26T14:02:40+0000

You can avoid loops with tidyr

and dplyr

. You can use crossing

to extend sample_sequences for all possible peptides.df. Then it's just simple mutate

withsubstr

library(dplyr);library(tidyr)
peptides.df <- read.table(text="   Name StartAA EndAA
peptide_1   25    48
peptide_2   33    56
peptide_3   41    64",header=TRUE,stringsAsFactors=FALSE)

sample_sequences <-read.table(text=" sample sequence
sample1     MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE
sample2     MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE
sample3     MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE",header=TRUE,stringsAsFactors=FALSE)

crossing(sample_sequences,peptides.df)%>%
  mutate(peptide.sequence=substr(sequence, StartAA, EndAA))

   sample                                                         sequence      Name StartAA EndAA         peptide.sequence
1 sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_1      25    48 QNYWEHPYQNSDVYRPINEHREHP
2 sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_2      33    56 QNSDVYRPINEHREHPKEYEYPLH
3 sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_3      41    64 INEHREHPKEYEYPLHQEHTYQQE
4 sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_1      25    48 QNYWEHPYQNSDVYRPINEHREHP
5 sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_2      33    56 QNSDVYRPINEHREHPKEYEYPLH
6 sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_3      41    64 INEHREHPKEYEYPLHQEHTYQQE
7 sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_1      25    48 QNYWEHPYQNSDVYRPINEHREHP
8 sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_2      33    56 QNSDVYRPINEHREHPKEYEYPLH
9 sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_3      41    64 INEHREHPKEYEYPLHQEHTYQQE

Need help extending a function and loop in R

More articles: