Need help extending a function and loop in R
I have the following function with a for loop:
getSequences <- function(input.seq){
peptide.result <- c()
for (i in 1:nrow(peptides.df)) {
peptide.seq <- substr(input.seq, peptides.df$StartAA[i], peptides.df$EndAA[i])
peptide.info <- data.frame(cbind(peptide.name = peptides.df$Name[i], peptide.seq))
peptide.result <- rbind(peptide.result, peptide.info)
}
return(peptide.result)
}
test.results <- getSequences(input.seq)
The function takes an amino acid sequence, and then using this input and a matrix of peptides with start and end positions, it extracts a subset of the sequence in different positions to generate a set of peptides. Sequence:
Example amino acid sequence:
input.seq <- ("MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE")
The first few lines of peptides.df look like this:
Name StartAA EndAA
peptide_1 25 48
peptide_2 33 56
peptide_3 41 64
Current peptide yield. Result:
peptide.name peptide.sequence
peptide_1 QNYWEHPYQNSDVYRPINEHREHP
peptide_2 QNSDVYRPINEHREHPKEYEYPLH
peptide_3 INEHREHPKEYEYPLHQEHTYQQE
How can I extend it to take a dataframe with sample # and its input sequence. For each sample # and its sequence, I want to generate a set of peptides as in the example.
new input: dataframe with sample_sequences (200 samples with input sequences)
sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE
sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE
sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE
...
sample200 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE
new output: sample_peptides
sample1 peptide_1 QNYWEHPYQNSDVYRPINEHREHP
sample1 peptide_2 QNSDVYRPINEHREHPKEYEYPLH
sample1 peptide_3 INEHREHPKEYEYPLHQEHTYQQE
sample2 peptide_1 QNYWEHPYQNSDVYRPINEHREHP
sample2 peptide_2 QNSDVYRPINEHREHPKEYEYPLH
sample2 peptide_3 INEHREHPKEYEYPLHQEHTYQQE
sample3 peptide_1 QNYWEHPYQNSDVYRPINEHREHP
sample3 peptide_2 QNSDVYRPINEHREHPKEYEYPLH
sample3 peptide_3 INEHREHPKEYEYPLHQEHTYQQE
...
sample200 peptide_1 QNYWEHPYQNSDVYRPINEHREHP
sample200 peptide_2 QNSDVYRPINEHREHPKEYEYPLH
sample200 peptide_3 INEHREHPKEYEYPLHQEHTYQQE
source to share
You can avoid loops with tidyr
and dplyr
. You can use crossing
to extend sample_sequences for all possible peptides.df. Then it's just simple mutate
withsubstr
library(dplyr);library(tidyr)
peptides.df <- read.table(text=" Name StartAA EndAA
peptide_1 25 48
peptide_2 33 56
peptide_3 41 64",header=TRUE,stringsAsFactors=FALSE)
sample_sequences <-read.table(text=" sample sequence
sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE
sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE
sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE",header=TRUE,stringsAsFactors=FALSE)
crossing(sample_sequences,peptides.df)%>%
mutate(peptide.sequence=substr(sequence, StartAA, EndAA))
sample sequence Name StartAA EndAA peptide.sequence
1 sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_1 25 48 QNYWEHPYQNSDVYRPINEHREHP
2 sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_2 33 56 QNSDVYRPINEHREHPKEYEYPLH
3 sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_3 41 64 INEHREHPKEYEYPLHQEHTYQQE
4 sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_1 25 48 QNYWEHPYQNSDVYRPINEHREHP
5 sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_2 33 56 QNSDVYRPINEHREHPKEYEYPLH
6 sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_3 41 64 INEHREHPKEYEYPLHQEHTYQQE
7 sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_1 25 48 QNYWEHPYQNSDVYRPINEHREHP
8 sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_2 33 56 QNSDVYRPINEHREHPKEYEYPLH
9 sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_3 41 64 INEHREHPKEYEYPLHQEHTYQQE
source to share