R: Using the sort function in a data frame based on multiple columns

I am a cardiologist and love coding in R - I am having a real problem sorting a data frame and I suspect the solution is very simple!

I have a data frame with totals from several df $ studies. Most of the studies have only one final value (df $ short description). However, as you can see, Study A has three summaries (df $ no.of.estimate). See below

study <- c("E", "A", "F", "A", "B", "A", "C", "D")
no.of.estimate <- c(1, 2, 1, 3, 1, 1, 1, 1)
summary <- c(1, 2, 3, 5, 6 ,7 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)

      

So, I want to sort the dataframe by df$summary

- it's easy. However, if each study has more than one grade, I want to group these studies and tidy them up using the "no.of.estimates" column.

So essentially the desired output is

study <- c("E", "A", "A", "A", "F", "B", "C", "D")
no.of.estimate <- c(1, 1, 2, 3, 1, 1, 1, 1)
summary <- c(1, 7, 2, 5, 3 ,6 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)

      

+3


source to share


2 answers


You may try

library(dplyr)
df %>% 
     mutate(study=factor(study, levels=unique(study))) %>%
     arrange(study,no.of.estimate)
  #  study no.of.estimate summary
  #1     E              1       1
  #2     A              1       7
  #3     A              2       2
  #4     A              3       5
  #5     F              1       3
  #6     B              1       6
  #7     C              1       8
  #8     D              1       9

      

Or approach base R

df$study <- factor(df$study, levels=unique(df$study))
df[with(df, order(study, no.of.estimate)), ]

      



data

df <- structure(list(study = structure(c(5L, 1L, 6L, 1L, 2L, 1L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 2, 1, 3, 1, 1, 1, 1), summary = c(1, 
2, 3, 5, 6, 7, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

      

Expected dataset

df1 <- structure(list(study = structure(c(5L, 1L, 1L, 1L, 6L, 2L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 1, 2, 3, 1, 1, 1, 1), summary = c(1, 
7, 2, 5, 3, 6, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

      

+2


source


Here's mine data.table

trying to leave your columns as they are and create a new index (although see my comment first). The main advantage is that you update your dataset by reference instead of creating new copies.



library(data.table)
setorder(setDT(df)[, indx := .GRP, study], indx, no.of.estimate)[]
#    study no.of.estimate summary indx
# 1:     E              1       1    1
# 2:     A              1       7    2
# 3:     A              2       2    2
# 4:     A              3       5    2
# 5:     F              1       3    3
# 6:     B              1       6    4
# 7:     C              1       8    5
# 8:     D              1       9    6

      

+2


source







All Articles