Split data frame and sort split values based on specific column
I have the following dataframe
tdf <- structure(list(GO = c("Cytokine-cytokine receptor interaction",
"Cytokine-cytokine receptor interaction|Endocytosis", "I-kappaB kinase/NF-kappaB signaling",
"NF-kappa B signaling pathway", "NF-kappaB import into nucleus",
"T cell chemotaxis"), PosCount = c(17, 18, 4, 5, 1, 2), shortgo = structure(c(7L,
7L, 18L, 18L, 18L, 21L), .Label = c("TNF", "adaptive", "alpha",
"apop", "beta", "chemokine", "cytokine", "death", "defense",
"gamma", "immune response", "infla", "interleukin-1 ", "interleukin-10 ",
"interleukin-12 ", "interleukin-18 ", "interleukin-6 ", "kappa",
"migration", "stress", "taxis", "wound"), class = "factor")), .Names = c("GO",
"PosCount", "shortgo"), class = "data.frame", row.names = c(NA,
6L))
It looks like this:
> tdf
GO PosCount shortgo
1 Cytokine-cytokine receptor interaction 17 cytokine
2 Cytokine-cytokine receptor interaction|Endocytosis 18 cytokine
3 I-kappaB kinase/NF-kappaB signaling 4 kappa
4 NF-kappa B signaling pathway 5 kappa
5 NF-kappaB import into nucleus 1 kappa
6 T cell chemotaxis 2 taxis
What I want to do is split the dataframe according to shortgo
and then sort it GO
with PosCount
, getting this (handcrafted):
$cytokine
[1] Cytokine-cytokine receptor interaction|Endocytosis
[2] Cytokine-cytokine receptor interaction
$kappa
[1] NF-kappa B signaling pathway
[2] I-kappaB kinase/NF-kappaB signaling
[3] NF-kappaB import into nucleus
$taxis
[1] T cell chemotaxis
I am stuck with this:
> split(tdf$GO,tdf$shortgo)
Error in split.default(tdf$GO, tdf$hsortgo) :
group length is 0 but data length > 0
How can i do this?
source to share
You can order your first framework before splitting:
library(dplyr)
tdf <- tdf %>% group_by(shortgo) %>% arrange(desc(PosCount))
Then the split:
ldf <- split(tdf$GO, tdf$shortgo, drop=TRUE)
Which gives the desired (ordered) output:
> ldf
$cytokine
[1] "Cytokine-cytokine receptor interaction|Endocytosis"
[2] "Cytokine-cytokine receptor interaction"
$kappa
[1] "NF-kappa B signaling pathway"
[2] "I-kappaB kinase/NF-kappaB signaling"
[3] "NF-kappaB import into nucleus"
$taxis
[1] "T cell chemotaxis"
If you want to split your framework in a list of data, you can use:
ldf <- split(tdf, tdf$shortgo, drop=TRUE)
Solution with R base ( provided by @Henrik in the comments ):
split(tdf$GO[order(tdf$shortgo, -tdf$PosCount)], tdf$shortgo, drop=TRUE)
source to share
Using data.table
, you can use setorder()
to reorder the data in the data table and then group like this:
require(data.table)
ans = setorder(setDT(tdf), shortgo, -GO)[, .(GO_list = list(GO)), by=shortgo]
I would recommend keeping it like this so that it can do calculations on it. But if you insist on your final structure, you can do:
ans = setattr(ans$GO_list, 'names', as.character(ans$shortgo))
If you don't want to change the order of the original data by reference, you can do:
ans = setDT(tdf)[order(shortgo, -GO), .(GO_list = list(GO)), by=shortgo]
source to share