Superposition of two-dimensional on many factors in R

First of all, I am still a beginner. I am trying to interpret and draw a stack plot with R. I have already looked at several answers, but some were not specific to my case and others I just didn’t understand:

I have a dataset dvl

that has five columns: Variant, Region, Time, Person, and PrecededByPrep. I would like to do a multivariate comparison of Variant with the other four predictors. Each column can have one of two possible values:

  • Option: elk

    or ieder

    .
  • Region = VL

    or NL

    .
  • Time: time

    orno time

  • Person: person

    orno person

  • PrecededByPrep: 1

    or0

Here's the logistic regression

From the answers I figured out that the library ggplot2

might be the best drawing library. I've read his documentation, but for the life of me I can't figure out how to do this: how can I compare Variant

to three other factors?

It took me a while, but I did something similar in Photoshop that I wanted (fictitious values!).

graph

Dark gray / light gray: possible values Variant

y-axis: frequency x-axis: each column divided by possible values

I know to make individual plots both stacked and grouped , but basically I don't know how to stack, group line plots, ggplot2

can be used, but if it can be done I would prefer that.

I think this can be seen as a sample dataset, although I'm not entirely sure. I am starting with R and I have been reading about creating a sample set.

t <- data.frame(Variant = sample(c("iedere","elke"),size = 50, replace = TRUE),
            Region = sample(c("VL","NL"),size = 50, replace = TRUE),
            PrecededByPrep = sample(c("1","0"),size = 50, replace = TRUE),
            Person = sample(c("person","no person"),size = 50, replace = TRUE),
            Time = sample(c("time","no time"),size = 50, replace = TRUE))

      

I would like the plot to be aesthetically pleasing. What I meant:

  • Chart color (i.e. for bars): col=c("paleturquoise3", "palegreen3")

  • Bold for axis labels font.lab=2

    , but not for value labels (for example, isgion in bold, but

    VL and

    NL` is not in bold)
  • #404040

    as font, axis and line colors
  • Axis labels: x:, factors

    y:frequency

+3


source to share


3 answers


Here is one possibility, which starts with an "unstabilized" data frame, melt

it, draw it with geom_bar

in ggplot2

(which does the count for each group), separate the graph by variable using facet_wrap

.

Create toy data:

set.seed(123)
df <- data.frame(Variant = sample(c("iedere", "elke"), size = 50, replace = TRUE),
           Region = sample(c("VL", "NL"), size = 50, replace = TRUE),
           PrecededByPrep = sample(c("1", "0"), size = 50, replace = TRUE),
           Person = sample(c("person", "no person"), size = 50, replace = TRUE),
           Time = sample(c("time", "no time"), size = 50, replace = TRUE))

      

Change the data:

library(reshape2)
df2 <- melt(df, id.vars = "Variant")

      

Plot:

library(ggplot2)
ggplot(data = df2, aes(factor(value), fill = Variant)) +
  geom_bar() +
  facet_wrap(~variable, nrow = 1, scales = "free_x") +
  scale_fill_grey(start = 0.5) +
  theme_bw()

      

enter image description here



There are many options for customizing the plot, such as the ordering of factor levels , labels with a rotating axis , wrapping labels on two lines (for example, for a longer variable name "PrecededByPrep"), or changing the distance between edges .

Setting (following updates in question and comments from OP)

# labeller function used in facet_grid to wrap "PrecededByPrep" on two lines
# see http://www.cookbook-r.com/Graphs/Facets_%28ggplot2%29/#modifying-facet-label-text
my_lab <- function(var, value){
  value <- as.character(value)
    if (var == "variable") { 
      ifelse(value == "PrecededByPrep", "Preceded\nByPrep", value)
    }
}

ggplot(data = df2, aes(factor(value), fill = Variant)) +
  geom_bar() +
  facet_grid(~variable, scales = "free_x", labeller = my_lab) + 
  scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
  theme_bw() +
  theme(axis.text = element_text(face = "bold"), # axis tick labels bold 
        axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
        line = element_line(colour = "gray25"), # line colour gray25 = #404040
        strip.text = element_text(face = "bold")) + # facet labels bold  
  xlab("factors") + # set axis labels
  ylab("frequency")

      

enter image description here

Add counts for each bar (edit following comments from OP).

The basic principles of calculating y coordinates can be found in this Q&A . Here I use dplyr

to calculate the counts per bar (i.e. label

in geom_text

) and their coordinates y

, but this can of course be done in base

R, plyr

or data.table

.

# calculate counts (i.e. labels for geom_text) and their y positions.
library(dplyr)
df3 <- df2 %>%
  group_by(variable, value, Variant) %>%
  summarise(n = n()) %>%
  mutate(y = cumsum(n) - (0.5 * n))

# plot
ggplot(data = df2, aes(x = factor(value), fill = Variant)) +
  geom_bar() +
  geom_text(data = df3, aes(y = y, label = n)) +
  facet_grid(~variable, scales = "free_x", labeller = my_lab) + 
  scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
  theme_bw() +
  theme(axis.text = element_text(face = "bold"), # axis tick labels bold 
        axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
        line = element_line(colour = "gray25"), # line colour gray25 = #404040
        strip.text = element_text(face = "bold")) + # facet labels bold  
  xlab("factors") + # set axis labels
  ylab("frequency")

      

enter image description here

+6


source


Here is my suggestion for a solution with barplot

R base function :

1.calculate counts

l_count_df<-lapply(colnames(t)[-1],function(nomcol){table(t$Variant,t[,nomcol])})
count_df<-l_count_df[[1]]
for (i in 2:length(l_count_df)){
    count_df<-cbind(count_df,l_count_df[[i]])
}

      

2.draw a barcode without axis names keeping the column coordinates

par(las=1,col.axis="#404040",mar=c(5,4.5,4,2),mgp=c(3.5,1,0))
bp<-barplot(count_df,width=1.2,space=rep(c(1,0.3),4),col=c("paleturquoise3", "palegreen3"),border="#404040", axisname=F, ylab="Frequency",
            legend=row.names(count_df),ylim=c(0,max(colSums(count_df))*1.2))

      

3.name the columns



mtext(side=1,line=0.8,at=bp,text=colnames(count_df))
mtext(side=1,line=2,at=(bp[seq(1,8,by=2)]+bp[seq(2,8,by=2)])/2,text=colnames(t)[-1],font=2)

      

4.add values ​​inside columns

for(i in 1:ncol(count_df)){
    val_elke<-count_df[1,i]
    val_iedere<-count_df[2,i]
    text(bp[i],val_elke/2,val_elke)
    text(bp[i],val_elke+val_iedere/2,val_iedere)
}

      

This is what I get (with my random data):

enter image description here

+6


source


I am basically answering another question. I suppose this can be seen as perverse on my part, but I really don't like barriers of almost any kind. They always seemed to create wasted space because the current informational numeric values ​​are less useful than a well-formed table. The package vcd

offers the extended mosaicplot function, which I think is more aptly called the "multidimensional line font, which is any of the ones I've seen so far. This requires you to first build a contingency table for which the xtabs

function seems to be perfect."

install.packages)"vcd")
library(vcd)
help(package=vcd,mosaic)
col=c("paleturquoise3", "palegreen3")
vcd::mosaic(xtabs(~Variant+Region + PrecededByPrep   +  Time, data=ttt) 
           ,highlighting="Variant", highlighting_fill=col)

      

enter image description here

It was a 5-line plot and this is a 5-way plot:

png(); vcd::mosaic( xtabs(
                  ~Variant+Region + PrecededByPrep +   Person  +  Time, 
                   data=ttt) 
                ,highlighting="Variant", highlighting_fill=col); dev.off()

      

enter image description here

+2


source







All Articles