R: Divide data in ggplot based on another factor
I start with R, so I have little experience. I ran into an issue when trying to split a scatterplot into groups based on infection status. In our example, the dataset consists of the log-transformed logapfhap2 antibody levels. The infection status of any Pf inf is coded as "Yes" or "No" and gives information about whether someone was infected during the subsequent period. I am plotting time points (x) versus antibody levels (y). For time points 1 and 14, I would like to make 2 groups by infection state.
This is the bulk of the code I use to plot the data without splitting into groups:
ggplot() +
geom_jitter(data=data2, aes(x='1', y=logapfhap2, colour='PfHAP2A')) +
geom_jitter(data=data2,aes(x='14', y=logbpfhap2, colour='PfHAP2B')) +
geom_jitter(data=TRC, aes(x='C', y=PfHAP2, colour='PfHAP2C'))
which results in this graph:
Then I tried to break it (I'm only showing the first time here), which returns an error.
ggplot() +
geom_jitter(data=data2[data2$any_Pf_inf=='Yes'],
aes(x='1inf', y=logapfhap2[data2$any_Pf_inf=='Yes'],
colour='PfHAP2A')) +
geom_jitter(data=data2[data2$any_Pf_inf=='No'],
aes(x='1un', y=logapfhap2[data2$any_Pf_inf=='No'],
colour='PfHAP2B'))
I wanted to create this graph , but I am getting this error:
Error: boolean index vector length must be 1 or 55, received: 482
Hope this is clear! Can anyone help me with this problem? Thank!
EDIT Not sure if this makes it clearer, but this is what my data looks like:
source to share
I just tried other things and I solved it now!
ggplot()+
geom_jitter(data=data2[data2$any_Pf_inf=='Yes',],
aes(x='1inf', y=logapfhap2,
colour='PfHAP2A')) +
geom_jitter(data=data2[data2$any_Pf_inf=='No',],
aes(x='1un', y=logbpfhap2,
colour='PfHAP2B'))
Obviously, you need to add a comma after [data2 $ any_Pf_inf == 'Yes',] to retrieve rows instead of columns.
source to share