Selecting shortcuts from a box in Gnuplot
I've been researching Gnuplot for about a day now and I would like to use boxplot to see outliers from a dataset at a glance.
So let's say I am running an experiment:
- In 10 subjects
- I force 10 subjects to repeat the task 100 times to achieve 3 specific goals.
- I am collecting how many times they reach Target1, Target2, Target3.
These results are collected in the data_File_new.dat file described below:
Name Target1 Target2 Target3
subject1 10 30 50
subject2 11 31 51
subject3 9 29 49
subject4 12 32 52
subject5 8 28 48
subject6 13 33 53
subject7 7 27 47
subject8 50 34 54
subject9 6 50 46
subject10 15 35 20
Now I am creating a boxplot from this data
file = 'data_File_new.dat'
header = system('head -1 '.file);
N=words(header)
set title 'BoxPlot Subject Success'
set ylabel 'Number Of Success'
set xtics border in scale 0,0 nomirror norotate offset character 0, 0, 0 autojustify
set xtics norangelimit
set xtics rotate -45
set xtics ('' 2)
set for [i=2:N] xtics add (word(header, i) i)
set style data boxplot
plot for [i=2:N] file using (i):i
So the result is a box of outliers that are plotted as solid dots (I wanted to post the image, but I need 10 reputation to post the image). This tells me if there are sheets or not. However, I want to know more, I want to know which are outliers , that is:
- Topic 8 - Outlier for Goal 1
- Topic 9 - Outlier for Goal 2
- Topic 10 - Outlier for Goal 3
Since Gnuplot knows these points are outliers, I expect Gnuplot to store them in some kind of list. I would like to say that Gnuplot displays outliers and labels them with the word of the first column (subjectx) corresponding to the row belonging to " .
Then, when I open the drawer, I can determine at a glance not only there are outliers, but also who they are .
Does anyone know how to do this? I looked at the forum and saw that some people are doing this in R, but not in Gnuplot.
source to share
This is not the nicest bit of gnuplot code, but it can be done!
Gnuplot stats
can be used to get the upper and lower quartiles that are used to create a box. Then you can use some conditional code to plot out-of-range points with labels
. The tricky part is that the command plot
is created as a string and then eval
ed at the end. As I said, not too pretty!
file = 'data_File_new.dat'
header = system('head -1 '.file)
N=words(header)
set title 'BoxPlot Subject Success'
set ylabel 'Number Of Success'
set xtics border in scale 0,0 nomirror norotate offset character 0, 0, 0 autojustify
set xtics norangelimit
set xtics rotate -45
set xtics ('' 2)
set for [i=2:N] xtics add (word(header, i) i)
r = 1.5
set style boxplot range r
unset key
cmd = "plot for [i=2:N] file using (i):i with boxplot"
do for [i=2:N] {
stats file using i every ::1 nooutput
lq = STATS_lo_quartile
uq = STATS_up_quartile
ir = uq - lq
min = lq - r * ir
max = uq + r * ir
cmd = cmd . sprintf(", file using (%d):($%d < %d || $%d > %d ? $%d : 1/0):1 every ::1 with labels offset 5,0", i, i, min, i, max, i)
}
eval cmd
source to share