Selecting shortcuts from a box in Gnuplot

I've been researching Gnuplot for about a day now and I would like to use boxplot to see outliers from a dataset at a glance.

So let's say I am running an experiment:

  • In 10 subjects
  • I force 10 subjects to repeat the task 100 times to achieve 3 specific goals.
  • I am collecting how many times they reach Target1, Target2, Target3.

These results are collected in the data_File_new.dat file described below:

    Name    Target1   Target2   Target3
    subject1    10  30  50
    subject2    11  31  51
    subject3    9   29  49
    subject4    12  32  52
    subject5    8   28  48
    subject6    13  33  53
    subject7    7   27  47
    subject8    50  34  54
    subject9    6   50  46
    subject10 15    35  20  

      

Now I am creating a boxplot from this data

   file = 'data_File_new.dat'
   header = system('head -1 '.file);
   N=words(header)
   set title 'BoxPlot Subject Success'
   set ylabel 'Number Of Success'
   set xtics border in scale 0,0 nomirror norotate  offset character 0, 0, 0 autojustify
   set xtics norangelimit
   set xtics rotate -45
   set xtics ('' 2)
   set for [i=2:N] xtics add (word(header, i) i)
   set style data boxplot
   plot for [i=2:N] file using (i):i

      

So the result is a box of outliers that are plotted as solid dots (I wanted to post the image, but I need 10 reputation to post the image). This tells me if there are sheets or not. However, I want to know more, I want to know which are outliers , that is:

  • Topic 8 - Outlier for Goal 1
  • Topic 9 - Outlier for Goal 2
  • Topic 10 - Outlier for Goal 3

Since Gnuplot knows these points are outliers, I expect Gnuplot to store them in some kind of list. I would like to say that Gnuplot displays outliers and labels them with the word of the first column (subjectx) corresponding to the row belonging to " .

Then, when I open the drawer, I can determine at a glance not only there are outliers, but also who they are .

Does anyone know how to do this? I looked at the forum and saw that some people are doing this in R, but not in Gnuplot.

+3


source to share


1 answer


This is not the nicest bit of gnuplot code, but it can be done!

Gnuplot stats

can be used to get the upper and lower quartiles that are used to create a box. Then you can use some conditional code to plot out-of-range points with labels

. The tricky part is that the command plot

is created as a string and then eval

ed at the end. As I said, not too pretty!



file = 'data_File_new.dat'
header = system('head -1 '.file)
N=words(header)
set title 'BoxPlot Subject Success'
set ylabel 'Number Of Success'
set xtics border in scale 0,0 nomirror norotate  offset character 0, 0, 0 autojustify
set xtics norangelimit
set xtics rotate -45
set xtics ('' 2)
set for [i=2:N] xtics add (word(header, i) i)
r = 1.5
set style boxplot range r
unset key
cmd = "plot for [i=2:N] file using (i):i with boxplot"
do for [i=2:N] {
    stats file using i every ::1 nooutput
    lq = STATS_lo_quartile
    uq = STATS_up_quartile
    ir = uq - lq
    min = lq - r * ir
    max = uq + r * ir
    cmd = cmd . sprintf(", file using (%d):($%d < %d || $%d > %d ? $%d : 1/0):1 every ::1 with labels offset 5,0", i, i, min, i, max, i)
}
eval cmd

      

final plot with labeled outliers

+2


source







All Articles