Probability chart of logging in R

I'm sure it's easy, but I was ripping my hair out trying to figure out how to do it in R.

I have some data that I am trying to fit into a power law distribution. To do this, you need to plot the data on a cumulative probability chart of a log log. The y-axis is the LOG for the data rate (or log probability, if you like) and the x-axis is the log of values. If it is a straight line, then it corresponds to the power law distribution, and the gradient determines the power law parameter.

If I want the data frequency, I can just use the ecdf () function:

My dataset is called Profits.negative and it's just a long list of trade profits that were less than zero (and I basically converted them all to positive numbers to avoid log issues later).

So I can print

plot(ecdf(Profits.negative))

      

And I get a handy empirical CDF. All I have to do is convert both axes to logarithmic scales. I can make the x-axis:

Profits.negative.logs <- log(Profits.negative)
plot(ecdf(Profits.negative.logs))

      

Almost there! I just need to figure out how to register the y-axis! But I can't seem to do it and I can't figure out how to extract the shapes from the ecdf object. Can anyone please help?

I know there is a power.law.fit function, but that just evaluates the parameters - I want to plot the data and see if it aligns.

+3


source to share


3 answers


You can write and build power laws using poweRlaw . Here's an example. First, we generate some data from the heavy tail distribution:

set.seed(1)
x = round(rlnorm(100, 3, 2)+1)

      

Then we load the package and create a data object and an offset object:

library(poweRlaw)
m = displ$new(x)

      

We can also estimate xmin

the scaling parameter:

est = estimate_xmin(m))

      



and set the parameters

m$setXmin(est[[2]])
m$setPars(est[[3]])

      

Then plot the data and add a line:

plot(m)
lines(m, col=2)

      

To obtain:

enter image description here

+5


source


Generating data first (you are a part, in fact;)):

set.seed(1)
Profits.negative <- runif(1e3, 50, 100) + rnorm(1e2, 5, 5)

      

Recording and ecdf

:

Profits.negative.logs <- log(Profits.negative)
fn <- ecdf(Profits.negative.logs)

      

ecdf

returns a function, and if you want to extract something from it, it's a good idea to look into function closure:



ls(environment(fn))
# [1] "f"      "method" "n"      "nobs"   "x"      "y"      "yleft"  "yright"

      

So now we can access x

and y

:

x <- environment(fn)$x
y <- environment(fn)$y

      

This is probably what you need. Indeed, plot(fn)

they plot(x,y,type="l")

show practically the same results. To register the y-axis, you just need:

plot(x,log(y),type="l")

      

+2


source


Here's an approach using ggplot2

:

library(ggplot2)

# data
  set.seed(1)
  x = round(rlnorm(100, 3, 2)+1)

# organize data into a df
  df <- data.frame(x = sort(x, decreasing = T),
                   pk <- ecdf(x)(x),
                   k <- seq_along(x))

# plot
  ggplot(df, aes(x=k, y= pk)) + geom_point(alpha=0.5) + 
    coord_trans(x = 'log10', y = 'log10') +
    scale_x_continuous(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) +
    scale_y_continuous(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x)))

      

enter image description here

0


source







All Articles