Probability chart of logging in R
I'm sure it's easy, but I was ripping my hair out trying to figure out how to do it in R.
I have some data that I am trying to fit into a power law distribution. To do this, you need to plot the data on a cumulative probability chart of a log log. The y-axis is the LOG for the data rate (or log probability, if you like) and the x-axis is the log of values. If it is a straight line, then it corresponds to the power law distribution, and the gradient determines the power law parameter.
If I want the data frequency, I can just use the ecdf () function:
My dataset is called Profits.negative and it's just a long list of trade profits that were less than zero (and I basically converted them all to positive numbers to avoid log issues later).
So I can print
plot(ecdf(Profits.negative))
And I get a handy empirical CDF. All I have to do is convert both axes to logarithmic scales. I can make the x-axis:
Profits.negative.logs <- log(Profits.negative)
plot(ecdf(Profits.negative.logs))
Almost there! I just need to figure out how to register the y-axis! But I can't seem to do it and I can't figure out how to extract the shapes from the ecdf object. Can anyone please help?
I know there is a power.law.fit function, but that just evaluates the parameters - I want to plot the data and see if it aligns.
source to share
You can write and build power laws using poweRlaw . Here's an example. First, we generate some data from the heavy tail distribution:
set.seed(1) x = round(rlnorm(100, 3, 2)+1)
Then we load the package and create a data object and an offset object:
library(poweRlaw) m = displ$new(x)
We can also estimate xmin
the scaling parameter:
est = estimate_xmin(m))
and set the parameters
m$setXmin(est[[2]])
m$setPars(est[[3]])
Then plot the data and add a line:
plot(m) lines(m, col=2)
To obtain:
source to share
Generating data first (you are a part, in fact;)):
set.seed(1)
Profits.negative <- runif(1e3, 50, 100) + rnorm(1e2, 5, 5)
Recording and ecdf
:
Profits.negative.logs <- log(Profits.negative)
fn <- ecdf(Profits.negative.logs)
ecdf
returns a function, and if you want to extract something from it, it's a good idea to look into function closure:
ls(environment(fn))
# [1] "f" "method" "n" "nobs" "x" "y" "yleft" "yright"
So now we can access x
and y
:
x <- environment(fn)$x
y <- environment(fn)$y
This is probably what you need. Indeed, plot(fn)
they plot(x,y,type="l")
show practically the same results. To register the y-axis, you just need:
plot(x,log(y),type="l")
source to share
Here's an approach using ggplot2
:
library(ggplot2)
# data
set.seed(1)
x = round(rlnorm(100, 3, 2)+1)
# organize data into a df
df <- data.frame(x = sort(x, decreasing = T),
pk <- ecdf(x)(x),
k <- seq_along(x))
# plot
ggplot(df, aes(x=k, y= pk)) + geom_point(alpha=0.5) +
coord_trans(x = 'log10', y = 'log10') +
scale_x_continuous(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) +
scale_y_continuous(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x)))
source to share