Measure how to propagate data in an array

I have an array of zeros and ones and I need to know if the data is spread across columns or concentrated in clumps.

For example:

If I have an array x

and it has the following values:

Column 1 values: 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1

Column 2 values: 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1

if we count the number of those we can know that it is the same number, but those are more well distributed and distributed in column 2 compared to column 1.

I am trying to make a score that gives me great value if the spread is good and low value if the spread is bad ... any ideas

Sample data:

1 0 0 0 5 0 -2 -3  0 0 1
1 0 0 0 0 0  0  0  0 0 1
2 0 0 0 0 0  0  3 -3 1 0
1 2 3 0 5 0  2 13  4 5 1
1 0 0 0 0 0 -4 34  0 0 1

      

+3


source to share


2 answers


I think you are trying to measure the variance of the distribution of a number 0

between 1

s, ie:

f = @(x)std(diff(find(x)))

      

So for you the data is:

a = [1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1]
b = [1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1]

f(a)
    = 8.0498

f(b)
    = 2.0736

      

But I still think you are essentially trying to measure the disorder of the system, what I imagine is how entropy measurements do it, but I don't know how



Note that this gives a low value if "spread" is good and a high value if it is bad (ie the opposite of your query).

Also, if you want it one column, it gets a little more complicated:

f = @(x)arrayfun(@(y)std(diff(find(x(:,y)))), 1:size(x,2))
data = [a', b'];
f(data)

      

WARNING: This method largely ignores trailing and leading 0

s. I don't know if this was a problem or not. but basically f([0; 0; 0; 1; 1; 1; 0; 0; 0])

returns 0

where as f([1; 0; 0; 1; 0; 1; 0; 0; 0])

returns a positive indication (incorrect) that the first case is more distributed. One possible fix could be adding and adding a row of these to the matrix ...

+2


source


I think you will need the spacing to find the "spread" locally, otherwise sample 1

(which is named as Column 1

in the question) it will show up as the range between 2nd and 3rd.

So, following this theory and assuming what input_array

will be the input array, you can try this approach -

intv = 10; %// Interval
diff_loc = diff(find(input_array))
spread_factor = sum(diff_loc(diff_loc<=intv)) %// desired output/score

      

For sample 1

, spread_factor

gives 4

, and for sample 2

- 23

.




Another theory that you can use would be that you accept an interval such that the distance between successive ones must be greater than or equal to this interval. This theory would lead us to code like this -

intv = 3; %// Interval
diff_loc = diff(find(input_array))
spread_factor = sum(diff_loc>=intv)

      

With this new approach - for sample 1

, spread_factor

- 1

, and for sample 2

- 5

.

+2


source







All Articles