The certainty of the data distribution profile when performing a sorting operation

Suppose we have some data structure such as an array of n records, and for the arguments, let's assume the data has limited numeric values.

Is there a way to define the profile of the data, e.g. monotonically increasing, decreasing, etc. reasonably, perhaps with a specific z-value having k checked records in the data structure?

+3


source to share


2 answers


Assuming we have an array of size N, this means we are mapping N-1 between each adjacent element in the array. Let M = N-1. M represents the number of relationships. The probability that the array is not in the correct order is

1/M

      

If you choose a subset of the K ratios to define monotonically increasing or descending, the theoretical probability of confidence is

K / M

      



Since these are two linear equations, it is easy to see that if you want to be sure .9, you will need to check about 90% of the entries.

This only takes into account the assumptions in your question. If you know about the probability distribution, then using statistics, you can accidentally check a small percentage of the array.

If you don't care that the array is in relative order (for example, between [0,10], most will be close to the beginning.), That's another question. An algorithm that does this, not just sorting, must have a high cost for the items in exchange and a cheap cost for comparison. Otherwise, there will be no payback from performing a complex verification processing algorithm.

It's important to note that this is theoretical. I do not accept any distribution in an array.

0


source


A simpler task is to test the likelihood of such ordered behavior colliding with random data.

Eg. If the numbers are randomly placed, then p = 0.5, which is less than the second number (we'll get to the repetition case later). Now, if you produce k pairs and in each case the first number is less than the second number, the probability of seeing it is 2 ^ (- k).



Returning to reps, keep track of observed reps and account for them. For example. If the probability of repetition is q, the probability of non-observance of repetitions is (1-q), the probability of observing an increase or equality is q + (1-q) / 2, so exponentially with k to get the problem.

0


source







All Articles