Efficient memory structure for keeping track of a subset of an array in original order

I am using C # but I don't think this is a language specific question.

I am working on a data structure to keep track of a subset of a large array. For example, I have a changing array of characters and I want to keep track of the vowels in it. I want to keep track of them in such a way that their original order is maintained.

To illustrate, let's say that the character array is currently: [A, B, D, C, I, A, E, F]. The vowel subset I want would be [A, I, A, E]. If after some time the character array changes to [T, B, D, C, I, A, E, F] (the first element changed from A to T), the vowel subset will then become [I, A, E].

The vowel subset will often be randomly accessed, as if it were an array: vowels [0], vowels [3] ... etc.

So I can summarize the functions I need for my data structure:

1) memory efficiency - both the base array and the subset can be large. I am comparing a million records.

2) the original order of elements in the underlying array must be maintained in the subset.

3) quick access speed. I will use subset in the same way I use an array.

4) removal and insertion must be efficient. I have a change notification in the underlying array - eg. when the i-th character in the underlying array has changed, I will get a notification that "the i-th element has changed from A to B". However, I need to insert or remove the corresponding element in the subset

5) if it matters, I prefer a faster delete and I can give up the performance of the insert. The nature of our application has shown me that subset inserts are very less frequent than deletions, and usually happens in the tail. But the deletion, which can happen a lot, is always at the top or bottom of the subset.

PS. I've seen a clever way to quickly delete an array element: keep a count of the number of elements in the array. When deleting an element, replace it with the last element in the array and decrement the counter. This makes deleting an operation O (1). Although this would waste memory without shrinking the array, but I'm satisfied since the data structure is just an array - it's quite compact. The only problem with this approach is that it violates requirement (2). The order of the element in the subset will be changed from its original when deleted.

Edit: After reading several answers, I realize that I can ask the question in a more interesting way (at least I think it's more interesting :)):

I definitely agree that a computed B-tree would be a working solution. But I don't need to support: 1) finding items. eg I don't need to find where the first "A" is in my subset 2) I don't need sorting. All I want is to keep the original order.

It appears I don't need any element comparison at all . I know that most sorted data structures are based on element comparisons. I know, so the optimal complexity is O (log n). I am wondering if it is possible to improve the complexity of any of the three operations (random access, insert, delete) or reduce the memory complexity if I don't need any comparison?

+3


source to share


1 answer


I think you need an ordered statistical balanced binary tree since it maintains the order of elements and also supports insertion and deletion into O(logn)

. All search, insert and delete operations O(logn)

.

Algorithm: -



1. store required values in tree as <index,vowel> pairs 
2. keep index as key for tree node.
3. You can lookup nth element in tree in O(logn)
4. You can delete element in O(logn)
5. You can insert element in O(logn)
6. Space requirement is O(n) with extra memory for size variables

      

+2


source







All Articles