Data structure for a random access linked list

I need a data structure that can give the previous and next neighbors for a given int that is part of the structure.

Some criteria that I have set for myself:

  • write once, read many times
  • contain from 100 to 1000 int
  • be efficient: order of magnitude O (1)
  • be memory efficient (ints size + some house bits ideally)
  • implemented in pure Java (there are no libraries for that as I want to find out)
  • unique elements
  • no concurrency requirements
  • ints are ordered externally, this order will most likely not be a natural ordering, and this order must be preserved (i.e. there is no contract regarding the difference in cost between two adjacent ints - any int can be greater or less than the int that precedes order).

This is in Java, and mostly in theory since I started using the solution described below.

Things I have considered:

  • LinkedHashSet : very fast to find element, order O (1) and very fast to get next neighbor. There is no obvious way to get the previous neighbor without reverse sorting the set. Nested Integer objects only.
  • int [] : very simple in memory because no boxing is needed, very fast to get the previous and next neighbor, element lookup is O (n), although due to the index unknown and array traversal is required and this is not acceptable.

Now I am using a combination of int [] and HashMap :

  • HashMap for retrieving index of a specific int to int []
  • int [] to retrieve the neighbors of this int

What I like:

  • Finding Neighborhood Ideally O (2)
  • int [] doesn't do boxing
  • theoretical performance is very good

What I do not like:

  • HashMap does boxing twice (key and value)
  • ints are stored twice (both in the map and in the array)
  • theoretical memory usage could be improved quite a bit.

I would be interested to know about better solutions.

+3


source to share


2 answers


One solution is to sort the array while adding items. This way the previous element is always i-1

and to find the value you can use binary search, which is O (log (N)).

The next obvious candidate is a balanced binary tree . For this structure, insertion is somewhat expensive, but searching again is O (log (N)).

If the values ​​are not 32-bit, then you can speed up your search by having a second array, where each value is an index in the first, and the index is the value you are looking for.

More options: You can look at bitsets, but again that depends on the range the values ​​might have.



Commons Lang has a hashmap that uses primitives int

as keys: http://grepcode.com/file/repo1.maven.org/maven2/commons-lang/commons-lang/2.6/org/apache/commons/lang/ IntHashMap.java but the type is internal, so you have to copy the code to use it.

This means you don't have to do anything with autoboxes (unboxing is cheap).

on this topic:

+1


source


ints are ordered externally, this ordering will most likely not be a natural ordering, and this ordering must be preserved (i.e. there is no contract regarding the difference in value between two adjacent ints).

This tells me "Tree". As Aaron said, expensive insert but effective search that you need if you write once, read a lot.

EDIT: Think about this a bit more if only one child and one parent can have a value, and given all your other requirements, I think ArrayList will work just fine. It's simple and very fast, although it's O (n). But if your dataset is growing, you are probably better off using the Map-List command.



Be aware that when working with these structures, the theoretical performance in terms of O () does not always match the performance of the real word. You have to consider the size of your dataset and the overall environment. One example: ArrayList and HashMap. In theory List is O (n) for unsorted search, while Map is O (1). However, there is a lot of overhead in creating and managing entries for a map, which actually degrades performance on smaller sets than the list.

Since you say you don't have to worry about memory, I would stay away from array

. The complexity of size management is not worth the dataset size you specify.

0


source







All Articles