What is the best way to model an unordered list (i.e. a Set)?

Question

What is the best way to model an unordered list (i.e. a Set)?

What's the most natural way to model a group of objects that form a set? For example, you might have many custom objects that are subscribers to a mailing list.

Obviously you could model this as an array, but then you have to order the elements, and whoever is using your interface might be confused as to why you are encoding arbitrary ordering data.

You can use a hash where the members are keys that match "1" or "true", but in most languages there are restrictions on what data types a hash key can be.

What's the standard way to do this in modern languages (PHP, Perl, Ruby, Python, etc.)?

0

language-agnostic

Tom lehman Dec 11. '08 at 18:59

source to share

7 replies

C # has a HashSet <T> generic collection.

public class EmailAddress  // probably needs to override GetHashCode()
{
   ...
}

var addresses = new HashSet<EmailAddress>();

+1

tvanfosson Dec 11. '08 at 19:05

source to share

Most modern languages will have some form of Set data structure. Java has a HashSet that implements Set .

In PHP, you can use an array to store your data. Either search for an array before adding a new element, or use array_unique to remove duplicates after inserting all elements.

+1

Bill the lizard Dec 11. '08 at 19:07

source to share

In c as a means of waiting for a direct understanding of the machine:

For small, discrete and well-defined ranges: use a bitmap to indicate the presence of each possible element (set to present, not set to absent).
Use a hash table for all other cases.

Write functions to implement adding and removing elements, testing for presence or absence, testing subsets, etc. as needed.

However, like the other answers, if you just want to use this feature, use a language feature or a third party library that is already well debugged.

0

dmckee Dec 11. 08 at 19:38

source to share

A lot of the time, hash-based based ones are the right thing to use, but if you don't have to do keyword searches and don't worry about enforcing unique values, a vector or list is fine. After all, the hash table overhead.

You seem to be concerned that people will think vector order is important, but I believe this is a fairly common use that people should not be confused with the documentation.

It really depends on how you want to access and use the data.

0

David Norman Dec 11. 08:12 PM

source to share

and Array is usually the simplest way to store data with no other requirements. Usually other data types are used for different reasons (you want to add data, you want to search for data at a constant time, you need to quickly establish a join / intersection, etc.). If your only problem is an abstraction, you can wrap it around a disordered facade in some way.

0

Jimmy Dec 11. '08 at 20:19

source to share

In Perl, I would definitely use a hash. In other languages, I have complained about the lack of a hash.

0

skiphoppy Dec 11. 08:24 PM

source to share

Greg Hewgill · Accepted Answer · 2008-12-11T19:02:25+0000

In Python, you have to use set

the datatype. A set

supports any hashable object, so if you have a custom class that you need to store in a collection and the default hashing behavior is not appropriate, you can implement __hash__

to implement the behavior you want.

What is the best way to model an unordered list (i.e. a Set)?

More articles: