Collections containing different types at the same time

Traditionally I have programmed in C ++ and Java and now I am starting to learn Ruby.

Then my question is, how do languages ​​like ruby ​​internally implement their arrays and hash data structures in such a way that they can hold any type at the same time? I know that in Java the fact that every class derives from an object may be one way to implement it, but I was wondering if there is another way. For example, in C ++, if I wanted to implement a dynamic array that could store multiple value types at the same time (no relation), how could I do that?

To clarify, I don't mean generic programming or templates as they just create a new collection interface for a type. I mean a structure like:

array = [1, "hello", someClass];

+3


source to share


6 answers


Most of them are roughly the same as in C ++, creating vector

(or list

, deque

etc.) boost::any

or something similar.

That is, they basically attach some kind of tag to each type of object that is stored in memory. When they store an object, they store the tag. When they read an object, they look at the tag to figure out which object. Of course, they also handle most of this internally, so you don't need to write any code to figure out exactly which object you just retrieved from the collection.

If it's not clear: "tag" is just a unique number assigned to each type. If the system you are dealing with has primitive types, it will usually pre-assign a type number to each of them. Likewise, every class you create assigns a unique number to it.



To do this in C ++, you usually create a central tag registry. When you register a type, you get a unique number that you use to tag objects of that type. When the language supports this directly, it automates the type registration process and picks a unique tag for each.

While this is probably the most common method of doing things like this, it is definitely not the only one. For example, you can also specify specific storage ranges for specific types. When you allocate an object of a certain type, it is always allocated from that range of addresses of that type. When you create a collection of "objects" you are not really storing the objects themselves, but instead storing something that contains the address of the object. Since objects are separated by address, you can determine the type of the object based on the value of the pointer.

+5


source


In the MRI interpreter , a ruby ​​value is stored as a pointer type that points to a data structure that stores the class of the value and any data associated with the value. Since pointers are always the same size, ( sizeof(unsigned long)

usually) this is possible. To answer your question about C ++, it is not possible in C ++ to define the class of an object given its location in memory, so it would not be possible unless you had something like this:

enum object_class { STRING, ARRAY, MAP, etc... };

struct TaggedObject {
  enum object_class klass;
  void *value;
}

      



and went around the meanings TaggedObject *

. This is pretty much what the ruby ​​does internally.

+4


source


There are many ways to do this: -

You can define a common interface for all elements and create a container from them. For example:

class Common { /* ... */ };  // the common interface.

      

You can use a container void*

: -

vector<void*> common;        // this would rather be too low level.
                             // you have to use cast very much.

      

And then the best approach I think is using the Any class, like Boost :: Any: -

vector<boost::any> v;

      

+2


source


You're looking for something like erasing styles. The easiest way to do it in C ++ is with boost :: any :

std::vector<boost::any> stuff;
stuff.push_back(1);
stuff.push_back(std::string("hello"));
stuff.push_back(someClass);

      

Of course, with any

, you are extremely limited in what you can do with yours stuff

, since you must personally remember everything that you put into it.

A more common use case for heterogeneous containers would be a series of callbacks. The standard class std::function<R(Args...)>

is actually a type-erasable functor:

void foo() { .. }

struct SomeClass {
    void operator()() { .. }
};

std::vector<std::function<void()>> callbacks;
callbacks.push_back(foo);
callbacks.push_back(SomeClass{});
callbacks.push_back([]{ .. });

      

Here we add three objects of different types (a void(*)()

, a SomeClass

and some lambdas) to one container , which we do by erasing the type. Therefore, we can still:

for (auto& func : callbacks) {
    func();
}

      

And it will be correct in each of the three objects ... no virtual ones needed!

+2


source


Others have explained how you can do this in C ++.

There are various ways to solve this problem. To answer your question about how languages ​​like Ruby resolve without going into the details of how Ruby solves it, they use a structure containing type information. For example, we could do it in C ++ something like this:

enum TypeKind { None, Int, Float, String }; // May need a few more?

class TypeBase
{
   protected:
     TypeKind kind;
   public:
     TypeBase(TypeKind k) : kind(k) { }
     virtual ~TypeBase() {};
     TypeKind type() { return kind; }
};


class TypeInt : public TypeBase
{
   private: 
      int value;
   public:
      TypeInt(int v) : value(v), TypeBase(Int) {}
};

class TypeFloat : public TypeBase
{
   private: 
      double value;
   public:
      TypeFloat(double v) : value(v), TypeBase(Float) {}
};

class TypeString : public TypeBase
{
   private: 
      std::string value;
   public:
      TypeString(std::string v) : value(v), TypeBase(String) {}
};

      

(To make this useful we'll probably need a few more methods for the class TypeXxx

, but I don't want to print another hour ...;))

And then it defines the type somewhere, for example.

Token t = getNextToken();
TypeBase *ty;
if (t.type == IntegerToken)
{
   ty = new(TypeInt(stoi(t.str));
}
else if (t.type == FloatToken)
{
   ty = new(TypeFloat(stod(t.str));
}
else if (t.type == StringToken)
{
   ty = new(TypeString(t.str));
}

      

Of course, we also need to deal with variables and various scripts, but the point is that the language can keep track of (and sometimes mutate) the value that is stored.

Most languages ​​in the general category, where Ruby, PHP, Python, etc., will have such a mechanism, and all variables are stored in some indirect way. The above is just one possible solution, I can think of at least half a dozen other ways to do this, but they are variations on "store data along with type information".

(And by the way, it boost::any

also does something along the lines of the above, more or less ....)

+2


source


In Ruby, the answer is quite simple: this array does not contain values ​​of different types, they are all of the same type. These are all objects.

Ruby is dynamically typed, the idea of ​​an array statically limited to just storing elements of the same type doesn't even make sense.

For a statically typed language, the question is, how much do you want it to be like Ruby? Do you want this to be actually dynamically typed? Then you need to implement the dynamic type in your language (if it doesn't already exist, like C♯s dynamic

).

Otherwise, if you want a statically typed heterogeneous list, such a thing is usually called HList

. There's a very good implementation for Scala in the Shapeless library, for example.

0


source







All Articles