B + tree, * * vs *

I am writing a B + tree for a variety of reasons and I came here to ask a question about the implementation of its nodes. Currently my nodes look like this:

struct BPlusNode
{
public:
    //holds the list of keys
    keyType **keys;
    //stores the number of slots used
    size_t size;
    //holds the array of pointers to lower nodes NULL if this is a leaf node
    BPlusNode **children;
    //holds the pointer to the next load to the 'left'
    BPlusNode *next;
    //Data page pointers NULL if this is a branch node
    Bucket **pages;
};

      

As you can see, my current implementation is using * * where I am wondering whether to use * * or *.

I am well aware that * * requires two dereferencing operations and is therefore slower than just using *, however this class uses a lot of recursion and it is much more convenient to pass pointers to sub-selections of recursive functions. To do this with *, I would need to do pointer arithmetic and pass the resulting pointer.

FROM **

someFunction(BPlusNode* currNode)
{
    ......
    someFunction(currNode->children[ChildIndex]);
}

      



from *

someFunction(BPlusNode* currNode)
{
    ......
    someFunction((currNode->children) + ChildIndex);
}

      

I see there is an extra memory read to get the pointer desired in the * * version, but the * * version is also easier to think about me (it more closely matches the way I see the diagrams drawn in The Art of Computer Programming and Wikipedia) ...

Does anyone have any thoughts anyway? Suggestions for the third option? Proof of why a person is superior to another? etc.?

Edit:
I could post this as an answer below, but I just realized that with the * * schema, I don't have to copy the entire contents of each subnode or bucket if I want to insert it into the middle of the array (i.e. resize the array). If there are 20 subnodes for the schema *, when I reallocate the array, I will need to copy 20 * sizeof (BPlusNode) bytes, as opposed to 20 * sizeof (BPlusNode *) bytes for the * * schema.

On the other hand, it occurred to me that since I am doing all of the paging and insertions, it might be more efficient to execute them, and the benefits of * over * * in lookups outweigh it.

+2


source to share


3 answers


I would define a different structure for the key and pointer data. I would instruct you to use fixed size nodes that should fit your structure on disk. This makes it easier to map memory to a tree.

Your BPlusNode structure becomes a descriptor class that points to these mapped data nodes and synthesizes things like the previous and next pointers by reading siblings as it descends the tree.



It might look something like this:

enum BPlusNodeType {
    LEAF, BRANCH
};

struct BPlusNodeData {
    static const size_t max_size = 511; // Try to fit into 4K? 8K?
    uint16_t size;
    uint16_t type;
    keyType key[max_size];
    union {
        Bucket* data[max_size];
        BPlusNodeData* children[max_size];
    };
};

      

+2


source


Using **

, you need an extra selection step to hold each child pointer BPlusNode*

. Or, you can allocate a block of them and just point each pointer in children

to sequential elements BPlusNode*

within that block, but it's another extra heap when creating a node (and the corresponding extra deletion step on destruction). So I would absolutely recommend using one *

. If you write

someFunction((currNode->children) + ChildIndex);

      

hurts you, you can rewrite it as



someFunction(&currNode->children[ChildIndex]);

      

which I find clearer.

+1


source


Could you use STL ' vector<keyType *> keys

' and ' vector<BPlusNode *> children

' etc.?

This may be overly simplistic, but my impression is that double-indirect binding is often unnecessary in C ++ (and not everything that is often in C, although more often than in C ++).

0


source







All Articles