Migrating data between shards

For a project I am working on, we need customer data in a database located next to the customer. For this reason, we have adopted a new Elastic Scale solution from Microsoft. This removes the complexity of shards and still gives you the ability to scale globally.

At the moment I am facing a rather important problem. I need to transfer data from 1 shard to another. There is an example application (Merge / Split) that does something but works with ranges (1..100, 101..400, etc.). The database I'm working on works with guides, so we can't use the example code.

I created the Move / Merge control tool myself, but there is a problem here. At first I wanted to insert all objects and dependencies from the ORM. Due to some circular keys, I cannot do this easily. So I am creating a SQL script now. The SQL script is about 130MB and only contains commands INSERT

.

This should all be done in one transaction, because you don't want the migration to be done in half. If there is an error, everything should be canceled.

Running this 130MB script is giving me some errors. My local development machine and SQL Azure I'm running out of memory. SQL Azure:

Not enough memory in the buffer pool

and locally:

There is not enough system memory in the default resource pool to run this request.

I tried to disable the indexes, so this won't recover on every one INSERT

. It doesn't fix anything.

Any suggestions on how to proceed? I cannot split the script because all data must be INSERT

ed at once. SSIS package is not an option I guess.

Building my own transaction system on the database seems to be causing a lot of bugs and errors.

Besides the INSERT

script, I also need to execute the DELETE

script in the "old" shard / database, so I think the solution should work for that script too. I would like to make a script INSERT

, and DELETE

in 1 transaction, but it is still not possible to SQL Azure (distributed transactions).

+3


source to share


1 answer


The Split / Merge version in the current preview for Azure SQL DB Elastic Scale has a known limitation that it only works with range maps. I am assuming you are currently using a map of the map list for your directions. While we are currently working on support for Map List and Split / Merge maps available in the Elastic Scale preview update, there is work I would recommend you try. This workaround might be easier than writing your own infrastructure to move data between shards and save a ton of effort (hopefully).

Here's what I suggest:

  • Replace the list shard map with a guid type of range shard map.
  • Make each pointer in your data one range: use the guid value directly as the left border, and use the guid value incremented by 1 in its binary representation as its right border (remember that the right border is exclusive and the left is inclusive). You can use the RawKey property of the ShardKey class to easily get the binary representation of the left break point.
  • Point your Split / Merge service to the new range shard map.
  • Use the Split / Merge Marble Operation on the Range Map to move the given pointer from one shard to another.

Let me know how it works. If you have any problems with this - in particular with the increase in the number of pointers - give me a shout at torrent (at) microsoft (dot) com.



Best, Torsten

Here is a code snippet that can help you increase the guid values.

    static void CreateMappings()
    {
        ShardKey guid1 = new ShardKey(new Guid("<yourgui1d>"));
        ShardKey guid2 = new ShardKey(new Guid("<yourguid2>"));

        ShardKey guid1_next = NextShardKeyForGuid(guid1);
        ShardKey guid2_next = NextShardKeyForGuid(guid2);

        _map.CreateRangeMapping(new Range<Guid>(guid1.GetValue<Guid>(), guid1_next.GetValue<Guid>()), _shard1);
        _map.CreateRangeMapping(new Range<Guid>(guid2.GetValue<Guid>(), guid2_next.GetValue<Guid>()), _shard2);
    }

    static ShardKey NextShardKeyForGuid(ShardKey shardkey)
    {
        int len = 16;
        byte[] b = new byte[len];

        shardkey.RawValue.CopyTo(b, 0);

        while (--len >= 0 && ++b[len] == 0) ;

        // Treat overflow if the current key value is the maximum in the domain
        if (len < 0)
        {
            return new ShardKey(ShardKeyType.Guid, null);
        }
        else
        {
            return ShardKey.FromRawValue(ShardKeyType.Guid, b);
        }
    }
}

      

+2


source







All Articles