Automatic key creation in DynamoDb using Django (Python)

I am using NOSql dynamodb for my project. How can I automatically generate a key that I can use for requests?

 DynamoDB_view(tableName, campaign_tag_app_group_map_id, campaign_id,
                      tag_id, tag_type, app_id, group_id, group_p, tenant_id,
                      insertion_timestamp, insertion_user_id):
    print "in func DynamoDB_view"
    def insert_to_dynamo(conn, tableName, campaign_tag_app_group_map_id, campaign_id,
                      tag_id, tag_type, app_id, group_id, group_p, tenant_id,
                      insertion_timestamp, insertion_user_id):
        print "in Insert"
        print tableName
# uswd the data as random key generation just for now. this is inappropriate
        data = str(uuid.uuid4().get_hex()[0:16]) 
        table = conn.get_table(tableName)
        item_data = {
        'campaign_id': str(campaign_id),
        'tag_id': tag_id,
        'tag_type': tag_type,
        'app_id' : app_id,
        'group_id' : str(group_id),
        'group_p' : group_p,
        'tenant_id' : str(tenant_id),
        'insertion_timestamp' : str(datetime.now()),
        'insertion_user_id' : str(insertion_user_id)
        }
        item = table.new_item(
        # Our hash key is 'forum'
        hash_key=data,

        range_key='Check this out!',

        attrs=item_data
        )
        item.put()
    def connection_dynamo(tableName, campaign_tag_app_group_map_id, campaign_id,
                      tag_id, tag_type, app_id, group_id, group_p, tenant_id,
                      insertion_timestamp, insertion_user_id):

        conn = boto.dynamodb.connect_to_region(
        'us-east-1',
        aws_access_key_id=settings.ACCESS_KEY,
        aws_secret_access_key=settings.PASS_KEY)    

        insert_to_dynamo(conn,tableName, campaign_tag_app_group_map_id, campaign_id,
                      tag_id, tag_type, app_id, group_id, group_p, tenant_id,
                      insertion_timestamp, insertion_user_id)


    connection_dynamo(tableName, campaign_tag_app_group_map_id, campaign_id,
                      tag_id, tag_type, app_id, group_id, group_p, tenant_id,
                      insertion_timestamp, insertion_user_id)

      

+3


source to share


1 answer


Here's a link to some docs:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html

For the request, you must supply a hash key and you must check for equality. If you have a range key, this is optional and you can perform a wider variety of operations using them than just equality. For performance, you don't need a hotkey for your hash key (using the same key all the time).

Much of the answer boils down to what you will have in hand when you execute the request and whether you need to worry about this murder or not. Automatically generating something random will save you the hotkey problem, but you won't be able to reproduce those values ​​when you go back to querying your data (even if you always use the same seed for RNGs, your head can explode before you get the hash key which you want). This can lead to a situation where you perform validation instead of asking, which is usually not desirable.

Will you have any campaign_id, group_id, tenant_id, etc. fields available to you at the time of the request? If the answer is yes, you have at least one candidate for your hash key. You still have to think about how much data you will have in the table and how many of them will have the same group_id, for example. If you have both group_id and tenant_id at the time of the request, and there is much more variety in tenant_id values, use them. You can also combine the two IDs to make your own key value if this helps spread the data.



If you only have group_id and you only have a small number of groups, adding some randomness to the end of the group_id to avoid hotkeys won't help you. From a query execution standpoint, you end up in the same situation where you have a bunch of keys that are essentially unrecoverable. In this case, maybe the least painful thing would be to have a table for each group_id, use completely random keys for good distribution, and just accept that your data is forcing you to do a scan.

If you can get a good hash key, your most common queries may dictate your choice of range keys. Therefore, if you typically request records from the past 24 hours, insertion_timestamp might be a good choice. If some other factor comes up in many queries, use that instead, for example, if you are restricting query results to specific campaigns and those campaigns do not have completely random names. Or if you have 3 general queries that depend on different ranges / criteria, then you might want to add some local secondary indexes ( Difference between local and global indexes in DynamoDB ).

To get back to what you might ask, if you have nothing in hand when you submit a data request, then you might get screwed and you might have to do a scan to get your data back. In this case, using as random as possible for the hash key will at least be nice to your records and ensure your data is well distributed.

Sorry, I got some dispassionate, I hope there is something useful. If I have completely misunderstood, or some other unsettled limitation, please edit your question to reflect it.

0


source







All Articles