Loading data from an AmazonDB dynamo at redshift

We have a DynamoDB table in production that is constantly updating, we want to load all records from dynamoDB into redshift.

we tried to use the copy command, but since new records are constantly being inserted into the table, the copy command is executed forever.

We want to know how best to load data from live dynamodb to redshift.

+4


source to share


3 answers


Consider finding a DynamoDB Streams based solution . Streams provides an ordered log of data plane events that occur in each DynamoDB partition (so the events for each primary key are absolutely ordered). You can use the Kinesis client library and the Kinesis Adapter DynamoDB Streams to process the stream in Redshift.



DynamoDB Streams is currently in preview, but should go public soon.

+4


source


You can use the following pattern:

DynamoDB Streams → AWS Lambda → Amazon Kinesis Firehose → Amazon Redshift.

Diagram from AWS article. DynamoDB threads use case studies and design patterns .

See also the answer here, AWS DynamoDB Stream on Redshift .

DynamoDB Streams are essentially the same as Kinesis Data Stream, but they are automatically generated by new / changed data in DynamoDB. This allows applications to be notified when new data is added to a DynamoDB table or when data changes.



Kinesis Data Firehose can automatically output the stream to Redshift (among other destinations).

AWS Lambda can execute code without provisioning or managing servers. You only pay for the time it takes to compute - you are not charged when your code is down. You can run code for almost any type of application or back-end service — all with zero administration.

Lambda is useful for checking data coming in through a stream. For example, it can be used to manipulate the data format or skip data that is not required.

Putting it all together, you can add / modify data in DynamoDB . Doing so can send a DynamoDB stream that contains the change information. AWS lambda function can validate data and manipulate / discard message. It can then push the data to Kinesis Data Firehose to automatically insert the data into Amazon Redshift .

enter image description here

0


source


Try HevoData.com

This might be fine for your use case. It's an easy-to-use tool that doesn't require a lot of code to move.

0


source