Amazon redshift

We are planning to migrate to amazon-redshift for our datawarehousing solution. We have a requirement to set up an incremental pipeline from mysql to redshift that also handles updates, what is the most efficient methodology to do this?

+3


source to share


6 answers


To sync data with MySQL in RedShift, you can try using AWS Data Pipeline.



0


source


You can use some of the existing solutions on the market, such as http://www.bryte.com.au/solutions/amazon-redshift-integration/ . Otherwise, you will need to implement triggers and the AWS Data Pipeline.



0


source


Option 1: Periodic background. Reading from MySQL tables and writing to Redshift. This is where you will load the MySQL DB with periodic reads. MySQL DB will be slow for other online users at that time.

Option 2: Use Option 1, but better, change the MySQL schema tables for additional flags and columns, and let a multi-threaded background program read from MySQL tables more efficiently over MySQL tables.

Option 3: A cost-effective way, using S3 as the staging area, change the program that writes to MySQL, let that program also write to the S3 location. You can run a custom java program in the background, which will sync periodically from S3 to Redshift. Moving to the Data Pipeline would be an expensive option along with downloading a MySQL based spike.

Option 4: Cloud cloud lighting

Option 5: AWS Data Cable

Option 6: AWS lambda function

0


source


I would suggest keeping everything as simple as possible. If your MySQL-db is small, you can run mysqldump

, load each dump of the table into a staging table, and then INSERT / UPDATE / DELETE the process on the resulting table. If your MySQL db is too large for regular full dumps, you will need to do selective excerpts from the changed data.

We use a hybrid of the two (from SQL Server): sample extracts for huge tables that are added only and full dumps of smaller tables where data is updated. We do this on an hourly basis and can handle hundreds of GB per day with no problem.

Alternatively, you can try some of the commercial ETL tools that claim to "sync" your database with Redshift (starting with Informatica Cloud and Attunity Cloudbeam). We found that these tools were unable to support some of the transformations we needed between live and Redshift.

0


source


You can use one of the following solutions:

0


source


You can use AWS Data Pipeline to do this or use ironBeast . A service to help you transfer data to Redshift and with data servicing after it inside (install expiry, Vacuums, fix stl_load_errors, etc.)

http://www.ironsrc.com/ironbeast

Disclosure: I lead the team that develops this solution.

0


source







All Articles