Using Google Cloud Dataflow to Merge Flat Files and Import to Cloud SQL
We have to read data from CSV files and match two files one column at a time and transfer data to Cloud SQL using Google Cloud Dataflow.
We can read data from CSV files, but stick to the next steps. Please provide me with information or links regarding the following:
- Concatenate / merge into flat files based on one column or condition with multiple columns
- Copy merged pcollection to Sloud SQL database
+3
source to share
1 answer
Here are some pointers that might be helpful:
- https://cloud.google.com/dataflow/model/joins describes how to combine PCollection into a data stream
- There is currently no built-in receiver for writing to CloudSQL, however you can either just process your connection results using ParDo, which writes every single record or batch (flushing periodically or in finishBundle ()), or if your needs are more complex than that , consider writing a CloudSQL sink - see https://cloud.google.com/dataflow/model/sources-and-sinks
+2
source to share