Auto detect BigQuery schema in data stream?

Can the equivalent be used --autodetect

in DataFlow?

i.e. can we load data into a BQ table without specifying a schema equivalent to how we can load data from CSV using --autodetect

?

( potentially related question )

+3


source to share


2 answers


If you are using protocol buffers as objects in your PCollections (which should perform well on Dataflow streaming), you may be able to use a utility I wrote in the past. It will parse the protobuffer schema into BigQuery schema at runtime based on the validation of the protobuffer descriptor.

I quickly uploaded it to GitHub , it's a WIP, but you could use it or inspire to write something like this using Java Reflection (I might do this myself at some point).

You can use the utility like this:



TableSchema schema = ProtobufUtils.makeTableSchema(ProtobufClass.getDescriptor());
enhanced_events.apply(BigQueryIO.Write.to(tableToWrite).withSchema(schema)
            .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
            .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));

      

where the generated location will create a table with the specified schema and ProtobufClass is a class generated using your Protobuf schema and proto compiler.

+4


source


I'm not sure about reading from BQ, but for records, I think something like this will work with the latest java SDK.



.apply("WriteBigQuery", BigQueryIO.Write
    .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
    .to(outputTableName));


Note: BigQuery Table must be of the form: <project_name>:<dataset_name>.<table_name>.

      

0


source







All Articles