Cannot insert new value into BigQuery table after updating with new column using streaming API

Question

Cannot insert new value into BigQuery table after updating with new column using streaming API

I see strange behavior with my bigquery table, I just created a new column added to the table, it looks good on the frontend and gets the schema via api.

But when adding a value to a new column, I get the following error:

{
  "insertErrors" : [ {
    "errors" : [ {
      "message" : "no such field",
      "reason" : "invalid"
    } ],
    "index" : 0
  } ],
  "kind" : "bigquery#tableDataInsertAllResponse"
}

I am using java client and streaming API, the only thing I added is:

tableRow.set ("server_timestamp", 0)

Without this line, it works correctly :(

You see something wrong with it (the column name is server_timestamp and it is defined as INTEGER)

+3

google-bigquery

iamedu 13 Aug 14 at 6:06

source to share

2 answers

I ran into this error. It turned out that I was building the insert object as if I was in "raw" mode, but forgot to set the flag raw: true

. This made bigQuery take my insert data and insert it again under json: {}

node.

In other words, I did this:

table.insert({
    insertId: 123,
    json: {
        col1: '1',
        col2: '2',
    }
});

when i had to do this:

table.insert({
    insertId: 123,
    json: {
        col1: '1',
        col2: '2',
    }
}, {raw: true});

the bigquery node library didn't realize it was already in mode raw

and then tried to insert this:

{
    insertId: '<generated value>',
    json: {
        insertId: 123,
        json: {
            col1: '1',
            col2: '2',
     }
}

So, in my case, the errors were referring to the insert expected my schema to have 2 columns in it (insertId and json).

+2

Christopher fitzner 16 Mar At 8:04 am

source to share

shollyman · Accepted Answer · 2014-08-13T16:59:27+0000

An update to this answer as the BigQuery streaming system has seen significant updates since August 2014 when this question was answered.

BigQuery streaming system caches the table schema for 2 minutes. When you add a field to the schema and immediately submit new rows to the table, you might encounter this error.

The best way to avoid this error is to defer streaming rows with a new field for 2 minutes after changing the table.

If this is not possible, you have a few more options:

Use the parameter ignoreUnknownValues

. This flag tells the insert operation to ignore unknown fields, and only accepts fields that it recognizes. Setting this flag allows you to immediately start streaming records with the new field, while avoiding the "no such field" error for a 2 minute window, but note that the new field values will be silently omitted until updated schema of cached tables!
skipInvalidRows

. , , , , . , , , ( ignoreUnknownValues

, ).

If you have to capture all the values and cannot wait for 2 minutes, you can create a new table with the updated schema and stream to that table. The downside to this approach is that you need to manage multiple tables generated by this approach. Note that you can easily query these tables with TABLE_QUERY

, and you can run periodic cleanup queries (or table copies) to combine your data into a single table.

Historical note. A previous version of this answer suggested stopping streaming, moving existing data to another table, re-creating the streaming table, and restarting streaming. However, due to the complexity of this approach and the shorter window for the schema cache, this approach is no longer recommended by the BigQuery team.

Cannot insert new value into BigQuery table after updating with new column using streaming API

More articles: