Import log files from Google Cloud Storage to BigQuery

We are developing a project that uploads log files (example below) from logstach to Google Cloud Storage. Then let App Engine import the log data into BigQuery. The problem is

  • BigQuery does not accept filenames like @timestamp generated by logstach in the log file. How can I handle this. Can App Engine do anything to fix this problem?

  • How to define BigQuery schema for nested JSON (geoip :)?

    {"UUID": "8806ceef34123122cdd009063f301a34158252f53b9a7d3147639fb71f68b585", "item_id": 1234, "member_id": 1234, "admin_id": 0, "cate_id": 131, "3 note" listing_status ": "," txn_type ":" edit "," ip_address ":" 13.89.42.18 "," email ":" xxxx@gmail.com "," post_name ":" "," user_agent ":" COM Mozilla / 5.0 (Windows NT 6.1; rv: 31.0) Gecko / 20100101 Firefox / 31.0 "," timestamp ":" 2014-08-22 06:38:53 "," http_host ":" EA1-ZoneS1 ", " @ version " :" 1 " , " @label": "2014-08-21T23: 38: 59.737Z","type": "Redis", "ua.name": "Firefox", "ua.os": "Windows 7", "ua.os_name": "Windows 7", "ua.device": "Other", "ua.major": "31", "ua.minor": "0","GeoIP": {"f": "13.89.42.18", "country_code2": "XX", "country_code3": "XXX", "COUNTRY_NAME": "XXXXXXX", "continent_code": "AS" "REGION_NAME": "40", "city_name": "XXXX", "latitude": 123.45, "longitude": 123.45, "time zone": "Asia / Bangkok", "real_region_name": "XXXXXX", "location" : [123.45,123.45]} }

Wish I am a beginner. I could not add an image.

Please give me a suggestion

Thank.

+3


source to share


1 answer


1) You cannot use symbols @

or .

in names. You will need to remove them by running the data through something like this.

line = line.replace("@", "_")    
line = line.replace("ua.", "ua_")

      



2) You may need to change some types, but I was able to load your example data (with the above changes) using this JSON constructor:

[{
    "name": "uuid",
    "type": "STRING"
}, {
    "name": "item_id",
    "type": "INTEGER"
}, {
    "name": "member_id",
    "type": "INTEGER"
}, {
    "name": "admin_id",
    "type": "INTEGER"
}, {
    "name": "cate_id",
    "type": "INTEGER"
}, {
    "name": "listing_status",
    "type": "INTEGER"
}, {
    "name": "monitor_status",
    "type": "INTEGER"
}, {
    "name": "note",
    "type": "STRING"
}, {
    "name": "txn_type",
    "type": "STRING"
}, {
    "name": "ip_address",
    "type": "STRING"
}, {
    "name": "email",
    "type": "STRING"
}, {
    "name": "post_name",
    "type": "STRING"
}, {
    "name": "user_agent",
    "type": "STRING"
}, {
    "name": "timestamp",
    "type": "TIMESTAMP"
}, {
    "name": "http_host",
    "type": "STRING"
}, {
    "name": "_version",
    "type": "STRING"
}, {
    "name": "_timestamp",
    "type": "TIMESTAMP"
}, {
    "name": "type",
    "type": "STRING"
}, {
    "name": "ua_name",
    "type": "STRING"
}, {
    "name": "ua_os",
    "type": "STRING"
}, {
    "name": "ua_os_name",
    "type": "STRING"
}, {
    "name": "ua_device",
    "type": "STRING"
}, {
    "name": "ua_major",
    "type": "STRING"
}, {
    "name": "ua_minor",
    "type": "STRING"
}, {
    "name": "geoip",
    "type": "RECORD",
    "fields": [{
        "name": "ip",
        "type": "STRING"
    }, {
        "name": "country_code2",
        "type": "STRING"
    }, {
        "name": "country_code3",
        "type": "STRING"
    }, {
        "name": "country_name",
        "type": "STRING"
    }, {
        "name": "continent_code",
        "type": "STRING"
    }, {
        "name": "region_name",
        "type": "STRING"
    }, {
        "name": "city_name",
        "type": "STRING"
    }, {
        "name": "latitude",
        "type": "FLOAT"
    }, {
        "name": "longitude",
        "type": "FLOAT"
    }, {
        "name": "timezone",
        "type": "STRING"
    }, {
        "name": "real_region_name",
        "type": "STRING"
    }, {
        "name": "location",
        "type": "FLOAT",
        "mode": "REPEATED"
    }]
}]

      

+5


source







All Articles