Question with Hive Serde enclosing nested structures

I am trying to load huge json data with nested structure to hive using Json serde. some field names start with $

in a nested structure. I am matching the names of the files stored in the hives using SerDeproperties

, but as soon as when I query the table, getting a null in the field starting with $

, tried with a different syntax, but no luck.

JSON example:

{
    "_id" : "319FFE15FF90",
    "SomeThing" : 
    {
            "$SomeField"     : 22,
            "AnotherField"   : 2112,
            "YetAnotherField":    1
    }
 . . . etc . . . .

      

Using the schema as follows:

create table testSample
( 
    `_id` string, 
    something struct
    <
        $somefield:int,
        anotherfield:bigint, 
        yetanotherfield:int
    >
) 
row format serde 'org.openx.data.jsonserde.JsonSerDe' 
with serdeproperties
(
    "mapping.somefield" = "$somefield"
);

      

This schema builds OK, however somefield (starting with $

) in the above table always returns null (all other values ​​exist and are correct).

We have tried many syntactic combinations, but to no avail.

Does anyone know a trick to get a nested field with a leading $

in its name?

+1


source to share


2 answers


You almost understood everything. Try to create a table like this. The mistake you are making is that when mapping in properties serde (mapping.somefield = "$ somefield") you say "when looking for a hive column named" somefield "look for json field" $ somefield "but in the hive you have identified a dollar sign column which, if not outright illegal, is most certainly not best hive practice.

create table testSample
(
`_id` string,
something struct
<
    somefield:int,
    anotherfield:bigint,
    yetanotherfield:int
  >
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties
(
"mapping.somefield" = "$somefield"
);

      



I tested it with some test data:

{ "_id" : "123", "something": { "$somefield": 12, "anotherfield":13,"yetanotherfield":100}}
hive> select something.somefield from testSample;
OK
12

      

+1


source


I'm starting to see this problem too, but also for regular column names (no special characters like $)

I am populating an external table (Temp) from another internal table (Table 2) and want the output of the Temp table in JSON format. I want camel-case column names in the output JSON file, so I also use Serdepoperties on the Temp table to point to the correct names. However, I see that when I do Select * from the Temp table, it gives NULL values ​​for the columns whose names were used in the mapping.

I am running Hive 0.13. Here are the commands:

Create table command:

CREATE EXTERNAL TABLE Temp (
    data STRUCT<
        customerId:BIGINT, region:STRING, marketplaceId:INT, asin:ARRAY<STRING>>
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
    'mapping.customerid' = 'customerId',
    'mapping.marketplaceid' = 'marketplaceId'
) 
LOCATION '/output'; 

INSERT INTO TABLE Temp
    SELECT 
        named_struct ('customerId',customerId, 'region', region, 'marketplaceId', marketplaceId, 'asin', asin) 
    FROM Table2;

      

Select * from Temp:

{"customerid":null,"region":"EU","marketplaceid":null,"asin":["B000FC1PZC"]}
{"customerid":null,"region":"EU","marketplaceid":null,"asin":["B000FC1C9G"]}

      



See how "customerid" and "marketplaceid" are null. Generated JSON file:

{"data":{"region":"EU","asin":["B000FC1PZC"]}}
{"data":{"region":"EU","asin":["B000FC1C9G"]}}

      

Now if I delete with serdeproperties the table starts getting all values:

{"customerid":1,"region":"EU","marketplaceid":4,"asin":["B000FC1PZC"]}
{"customerid":2,"region":"EU","marketplaceid":4,"asin":["B000FC1C9G"]}

      

And then the JSON file created in this way:

{"data":{"region":"EU","marketplaceid":4,"asin":["B000FC1PZC"],"customerid":1}}
{"data":{"region":"EU","marketplaceid":4,"asin":["B000FC1C9G"],"customerid":2}}

      

0


source







All Articles