Return elements of Redshift JSON array on separate lines

Question

Return elements of Redshift JSON array on separate lines

I have a Redshift table that looks like this:

 id | metadata
---------------------------------------------------------------------------
 1  | [{"pet":"dog"},{"country":"uk"}]
 2  | [{"pet":"cat"}]
 3  | []
 4  | [{"country":"germany"},{"education":"masters"},{"country":"belgium"}]

All array elements have only one field.
There is no guarantee that any field will be present in any of the array elements.
The field name can be repeated in an array
Array elements can be in any order

I want to return a table that looks like this:

 id |   field   |  value
------------------------
 1  | pet       | dog
 1  | country   | uk
 2  | pet       | cat
 4  | country   | germany
 4  | education | masters
 4  | country   | belgium

Then I can combine this with my queries in the rest of the input table.

I've tried playing around with Redshift JSON functions, but without being able to write functions / use loops / have variables in Redshift, I really don't see a way to do it!

Please let me know if I can clarify anything else.

+3

json amazon-redshift

Serenthia May 12 '15 at 15:48

source to share

2 answers

There is a universal version for CREATE VIEW seq_0_to_3

. Let's call it CREATE VIEW seq_0_to_n

. This can be generated

CREATE VIEW seq_0_to_n AS (  
    SELECT row_number() over (
                          ORDER BY TRUE)::integer - 1 AS i
    FROM <insert_large_enough_table> LIMIT <number_less_than_table_entries>);

This helps in creating large sequences as a presentation.

+1

Dheeraj MR 02 dec. 16 at 11:33

source to share

Serenthia · Accepted Answer · 2015-05-12T18:26:18+0000

Thanks to this inspired blog post, I was able to develop a solution. It:

Create a lookup table to efficiently iterate over the elements of each array. The number of rows in this table is equal to or greater than the maximum number of array elements. Let's say it's 4 (it can be calculated with SELECT MAX(JSON_ARRAY_LENGTH(metadata)) FROM input_table

):

CREATE VIEW seq_0_to_3 AS
    SELECT 0 AS i UNION ALL                                      
    SELECT 1 UNION ALL
    SELECT 2 UNION ALL    
    SELECT 3          
);

From this, one string per JSON element can be generated:

WITH exploded_array AS (                                                                          
    SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
    FROM input_table, seq_0_to_3 AS seq
    WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
  )
SELECT *
FROM exploded_array;

Production:

 id | json
------------------------------
 1  | {"pet":"dog"}
 1  | {"country":"uk"}
 2  | {"pet":"cat"}
 4  | {"country":"germany"}
 4  | {"education":"masters"}
 4  | {"country":"belgium"}

However, I needed to extract the names / values of the fields. Since I don't see a way to extract the JSON field names using Redshift's restricted functionality, I'll do it using a regex:

WITH exploded_array AS (                                                                                       
    SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
    FROM input_table, seq_0_to_3 AS seq
    WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
)
SELECT id, field, JSON_EXTRACT_PATH_TEXT(json, field)
FROM (
    SELECT id, json, REGEXP_SUBSTR(json, '[^{"]\\w+[^"]') AS field
    FROM exploded_array
);

Return elements of Redshift JSON array on separate lines

More articles: