Reconcile file with table in postgres and determine compliance level
I have a table with 4 columns and it is loaded with data. Data can range from 1000 to a maximum of 2 million.
I am getting a file (say a tab) as part of a daily process with data for 4 columns. I have to prepare a report where for every column match I have to increase the match level.
eg:
The file.col1 data exists in db.col1; then matchlevel = 1
The files file.col2 exist in db.col2; then matchlevel = 2
The file.col3 data exists in db.col3; then matchlevel = 3
The file.col4 data exists in db.col4; then matchlevel = 4
Example of data in the database: (all fields are strings)
1367 37991 11111 sometext1
1365 37993 11112 sometext2
1369 34521 sample1 sometext1
1359 76583 sample2 sometext2
Sample data in file: (all fields are strings)
1367 37991 11111 sometext1
1365 8993 sample3 sometext5
1369 34521 sample4 sometext6
1359 76583 sample2 sometext7
1651 875637 notpresentindb notpresentindb
The way out should be
id from file Match Level
1367 Max (all fields match in db)
1365 Low (only column1 matches
1369 Med (column1 and 2 match)
1359 High (First 3 columns match)
1651 No Match (no columns match)
I am currently doing this Java. Declared 4 arraylists and copied all the data in the columns into the arraylists and used the apache collections to determine if the data of each column exists in each of the 4 arraylists for testing. But this is redflagged because we need to load 4 columns of data with 1 million records and all the server memory can be used.
Sample code below: Arraylist1, Arraylist2, Arraylist3, Arraylist4 are defined before the code below and loaded with data from db.
while ((sCurrentLine = br.readLine()) != null) {
String[] temp;
temp = sCurrentLine.split("\\t");
value = "no match";
valueInt = 0;
if(arraylist1.contains(temp[0])){
value = "low";
int retval=arraylist1.indexOf(temp[0]);
if (arraylist2.get(retval).equals(temp[1])) {
value = "med";
if (arraylist3.get(retval).equals(temp[2])){
value = "High";
if (arraylist4.get(retval).equals(temp[3])) {
value = "Max";
}
}
}
}
source to share
Since Postgresql 9.1 you can use external data wrappers to view files as tables. http://www.postgresql.org/docs/9.4/static/file-fdw.html
Once you see your Csv as a table, you can use the power of SQL to generate a report.
CREATE TABLE t (
id SERIAL PRIMARY KEY,
c1 text,
c2 text,
c3 text
) ;
CREATE EXTENSION file_fdw;
CREATE SERVER my_csv_server FOREIGN DATA WRAPPER file_fdw;
CREATE TABLE csv (
id integer,
c1 text,
c2 text,
c3 text
)
SERVER my_csv_server
OPTIONS ( filename '/home/me/data.csv', format 'csv' );
SELECT
id,
CASE
WHEN t.c1 = csv.c1 AND t.c2 = csv.c2 AND t.c3 = csv.c3 THEN
'Max'
WHEN t.c1 = csv.c1 AND t.c2 = csv.c2 THEN
'High'
WHEN t.c1 = csv.c1 THEN
'Mid'
ELSE
'Low' -- only id match
END as report
FROM
t JOIN csv USING (id)
source to share
If you want to do the processing in the database, you must first download the contents of the file first. I can think of two ways to do this:
- create another table and import the contents of the file into it eg. using command
copy
- use external data wrapper to directly access file contents.
Once you can access the contents of the file via SQL, you can do so using a simple outer join:
select fc.col1,
case
when fc.col1 = bt.col1 and fc.col2 = bt.col2 and fc.col3 = bt.col3 and fc.col4 = bt.col4 then 'Max'
when fc.col1 = bt.col1 and fc.col2 = bt.col2 and fc.col3 = bt.col3 then 'High'
when fc.col1 = bt.col1 and fc.col2 = bt.col2 then 'Med'
when fc.col1 = bt.col1 then 'Low'
else 'No Match'
end as match_level
from file_content fc
left join base_table bt on fc.col1 = bt.col1;
If you also need to identify lines that are in base_table
but not contained in the file, you need full outer join
, not the left join:
select fc.col1,
case
when fc.col1 = bt.col1 and fc.col2 = bt.col2 and fc.col3 = bt.col3 and fc.col4 = bt.col4 then 'Max'
when fc.col1 = bt.col1 and fc.col2 = bt.col2 and fc.col3 = bt.col3 then 'High'
when fc.col1 = bt.col1 and fc.col2 = bt.col2 then 'Med'
when fc.col1 = bt.col1 then 'Low'
else
case when bt.col1 is null then 'Not in database'
else 'Not in file'
end
end as match_level
from file_content fc
full join base_table bt on fc.col1 = bt.col1;
SQLFiddle example: http://sqlfiddle.com/#!15/1ff38/2
source to share