Validate Sqoop using QUERY and WHERE clauses

I am streamlining a data import process that takes data from an existing database and splits it into an HDFS schema. By default, the job is split into four map processes and I now have a job configured for a daily interval via Apache Oozie.

Since Oozie is DAG oriented, is it possible to create a validationStep in the Oozie workflow so that:

  • Run HIVE query on newly imported data to return row count
  • Run SQL Query to Return Count of Rows in Original Data Source
  • Compare two values
  • If they don't match, return FAIL and KILL JOB, if they match, return TRUE and OK

I understand that there is a validation process in sqoop, but I understand that since I am not doing this against a single table, this does not apply (each of my sqoop imports are split on a specific date).

Is it possible?

+3
sql hadoop hdfs oozie


source to share


No one has answered this question yet

See similar questions:

0
How to perform Sqoop import validation using free form query option

or similar:

862
INNER JOIN to WHERE clause
1
Oozie script action containing Sqoop import failing
1
Oozie and sqoop configuration issues
1
Oozie workflow for sqoop import not working in Amazon emr hue
0
Oozi Squop
0
oozie sqoop action with hive import
0
How to make sqoop work from oozie?
0
Sqoop - importing hive using Oozie failed
0
Failed to load HiveConf into Sqoop workflow in Oozie
0
Can't do sqoop job in oozie (no enum constant com.cloudera.sqoop.SqoopOptions.FileLayout.ParquetFile)



All Articles
Loading...
X
Show
Funny
Dev
Pics