Validate Sqoop using QUERY and WHERE clauses

I am streamlining a data import process that takes data from an existing database and splits it into an HDFS schema. By default, the job is split into four map processes and I now have a job configured for a daily interval via Apache Oozie.

Since Oozie is DAG oriented, is it possible to create a validationStep in the Oozie workflow so that:

  • Run HIVE query on newly imported data to return row count
  • Run SQL Query to Return Count of Rows in Original Data Source
  • Compare two values
  • If they don't match, return FAIL and KILL JOB, if they match, return TRUE and OK

I understand that there is a validation process in sqoop, but I understand that since I am not doing this against a single table, this does not apply (each of my sqoop imports are split on a specific date).

Is it possible?

+3


source to share





All Articles