A hive of foreign keys?

I am new to Hive. I've tried searching various sites, but no one was able to give me a clear idea of ​​the following: A> Foreign Keys: General concept Hive never mentions foreign keys. Then, how do we enforce referential constraints? (I am aware of the JOIN ON syntax, which means that the two tables have a primary key: a foreign key relationship?) Is there a higher goal of not supporting foreign keys? B> Equality floats comparison: Seems to be a problem with this. For example, to check if A = 3.5 => "A> 3.49 and A <3.51". Is it correct?

Are there any links / materials that can help with the implementation of HQL?

Appreciate any help

Thanks -Shiree

+3


source to share


4 answers


Hive is implemented as Schema-on-Read, so there is no inherent referential integrity that Hive does on datasets. Instead, integrity must be enforced by the originating system and, more importantly, by any queries that run on Hive.



+5


source


Hive does not currently support FK / PK restrictions.

But this may happen in the future. This gives Hive CBO more information to make better power estimates, better rewrite queries:

https://issues.apache.org/jira/browse/HIVE-13019

https://issues.apache.org/jira/browse/HIVE-6905



In response to Mo K's answer, constraints don't necessarily mean overhead. Oracle, for example, has the "RELY NOVALIDATE" constraint - so the CBO (or Hive CBO in this case) relies on this constraint to optimize queries, but does not validate the constraint.

Edit 02/18/2016: I created https://issues.apache.org/jira/browse/HIVE-13076 , please vote up if you are interested in this feature.

Edit 07/25/2016: https://issues.apache.org/jira/browse/HIVE-13076 resolved from 06/2016, should be landing in Hive 2.1. I don't see any updates in the official documentation yet.

+3


source


Generally, the best practice in a data warehouse is to avoid forced referential integrity to avoid overhead. Therefore, if the need arises, you can enforce it in your requests.

0


source


Support for primary / foreign key constraints is available in Hive 2.1.0. See the 2.1.0 release notes .

0


source







All Articles