Azure Data Factory HDI Compliance On Demand

We have an Azure Data Factory with about 40 pipelines in it, each pipeline has an activity to run a hive script using the HDInsight Binding Service on demand.

We've since added a custom metastor that uses an azure SQL instance as a store-related service. The only thing we have set for this is HcatalogLinkedServiceName

in the Factory ARM script data.

Since then we have noticed an increase in outages and our performance has deteriorated (this is possibly due to the Azure SQL tier we used previously, which dived into 100% DTU usage.)

Actions don't work with Error in Activity: The request was aborted: The request was canceled..

This seems to be pretty consistent when using a custom metastor and doesn't seem to be an issue when using the default.

Is there anything else required to set up a custom metastar that we may have missed, or something about our configuration / usage described here that might indicate why we are experiencing a significant increase in crashes with the above error.

HDI OnDemand OS - Linux version 3.2.

We also had more serious bugs, but I think it could be due to the fact that we connected the 3.3 HDI cluster to the same metastore as our 3.2 on demand cluster. This is what this document testifies to.

https://blogs.msdn.microsoft.com/azuredatalake/2017/03/24/hive-metastore-in-hdinsight-tips-tricks-best-practices/

Ensure that the Metastore created for one HDInsight cluster version is not shared across different HDInsight cluster versions. This is due to different Hive versions has different schemas. Example – Hive 1.2 and Hive 2.1 clusters trying to use same Metastore.

This also seems to suggest that there are gaps between 3.2 and 3.3 for hive and hcatalog.

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning

+3


source to share





All Articles