Hive ParseException on tab table statement
I am using python and pyobbc module specifically for making Hive queries on Hadoop. Part of the code running problem looks like this:
import pyodbc
import pandas
oConnexionString = 'Driver={ClouderaHive};[...]'
oConnexion = pyodbc.connect(oConnexionString, autocommit=True)
oConnexion.setencoding(encoding='utf-8')
oQueryParameter = "select * from my_db.my_table;"
oParameterData = pandas.read_sql(oQueryParameter, oConnexion)
oCursor = oConnexion.cursor()
for oRow in oParameterData.index:
sTableName = oParameterData.loc[oRow,'TableName']
oQueryDeleteTable = 'drop table if exists my_db.' + sTableName + ';'
print(oQueryDeleteTable)
oCursor.execute(oQueryDeleteTable)
The fingerprint gives the following: drop table if exists dl_audit_data_quality.hero_context_start_gamemode;
But it cursor.execute
runs the following error message
pyodbc.Error: ('HY000', "[HY000] [Cloudera] [HiveODBC] (80) Syntax or semantic parsing error caused by the server when executing an execurint request. error message from the server: Error compiling statement: FAILED: ParseException line 1 : 44 character '(80) (SQLExecDirectW) ")
Note that when I copy the print and do it manually in Hue, it works well. I assume it has something to do with the encoding of the variable sTableName
, but I can't figure out how to fix it.
thank
source to share
The request did not work due to incorrect encoding of the variable sTableName
. Printing only the variable will result in the text being displayed correctly. Example with print above:
>>> print(oQueryDeleteTable)
>>> 'drop table if exists dl_audit_data_quality.hero_context_start_gamemode;'
But printing the original data frame showed that it contains characters like this:
>>> print(oParameterData.loc[oRow,'TableName']
>>> 'h\x00e\x00r\x00o\x00_c\x00o\x00n\x00t\x00e\x00x\x00t\x00'
The issue was solved by reworking to encoding as described here: Python dictionary contains encoded values
import pyodbc
import pandas
oConnexionString = 'Driver={ClouderaHive};[...]'
oConnexion = pyodbc.connect(oConnexionString, autocommit=True)
oConnexion.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
oConnexion.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
oConnexion.setencoding(encoding='utf-8')
oQueryParameter = "select * from my_db.my_table;"
oParameterData = pandas.read_sql(oQueryParameter, oConnexion)
oCursor = oConnexion.cursor()
for oRow in oParameterData.index:
sTableName = oParameterData.loc[oRow,'TableName']
oQueryDeleteTable = 'drop table if exists my_db.' + sTableName + ';'
print(oQueryDeleteTable)
oCursor.execute(oQueryDeleteTable)
source to share