Managed table vertical table: external table: LOCATION catalog

I go through some books and textbooks on HIV. One of the books - Hadoop in Practice says

When you create an external (unmanaged) table, Hive keeps the data in the directory specified by the LOCATION keyword intact. But if you have to issue the same CREATE command and drop the EXTERNAL keyword, the table will be a managed table and Hive will move the contents of the LOCATION directory to / user / hive / warehouse / stock, which may not match the expected behavior.

I have created a managed table with the LOCATION keyword. And then we loaded the data into the table from the HDFS file. But I couldn't see the directory created under / user / hive / storage. Rather, a new directory has been created at LOCATION. So I think if I create a MANAGED table with LOCATION then nothing is created in the Hive warehouse directory? Is this understanding correct?

Also, if the location of the input file at the time of the LOAD command is hdfs, then the internal or external table will move the data to their location. Is this understanding correct?

+3


source to share


3 answers


In both cases (managed or external) Location is optional, so whenever you specify LOCATION data, it will be stored in the same HDCP LOCATION PATH no matter which table you create (managed or external). And if you are not using LOCATION, the default path specified in the hive-site.xml file is considered.



+1


source


When you create a table with the location keyword, it will point the table to the location. Location defines the path in hdfs for data files.

CREATE EXTERNAL TABLE IF NOT EXISTS mydb.contacts (
  name         STRING ,
  -- ... other variables
  city         STRING ,
LOCATION '/user/hive/warehouse/mydb.db/contacts';

      



When you provide a location, you need to make sure that you place your data files there. In the above example, we are explicitly specifying the hive where the data is in the external table. If we did not specify the table, then the default location should be listed below, and this is true for any table created without a location statement, unless your sysadmin changes the default.

/user/hive/warehouse/databasename.db/contacts

      

0


source


First of all, when you create a keyword managed table

with location

, it does not create a directory with this location

, rather it will give you exception FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:hdfs://path/of/the/given/location is not a directory or unable to create one)

.
This means that the DDL you specified location

first needs a directory to be present, otherwise the above exception will be thrown.
Then you can create a DDL with the above location

.
Then you can use the command select * from <table>

to view the data (without having to load the data).
But when you drop that table, your data is also removed from hdfs (as opposed to external tables), and metadata is also removed.
This is the main difference between a managed table with the keywordlocation

... It behaves partly as an external table and partly as a managed table.
External, like in, you don't need to load data and you just specify the location.
Managed as in, you delete the table, the data is also deleted.
Hope this makes sense.

0


source







All Articles