Save the chimney outlet to the beehive table with a hive sink
I am trying to set up a tray using Hive to save the tray output to a beehive table using the Hive Sink type. I have one cluster node. I am using mapr hadoop distribution.
Here is my flume.conf
agent1.sources = source1
agent1.channels = channel1
agent1.sinks = sink1
agent1.sources.source1.type = exec
agent1.sources.source1.command = cat /home/andrey/flume_test.data
agent1.sinks.sink1.type = hive
agent1.sinks.sink1.channel = channel1
agent1.sinks.sink1.hive.metastore = thrift://127.0.0.1:9083
agent1.sinks.sink1.hive.database = default
agent1.sinks.sink1.hive.table = flume_test
agent1.sinks.sink1.useLocalTimeStamp = false
agent1.sinks.sink1.round = true
agent1.sinks.sink1.roundValue = 10
agent1.sinks.sink1.roundUnit = minute
agent1.sinks.sink1.serializer = DELIMITED
agent1.sinks.sink1.serializer.delimiter = ","
agent1.sinks.sink1.serializer.serdeSeparator = ','
agent1.sinks.sink1.serializer.fieldnames = id,message
agent1.channels.channel1.type = FILE
agent1.channels.channel1.transactionCapacity = 1000000
agent1.channels.channel1.checkpointInterval 30000
agent1.channels.channel1.maxFileSize = 2146435071
agent1.channels.channel1.capacity 10000000
agent1.sources.source1.channels = channel1
My data is flume_test.data p>
1,AAAAAAAA 2,BBBBBBB 3,CCCCCCCC 4,DDDDDD 5,EEEEEEE 6,FFFFFFFFFFF 7,GGGGGG 8,HHHHHHH 9,IIIIII 10,JJJJJJ 11,KKKKKK 12,LLLLLLLL 13,MMMMMMMMM 14,NNNNNNNNN 15,OOOOOOOO 16,PPPPPPPPPP 17,QQQQQQQ 18,RRRRRRR 19,SSSSSSSS
This is how I create my table in Hive
create table flume_test(id string, message string)
clustered by (message) into 1 buckets
STORED AS ORC tblproperties ("orc.compress"="NONE");
When I only use 1 bucket, select * from the flume_test command in the hive shell returns me only "OK" status, no data. If I use more than 1 bucket it gives me error messages back.
Errors, for example, with five cells after the beehive table table:
hive> select * from flume_test;
OK
2015-06-18 10:04:57,6909 ERROR Client fs/client/fileclient/cc/client.cc:1385 Thread: 10141 Open failed for file /user/hive/warehouse/flume_test/delta_0004401_0004500/bucket_00, LookupFid error No such file or directory(2)
2015-06-18 10:04:57,6941 ERROR Client fs/client/fileclient/cc/client.cc:1385 Thread: 10141 Open failed for file /user/hive/warehouse/flume_test/delta_0004401_0004500/bucket_00, LookupFid error No such file or directory(2)
2015-06-18 10:04:57,6976 ERROR Client fs/client/fileclient/cc/client.cc:1385 Thread: 10141 Open failed for file /user/hive/warehouse/flume_test/delta_0004401_0004500/bucket_00, LookupFid error No such file or directory(2)
2015-06-18 10:04:57,7044 ERROR Client fs/client/fileclient/cc/client.cc:1385 Thread: 10141 Open failed for file /user/hive/warehouse/flume_test/delta_0004401_0004500/bucket_00, LookupFid error No such file or directory(2)
Time taken: 0.914 seconds
The flume table data is stored in the / user / hive / storage / flume_test directory and it is not empty.
-rwxr-xr-x 3 andrey andrey 4 2015-06-17 16:28 /user/hive/warehouse/flume_test/_orc_acid_version
drwxr-xr-x - andrey andrey 2 2015-06-17 16:28 /user/hive/warehouse/flume_test/delta_0004301_0004400
the delta directory contains
-rw-r--r-- 3 andrey andrey 991 2015-06-17 16:28 /user/hive/warehouse/flume_test/delta_0004301_0004400/bucket_00000
-rwxr-xr-x 3 andrey andrey 8 2015-06-17 16:28 /user/hive/warehouse/flume_test/delta_0004301_0004400/bucket_00000_flush_length
I cannot read the file / read / user / hive / warehouse / flume_test / delta_0004301_0004400 / bucket_00000 orc even with the help of a pig.
Also I tried to set these vars after creating the table in the hive, but it didn't work.
set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on = true;
set hive.compactor.worker.threads = 2;
I found some examples on the internet but they are not complete and I am new to flume so I cannot understand them)
source to share