Loading csv data into hbase table in multiple columns using tray

Spool directory CSV File format: sample.csv

8600000US00601,00601,006015-DigitZCTA,0063-DigitZCTA,11102
8600000US00602,00602,006025-DigitZCTA,0063-DigitZCTA,12869
8600000US00603,00603,006035-DigitZCTA,0063-DigitZCTA,12423
8600000US00604,00604,006045-DigitZCTA,0063-DigitZCTA,33548
8600000US00606,00606,006065-DigitZCTA,0063-DigitZCTA,10603

      

My Flume.Conf code:

agent.sources  = spool
agent.channels = fileChannel2
agent.sinks    = sink2

agent.sources.spool.type = spooldir
agent.sources.spool.spoolDir = /home/cloudera/cloudera
agent.sources.spool.fileSuffix = .completed
agent.sources.spool.channels = fileChannel2
#agent.sources.spool.deletePolicy = immediate

agent.sinks.sink2.type = org.apache.flume.sink.hbase.HBaseSink
agent.sinks.sink2.channel = fileChannel2
agent.sinks.sink2.table = sample
agent.sinks.sink2.columnFamily = s1
agent.sinks.sink2.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.sink1.serializer.regex = ^([^,]+),([^,]+),([^,]+),([^,]+)$
#agent.sinks.sink2.serializer.regexIgnoreCase = true
agent.sinks.sink1.serializer.colNames =col1,col2,col3,col4
agent.sinks.sink2.batchSize = 100
agent.channels.fileChannel2.type=memory

      

I can load data into one column using tray, but cannot load it into multiple columns with regex, any help so that I can load it into multiple columns in hbase. Thank.

+3


source to share


2 answers


I got my answer. There is a regex problem in my above code.

I decided to fix it the regex code.



agent.sources  = spool
agent.channels = fileChannel2
agent.sinks    = sink2

agent.sources.spool.type = spooldir
agent.sources.spool.spoolDir = /home/cloudera/cloudera

#agent.sources.spool.type = exec
#agent.sources.spool.command = tail -F /home/cloudera/cloudera/data.csv
agent.sources.spool.fileSuffix = .completed
agent.sources.spool.channels = fileChannel2
#agent.sources.spool.deletePolicy = immediate

agent.sinks.sink2.type = org.apache.flume.sink.hbase.HBaseSink
agent.sinks.sink2.channel =fileChannel2
agent.sinks.sink2.table =sample
agent.sinks.sink2.columnFamily=s1
agent.sinks.sink2.serializer =org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.sink2.serializer.regex =(.+),(.+),(.+),(.+),(.+)
agent.sinks.sink2.serializer.rowKeyIndex  = 0
agent.sinks.sink2.serializer.colNames =ROW_KEY,col2,col3,col4,col5
agent.sinks.sink2.serializer.regexIgnoreCase = true
agent.channels.fileChannel2.type = FILE 
agent.sinks.sink2.batchSize =100
agent.channels.fileChannel2.type=memory

      

0


source


something like this works for me:

agent.sinks.s1.type = hbase 
agent.sinks.s1.table = test
agent.sinks.s1.columnFamily = r 
agent.sinks.s1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.s1.serializer.rowKeyIndex = 0 
agent.sinks.s1.serializer.regex = ^(\\S+),(\\d+),(\\d+),(\\d)$
agent.sinks.s1.serializer.colNames = ROW_KEY,r:colA,r:colB,r:colC

      

And if you want to specify a rowkey instead of a random one, you can use:



agent.sinks.s1.serializer.rowKeyIndex = 0 
agent.sinks.s1.serializer.colNames = ROW_KEY,r:colA,r:colB,r:colC

      

Here is a link if you want more flexibility. http://www.rittmanmead.com/2014/05/trickle-feeding-log-data-into-hbase-using-flume/

In short, the problem is that the regex expression is not correct.

0


source







All Articles