Number of records in a column family in an HBase table

I'm looking for an HBase wrapper command that will count the number of records in a specified column family. I know I can run:

echo "scan 'table_name'" | hbase shell | grep column_family_name | wc -l  

      

however this will run much slower than the standard counting command:

count 'table_name' , CACHE => 50000 (because the use of the CACHE=>50000)  

      

and worse - it doesn't return the actual number of records, but something like the total number of cells (if I'm not mistaken?) in the specified column family. I need something like:

count 'table_name' , CACHE => 50000 , {COLUMNS => 'column_family_name'}

      

Thanks in advance
Michael

+3


source to share


1 answer


Here is the Ruby code I wrote as needed as you need it. Relevant comments are provided. It provides a HBase

shell command count_table

. The first parameter is the name of the table and the second is an array of properties, the same as for the scan

shell command .

Direct answer to your question

count_table 'your.table', { COLUMNS => 'your.family' }

      

I also recommend adding a cache, for example for scanning:



count_table 'your.table', { COLUMNS => 'your.family', CACHE => 10000 }

      

And here you go with sources:

# Argiments are the same as for scan command.
# Examples:
#
# count_table 'test.table', { COLUMNS => 'f:c1' }
# --- Counts f:c1 columsn in 'test_table'.
#
# count_table 'other.table', { COLUMNS => 'f' }
# --- Counts 'f' family rows in 'other.table'.
#
# count_table 'test.table', { CACHE => 1000 }
# --- Count rows with caching.
#
def count_table(tablename, args = {})

    table = @shell.hbase_table(tablename)

    # Run the scanner
    scanner = table._get_scanner(args)

    count = 0
    iter = scanner.iterator

    # Iterate results
    while iter.hasNext
        row = iter.next
        count += 1
    end

    # Return the counter
    return count
end

      

+3


source







All Articles