HBase – Quick Guide to key commands
If you are working in Big Data space, soon you would found yourself working with a NoSql database. HBase has a huge chunk of users and fits in most of the NoSql use cases. So, in case if you are getting started what HBase in production the following HBase commands will come in handy.
Create a employee HBase table with column families “personal” and “professonal” and follow this walkthrough.
Get by Row Key:
get ’emp’, ‘52568’
To get row of a corresponding row key.
Row Prefix Filter
scan ’emp’, {FILTER =”(PrefixFilter(32))”}
Row Prefix Filter returns all the rows which starts with the specified row prefix in the PrefixFilter, The above HBase command will return the rows which starts with 32.
SingleColumValue Filter
scan ’emp’ { COLUMNS=>[“personal:name”,”personal:age”],FILTER =>”(SingleColumnValueFilter(‘professional’,’desig’,=,’substring:engineer’))” }
To get columns based on a value in a particular cell use SingleColumnValueFilter. The SingleColumnValueFilter needs column family, qualifier, compare operator and a comparator. The above command will return name and age from HBase column emp where the employees designation is like engineer.
Row Count for particular time range
hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> –starttime=[start] –endtime=[end]
HBase will launch a mapreduce job to get count the no of rows for the specified time range.
List Regions
list_regions ’emp’
List all the regions of an particular table.
Get Row Key based on pattern
scan ‘emp’, {FILTER => “RowFilter(=, ‘regexstring:^1.*’)”}
RowFilter along with regex string comparator will help to fetch the rows based on a particular regex pattern. In the above example the row Filter returns all the rows where the row key starts with 1.
Get the records in last 15 mins
cTime = System.currentTimeMillis()
sTime = System.currentTimeMillis() -15 *60*1000
scan ‘EMP’, { TIMERANGE => [sTime, cTime] }
To get the rows which are inserted in the last 15 (or any other time range) use the TimeRange filter along with the time range.
Get Specific columns based on value of any two columns
scan ’emp’ {COLUMNS=>[“personal:name”,”personal:age”],FILTER =>”(SingleColumnValueFilter(‘professional’,’desig’,=,’substring:engineer’) AND SingleColumnValueFilter(‘professional’,’exp’,=,’binary:5′))”}
Combine filters to filter out the specific records. This query will return name and age columns from HBase table emp where the designation is like engineer and experience is 5 years.
Joins
There may arise a scenario that one may need to join two HBase tables. The best solution in this case is to create an Table in Hive or Impala over the HBase tables and perform the joins on Hive/Impala tables.
Please refer this link for HBase – Hive Integration
Comments