- This piece was born when I am reading HBase-The Definitive Guide. Don’t reprint this article in any forms without permission.
- Remote Procedure Call（RPC） of Hbase is time-consuming. To make it work better, HBase provides serializable writer buffer in put interface for user. Without any doubt, write buffer populates at client.
- Get can be able to read multi-line entries as well as single-line one.
- Complete process call for get:
- getValue operation cannot set the timestamp. Instead, it can get the newest one version from database.
- Get and put is based on Hadoop Distribute File System. they are no more than encapsulation.
- Put operation has atomicity.
- Delete support deleting with filters on row, column, column family and versions.
- HBase offers users with batch operation on data. It’s implemented by batch class. More details:
- Row lock ensures that corresponding row can only be updated by the client who owns a row lock. However, put, delete and CheckAndPut don’t intrigue row lock.
- If you are not using the old version of get, the row key will not be allocated. Instead, concurrency control protocol ensures the atomicity. PS. if there are rows which are under updating, get can only get access to the old ones
- Scan can read the start line but last one.
- Scan gets rows one by one. Due to RPC, it’s very time-consuming. So when a large number of rows are required, it doesn’t perform well.
- Example for ResultScanner:
ScannerCatchingis down, next operation requires data by line, which means that RPC will be frequently awoken. Therefore, start
ScannerCatchingto improve the performance.
- If one row contains too much information. You may need to split it into several regions. You can decide how many columns each region have. The last region may not contain as many columns as other region when the column amount cannot be devided by the set number.