2.10.12

HAcid: multi-row transactions in HBase

HAcid is a client library for HBase applications to enable the support for multi-row ACID transactions in HBase. The isolation level of these transactions is Snapshot Isolation, but Serializability is currently supported  as an unstable feature.

HAcid open-source repo at bitbucket.org 


This is the end result of my Master's Thesis at Aalto University. It was inspired by Google Percolator, but is still different in many ways. While Percolator uses locks for controlling concurrency, HAcid is lock-free because it employs optimistic concurrency control.

Using HAcid is straightforward:
import fi.aalto.hacid.*;

Configuration conf = HBaseConfiguration.create();
HAcidClient client = new HAcidClient(conf);

// To enable transactions, use this wrapper instead of HTable
HAcidTable table = new HAcidTable(conf, "mytable");

// Start a new transaction
HAcidTxn txn = new HAcidTxn(client);

// Use HAcidGet instead of HBase's Get
HAcidGet g = new HAcidGet(table, Bytes.toBytes("row1"));
g.addColumn(Bytes.toBytes("fam1"), Bytes.toBytes("qual1"));
Result r = txn.get(g);

// Use HAcidPut instead of Put
HAcidPut p = new HAcidPut(table, Bytes.toBytes("row1"));
p.add(Bytes.toBytes("fam1"), 
      Bytes.toBytes("qual1"), 
      Bytes.toBytes("value"));
txn.put(p);

// Commit the transaction
boolean outcome = txn.commit(); 
// true is "committed", false "aborted"

The algorithms in the HAcid library rely heavily on HBase's single-row transactions, in particular CheckAndPut. One of the tricks employed was explained on this blog already. My Thesis is a complete documentation of the system.

The license is Apache 2.0.
Feel free to comment, ask, fork at bitbucket, etc.

6 comments:

  1. I notice that there's no HTablePool support. What's the best option for pooling? Do I need to create my own HAcidTablePool using the Apache Commons Pool or something similar?

    ReplyDelete
    Replies
    1. Hi Mat, good to know that you are using HAcid.

      There is indeed no HTablePool support, I had not predicted that use.
      As far as I see, you don't need to implement something like HAcidTablePool. You can implement your own factory class through the interface HTableInterfaceFactory. See this book chapter http://my.safaribooksonline.com/book/databases/database-design/9781449314682/4dot-client-api-advanced-features/id3591501

      You also need to implement the interface HTableInterface, for instance extending or modifying HAcidTable. In fact, HAcidTable is a thin wrapper over HTable. https://bitbucket.org/staltz/hacid/src/62c4589532cee23fc4a088f197656e28774bcef9/src/fi/aalto/hacid/HAcidTable.java?at=default

      The book chapter seems like a good resource to implement this. Let me know if you run into problems.

      Delete
  2. Implementing this is proving more awkward than it should be. I could for example just modify your code and put this Lombok annotation on htable in the HAcidTable class and make HAcidTable implement HTableInterface but I'd really rather just use your code as is. Would you be able to make this change to make it easier to use your code?

    @Delegate(types=HTableInterface.class)
    protected HTable htable;

    http://projectlombok.org/features/Delegate.html

    (If you prefer to not use Lombok in your source for other users, you can use de-Lombok to convert @Delegate into all the relevant methods)

    ReplyDelete
  3. Hi Mat,

    Thanks for the suggestion. Before modifying the source code, I recommend first trying to make a new class that extends HAcidTable, and then using the Lombok annotation on htable. Should work because htable is a protected variable.

    Let me know if this approach solves your problem.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete