Delivering a truly distributed cross cluster transaction manager on Hadoop


So far, Hadoop has been extensively leveraged for Business Intelligence and Analytics workloads.  HBase offers a low latency NoSQL Big Table solution on Hadoop to host operational workloads. But HBase’ s built in atomic operation guarantee is not suitable for many workloads consisting of composite operations. With the advent of transactional SQL-on-HBase solutions, the Hadoop ecosystem offers enterprises a real opportunity to run full-fledged transactional workloads on Hadoop as well.  The motivations behind this paradigm shift include:

  • Realize lower Total Cost of Ownership for applications in terms of both software and specialized hardware
  • Provide schema flexibility for applications needing to deal with dynamic changes
  • Exploit the elastic scalability of Hadoop to meet the high volume and velocity of Big Data
  • Integrate semi-structured and unstructured data hosted on the Hadoop platform to extract additional business value
  • Remove the latency, duplication, and synchronization for data that must be transferred/transformed from operational systems to Hadoop for analytics
  • Offload workloads onto a scalable architecture to reduce the load on mission critical operational systems Host operational, historical, and external Big Data, together with common metadata or reference data in order to provide better insights and implement closed loop real-time analytics on a single platform
  • Leverage the extensive Hadoop ecosystem to integrate data models for a flexible development environment using the best access methods for the varied workloads

This growth of use cases and operational workloads on Hadoop has driven the demand for enterprise class SQL and transactional support on HBase that provides a low latency storage engine, supports schema flexibility and semi-structured data, in combination with relational structured abstractions of data.


To support these enterprise-class transactional workloads, you need a transaction management implementation that supports full ACID (Atomic, Consistent, Isolated, Durable) transactions across multiple rows, tables, and statements for data distributed across many servers.  This distributed transaction manger has to be able to provide capabilities that customers expect from an enterprise class database:

  • Completely distributed transaction management that scales linearly as the cluster scales elastically
  • Full transactional backup and “transactionally” consistent recovery to a point-in-time
  • Disaster Recovery support across data centers with zero loss of transactions
  • Synchronous support across clusters (with conflict resolution) to meet performance and regulatory requirements

EsgynDB’ s extensions to the Apache Trafodion transaction manager meet all of these requirements. If you are building an application that demands enterprise class transaction management as part of the Hadoop ecosystem, check out Apache Trafodion and EsgynDB.


About the Author: