Both HBase and Cassandra are well-suited to batch-based processing of time-series data, which we covered in our overview of machine learning in text analysis, from sources such as weblogs, IoT devices, and many kinds of statistical data from sectors such as epidemiology, meteorology, and social sciences (population statistics, etc.). Both are also well-suited for non-time-critical traversal and analysis of medical and civic datasets.
Both systems are primarily written in Java, and both have shells derived from the JRuby shell. In terms of security, both architectures provide granular access as necessary. However, each defaults to minimal security standards, relying on Kerberos or other security architectures at the file-system/cluster level. Both HBase and Cassandra initially presume a single-user scenario where these extra security mechanisms would be redundant.
In terms of scalability and DevOps, the two architectures are generally reported to be on a par, though anecdotal evidence suggests that Cassandra's garbage collection routines and slow repair processes can create some DevOps headaches in the long term.
Your choice will obviously be affected by the demands and structure of a particular project. With that in mind, let's look at some specific areas where one system wins out over the other in this Cassandra vs HBase matchup:
Data Consistency: HBase
Cassandra's improved speed over HBase is not inherent or reliable, but is instead a function of its ability to prioritize performance over data integrity. Whether this 'sixth gear' is worth the risk is left to the end user, for whom a higher margin of error may be an insignificant issue (i.e. in calculating broad aggregate values instead of highly granular data points).
OLTP (On-line Transaction Processing): Cassandra
Since Cassandra was designed to prioritize write performance, it is more suited to real-time or near-real-time analytics systems, as well as to an On-line Transaction Processing (OLTP) pipeline. If data consistency is an issue, however, there are some associated penalties (see 'Data Consistency' above).
OLAP (On-line Analytical Processing): HBase
HBase was conceived, along with Hadoop, to enable batch-based processing of historical data—an on-line analytical processing (OLAP) pipeline. HBase is accurate and usually quite fast, considering the vast sums of data it was designed to address. By the time Cassandra has been tuned to be as accurate as HBase for such operations, it no longer has any notable speed advantage.
Scalability and DevOps: HBase
In theory, scaling a cluster up in either HBase or Cassandra is no more complicated than adding more nodes to a cluster. However, HBase does not need to re-rationalize data partitions in order to grow the cluster. It also has a more consistent version history than Cassandra, which has re-tooled some of the most fundamental aspects of its data management systems several times.
User Code Insertion: HBase
HBase offers 'co-processors' that allow user code to run within HBase routines, effectively providing stored procedures and triggers. These features would normally only be available in a relational database model, and Cassandra has no native provision for this.
Data Ingestion: Cassandra
Cassandra's consistent and deliberate focus on faster write speeds means that it can create an initial data store more quickly than HBase.
Supported Programming Languages: Cassandra
HBase supports C, C#, C++, Groovy, Java, PHP, Python, and Scala. Cassandra supports all those, and additionally supports Clojure, Erlang, Go, Haskell, JavaScript, Perl and Ruby.
Documentation and Crowd-sourced Support: Cassandra
Though considered by Apache to be 'a work in progress', Cassandra's documentation is more comprehensive and searchable than the book-like reference guide supplied for HBase.
At the time of writing, Cassandra has more than double the number of questions in Stack Overflow compared to HBase. Whether or not that's a good sign is, of course, open to interpretation!
Internal Security Architecture: HBase
Cassandra provides templated roles with pre-defined levels of user access, similar to popular operating systems. HBase instead allows an administrator to set object-level access rights to users.
Acknowledgement to Jesse Anderson for examples of columnar tables in relational and non-relational databases.