Apache HBase is a distributed, scalable big data store built on HDFS and cloud object storage.
This is a multi-module Maven project. Modules live in arbitrarily nested
folders; enumerate them by searching for pom.xml files (excluding target/
directories). The root pom.xml defines the full reactor and build order.
Note that some directories from removed or merged modules (e.g.,
hbase-hadoop2-compat/, hbase-protocol/, hbase-rsgroup/) may still exist
as empty shells with only target/ remnants. If a directory has no pom.xml,
it is not part of the active build.
The fundamental divide in this codebase is client-side vs. server-side, with several modules shared between them.
hbase-client-- The client library. Builds RPC requests, handles retries, manages connections. This is the public API that external consumers depend on.hbase-server-- RegionServer and Master implementations. Processes RPCs, manages regions, stores data. The largest module by far.- Shared modules like
hbase-common,hbase-protocol-shaded, andhbase-metrics-apiare dependencies of both sides.
When orienting on unfamiliar code, first determine which side of this divide you are on.
Core data path:
hbase-client -> hbase-server (via protobuf RPCs defined in
hbase-protocol-shaded)
Gateways (alternative client entry points):
hbase-rest (HTTP/JSON), hbase-thrift (Thrift RPC)
Coprocessors are HBase's server-side extension framework. They allow custom
code to run inside RegionServer and Master processes, with the same privileges
as the host process. The base Coprocessor interface lives in hbase-client;
observer and endpoint interfaces (RegionObserver, MasterObserver, etc.) live
in hbase-server. Endpoint implementations live in hbase-endpoint. The
built-in AccessController coprocessor enforces ACLs; VisibilityController
enforces cell-level visibility labels. Third-party coprocessors are loaded via
configuration or table schema.
Server subsystems (separated from hbase-server for modularity):
hbase-balancer, hbase-procedure, hbase-replication, hbase-asyncfs,
hbase-zookeeper, hbase-http
Shared libraries:
hbase-common, hbase-metrics + hbase-metrics-api, hbase-logging,
hbase-hadoop-compat
Extensions:
hbase-extensions (currently hbase-openssl for native TLS support)
Storage codecs:
hbase-compression/* (pluggable algorithms), hbase-external-blockcache
Packaging and shading:
hbase-shaded/*, hbase-assembly*, hbase-resource-bundle
Tooling:
hbase-shell (JRuby REPL), hbase-hbtop, hbase-mapreduce, hbase-backup,
hbase-diagnostics
Build infrastructure (ignore for code tasks):
hbase-build-configuration, hbase-checkstyle, hbase-annotations,
hbase-archetypes/*, hbase-dev-generate-classpath
Testing:
hbase-testing-util, hbase-it, hbase-examples
Classes are annotated with @InterfaceAudience to indicate their intended
consumer:
Public-- Stable client API. External consumers depend on these.LimitedPrivate-- Internal API shared across modules, scoped to a named audience (e.g.,COPROC,CONFIG,REPLICATION,AUTHENTICATION). The audience name tells you who is expected to call this code.Private-- Module-internal. Not API.
These annotations are the fastest way to determine whether a class is part of the external surface or internal plumbing.
When investigating a behavior, start from where it enters the system:
- Client RPCs:
RSRpcServices(RegionServer) andMasterRpcServices(Master) handle all client-initiated RPCs. Trace from the method matching the RPC name. - REST gateway: resource classes in
hbase-restmap HTTP verbs to operations. - Thrift gateway: handler classes in
hbase-thriftmap Thrift methods. - Coprocessor hooks: observer interfaces (
RegionObserver,MasterObserver, etc.) define extension points. Implementations are loaded via configuration or table schema. - Procedures:
hbase-proceduredefines the framework; concrete procedures (table create, region split, etc.) live inhbase-server. - Configuration: properties are defined in
hbase-default.xml(inhbase-common) and overridden by operators inhbase-site.xml. - Wire format:
.protofiles inhbase-protocol-shadeddefine every RPC request/response and all persisted data structures. (Older branches had a separatehbase-protocolmodule; it has been removed on master.)
The same Java package often appears in multiple modules (e.g., the
coprocessor package exists in hbase-client, hbase-server,
hbase-endpoint, and hbase-examples). Each module contributes different
classes to the package. When searching for a class, check which module it
lives in -- the module determines the execution context.
hbase-thirdparty is a companion
project that patches and shades key dependencies (protobuf, netty, gson, etc.)
so that HBase's internal use of these libraries does not conflict with
versions on the application classpath. The hbase-shaded-* artifacts from
that repo appear as dependencies throughout this project's pom.xml. Changes
to shaded dependency versions or patches happen in that repo, not here.
dev-support/ contains CI configuration, release automation, code analysis
scripts, and other maintainer tools. PR-level CI has migrated to GitHub
Actions (.github/workflows/), but nightly and branch-level CI still runs
via configurations in dev-support/. That directory also holds release
scripts, docker-based test environments, and various developer utilities.
See dev-support/README.md for a full index.
conf/ holds default configuration templates (hbase-site.xml,
hbase-env.sh, log4j2.properties). bin/ holds shell scripts for cluster
lifecycle and operations.
dev-support/design-docs/ collects design documents and proposals for major
features. These capture the rationale behind complex subsystems and are useful
for understanding why the code is structured the way it is.
- Tests mirror source paths:
src/test/javaparallelssrc/main/java - Generated code (protobuf, etc.) lives in
target/and is not checked in - Configuration properties use
hbase.prefix - The shell is JRuby wrapping the Java client API
The project website (https://hbase.apache.org) is maintained in this repo under
hbase-website/. User-facing and administrator-facing documentation covering
configuration, security, architecture, schema design, operations, APIs, and
more lives in hbase-website/app/pages/_docs/docs/_mdx/. The table of
contents and page ordering is defined in the meta.json files within that
tree.
The site also serves https://hbase.apache.org/llms-full.txt, which concatenates all documentation pages into a single text file suitable for LLM context ingestion.
The project's security model is the entry point for any security analysis of this codebase. The canonical chain for discovering it is:
AGENTS.md -> SECURITY.md -> https://hbase.apache.org/security-model/
SECURITY.md at the repo root is a short pointer to the published model
and the security@apache.org reporting flow. The full model (source at
hbase-website/app/pages/_landing/security-model/content.md, published
at https://hbase.apache.org/security-model/) details trust boundaries,
what constitutes a valid vulnerability, and what does not.
When performing security analysis of this codebase, use the navigation structure above to determine the role of the code under review, then apply the security model to interpret findings in context.