Skip to content

Latest commit

 

History

History
199 lines (152 loc) · 8.49 KB

File metadata and controls

199 lines (152 loc) · 8.49 KB

AGENTS.md

Apache HBase is a distributed, scalable big data store built on HDFS and cloud object storage.

Repo Structure

This is a multi-module Maven project. Modules live in arbitrarily nested folders; enumerate them by searching for pom.xml files (excluding target/ directories). The root pom.xml defines the full reactor and build order. Note that some directories from removed or merged modules (e.g., hbase-hadoop2-compat/, hbase-protocol/, hbase-rsgroup/) may still exist as empty shells with only target/ remnants. If a directory has no pom.xml, it is not part of the active build.

Client and Server

The fundamental divide in this codebase is client-side vs. server-side, with several modules shared between them.

  • hbase-client -- The client library. Builds RPC requests, handles retries, manages connections. This is the public API that external consumers depend on.
  • hbase-server -- RegionServer and Master implementations. Processes RPCs, manages regions, stores data. The largest module by far.
  • Shared modules like hbase-common, hbase-protocol-shaded, and hbase-metrics-api are dependencies of both sides.

When orienting on unfamiliar code, first determine which side of this divide you are on.

Module Roles

Core data path: hbase-client -> hbase-server (via protobuf RPCs defined in hbase-protocol-shaded)

Gateways (alternative client entry points): hbase-rest (HTTP/JSON), hbase-thrift (Thrift RPC)

Coprocessors are HBase's server-side extension framework. They allow custom code to run inside RegionServer and Master processes, with the same privileges as the host process. The base Coprocessor interface lives in hbase-client; observer and endpoint interfaces (RegionObserver, MasterObserver, etc.) live in hbase-server. Endpoint implementations live in hbase-endpoint. The built-in AccessController coprocessor enforces ACLs; VisibilityController enforces cell-level visibility labels. Third-party coprocessors are loaded via configuration or table schema.

Server subsystems (separated from hbase-server for modularity): hbase-balancer, hbase-procedure, hbase-replication, hbase-asyncfs, hbase-zookeeper, hbase-http

Shared libraries: hbase-common, hbase-metrics + hbase-metrics-api, hbase-logging, hbase-hadoop-compat

Extensions: hbase-extensions (currently hbase-openssl for native TLS support)

Storage codecs: hbase-compression/* (pluggable algorithms), hbase-external-blockcache

Packaging and shading: hbase-shaded/*, hbase-assembly*, hbase-resource-bundle

Tooling: hbase-shell (JRuby REPL), hbase-hbtop, hbase-mapreduce, hbase-backup, hbase-diagnostics

Build infrastructure (ignore for code tasks): hbase-build-configuration, hbase-checkstyle, hbase-annotations, hbase-archetypes/*, hbase-dev-generate-classpath

Testing: hbase-testing-util, hbase-it, hbase-examples

Navigating with @InterfaceAudience

Classes are annotated with @InterfaceAudience to indicate their intended consumer:

  • Public -- Stable client API. External consumers depend on these.
  • LimitedPrivate -- Internal API shared across modules, scoped to a named audience (e.g., COPROC, CONFIG, REPLICATION, AUTHENTICATION). The audience name tells you who is expected to call this code.
  • Private -- Module-internal. Not API.

These annotations are the fastest way to determine whether a class is part of the external surface or internal plumbing.

Key Entry Points

When investigating a behavior, start from where it enters the system:

  • Client RPCs: RSRpcServices (RegionServer) and MasterRpcServices (Master) handle all client-initiated RPCs. Trace from the method matching the RPC name.
  • REST gateway: resource classes in hbase-rest map HTTP verbs to operations.
  • Thrift gateway: handler classes in hbase-thrift map Thrift methods.
  • Coprocessor hooks: observer interfaces (RegionObserver, MasterObserver, etc.) define extension points. Implementations are loaded via configuration or table schema.
  • Procedures: hbase-procedure defines the framework; concrete procedures (table create, region split, etc.) live in hbase-server.
  • Configuration: properties are defined in hbase-default.xml (in hbase-common) and overridden by operators in hbase-site.xml.
  • Wire format: .proto files in hbase-protocol-shaded define every RPC request/response and all persisted data structures. (Older branches had a separate hbase-protocol module; it has been removed on master.)

Split Packages

The same Java package often appears in multiple modules (e.g., the coprocessor package exists in hbase-client, hbase-server, hbase-endpoint, and hbase-examples). Each module contributes different classes to the package. When searching for a class, check which module it lives in -- the module determines the execution context.

Related Repositories

hbase-thirdparty is a companion project that patches and shades key dependencies (protobuf, netty, gson, etc.) so that HBase's internal use of these libraries does not conflict with versions on the application classpath. The hbase-shaded-* artifacts from that repo appear as dependencies throughout this project's pom.xml. Changes to shaded dependency versions or patches happen in that repo, not here.

Developer Tooling

dev-support/ contains CI configuration, release automation, code analysis scripts, and other maintainer tools. PR-level CI has migrated to GitHub Actions (.github/workflows/), but nightly and branch-level CI still runs via configurations in dev-support/. That directory also holds release scripts, docker-based test environments, and various developer utilities. See dev-support/README.md for a full index.

conf/ holds default configuration templates (hbase-site.xml, hbase-env.sh, log4j2.properties). bin/ holds shell scripts for cluster lifecycle and operations.

dev-support/design-docs/ collects design documents and proposals for major features. These capture the rationale behind complex subsystems and are useful for understanding why the code is structured the way it is.

Conventions

  • Tests mirror source paths: src/test/java parallels src/main/java
  • Generated code (protobuf, etc.) lives in target/ and is not checked in
  • Configuration properties use hbase. prefix
  • The shell is JRuby wrapping the Java client API

Documentation

The project website (https://hbase.apache.org) is maintained in this repo under hbase-website/. User-facing and administrator-facing documentation covering configuration, security, architecture, schema design, operations, APIs, and more lives in hbase-website/app/pages/_docs/docs/_mdx/. The table of contents and page ordering is defined in the meta.json files within that tree.

The site also serves https://hbase.apache.org/llms-full.txt, which concatenates all documentation pages into a single text file suitable for LLM context ingestion.

Security Model

The project's security model is the entry point for any security analysis of this codebase. The canonical chain for discovering it is:

AGENTS.md -> SECURITY.md -> https://hbase.apache.org/security-model/

SECURITY.md at the repo root is a short pointer to the published model and the security@apache.org reporting flow. The full model (source at hbase-website/app/pages/_landing/security-model/content.md, published at https://hbase.apache.org/security-model/) details trust boundaries, what constitutes a valid vulnerability, and what does not.

When performing security analysis of this codebase, use the navigation structure above to determine the role of the code under review, then apply the security model to interpret findings in context.