Skip to content

Commit d17b7ce

Browse files
committed
Implement Kerberos authentication support for Hive Catalog
1 parent 94e8a98 commit d17b7ce

File tree

5 files changed

+93
-36
lines changed

5 files changed

+93
-36
lines changed

mkdocs/docs/configuration.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -228,19 +228,19 @@ catalog:
228228
catalog:
229229
default:
230230
uri: thrift://localhost:9083
231-
s3.endpoint: http://localhost:9000
232-
s3.access-key-id: admin
233-
s3.secret-access-key: password
231+
hive:
232+
hive2-compatible: true
233+
use-kerberos: true
234234
```
235235

236-
When using Hive 2.x, make sure to set the compatibility flag:
236+
<!-- markdown-link-check-disable -->
237237

238-
```yaml
239-
catalog:
240-
default:
241-
...
242-
hive.hive2-compatible: true
243-
```
238+
| Key | Example | Description |
239+
| --------------------- | ------- | --------------------------------- |
240+
| hive.hive2-compatible | true | Using Hive 2.x compatibility mode |
241+
| hive.use-kerberos | true | Using authentication via Kerberos |
242+
243+
<!-- markdown-link-check-enable-->
244244

245245
## Glue Catalog
246246

mkdocs/docs/index.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -40,22 +40,23 @@ pip install "pyiceberg[s3fs,hive]"
4040

4141
You can mix and match optional dependencies depending on your needs:
4242

43-
| Key | Description: |
44-
| ------------ | -------------------------------------------------------------------- |
45-
| hive | Support for the Hive metastore |
46-
| glue | Support for AWS Glue |
47-
| dynamodb | Support for AWS DynamoDB |
48-
| sql-postgres | Support for SQL Catalog backed by Postgresql |
49-
| sql-sqlite | Support for SQL Catalog backed by SQLite |
50-
| pyarrow | PyArrow as a FileIO implementation to interact with the object store |
51-
| pandas | Installs both PyArrow and Pandas |
52-
| duckdb | Installs both PyArrow and DuckDB |
53-
| ray | Installs PyArrow, Pandas, and Ray |
54-
| daft | Installs Daft |
55-
| s3fs | S3FS as a FileIO implementation to interact with the object store |
56-
| adlfs | ADLFS as a FileIO implementation to interact with the object store |
57-
| snappy | Support for snappy Avro compression |
58-
| gcsfs | GCSFS as a FileIO implementation to interact with the object store |
43+
| Key | Description: |
44+
| ------------- | -------------------------------------------------------------------- |
45+
| hive | Support for the Hive metastore |
46+
| hive-kerberos | Support for Hive metastore in Kerberos environment |
47+
| glue | Support for AWS Glue |
48+
| dynamodb | Support for AWS DynamoDB |
49+
| sql-postgres | Support for SQL Catalog backed by Postgresql |
50+
| sql-sqlite | Support for SQL Catalog backed by SQLite |
51+
| pyarrow | PyArrow as a FileIO implementation to interact with the object store |
52+
| pandas | Installs both PyArrow and Pandas |
53+
| duckdb | Installs both PyArrow and DuckDB |
54+
| ray | Installs PyArrow, Pandas, and Ray |
55+
| daft | Installs Daft |
56+
| s3fs | S3FS as a FileIO implementation to interact with the object store |
57+
| adlfs | ADLFS as a FileIO implementation to interact with the object store |
58+
| snappy | Support for snappy Avro compression |
59+
| gcsfs | GCSFS as a FileIO implementation to interact with the object store |
5960

6061
You either need to install `s3fs`, `adlfs`, `gcsfs`, or `pyarrow` to be able to fetch files from an object store.
6162

poetry.lock

Lines changed: 46 additions & 6 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyiceberg/catalog/hive.py

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,9 @@
121121
HIVE2_COMPATIBLE = "hive.hive2-compatible"
122122
HIVE2_COMPATIBLE_DEFAULT = False
123123

124+
HIVE_KERBEROS_AUTH = "hive.use-kerberos"
125+
HIVE_KERBEROS_AUTH_DEFAULT = False
126+
124127
LOCK_CHECK_MIN_WAIT_TIME = "lock-check-min-wait-time"
125128
LOCK_CHECK_MAX_WAIT_TIME = "lock-check-max-wait-time"
126129
LOCK_CHECK_RETRIES = "lock-check-retries"
@@ -138,11 +141,17 @@ class _HiveClient:
138141
_client: Client
139142
_ugi: Optional[List[str]]
140143

141-
def __init__(self, uri: str, ugi: Optional[str] = None):
144+
def __init__(self, uri: str, ugi: Optional[str] = None, use_kerberos: Optional[bool] = HIVE_KERBEROS_AUTH_DEFAULT):
142145
url_parts = urlparse(uri)
146+
143147
transport = TSocket.TSocket(url_parts.hostname, url_parts.port)
144-
self._transport = TTransport.TBufferedTransport(transport)
145-
protocol = TBinaryProtocol.TBinaryProtocol(transport)
148+
149+
if not use_kerberos:
150+
self._transport = TTransport.TBufferedTransport(transport)
151+
else:
152+
self._transport = TTransport.TSaslClientTransport(transport, host=url_parts.hostname, service="hive")
153+
154+
protocol = TBinaryProtocol.TBinaryProtocol(self._transport)
146155

147156
self._client = Client(protocol)
148157
self._ugi = ugi.split(":") if ugi else None
@@ -257,7 +266,11 @@ class HiveCatalog(MetastoreCatalog):
257266

258267
def __init__(self, name: str, **properties: str):
259268
super().__init__(name, **properties)
260-
self._client = _HiveClient(properties["uri"], properties.get("ugi"))
269+
self._client = _HiveClient(
270+
properties["uri"],
271+
properties.get("ugi"),
272+
PropertyUtil.property_as_bool(properties, HIVE_KERBEROS_AUTH, HIVE_KERBEROS_AUTH_DEFAULT),
273+
)
261274

262275
self._lock_check_min_wait_time = PropertyUtil.property_as_float(
263276
properties, LOCK_CHECK_MIN_WAIT_TIME, DEFAULT_LOCK_CHECK_MIN_WAIT_TIME

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@ gcsfs = { version = ">=2023.1.0,<2024.1.0", optional = true }
7272
psycopg2-binary = { version = ">=2.9.6", optional = true }
7373
sqlalchemy = { version = "^2.0.18", optional = true }
7474
getdaft = { version = ">=0.2.12", optional = true }
75+
thrift-sasl = { version = ">=0.4.3", optional = true }
76+
kerberos = { version = "1.3.1", optional = true }
7577

7678
[tool.poetry.group.dev.dependencies]
7779
pytest = "7.4.4"
@@ -580,6 +582,7 @@ ray = ["ray", "pyarrow", "pandas"]
580582
daft = ["getdaft"]
581583
snappy = ["python-snappy"]
582584
hive = ["thrift"]
585+
hive-kerberos = ["thrift", "thrift_sasl", "kerberos"]
583586
s3fs = ["s3fs"]
584587
glue = ["boto3", "mypy-boto3-glue"]
585588
adlfs = ["adlfs"]

0 commit comments

Comments
 (0)