feat: add Hive2Namespace implementation for Python#199
Merged
Conversation
2cf68fb to
e6d96b3
Compare
Implements a Hive2 namespace adapter for lance-namespace Python client that integrates with Apache Hive Metastore. Key changes: - Add optional hive2 dependencies in pyproject.toml - Implement Hive2Namespace class with full namespace and table operations - Add shared utils module for PyArrow to JSON schema conversion - Add comprehensive test suite with mocked Hive client - Register hive2 implementation in namespace factory The implementation: - Connects to Hive Metastore via Thrift protocol - Manages Lance tables as external tables in Hive - Supports all namespace operations (list, create, drop, describe) - Supports all table operations (register, create, drop, query) - Converts between PyArrow and Hive schemas - Includes comprehensive docstring with usage examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Updates the Hive2Namespace implementation to be consistent with the documented specification in hive.md: Configuration changes: - Use 'root' instead of 'warehouse' for storage root location - Add 'ugi' to configuration properties documentation - Support 'client.pool-size' and 'storage.*' properties Root namespace handling: - list_namespaces: Only list from root namespace - describe_namespace: Support describing root namespace - create_namespace: Reject creating root (already exists) - drop_namespace: Reject dropping root namespace - namespace_exists: Root namespace always exists - list_tables: Return empty list for root namespace Table metadata: - Use 'table_type' key (not 'lance.table_type') per spec - Set 'managed_by' property (default: 'storage') - Use case-insensitive matching for 'lance' table type - Include 'version' key for table version tracking 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
e6d96b3 to
35b4511
Compare
Modified describe_table method to only return Hive metadata without opening the Lance dataset. This makes the operation more lightweight and faster. Changes: - Remove dataset opening logic from describe_table - Return schema as None (var_schema field) - Parse version from Hive parameters instead of dataset - Add test to verify the new behavior 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Standardized the input/output formats and SerDe configuration for Lance tables in Hive Metastore across both Python and Java implementations: - Set inputFormat to com.lancedb.lance.mapred.LanceInputFormat - Set outputFormat to com.lancedb.lance.mapred.LanceOutputFormat - Set serializationLib to com.lancedb.lance.mapred.LanceSerDe This ensures consistency when tables are registered from either implementation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…ged_by Updated register_table to follow proper versioning semantics: - Only set version parameter when managed_by is "impl" - When managed_by is "storage" (default), version is not tracked in Hive - Removed unnecessary "EXTERNAL": "TRUE" parameter This aligns with the specification where version tracking is only needed when the table is implementation-managed. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Collaborator
Author
|
looks good to me |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
Core Implementation
python/lance_namespace/src/lance_namespace/hive.py: Complete Hive2Namespace implementation with:Dependencies
python/lance_namespace/pyproject.toml: Added optionalhive2extra with:thrift>=0.13.0hive-metastore-client>=1.0.9pip install 'lance-namespace[hive2]'Tests
python/lance_namespace/tests/test_hive.py: Comprehensive test suite covering:Documentation
docs/src/impls/hive.md: Added Python-specific documentation with usage examplespython/lance_namespace/README.md: Updated with Hive2 backend instructionsRegistration
python/lance_namespace/src/lance_namespace/namespace.py: Registeredhive2implementationUsage Example
Test Plan
🤖 Generated with Claude Code