This guide provides comprehensive instructions for developers who want to contribute to PowerMem, including how to set up the development environment, build the project, and add new integrations.
- Development Environment Setup
- Building the Project
- Contributing Code
- Adding a Vector Store
- Adding an LLM Provider
- Adding an Embedding Provider
- Adding a Reranker Provider
- Testing
- Code Style and Standards
- Publishing to PyPI
- Debugging
- Performance Optimization
- Common Issues and Solutions
- Advanced Topics
- Additional Resources
- Getting Help
- Python 3.10 or higher
- pip (Python package manager)
- Git
- Clone the repository:
git clone https://github.com/oceanbase/powermem.git
cd powermem- Install development dependencies:
# Install the project in development mode with all dependencies
make install-dev
# Or install with test dependencies
make install-testAlternatively, you can use pip directly:
pip install -e ".[dev,test]"- Set up environment variables:
Copy the example environment file and configure it:
cp configs/env.example configs/powermem.env
# Edit configs/powermem.env with your configurationThe project uses several development tools that are installed with dev dependencies:
- pytest: Testing framework
- black: Code formatter
- isort: Import sorter
- flake8: Linter
- mypy: Type checker
To build wheel and source distribution packages:
make build-packageThis will:
- Clean previous build artifacts
- Build both wheel (
.whl) and source distribution (.tar.gz) packages - Output files to the
dist/directory
Before publishing, verify the package:
make build-checkThis runs twine check to validate the package.
If you need to install build tools manually:
make install-build-toolsThis installs build and twine packages.
To install the built package locally:
make install-localThis installs the wheel package from dist/ directory.
- Fork the repository on GitHub
- Create a feature branch:
git checkout -b feature/your-feature-name- Make your changes following the code style guidelines
- Write tests for your changes
- Run tests to ensure everything passes:
make test- Format and lint your code:
make format
make lint- Commit your changes with clear commit messages
- Push to your fork and create a Pull Request
- Use clear, descriptive commit messages
- Start with a verb in imperative mood (e.g., "Add", "Fix", "Update")
- Reference issue numbers when applicable (e.g., "Fix #123: Add error handling")
- Provide a clear description of changes
- Include tests for new features
- Update documentation if needed
- Ensure all tests pass
- Follow the code style guidelines
To add a new vector store provider, you need to:
- Create the vector store implementation
- Create the configuration class
- Register it in the factory
Create a new file in src/powermem/storage/your_provider/your_provider.py:
"""
YourProvider vector store implementation
"""
import logging
from typing import Any, Dict, List, Optional
from powermem.storage.base import VectorStoreBase, OutputData
from powermem.utils.utils import generate_snowflake_id
logger = logging.getLogger(__name__)
class YourProviderVectorStore(VectorStoreBase):
"""YourProvider vector store implementation"""
def __init__(
self,
collection_name: str = "memories",
# Add your provider-specific parameters
host: Optional[str] = None,
port: Optional[int] = None,
**kwargs
):
"""
Initialize the vector store.
Args:
collection_name: Name of the collection
host: Server host
port: Server port
"""
self.collection_name = collection_name
# Initialize your provider client here
logger.info(f"YourProviderVectorStore initialized")
def create_col(self, name=None, vector_size=None, distance="cosine"):
"""Create a new collection."""
collection_name = name or self.collection_name
# Implement collection creation logic
pass
def insert(self, vectors: List[List[float]], payloads=None, ids=None) -> List[int]:
"""
Insert vectors into the collection.
Args:
vectors: List of vectors to insert
payloads: List of payload dictionaries
ids: Optional list of IDs (if None, will generate Snowflake IDs)
Returns:
List[int]: List of generated or provided IDs
"""
if not vectors:
return []
if payloads is None:
payloads = [{} for _ in vectors]
# Generate IDs if not provided
if ids is None:
ids = [generate_snowflake_id() for _ in range(len(vectors))]
# Implement insertion logic
# ...
return ids
def search(
self,
query: List[float],
vectors: Optional[List[List[float]]] = None,
limit: int = 5,
filters: Optional[Dict[str, Any]] = None
) -> List[OutputData]:
"""
Search for similar vectors.
Args:
query: Query vector
vectors: Optional list of vectors to search in
limit: Maximum number of results
filters: Optional metadata filters
Returns:
List[OutputData]: List of search results
"""
# Implement search logic
results = []
# ...
return results
def delete(self, vector_id: int) -> bool:
"""Delete a vector by ID."""
# Implement deletion logic
return True
def update(self, vector_id: int, vector=None, payload=None) -> bool:
"""Update a vector and its payload."""
# Implement update logic
return True
def get(self, vector_id: int) -> Optional[OutputData]:
"""Retrieve a vector by ID."""
# Implement retrieval logic
return None
def list_cols(self) -> List[str]:
"""List all collections."""
# Implement list logic
return []
def delete_col(self, name=None) -> bool:
"""Delete a collection."""
# Implement deletion logic
return True
def col_info(self) -> Dict[str, Any]:
"""Get information about a collection."""
# Implement info retrieval
return {}
def list(self, filters=None, limit=None) -> List[OutputData]:
"""List all memories."""
# Implement list logic
return []
def reset(self) -> bool:
"""Reset by deleting the collection and recreating it."""
self.delete_col()
self.create_col()
return TrueCreate a configuration file in src/powermem/storage/config/your_provider.py:
"""
Configuration for YourProvider vector store
"""
from typing import Optional
from pydantic import Field
from powermem.storage.config.base import BaseVectorStoreConfig
class YourProviderConfig(BaseVectorStoreConfig):
"""Configuration for YourProvider vector store"""
host: str = Field(description="Server host")
port: int = Field(default=5432, description="Server port")
# Add other provider-specific fields
class Config:
extra = "allow"Update src/powermem/storage/configs.py to include your config:
from powermem.storage.config.your_provider import YourProviderConfig
class VectorStoreConfig(BaseModel):
# ...
_provider_configs: Dict[str, str] = {
"oceanbase": "OceanBaseConfig",
"pgvector": "PGVectorConfig",
"sqlite": "SQLiteConfig",
"your_provider": "YourProviderConfig", # Add this
}Update src/powermem/storage/factory.py:
class VectorStoreFactory:
provider_to_class = {
"oceanbase": "powermem.storage.oceanbase.oceanbase.OceanBaseVectorStore",
"sqlite": "powermem.storage.sqlite.sqlite_vector_store.SQLiteVectorStore",
"pgvector": "powermem.storage.pgvector.pgvector.PGVectorStore",
"postgres": "powermem.storage.pgvector.pgvector.PGVectorStore",
"your_provider": "powermem.storage.your_provider.your_provider.YourProviderVectorStore", # Add this
}Create test file tests/integration/test_your_provider_vector_store.py:
import pytest
from powermem.storage.factory import VectorStoreFactory
def test_your_provider_create_col():
"""Test collection creation"""
config = {
"collection_name": "test_collection",
"host": "localhost",
"port": 5432,
}
store = VectorStoreFactory.create("your_provider", config)
store.create_col("test_col", vector_size=128, distance="cosine")
assert store.col_info() is not None
def test_your_provider_insert_and_search():
"""Test insertion and search"""
config = {
"collection_name": "test_collection",
"host": "localhost",
"port": 5432,
}
store = VectorStoreFactory.create("your_provider", config)
store.create_col("test_col", vector_size=128, distance="cosine")
vectors = [[0.1] * 128, [0.2] * 128]
payloads = [{"text": "test1"}, {"text": "test2"}]
ids = store.insert(vectors, payloads)
assert len(ids) == 2
results = store.search([0.1] * 128, limit=1)
assert len(results) > 0To add a new LLM provider:
- Create the LLM implementation
- Create the configuration class
- Register it in the factory
Create src/powermem/integrations/llm/your_provider.py:
"""
YourProvider LLM implementation
"""
from typing import Dict, List, Optional
from powermem.integrations.llm import LLMBase
from powermem.integrations.llm.config.base import BaseLLMConfig
from powermem.integrations.llm.config.your_provider import YourProviderConfig
# Import your provider's SDK
try:
from your_provider_sdk import YourProviderClient
except ImportError:
raise ImportError(
"The 'your_provider_sdk' library is required. "
"Please install it using 'pip install your_provider_sdk'."
)
class YourProviderLLM(LLMBase):
"""YourProvider LLM implementation"""
def __init__(self, config: Optional[YourProviderConfig] = None):
if config is None:
config = YourProviderConfig()
elif isinstance(config, dict):
config = YourProviderConfig(**config)
super().__init__(config)
# Initialize your provider client
self.client = YourProviderClient(
api_key=self.config.api_key,
base_url=getattr(self.config, 'base_url', None),
)
def generate_response(
self,
messages: List[Dict[str, str]],
response_format=None,
tools: Optional[List[Dict]] = None,
tool_choice: str = "auto",
**kwargs,
) -> str:
"""
Generate a response based on the given messages.
Args:
messages: List of message dicts with 'role' and 'content'
response_format: Optional response format specification
tools: Optional list of tools for function calling
tool_choice: Tool choice strategy
**kwargs: Additional parameters
Returns:
str: Generated response text
"""
# Prepare parameters
params = self._get_supported_params(
messages=messages,
tools=tools,
tool_choice=tool_choice,
**kwargs
)
# Add provider-specific parameters
params.update({
"model": self.config.model,
"temperature": self.config.temperature,
"max_tokens": self.config.max_tokens,
})
# Call your provider's API
response = self.client.chat.completions.create(**params)
# Parse and return response
return self._parse_response(response, tools)
def _parse_response(self, response, tools=None):
"""Parse the response from the provider"""
if tools:
# Handle tool calls if supported
return {
"content": response.choices[0].message.content,
"tool_calls": getattr(response.choices[0].message, 'tool_calls', []),
}
else:
return response.choices[0].message.contentCreate src/powermem/integrations/llm/config/your_provider.py:
"""
Configuration for YourProvider LLM
"""
from typing import Optional
from pydantic import Field
from powermem.integrations.llm.config.base import BaseLLMConfig
class YourProviderConfig(BaseLLMConfig):
"""Configuration for YourProvider LLM"""
base_url: Optional[str] = Field(
default=None,
description="Base URL for the API"
)
# Add provider-specific fields
class Config:
extra = "allow"Update src/powermem/integrations/llm/factory.py:
from powermem.integrations.llm.config.your_provider import YourProviderConfig
class LLMFactory:
provider_to_class = {
# ... existing providers ...
"your_provider": ("powermem.integrations.llm.your_provider.YourProviderLLM", YourProviderConfig),
}If your provider requires a new dependency, add it to pyproject.toml:
[project]
dependencies = [
# ... existing dependencies ...
"your_provider_sdk>=1.0.0",
]To add a new embedding provider:
- Create the embedding implementation
- Register it in the factory
Create src/powermem/integrations/embeddings/your_provider.py:
"""
YourProvider embedding implementation
"""
from typing import Literal, Optional, List
from powermem.integrations.embeddings import EmbeddingBase
from powermem.integrations.embeddings.config.base import BaseEmbedderConfig
# Import your provider's SDK
try:
from your_provider_sdk import YourProviderEmbeddingClient
except ImportError:
raise ImportError(
"The 'your_provider_sdk' library is required. "
"Please install it using 'pip install your_provider_sdk'."
)
class YourProviderEmbedding(EmbeddingBase):
"""YourProvider embedding implementation"""
def __init__(self, config: Optional[BaseEmbedderConfig] = None):
super().__init__(config)
# Initialize your provider client
self.client = YourProviderEmbeddingClient(
api_key=getattr(self.config, 'api_key', None),
model=self.config.model,
)
def embed(
self,
text: str,
memory_action: Optional[Literal["add", "search", "update"]] = None
) -> List[float]:
"""
Get the embedding for the given text.
Args:
text: The text to embed
memory_action: The type of embedding to use (optional)
Returns:
List[float]: The embedding vector
"""
# Some providers support different embedding models for different actions
model = self.config.model
if memory_action == "search" and hasattr(self.config, 'search_model'):
model = self.config.search_model
# Call your provider's API
response = self.client.embed(text, model=model)
return response.embeddingUpdate src/powermem/integrations/embeddings/factory.py:
class EmbedderFactory:
provider_to_class = {
# ... existing providers ...
"your_provider": "powermem.integrations.embeddings.your_provider.YourProviderEmbedding",
}Rerankers improve search results by reordering documents based on relevance to the query. To add a new reranker provider:
- Create the reranker implementation
- Register it in the factory
Create src/powermem/integrations/rerank/your_provider.py:
"""
YourProvider reranker implementation
"""
import os
from typing import List, Optional, Tuple
from powermem.integrations.rerank.base import RerankBase
from powermem.integrations.rerank.config.base import BaseRerankConfig
# Import your provider's SDK
try:
from your_provider_sdk import YourProviderRerankClient
except ImportError:
raise ImportError(
"The 'your_provider_sdk' library is required. "
"Please install it using 'pip install your_provider_sdk'."
)
class YourProviderRerank(RerankBase):
"""YourProvider reranker implementation"""
def __init__(self, config: Optional[BaseRerankConfig] = None):
super().__init__(config)
# Set default model
self.config.model = self.config.model or "your-rerank-model"
# Initialize your provider client
api_key = self.config.api_key or os.getenv("YOUR_PROVIDER_API_KEY")
if not api_key:
raise ValueError(
"API key is required. Set YOUR_PROVIDER_API_KEY environment variable "
"or pass api_key in config."
)
self.client = YourProviderRerankClient(api_key=api_key)
def rerank(
self,
query: str,
documents: List[str],
top_n: Optional[int] = None
) -> List[Tuple[int, float]]:
"""
Rerank documents based on relevance to the query.
Args:
query: The search query
documents: List of document texts to rerank
top_n: Number of top results to return
Returns:
List[Tuple[int, float]]: List of (document_index, relevance_score) tuples,
sorted by relevance score in descending order
"""
if not query or not query.strip():
raise ValueError("Query cannot be empty")
if not documents or len(documents) == 0:
raise ValueError("Documents list cannot be empty")
# Use provided top_n or return all results
effective_top_n = top_n if top_n is not None else len(documents)
try:
# Call your provider's rerank API
response = self.client.rerank(
query=query,
documents=documents,
model=self.config.model,
top_n=effective_top_n,
)
# Parse results - format: [(index, score), ...]
results = []
for item in response.results:
index = item.index
score = item.relevance_score
results.append((index, float(score)))
# Sort by score descending (highest first)
results.sort(key=lambda x: x[1], reverse=True)
return results
except Exception as e:
raise Exception(f"Failed to rerank documents: {e}")Update src/powermem/integrations/rerank/factory.py:
class RerankFactory:
provider_to_class = {
"qwen": "powermem.integrations.rerank.qwen.QwenRerank",
"your_provider": "powermem.integrations.rerank.your_provider.YourProviderRerank", # Add this
}Enable reranker in your configuration:
from powermem import Memory
config = {
"reranker": {
"enabled": True,
"provider": "your_provider",
"config": {
"model": "your-rerank-model",
"api_key": "your-api-key",
},
},
# ... other config
}
memory = Memory(config=config)Or via environment variables:
export RERANKER_ENABLED=true
export RERANKER_PROVIDER=your_provider
export RERANKER_MODEL=your-rerank-model
export RERANKER_API_KEY=your-api-key# Run all tests
make test
# Run unit tests only
make test-unit
# Run integration tests only
make test-integration
# Run end-to-end tests
make test-e2e
# Run tests with coverage
make test-coverage- Unit tests: Test individual functions and classes in isolation
- Integration tests: Test interactions between components
- E2E tests: Test complete workflows
Place tests in:
tests/unit/for unit teststests/integration/for integration teststests/e2e/for end-to-end tests
import pytest
from powermem import Memory
def test_feature_name():
"""Test description"""
# Arrange
memory = Memory()
# Act
result = memory.some_method()
# Assert
assert result is not NoneThe project uses black for code formatting and isort for import sorting:
# Format code
make format
# Check formatting without making changes
make format-check# Run linter
make lint# Run type checker
make type-check- Follow PEP 8 style guide
- Use type hints for function parameters and return values
- Write docstrings for all classes and public methods
- Keep functions focused - one responsibility per function
- Use meaningful variable names
- Handle errors appropriately - use try/except where needed
- Add logging for important operations
"""
Module docstring describing the purpose of this module.
"""
import logging
from typing import Dict, List, Optional
logger = logging.getLogger(__name__)
class ExampleClass:
"""
Class docstring describing what this class does.
Args:
param1: Description of param1
param2: Description of param2
"""
def __init__(self, param1: str, param2: Optional[int] = None):
"""Initialize the class."""
self.param1 = param1
self.param2 = param2
logger.info(f"Initialized ExampleClass with param1={param1}")
def example_method(self, input_data: Dict[str, str]) -> List[str]:
"""
Method docstring describing what this method does.
Args:
input_data: Dictionary containing input data
Returns:
List of processed strings
Raises:
ValueError: If input_data is empty
"""
if not input_data:
raise ValueError("input_data cannot be empty")
# Implementation here
return []Before publishing to PyPI, ensure you have:
- PyPI account: Create an account at pypi.org
- API tokens: Generate API tokens from your PyPI account settings
- Configure credentials: Set up
~/.pypircor use environment variables
- Update version in
pyproject.toml:
[project]
version = "0.1.1" # Increment version number-
Update changelog (if you maintain one)
-
Build and check the package:
make build-check- Test locally:
make install-local
# Test the installed package
python -c "import powermem; print(powermem.__version__)"- Publish to TestPyPI first (recommended):
make publish-testpypi- Test installation from TestPyPI:
pip install --index-url https://test.pypi.org/simple/ powermem- Publish to PyPI:
make publish-pypiFollow Semantic Versioning:
- MAJOR version (1.0.0): Incompatible API changes
- MINOR version (0.1.0): New functionality in a backward compatible manner
- PATCH version (0.0.1): Backward compatible bug fixes
After publishing, create a git tag:
git tag -a v0.1.1 -m "Release version 0.1.1"
git push origin v0.1.1Set the logging level to DEBUG:
import logging
logging.basicConfig(level=logging.DEBUG)Or set environment variable:
export LOG_LEVEL=DEBUG- Vector Store Connection Issues:
from powermem.storage.factory import VectorStoreFactory
# Test connection
config = {
"collection_name": "test",
"host": "localhost",
"port": 2881,
# ... other config
}
store = VectorStoreFactory.create("oceanbase", config)
print(store.col_info()) # Check if connection works- LLM API Issues:
from powermem.integrations.llm.factory import LLMFactory
# Test LLM connection
config = {
"model": "gpt-4",
"api_key": "your-key",
}
llm = LLMFactory.create("openai", config)
response = llm.generate_response([{"role": "user", "content": "test"}])
print(response)- Embedding Issues:
from powermem.integrations.embeddings.factory import EmbedderFactory
# Test embedding
config = {
"model": "text-embedding-3-small",
"api_key": "your-key",
}
embedder = EmbedderFactory.create("openai", config)
vector = embedder.embed("test text")
print(f"Vector dimension: {len(vector)}")- Index Configuration:
# For OceanBase
config = {
"index_type": "HNSW", # Use HNSW for better performance
"vidx_metric_type": "l2", # Choose appropriate metric
# ... other config
}- Batch Operations:
# Insert multiple vectors at once
vectors = [[0.1] * 1536 for _ in range(100)]
payloads = [{"text": f"text_{i}"} for i in range(100)]
ids = store.insert(vectors, payloads) # Batch insert- Connection Pooling:
# Use connection pooling for better performance
config = {
"pool_size": 10,
"max_overflow": 20,
# ... other config
}- Use Async Operations:
from powermem import AsyncMemory
memory = AsyncMemory()
# Async operations are more efficient for I/O-bound tasks
await memory.add("text", user_id="user1")- Batch Memory Operations:
# Add multiple memories efficiently
memories = [
"User likes Python",
"User works at tech company",
"User prefers coffee over tea",
]
for mem in memories:
memory.add(mem, user_id="user1")Problem: ModuleNotFoundError when importing provider-specific modules
Solution: Install the required dependencies:
# For specific providers
pip install pyobvector # For OceanBase
pip install pgvector # For PostgreSQL
pip install dashscope # For QwenProblem: Configuration from .env file not being loaded
Solution: Ensure the file path is correct:
from powermem import create_memory
# Explicitly specify config file
memory = create_memory(config_file="configs/powermem.env")Problem: ValueError: Vector dimension mismatch
Solution: Ensure embedding model dimensions match vector store configuration:
# Check embedding dimensions
embedder = EmbedderFactory.create("openai", {"model": "text-embedding-3-small"})
vector = embedder.embed("test")
print(f"Dimension: {len(vector)}") # Should be 1536 for text-embedding-3-small
# Configure vector store with matching dimensions
store_config = {
"embedding_model_dims": 1536, # Must match embedding dimension
# ... other config
}Problem: Timeout errors when connecting to vector store
Solution: Increase timeout and check network:
config = {
"host": "localhost",
"port": 2881,
"connect_timeout": 30, # Increase timeout
# ... other config
}You can customize how facts are extracted from messages:
from powermem import Memory
custom_prompt = """
Extract key facts from the following conversation.
Focus on: user preferences, important events, relationships.
"""
memory = Memory(
config={
"custom_fact_extraction_prompt": custom_prompt,
# ... other config
}
)Customize how memories are updated:
custom_update_prompt = """
Update existing memories based on new information.
Merge similar memories and remove outdated ones.
"""
memory = Memory(
config={
"custom_update_memory_prompt": custom_update_prompt,
# ... other config
}
)Configure sub-stores for better data organization:
config = {
"storage_type": "oceanbase",
"sub_stores": [
{
"name": "user_preferences",
"filters": {"type": "preference"},
},
{
"name": "events",
"filters": {"type": "event"},
},
],
# ... other config
}
memory = Memory(config=config)Enable graph store for relationship-based retrieval:
config = {
"graph_store": {
"provider": "oceanbase",
"config": {
# Graph store configuration
},
},
# ... other config
}
memory = Memory(config=config)- API Documentation: See
docs/api/for detailed API reference - Examples: Check
examples/directory for usage examples - Architecture: See
docs/architecture/overview.mdfor system architecture - Issues: Report bugs or request features on GitHub Issues
- Discord: Join our Discord community
- GitHub Discussions: Ask questions in GitHub Discussions
- Documentation: Browse the documentation
Happy coding! 🚀