Model Compatibility

⚠️ CRITICAL WARNING

HMLR has ONLY been tested with gpt-4.1-mini.

Why Only GPT-4.1-mini?

HMLR's architecture was designed, optimized, and benchmarked exclusively with gpt-4.1-mini. The system makes specific assumptions about:

Instruction following consistency
JSON formatting reliability
Prompt response patterns
Token usage characteristics
Reasoning capabilities at the mini-tier

These assumptions have been validated through extensive testing with gpt-4.1-mini ONLY.

Verified Benchmarks

All RAGAS benchmarks achieving 1.00 Faithfulness and 1.00 Context Recall were run using:

model="gpt-4.1-mini"

Test	Model	Faithfulness	Context Recall	Result
Multi-hop reasoning	gpt-4.1-mini	1.00	1.00	✅ PASS
Temporal conflicts	gpt-4.1-mini	1.00	1.00	✅ PASS
Cross-topic invariants	gpt-4.1-mini	1.00	1.00	✅ PASS
Zero-keyword recall	gpt-4.1-mini	1.00	1.00	✅ PASS
50-turn long conversation	gpt-4.1-mini	1.00	1.00	✅ PASS
Hydra (9-policy lethal RAG)	gpt-4.1-mini	1.00	1.00	✅ PASS

No other models have been tested.

What Happens If You Use Another Model?

If you try to use a different model, you'll see this warning:

client = HMLRClient(
    api_key="...",
    model="gpt-4o"  # Different model!
)

# ⚠️ Output:
# UserWarning: Model 'gpt-4o' has NOT been tested!
# HMLR is only validated with 'gpt-4.1-mini'.
# You may experience:
# - Incorrect memory retrieval
# - Malformed internal JSON
# - Failed fact extraction
# - Bridge block corruption
# Proceed at your own risk.

Potential Failure Modes with Other Models

1. JSON Parsing Failures

HMLR components generate structured JSON for internal operations. Other models may:

Use different formatting
Include extra commentary
Fail to follow JSON schema exactly

2. Fact Extraction Errors

The FactScrubber uses specific prompts optimized for GPT-4.1-mini. Other models may:

Extract incorrect facts
Miss important facts
Hallucinate non-existent facts

3. Retrieval Failures

The LatticeCrawler and Governor rely on specific reasoning patterns. Other models may:

Retrieve irrelevant context
Miss critical memories
Fail to resolve temporal conflicts

4. Bridge Block Corruption

The bridge block architecture depends on consistent LLM behavior. Other models may:

Create malformed bridge blocks
Fail to detect topic shifts
Break conversation continuity

Future Model Support

We may add support for other models in future versions, but this will require:

Extensive re-testing of all RAGAS benchmarks
Prompt engineering specific to each model
Validation that perfect scores are maintained
Documentation of any behavioral differences

Until then, stick to gpt-4.1-mini.

Default Configuration

The default model is set to gpt-4.1-mini everywhere:

# In HMLRClient
client = HMLRClient(
    api_key="...",
    # model="gpt-4.1-mini"  # This is the default
)

# In LangChain integration
from langchain_openai import ChatOpenAI

memory = HMLRMemory(api_key="...")
llm = ChatOpenAI(model="gpt-4.1-mini")  # Use matching model!

Cost Considerations

GPT-4.1-mini is OpenAI's most cost-effective model while maintaining strong reasoning capabilities:

Input: ~$0.15 per 1M tokens
Output: ~$0.60 per 1M tokens

HMLR's architecture is designed to work efficiently within these economics:

Average queries use < 4k tokens
Bridge blocks minimize redundant context
Fact extraction happens once per message

Checking Your Model

To verify what model you're using:

client = HMLRClient(
    api_key="...",
    db_path="memory.db",
    model="gpt-4.1-mini"
)

# The model is stored in the client
print(f"Using model: {client.conversation_engine.llama_client.model}")

Summary

Aspect	Status
Supported Models	`gpt-4.1-mini` ONLY
Tested Models	`gpt-4.1-mini` ONLY
Benchmarked Models	`gpt-4.1-mini` ONLY
Other Models	❌ NOT SUPPORTED
Future Support	Maybe (requires extensive testing)

When in doubt, use gpt-4.1-mini.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Compatibility

⚠️ CRITICAL WARNING

Why Only GPT-4.1-mini?

Verified Benchmarks

What Happens If You Use Another Model?

Potential Failure Modes with Other Models

1. JSON Parsing Failures

2. Fact Extraction Errors

3. Retrieval Failures

4. Bridge Block Corruption

Future Model Support

Default Configuration

Cost Considerations

Checking Your Model

Summary

Related Documentation

FilesExpand file tree

model_compatibility.md

Latest commit

History

model_compatibility.md

File metadata and controls

Model Compatibility

⚠️ CRITICAL WARNING

Why Only GPT-4.1-mini?

Verified Benchmarks

What Happens If You Use Another Model?

Potential Failure Modes with Other Models

1. JSON Parsing Failures

2. Fact Extraction Errors

3. Retrieval Failures

4. Bridge Block Corruption

Future Model Support

Default Configuration

Cost Considerations

Checking Your Model

Summary

Related Documentation