This guide provides a comprehensive walkthrough of the Semantic Mapper AI, explaining how to set up your environment, manage data, and orchestrate the agentic mapping workflow.
Semantic Mapper AI is designed for long-term persistence.
- Project Storage: Creating a project generates a local directory:
projects/<project_name>/. - Local File System: Your uploaded files are organized into specific sub-folders for isolation:
projects/<name>/files/vs/: Source documents for the Vector Store.projects/<name>/files/fsdm/: Target FSDM/ETL schema models.projects/<name>/files/mapping/: Source-to-target mapping spreadsheets.
- State Management: All UI states, configurations, and inventory data are saved to
projects/<name>/metadata.json. This is handled byproject_manager.py, ensuring that when you reload the app or switch projects, everything remains exactly where you left it.
Before starting, you must configure the LLM Endpoint via the sidebar.
- Base URL: Enter your LLM provider's endpoint. The system automatically queries the
/v1/modelsendpoint to populate the model list. - Model Selection: Fetch and select the specific model you wish to use for the Detective and Engineer agents.
- Embedding Model: Click Load Embedding Model to download the required transformer models from HuggingFace. These models run locally to power the semantic search in your Knowledge Base.
The Knowledge Base (KB) provides the context the agents need to understand your data.
Upload PDFs or Excel guides that describe your business logic. After uploading, click Sync with Vector Store to index them. This creates a searchable database in projects/<name>/vector_store/.
Upload your target data models (FSDM/ETL sheets).
- Merge Headers: If your Excel sheet has multi-row headers, check this option before syncing.
- Generate Metadata: Click this to have the AI analyze the table and generate a description.
- Tip: To customize the prompt used for this, refer to
agent/agents/fsdm_metadata.py.
- Tip: To customize the prompt used for this, refer to
Everything done till now and this is a one-time setup activity where you define the scope of your mapping project.
- Upload Mapping Sheets: Upload your source mapping documents. (Note: Currently, these cannot be removed from the UI once uploaded).
- Multi-Sheet Support: You can select multiple sheets (e.g., Fact tables, Dimension tables) to be part of your "Universe."
- Column Mapping: For each sheet, define which columns contain the Subject Area, DB Name, Table, Column, and Datatype.
- Save & Preview: Save the configuration to see a preview of how the data will be interpreted.
- Sync to Master: Once all sheets are configured, click Sync Mappings to Master to create a consolidated view of all potential mappings. This is stored in the project's SQLite database as the
unified_mapping_view.
Now that your universe is defined, you decide what to process.
- The Selection Tree: This hierarchy allows you to drill down from File -> Sheet -> Table -> Row.
- Flexible Scope: You can select a single row for a quick test, or select multiple entire tables for a batch run.
- Generate: Click Generate SQL Mappings to start the agents.
As the agents work, you can monitor the logs in real-time.
- View Results: Use the table selector in Section 3 to filter results by their target table.
- Analyze Reasoning: Each result includes the "Detective's" discovery report and the "Engineer's" SQL reasoning.
- Feedback & Verification: Provide feedback to regenerate logic or mark a row as "Verified" (Golden Example).
- Export: Once satisfied, click Export All Processed Tables to Excel to get a consolidated final report.
- Enhanced Output Views: We plan to split Section 3 into two distinct areas:
- Current Run: Showing only the outputs from your most recent execution.
- Project History: The current view showing all completed mappings across the project.
- UI Cleanup: Adding the ability to remove mapping sheets from the dashboard.
To extend or customize the system, here is how the core logic and AI layers are structured.
All heavy lifting is decoupled from the Streamlit UI and resides in the logic/ directory:
project_manager.py: The heart of the system. It handles folder creation, SQLite database initialization, and state persistence viametadata.json.fsdm/,mapping/,vector_store/: Specialized sub-directories that handle the lifecycle of their respective data types, ensuring the UI remains "thin" and focused only on presentation.
The application uses two primary agents residing in agent/agents/:
-
🕵️ FSDM Detective (
fsdm_detective.py):- Purpose: Investigates lineage using structured SQLite tables.
- Prompt Modification: Change the
state['system_prompt']inside thedetective_nodefunction. - Tools: Uses
lg_get_table_schema,lg_query_db, andlg_list_fsdm_tables_logicto autonomously query the mapping universe.
-
⚙️ Mapping Engineer (
mapping_oneshot.py):- Purpose: Consolidates the Detective's findings and vector context into final SQL logic.
- Prompt Modification: Update the
system_promptanduser_contentstrings in thegenerate_transformationfunction. - Tools: Primarily uses the Vector Store Retriever to pull unstructured documentation context.
- Metadata Generation: The "Generate Metadata" button in the UI uses the logic in
agent/agents/fsdm_metadata.py. This is a specialized utility prompt that helps the agents understand target table schemas before the mapping begins. - Discovery Tools: All tools used by the agents are defined in
agent/tools/tools.py. These tools provide the "hands" for the agents to interact with the SQLite database and Vector Store.
The AgentExecutor class in agent/agents/executor.py orchestrates the multi-phase mapping process. It manages the hand-off between the Detective (Phase 1) and the Engineer (Phase 2).
- Mapping Agent Swap: The system is designed to be flexible. You can swap between two types of mapping engines in
executor.py:mapping_oneshot.py(Current Default): Optimized for speed and smaller models. It takes the "Discovery Intent" from the Detective and generates SQL in a single pass.mapping_engineer.py: A full ReAct agent that can autonomously query the database and explore the schema. This is recommended for larger, more capable models.
- How to Swap: Both agents are compatible. To switch, simply update the import and the initialization variable in
AgentExecutor._initialize_agents().
To ensure the multi-agent system functions correctly, your chosen LLM must meet these minimum requirements:
- Context Window: Minimum 10,000 tokens. The system passes detailed metadata, discovery reports, and vector context that can quickly consume smaller windows.
- Intelligence Level: The model must be "smart enough" to perform multi-step SQL querying against the FSDM documentation tables.
Smaller or less capable models (e.g., early 7B/13B parameters) often suffer from:
- Instruction Drift: As the agent progresses, it may lose sight of user patterns and mandatory instructions.
- Tool Execution Errors: Models may exit their loop without calling the mandatory "exit tools" (like
FSDMIntentOutput). - Symptom: In the UI, if the FSDM Discovery section remains empty or the agent logs show a completion without an output report, it is likely due to the model failing to follow the structured output constraints.
All AI-generated insights, including lineage reports, SQL logic, and agent reasoning, are persisted in the project's SQLite database:
- File:
projects/<project_name>/mapping.db - Table:
final_mappings