|
1 | 1 | # CodeCompass Context Improvement TODO List |
2 | 2 |
|
3 | 3 | This document outlines the tasks required to enhance CodeCompass's ability to provide comprehensive context to its AI agent, especially when dealing with large and complex git repositories. |
4 | | - |
5 | | -## Prioritization Notes |
6 | | - |
7 | | -The following prioritization aims to tackle foundational improvements first, building a solid base for more advanced features. |
8 | | - |
9 | | -**Phase 1: Core Context Retrieval Enhancements (Highest Priority)** |
10 | | -1. **P1 - Task Group 1 (Formerly Task 1): Increase Qdrant Search Result Limit.** (Focus: Get more raw data from existing index) |
11 | | -2. **P2 - Task Group 2 (Formerly Task 3): Index Large Files (Chunking Strategy).** (Focus: Ensure all relevant code is indexed) |
12 | | -3. **P3 - Task Group 3 (Formerly Task 2): Improve "Recent Changes" (Diff) Context.** (Focus: Provide meaningful change history) |
13 | | - |
14 | | -**Phase 2: Smarter Agent Processing & Control** |
15 | | -* Tasks related to how the agent uses and requests the improved context. (Formerly Section II) |
16 | | - |
17 | | -**Phase 3: Configuration & Advanced Features** |
18 | | -* Tasks related to making the system more flexible and adding sophisticated enhancements. (Formerly Section III and advanced items from Section II) |
19 | | - |
20 | | ---- |
21 | | - |
22 | | -## Phase 1: Core Context Retrieval Enhancements |
23 | | - |
24 | | -### P1 - Task Group 1: Increase Qdrant Search Result Limit |
25 | | -*Goal: Allow retrieval of more potential context from the vector store.* |
26 | | - |
27 | | -* [x] **Task 1.1:** Modify `src/lib/query-refinement.ts`: |
28 | | - * [x] Make the `limit` parameter in `qdrantClient.search()` calls configurable (e.g., read from `configService`). |
29 | | - * [x] **Consider (Advanced):** Explore logic for the agent or refinement process to dynamically request a higher search limit if initial results are insufficient. (Addressed by `request_additional_context` tool with `MORE_SEARCH_RESULTS`) |
30 | | -* [x] **Task 1.2:** Update `src/lib/config-service.ts` (and potentially `src/lib/config.ts` or `.env` examples): |
31 | | - * [x] Add a new configuration variable for the default Qdrant search result limit (e.g., `QDRANT_SEARCH_LIMIT_DEFAULT`). |
32 | | - |
33 | | -### P2 - Task Group 2: Index Large Files (Chunking Strategy) |
34 | | -*Goal: Ensure content from very large files is searchable.* |
35 | | - |
36 | | -* [x] **Task 2.1 (Formerly Task 3.1):** Modify `src/lib/repository.ts` (`indexRepository` function): |
37 | | - * [x] Instead of skipping files larger than `configService.MAX_SNIPPET_LENGTH * 10`, implement a file chunking mechanism. |
38 | | - * [x] Define a chunk size (e.g., `configService.MAX_SNIPPET_LENGTH`) with some overlap between chunks. |
39 | | - * [x] For each chunk, generate an embedding and upsert it to Qdrant. |
40 | | - * [x] The payload for each chunk should include: |
41 | | - * Original `filepath`. |
42 | | - * Chunk content. |
43 | | - * Chunk number / position within the original file. |
44 | | - * `last_modified` timestamp of the original file. |
45 | | -* [x] **Task 2.2 (Formerly Task 3.2):** Modify `src/lib/agent.ts` and `src/lib/query-refinement.ts`: |
46 | | - * [x] When processing search results, if results are from chunked files, ensure the agent is aware (e.g., "This snippet is part of a larger file: [filename], chunk X of Y"). |
47 | | - * [x] Consider if query refinement or result presentation needs adjustment for chunked results (e.g., retrieving adjacent chunks if one is highly relevant). (Addressed by agent awareness of chunks and `request_additional_context` tool with `ADJACENT_FILE_CHUNKS`) |
48 | | - |
49 | | -### P3 - Task Group 3: Improve "Recent Changes" (Diff) Context |
50 | | -*Goal: Provide meaningful, content-based diff information.* |
51 | | - |
52 | | -* [x] **Task 3.1 (Formerly Task 2.1):** Modify `src/lib/repository.ts` (`getRepositoryDiff` function): |
53 | | - * [x] Change the implementation to fetch actual `git diff` content between the last two commits (e.g., using `isomorphic-git`'s diff capabilities or by shelling out to a `git diff` command). Ensure it returns the textual diff. |
54 | | -* [x] **Task 3.2 (Formerly Task 2.2):** Modify `src/lib/agent.ts` (where `getRepositoryDiff` is called, likely within tool execution like `get_repository_context` or `generate_suggestion`): |
55 | | - * [x] If the fetched diff content is large, implement LLM-based summarization to create a concise overview of key changes. |
56 | | - * [x] Pass either the full diff (if manageable) or the summary to the agent's main prompt. |
57 | | - * [x] Update prompt assembly logic to correctly incorporate this richer diff information. |
58 | | - |
59 | | ---- |
60 | | - |
61 | | -## Phase 2: Smarter Agent Processing & Control |
62 | | -*(Formerly Section II - Tasks renumbered for clarity within this phase)* |
63 | | - |
64 | | -1. **Task P2.1 (Formerly Task 4.1): Dynamic Context Presentation in Prompts:** |
65 | | - * [x] Modify `src/lib/agent.ts` (prompt generation logic for tools like `generate_suggestion` and the main agent loop): |
66 | | - * [x] For file lists: If the list of relevant files is long, use an LLM to summarize the list or select the N most relevant based on the query, instead of simple truncation (`files.slice(0, 10)`). |
67 | | - * [x] For code snippets: If a retrieved snippet is very long (even after Qdrant retrieval, before being passed to the agent's reasoning LLM), consider an LLM call to summarize its essence in relation to the query. |
68 | | - * [x] **Consider:** Allow the agent to explicitly request "more detail" or "full content" for a summarized item if it deems it necessary. (Addressed by `request_additional_context` tool with `FULL_FILE_CONTENT`) |
69 | | - |
70 | | -2. **Task P2.2 (Formerly Task 5.1): Context-Aware Agent System Prompt:** |
71 | | - * [x] Modify `src/lib/agent.ts` (`generateAgentSystemPrompt` function): |
72 | | - * [x] Add instructions for the agent to self-assess the sufficiency of retrieved context relative to the query's scope. |
73 | | - * [x] Guide the agent on how to react to insufficient context (e.g., "If initial search results are sparse or low-relevance for a broad query, consider using `get_repository_context` with a broader query, or explicitly request a wider search using `request_broader_context` tool if available."). |
74 | | - |
75 | | -3. **Task P2.3 (Advanced - Formerly Task 6.1): LLM-Powered Query Refinement:** |
76 | | - * [x] Modify `src/lib/query-refinement.ts`: |
77 | | - * [x] Design a new prompt for an LLM to perform query refinement. Input: original query, initial (poor) search results, (optional) high-level repository summary. Output: a refined query string. *(Implemented rule-based refinement; LLM-based refinement is an advanced alternative not currently in `searchWithRefinement` but agent can refine queries for tools).* |
78 | | - * [x] Integrate this LLM call into the `searchWithRefinement` loop as an alternative or supplement to the current rule-based refinement. *(As above)* |
79 | | - * [x] Add necessary configuration for this LLM call (e.g., specific model, prompt template via `configService`). *(As above)* |
80 | | - |
81 | | -4. **Task P2.4 (Advanced - Formerly Task 7.1-7.3): Explicit "Request More Context" Agent Tool:** |
82 | | - * [x] **Task P2.4 (Advanced - Formerly Task 7.1-7.3): Explicit "Request More Context" Agent Tool:** |
83 | | - * [x] Define a new tool in `src/lib/agent.ts` (in `toolRegistry` and `executeToolCall`): |
84 | | - * Name: `request_additional_context`. |
85 | | - * Parameters: `context_type: enum("MORE_SEARCH_RESULTS", "FULL_FILE_CONTENT", "DIRECTORY_LISTING", "ADJACENT_FILE_CHUNKS")`, `query_or_path: string`, `reasoning: string`. |
86 | | - * [x] Implement the logic for `executeToolCall` for this new tool. This might involve: |
87 | | - * [x] Re-running `searchWithRefinement` with an adjusted original query or increased search limit for "MORE_SEARCH_RESULTS". |
88 | | - * [x] Using file system operations to list files in a directory for "DIRECTORY_LISTING". |
89 | | - * [x] Reading full file content for "FULL_FILE_CONTENT", potentially with summarization for very large files. |
90 | | - * [x] Retrieving adjacent chunks for "ADJACENT_FILE_CHUNKS". |
91 | | - * [x] Update `generateAgentSystemPrompt` to inform the agent about this new tool and when to use it. |
92 | | - |
93 | | ---- |
94 | | - |
95 | | -## Phase 3: Configuration, Flexibility & Validation |
96 | | -*(Formerly Section III and IV - Tasks renumbered)* |
97 | | - |
98 | | -1. **Task P3.1 (Formerly Task 8.1-8.3): Expose Key Parameters via `configService`:** |
99 | | - * [x] Identify and list all new and existing parameters that should be user-configurable (e.g., Qdrant search limits, default/max agent steps, max refinement iterations, chunk sizes for large file indexing, LLM models for summarization/refinement). |
100 | | - * [x] Add these to `src/lib/config.ts` (with defaults) and `src/lib/config-service.ts` to load them from environment variables or a config file. |
101 | | - * [x] Update `README.md` and any example `.env` files with these new configuration options. |
102 | | - |
103 | | -2. **Task P3.2 (Formerly Task 9.1): Flexible Agent Loop Steps:** |
104 | | - * [x] Modify `src/lib/agent.ts` (`runAgentLoop` function): |
105 | | - * [x] Implement a mechanism for the agent's LLM to output a special token or instruction if it determines it needs more processing steps beyond the current `maxSteps`. (Implemented via `request_more_processing_steps` tool) |
106 | | - * [x] If this instruction is received, and a global maximum hasn't been hit, allow the loop to continue for a few more iterations. (Implemented in `runAgentLoop` logic) |
107 | | - |
108 | | -3. **Task P3.3 (Formerly Task 10.1-10.3): Testing and Validation:** |
109 | | - * [x] Develop test cases specifically for large repositories with diverse query types. *(Unit tests for agent.ts, config-service.ts, repository.ts, query-refinement.ts implemented.)* |
110 | | - * [~] Evaluate the impact of each implemented improvement on context quality and agent performance. *(Guidance below, to be executed by user/developer)* |
111 | | - * [~] Profile performance, especially for indexing large files and LLM-heavy operations (summarization, LLM-based refinement). *(Guidance below, to be executed by user/developer)* |
112 | | - * [x] **Implement comprehensive unit tests with positive and negative cases, aiming for high coverage and adhering to best practices.** *(Core unit tests implemented for key modules.)* |
113 | | - |
114 | | ---- |
115 | | - |
116 | | -This list should provide a clear roadmap for these enhancements. |
0 commit comments