Skip to content

Commit bb3d169

Browse files
authored
feat(nvim): retrieval result summarisation in CodeCompanion.nvim query tool (#179)
* refactor(nvim): move `VectorCode.Result` processing into a function. * feat(nvim): add file summarization to codecompanion tool * feat(nvim): Add result summarisation to query tool * feat(nvim): smash a list of results into one HUGE string and send only one request to the summariser. * fix(nvim): Move result processing before conditional summarisation * refactor(nvim): Remove summary field from VectorCode.Result type * feat(nvim): allow dynamically switching on/off the summarisation * feat(nvim): Augment result summary with user query context * fix(nvim): merge conflicts * feat(nvim): allow customising the system prompt as a function * feat(nvim): add system prompt to merge chunks from the same file in result summarisation
1 parent b3a8fa2 commit bb3d169

3 files changed

Lines changed: 226 additions & 41 deletions

File tree

lua/vectorcode/integrations/codecompanion/common.lua

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,69 @@ local default_query_options = {
1111
default_num = { chunk = 50, document = 10 },
1212
no_duplicate = true,
1313
chunk_mode = false,
14+
summarise = {
15+
enabled = false,
16+
query_augmented = true,
17+
system_prompt = [[You are an expert and experienced code analyzer and summarizer. Your primary task is to analyze provided source code, which will be given as a list of XML objects, and generate a comprehensive, well-structured Markdown summary. This summary will serve as a concise source of information for others to quickly understand how the code works and how to interact with it, without needing to delve into the full source code.
18+
19+
Input Format:
20+
Each XML object represents either a full file or a chunk of a file, containing the following tags:
21+
- `<path>...</path>`: The absolute file path of the source code.
22+
- `<document>...</document>`: The full content of a source code file. This tag will not coexist with `<chunk>`.
23+
- `<chunk>...</chunk>`: A segment of source code from a file. This tag will not coexist with `<document>`.
24+
- `<start_line>...</start_line>` and `<end_line>...</end_line>`: These tags will be present only when a `<chunk>` tag is used, indicating the starting and ending line numbers of the chunk within its respective file.
25+
26+
Your goal is to process each of these XML objects. If multiple chunks belong to the same file, you must synthesize them to form a cohesive understanding of that file. Generate a single Markdown summary that combines insights from all provided objects.
27+
28+
Markdown Structure:
29+
30+
Top-Level Header (#): The absolute or relative file path of the source code.
31+
32+
Secondary Headers (##): For each top-level symbol (e.g., functions, classes, global variables) defined directly within the source code file that are importable or includable by other programs.
33+
34+
Tertiary Headers (###): For symbols nested one level deep within a secondary header's symbol (e.g., methods within a class, inner functions).
35+
36+
Quaternary Headers (####): For symbols nested two levels deep (e.g., a function defined within a method of a class).
37+
38+
Continue this pattern, incrementing the header level for each deeper level of nesting.
39+
40+
Content for Each Section:
41+
42+
Descriptive Summary: Each header section (from secondary headers downwards) must contain a concise and informative summary of the symbol defined by that header.
43+
44+
For Functions/Methods: Explain their purpose, parameters (including types), return values (including types), high-level implementation details, and any significant side effects or core logic. For example, if summarizing a sorting function, include the sorting algorithm used. If summarizing a function that makes an HTTP request, mention the network library employed.
45+
46+
For Classes: Describe the class's role, its main responsibilities, and key characteristics.
47+
48+
For Variables (global or within scope): State their purpose, type (if discernible), and initial value or common usage.
49+
50+
For Modules/Files (under the top-level header): Provide an overall description of the file's purpose, its main components, and its role within the larger project (if context is available).
51+
52+
General Guidelines:
53+
54+
Clarity and Conciseness: Summaries should be easy to understand, avoiding jargon where possible, and as brief as possible while retaining essential information. The full summary MUST NOT be longer than the original code input. When quoting a symbol in the code, include the line numbers where possible.
55+
56+
Accuracy: Ensure the summary accurately reflects the code's functionality.
57+
58+
Focus on Public Interface/Behavior: Prioritize describing what a function/class does and how it's used. Only include details about symbols (variables, functions, classes) that are importable/includable by other programs. DO NOT include local variables and functions that are not accessible by other functions outside their immediate scope.
59+
60+
No Code Snippets: Do not include any actual code snippets in the summary. Focus solely on descriptive text. If you need to refer to a specific element for context (e.g., in an error description), describe it and provide line numbers for reference from the source code.
61+
62+
Syntax/Semantic Errors: If the code contains syntax or semantic errors, describe them clearly within the summary, indicating the nature of the error.
63+
64+
Language Agnostic: Adapt the summary to the specific programming language of the provided source code (e.g., Python, JavaScript, Java, C++, etc.).
65+
66+
Handle Edge Cases/Dependencies: If a symbol relies heavily on external dependencies or handles specific edge cases, briefly mention these if they are significant to its overall function.
67+
68+
Information Source: There will be no extra information available to you. Provide the summary solely based on the provided XML objects.
69+
70+
Omit meaningless results: For an xml object that contains no meaningful information, you're free to omit it, but please leave a sentence in the summary saying that you did this.
71+
72+
No extra reply: Your reply should solely consist of the summary. Do not say anything else.
73+
74+
Merge chunks from the same file: When there are chunks that belong to the same file, merge their content so that they're grouped under the same top level header.
75+
]],
76+
},
1477
}
1578

1679
---@type VectorCode.CodeCompanion.LsToolOpts
@@ -23,6 +86,7 @@ local TOOL_RESULT_SOURCE = "VectorCodeToolResult"
2386

2487
return {
2588
tool_result_source = TOOL_RESULT_SOURCE,
89+
2690
---@param t table|string
2791
---@return string
2892
flatten_table_to_string = function(t)
@@ -81,7 +145,7 @@ return {
81145
)
82146
end
83147
if type(opts.max_num) == "table" then
84-
if opts._ then
148+
if opts.chunk_mode then
85149
opts.max_num = opts.max_num.chunk
86150
else
87151
opts.max_num = opts.max_num.document
@@ -103,6 +167,7 @@ return {
103167
---@param result VectorCode.QueryResult
104168
---@return string
105169
process_result = function(result)
170+
-- TODO: Unify the handling of summarised and non-summarised result
106171
local llm_message
107172
if result.chunk then
108173
-- chunk mode

lua/vectorcode/integrations/codecompanion/query_tool.lua

Lines changed: 131 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
---@module "codecompanion"
22

33
local cc_common = require("vectorcode.integrations.codecompanion.common")
4+
local cc_config = require("codecompanion.config").config
5+
local cc_schema = require("codecompanion.schema")
6+
local http_client = require("codecompanion.http")
47
local vc_config = require("vectorcode.config")
58
local check_cli_wrap = vc_config.check_cli_wrap
69
local logger = vc_config.logger
@@ -80,6 +83,91 @@ local filter_results = function(results, chat)
8083
return filtered_results
8184
end
8285

86+
---@alias ChatMessage {role: string, content:string}
87+
88+
---@param adapter CodeCompanion.Adapter
89+
---@param system_prompt string
90+
---@param user_messages string|string[]
91+
---@return {messages: ChatMessage[], tools:table?}
92+
local function make_oneshot_payload(adapter, system_prompt, user_messages)
93+
if type(user_messages) == "string" then
94+
user_messages = { user_messages }
95+
end
96+
local messages =
97+
{ { role = cc_config.constants.SYSTEM_ROLE, content = system_prompt } }
98+
for _, m in pairs(user_messages) do
99+
table.insert(messages, { role = cc_config.constants.USER_ROLE, content = m })
100+
end
101+
return { messages = adapter:map_roles(messages) }
102+
end
103+
104+
---@param result VectorCode.QueryResult[]
105+
---@param cmd QueryToolArgs
106+
---@param summarise_opts VectorCode.CodeCompanion.SummariseOpts
107+
---@param callback fun(summary:string)
108+
local function generate_summary(result, summarise_opts, cmd, callback)
109+
assert(vim.islist(result), "result should be a list of VectorCode.QueryResult")
110+
local result_xml = table.concat(vim
111+
.iter(result)
112+
:map(function(res)
113+
return cc_common.process_result(res)
114+
end)
115+
:totable())
116+
117+
if summarise_opts.enabled and type(callback) == "function" then
118+
---@type CodeCompanion.Adapter
119+
local adapter =
120+
vim.deepcopy(require("codecompanion.adapters").resolve(summarise_opts.adapter))
121+
122+
local system_prompt = summarise_opts.system_prompt
123+
if type(system_prompt) == "function" then
124+
system_prompt = system_prompt(
125+
cc_common.get_query_tool_opts().summarise.system_prompt --[[@as string]]
126+
)
127+
end
128+
129+
assert(
130+
type(system_prompt) == "string",
131+
"`system_prompt` should have been converted to a string."
132+
)
133+
if summarise_opts.query_augmented then
134+
system_prompt = string.format(
135+
[[%s
136+
137+
The code provided to you is the result of a search in a codebase from the following query: %s.
138+
When summarising the code, pay extra attention on information related to the queries.
139+
]],
140+
system_prompt,
141+
table.concat(cmd.query, ", ")
142+
)
143+
end
144+
local payload = make_oneshot_payload(adapter, system_prompt, result_xml)
145+
local settings =
146+
vim.deepcopy(adapter:map_schema_to_params(cc_schema.get_default(adapter)))
147+
settings.opts.stream = false
148+
149+
---@type CodeCompanion.Client
150+
local client = http_client.new({ adapter = settings })
151+
client:request(payload, {
152+
---@param _adapter CodeCompanion.Adapter
153+
callback = function(_, data, _adapter)
154+
if data then
155+
local res = _adapter.handlers.chat_output(_adapter, data)
156+
if res and res.status == "success" then
157+
local gen_summary = vim.trim(res.output.content or "")
158+
if gen_summary ~= "" then
159+
return callback(gen_summary)
160+
end
161+
end
162+
end
163+
return callback(result_xml)
164+
end,
165+
}, { silent = true })
166+
else
167+
callback(result_xml)
168+
end
169+
end
170+
83171
---@param opts VectorCode.CodeCompanion.QueryToolOpts?
84172
---@return CodeCompanion.Agent.Tool
85173
return check_cli_wrap(function(opts)
@@ -181,7 +269,27 @@ return check_cli_wrap(function(opts)
181269

182270
job_runner.run_async(args, function(result, error)
183271
if vim.islist(result) and #result > 0 and result[1].path ~= nil then ---@cast result VectorCode.QueryResult[]
184-
cb({ status = "success", data = result })
272+
if opts.no_duplicate then
273+
result = filter_results(result, agent.chat)
274+
end
275+
local max_result = #result
276+
if opts.max_num > 0 then
277+
max_result = math.min(tonumber(opts.max_num) or 1, max_result)
278+
end
279+
while #result > max_result do
280+
table.remove(result)
281+
end
282+
local summary_opts = vim.deepcopy(opts.summarise) or {}
283+
if type(summary_opts.enabled) == "function" then
284+
summary_opts.enabled = summary_opts.enabled(agent.chat, result)
285+
end
286+
generate_summary(result, summary_opts, action, function(s)
287+
cb({
288+
status = "success",
289+
---@type VectorCode.CodeCompanion.QueryToolResult
290+
data = { raw_results = result, count = #result, summary = s },
291+
})
292+
end)
185293
else
186294
if type(error) == "table" then
187295
error = cc_common.flatten_table_to_string(error)
@@ -280,50 +388,33 @@ If a query returned empty or repeated results, you should avoid using these quer
280388
end,
281389
---@param agent CodeCompanion.Agent
282390
---@param cmd QueryToolArgs
283-
---@param stdout VectorCode.QueryResult[][]
391+
---@param stdout VectorCode.CodeCompanion.QueryToolResult[]
284392
success = function(self, agent, cmd, stdout)
285393
stdout = stdout[1]
286394
logger.info(
287395
("CodeCompanion tool with command %s finished."):format(vim.inspect(cmd))
288396
)
289-
local user_message
290-
local max_result = #stdout
291-
if opts.max_num > 0 then
292-
max_result = math.min(opts.max_num or 1, max_result)
293-
end
294-
if opts.no_duplicate then
295-
stdout = filter_results(stdout, agent.chat)
296-
end
297-
for i, file in pairs(stdout) do
298-
if i <= max_result then
299-
if i == 1 then
300-
user_message = string.format(
301-
"**VectorCode Tool**: Retrieved %d %s(s)",
302-
max_result,
303-
mode
304-
)
305-
if cmd.project_root then
306-
user_message = user_message .. " from " .. cmd.project_root
307-
end
308-
user_message = user_message .. "\n"
309-
else
310-
user_message = ""
311-
end
312-
agent.chat:add_tool_output(
313-
self,
314-
cc_common.process_result(file),
315-
user_message
316-
)
317-
if not opts.chunk_mode then
318-
-- only add to reference if running in full document mode
319-
local ref = {
320-
source = cc_common.tool_result_source,
321-
id = file.path,
322-
path = file.path,
323-
opts = { visible = false },
324-
}
325-
agent.chat.references:add(ref)
326-
end
397+
agent.chat:add_tool_output(
398+
self,
399+
stdout.summary
400+
or table.concat(vim
401+
.iter(stdout.raw_results or {})
402+
:map(function(res)
403+
return cc_common.process_result(res)
404+
end)
405+
:totable()),
406+
string.format("**VectorCode Tool**: Retrieved %d %s(s)", stdout.count, mode)
407+
)
408+
for _, file in pairs(stdout) do
409+
if not opts.chunk_mode then
410+
-- skip referencing because there will be multiple chunks with the same path (id).
411+
-- TODO: figure out a way to deduplicate.
412+
agent.chat.references:add({
413+
source = cc_common.tool_result_source,
414+
id = file.path,
415+
path = file.path,
416+
opts = { visible = false },
417+
})
327418
end
328419
end
329420
end,

lua/vectorcode/types.lua

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
---@module "codecompanion"
2+
13
---Type definition of the retrieval result.
24
---@class VectorCode.QueryResult
35
---@field path string Path to the file
@@ -6,6 +8,7 @@
68
---@field start_line integer?
79
---@field end_line integer?
810
---@field chunk_id string?
11+
---@field summary string?
912

1013
---@class VectorCode.LsResult
1114
---@field project-root string
@@ -93,6 +96,7 @@
9396
--- Whether to send chunks instead of full files to the LLM. Default: `false`
9497
--- > Make sure you adjust `max_num` and `default_num` accordingly.
9598
---@field chunk_mode boolean?
99+
---@field summarise VectorCode.CodeCompanion.SummariseOpts?
96100

97101
---@class VectorCode.CodeCompanion.VectoriseToolOpts: VectorCode.CodeCompanion.ToolOpts
98102

@@ -103,3 +107,28 @@
103107
---@field collapse boolean
104108
--- Other tools that you'd like to include in `vectorcode_toolbox`
105109
---@field extras string[]
110+
111+
--- The result of the query tool should be structured in the following table
112+
---@class VectorCode.CodeCompanion.QueryToolResult
113+
---@field raw_results VectorCode.QueryResult[]
114+
---@field count integer
115+
---@field summary string|nil
116+
117+
---@class VectorCode.CodeCompanion.SummariseOpts
118+
---A boolean flag that controls whether summarisation should be enabled.
119+
---This can also be a function that returns a boolean.
120+
---In this case, you can use this option to dynamically control whether summarisation is enabled during a chat.
121+
---
122+
---This function recieves 2 parameters:
123+
--- - `CodeCompanion.Chat`: the chat object;
124+
--- - `VectorCode.QueryResult[]`: a list of query results.
125+
---@field enabled boolean|(fun(chat: CodeCompanion.Chat, results: VectorCode.QueryResult[]):boolean)|nil
126+
---The adapter used for the summarisation task. When set to `nil`, the adapter from the current chat will be used.
127+
---@field adapter string|CodeCompanion.Adapter|nil
128+
---The system prompt sent to the summariser model.
129+
---When set to a function, it'll recieve the default system prompt as the only parameter,
130+
---and should return the new (full) system prompt. This allows you to customise or rewrite the system prompt.
131+
---@field system_prompt string|(fun(original_prompt: string): string)
132+
---When set to true, include the query messages so that the LLM may make task-related summarisations.
133+
---This happens __after__ the `system_prompt` callback processing
134+
---@field query_augmented boolean

0 commit comments

Comments
 (0)