Skip to content

Commit c3c8246

Browse files
welkeyeverAsterDY
authored andcommitted
feat: struct compress (cloudwego#23)
* feat: add model file * optimzie: prompt engineering * feat: refactor & support type compress * fix: errors * refactor: fix llm related error * chore: remove unused import * feat: prompt engineering & change to mistral model
1 parent 34fa1d3 commit c3c8246

4 files changed

Lines changed: 254 additions & 30 deletions

File tree

script/ModelFile

Lines changed: 0 additions & 8 deletions
This file was deleted.

script/ModelFile-func

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
FROM mistral:latest
2+
3+
SYSTEM """# Character
4+
You're a Golang engineer with a special talent for condensing and summarizing complex methods or functions. You've a knack for retaining key information while keeping the summary concise and informative. Your task involves summarizing information provided to you through JSON strings, although you never show any code in your summaries or refer to the JSON string format.
5+
6+
## Skills
7+
### Skill 1: Summarize Method or Function
8+
- Extract the content of the main function/method to be summarized from the "Content" field.
9+
- Closely look at the "Related_func" list for all the functions or methods that are called in the main function/method.
10+
- Be attentive to the possibilities that "Related_func" can be null which means there is no other function/method is called. Just ignore "Related_func" in this case. And do not mention the lack of "Related_func" in your summaries.
11+
- Your summary should decode the functionality of the method or function in a concise sentence without losing its essence.
12+
- Stick to the format: "XXX is for XXX..."
13+
14+
### Skill 2: Consider Related Functions or Methods
15+
- Each element from the "Related_func" list requires thorough consideration. Notably, the "CallName" shows the name that is called in the main function/method while "Description" gives the summarized context of it.
16+
- Integrate the information from "Related_func"(if there has any) to your summary of the main method or function, enriching it with more specific detail.
17+
18+
## Constraints:
19+
- Remember to keep your summaries focused on the function/method, ignore any mentions of "JSON string".
20+
- Never show any code in your summaries.
21+
- Maintain the output format by beginning the summarization as 'XXX is for XXX...'
22+
- Remain on the topic of summarizing functions or methods, if the user diverges from this, do not accommodate their queries.
23+
- Match the language used by the user in their queries. Make sure not to use a language that doesn't parallel the user's choice.
24+
- Only answer questions regarding function or method summarization. For any unrelated queries, do not provide an answer."""
25+
26+
PARAMETER temperature 0.7
27+
28+

script/ModelFile-type

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
FROM mistral:latest
2+
3+
SYSTEM """# Character
4+
You're a systematic Go programmer. You specialize in summarizing key details of user-provided Go types (such as structs, interfaces, and so on), which are pitched in JSON format.
5+
6+
## Skills
7+
8+
### Skill 1: Summarize Type Content
9+
- Inspect and recognize the definition represented in the "Content" of the type in JSON format.
10+
11+
### Skill 2: Identify Related Methods
12+
- Evaluate the "Related_methods" list, each item of which is an object representing a related method for this type.
13+
- Every object contains "Name" as the method's name and "Description" as the condensed information of the method.
14+
- If no methods exist for this type, "Related_methods" will be null.
15+
16+
### Skill 3: Identify Related Types
17+
- Comprehend the "Related_types" list, each element of which is an object illustrating a type that is utilized in the type definition.
18+
- Each object encompasses "Name" as the type's name and "Description" as the brief information of the type.
19+
- If there are no types for this type, "Related_types" will be null.
20+
21+
## Constraints:
22+
- Remember to keep your summaries focused on the type, ignore any mentions of "JSON string".
23+
- Never show any code in your summaries.
24+
- It is strictly prohibited to show the origin JSON string or any part of it.
25+
- Using the information provided in JSON format, you should accurately summarize the type content, related methods, and related types.
26+
- "Related_methods" and "Related_types", should be null when the type doesn't have any methods or other types associated.
27+
- Start your responses with a summary of the type content directly."""
28+
29+
30+
PARAMETER temperature 0.7

src/compress/compress.rs

Lines changed: 196 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -82,37 +82,66 @@ pub struct Identity {
8282
struct ToCompressFunc {
8383
#[serde(rename = "Content")]
8484
content: String,
85-
#[serde(rename = "related_func")]
86-
related_func: Option<Vec<CalledFunc>>,
85+
#[serde(rename = "Related_func")]
86+
related_func: Option<Vec<CalledType>>,
87+
}
88+
89+
#[derive(Serialize, Deserialize, Debug)]
90+
struct ToCompressType {
91+
#[serde(rename = "Content")]
92+
content: String,
93+
#[serde(rename = "Related_methods")]
94+
related_methods: Option<Vec<KeyValueType>>,
95+
#[serde(rename = "Related_types")]
96+
related_types: Option<Vec<KeyValueType>>,
8797
}
8898

8999

90100
#[derive(Serialize, Deserialize, Debug)]
91-
struct CalledFunc {
101+
struct CalledType {
92102
#[serde(rename = "CallName")]
93103
pub call_name: String,
94104
#[serde(rename = "Description")]
95105
pub description: String,
96106
}
97107

108+
#[derive(Serialize, Deserialize, Debug)]
109+
struct KeyValueType {
110+
#[serde(rename = "Name")]
111+
pub name: String,
112+
#[serde(rename = "Description")]
113+
pub description: String,
114+
}
115+
98116
pub fn from_json(json: &str) -> Result<Repository, Box<dyn Error>> {
99117
let f: Repository = serde_json::from_str(json)?;
100118
Ok(f)
101119
}
102120

103121

104122
pub async fn compress_all(repo: &mut Repository) {
105-
let mut to_compress = Vec::new();
123+
let mut to_compress_func = Vec::new();
124+
let mut to_compress_type = Vec::new();
125+
106126
for (_, pkg) in &repo.packages {
107127
for (_, func) in &pkg.functions {
108128
let id = Identity { pkg_path: func.pkg_path.clone(), name: func.name.clone() };
109-
to_compress.push(id)
129+
to_compress_func.push(id)
130+
}
131+
132+
for (_, _type) in &pkg.types {
133+
let id = Identity { pkg_path: _type.pkg_path.clone(), name: _type.name.clone() };
134+
to_compress_type.push(id)
110135
}
111136
}
112137

113-
for id in to_compress {
138+
for id in to_compress_func {
114139
cascade_compress_function(&id, repo).await;
115140
}
141+
142+
for id in to_compress_type {
143+
cascade_compress_struct(&id, repo).await;
144+
}
116145
}
117146

118147
#[async_recursion]
@@ -170,7 +199,7 @@ pub async fn cascade_compress_function(id: &Identity, repo: &mut Repository) {
170199
println!("content is empty skip it");
171200
Some("".to_string())
172201
} else {
173-
llm_compress(func_opt.content.as_str(), map).await
202+
llm_compress_func(func_opt.content.as_str(), map).await
174203
}
175204
};
176205

@@ -183,30 +212,175 @@ pub async fn cascade_compress_function(id: &Identity, repo: &mut Repository) {
183212
func_opt.compress_data = content;
184213
}
185214

186-
async fn llm_compress(func: &str, extra: HashMap<String, String>) -> Option<String> {
187-
let compress_data = _ollama_compress(func.to_string(), extra).await;
188-
Option::from(compress_data)
215+
#[async_recursion]
216+
pub async fn cascade_compress_struct(id: &Identity, repo: &mut Repository) {
217+
let mut to_compress = Vec::new();
218+
219+
{
220+
let struct_opt = repo.packages.get(id.pkg_path.as_str()).unwrap().types.get(id.name.as_str());
221+
if struct_opt.is_none() {
222+
println!("not found struct, id {:?}", id);
223+
}
224+
let stru = struct_opt.unwrap();
225+
if stru.compress_data.is_some() {
226+
println!("{} is already compressed, skip it.", stru.name);
227+
return;
228+
}
229+
if let Some(sub) = &stru.sub_struct {
230+
for (_, f) in sub {
231+
if !f.pkg_path.starts_with(&repo.mod_name) {
232+
continue;
233+
}
234+
235+
let id = Identity { pkg_path: f.pkg_path.clone(), name: f.name.clone() };
236+
to_compress.push(id);
237+
}
238+
}
239+
240+
if let Some(inline) = &stru.inline_struct {
241+
for (_, f) in inline {
242+
if !f.pkg_path.starts_with(&repo.mod_name) {
243+
continue;
244+
}
245+
246+
let id = Identity { pkg_path: f.pkg_path.clone(), name: f.name.clone() };
247+
to_compress.push(id);
248+
}
249+
}
250+
}
251+
252+
for f_id in to_compress {
253+
cascade_compress_struct(&f_id, repo).await;
254+
}
255+
256+
let mut type_map = HashMap::new();
257+
let mut method_map = HashMap::new();
258+
let content = {
259+
let _type = repo.packages.get(id.pkg_path.as_str()).unwrap().types.get(id.name.as_str()).unwrap();
260+
if let Some(subs) = &_type.sub_struct {
261+
for (k, f) in subs {
262+
let pkg = repo.packages.get(f.pkg_path.as_str());
263+
if pkg.is_none() {
264+
// TODO
265+
eprintln!("do not get the type, must be a third party one: {:?}", f);
266+
continue;
267+
}
268+
let sub = pkg.unwrap().types.get(f.name.as_str());
269+
type_map.insert(k.clone(), sub.unwrap().compress_data.clone().unwrap());
270+
}
271+
}
272+
273+
if let Some(inlines) = &_type.inline_struct {
274+
for (k, f) in inlines {
275+
let pkg = repo.packages.get(f.pkg_path.as_str());
276+
if pkg.is_none() {
277+
// TODO
278+
eprintln!("do not get the type, must be a third party one: {:?}", f);
279+
continue;
280+
}
281+
let inline = repo.packages.get(f.pkg_path.as_str()).unwrap().types.get(f.name.as_str());
282+
283+
type_map.insert(k.clone(), inline.unwrap().compress_data.clone().unwrap());
284+
}
285+
}
286+
287+
if let Some(methods) = &_type.methods {
288+
for (k, f) in methods {
289+
let func = repo.packages.get(f.pkg_path.as_str()).unwrap().functions.get(f.name.as_str());
290+
if func.is_none() {
291+
// TODO
292+
eprintln!("[BUG] do not get the method of the type, id: {:?}", f);
293+
} else {
294+
method_map.insert(k.clone(), func.unwrap().compress_data.clone().unwrap());
295+
}
296+
}
297+
}
298+
299+
println!("start to compress type: {}", _type.name);
300+
if _type.content.is_empty() {
301+
println!("content is empty skip it");
302+
Some("".to_string())
303+
} else {
304+
llm_compress_type(_type.content.as_str(), type_map, method_map).await
305+
}
306+
};
307+
308+
let mut type_opt = repo.packages.get_mut(id.pkg_path.as_str()).unwrap().types.get_mut(id.name.as_str()).unwrap();
309+
if content.is_some() {
310+
let content = content.unwrap().trim().to_string();
311+
type_opt.compress_data = Some(content);
312+
return;
313+
}
314+
type_opt.compress_data = content;
189315
}
190316

191317

192-
pub async fn _ollama_compress(func: String, ctx: HashMap<String, String>) -> String {
193-
let request_url = format!("http://localhost:11434/api/generate");
318+
pub enum ToCompress {
319+
ToCompressFunc(String),
320+
ToCompressType(String),
321+
}
194322

195-
let mut compress_func = ToCompressFunc { content: func, related_func: None };
196-
if !ctx.is_empty() {
323+
async fn llm_compress_func(func: &str, extra: HashMap<String, String>) -> Option<String> {
324+
let mut compress_func = ToCompressFunc { content: func.to_string(), related_func: None };
325+
if !extra.is_empty() {
197326
let mut related_func = Vec::new();
198-
for (name, compressed_data) in ctx {
199-
let re = CalledFunc { call_name: name, description: compressed_data };
327+
for (name, compressed_data) in extra {
328+
let re = CalledType { call_name: name, description: compressed_data };
200329
related_func.push(re);
201330
}
202331
compress_func.related_func = Some(related_func);
203332
}
333+
let to_compress_str = serde_json::to_string(&compress_func).unwrap();
334+
let compress_func_enum = ToCompress::ToCompressFunc(to_compress_str);
335+
let compress_data = _ollama_compress(compress_func_enum).await;
336+
Option::from(compress_data)
337+
}
338+
339+
// depends on the compressed info of methods, so call llm_compress_func first.
340+
async fn llm_compress_type(func: &str, extra_type: HashMap<String, String>, related_methods: HashMap<String, String>) -> Option<String> {
341+
let mut compress_type = ToCompressType { content: func.to_string(), related_methods: None, related_types: None };
342+
if !extra_type.is_empty() {
343+
let mut r_type = Vec::new();
344+
for (name, compressed_data) in extra_type {
345+
let re = KeyValueType { name, description: compressed_data };
346+
r_type.push(re);
347+
}
348+
compress_type.related_types = Some(r_type);
349+
}
350+
351+
if !related_methods.is_empty() {
352+
let mut r_methods = Vec::new();
353+
for (name, compressed_data) in related_methods {
354+
let re = KeyValueType { name, description: compressed_data };
355+
r_methods.push(re);
356+
}
357+
compress_type.related_methods = Some(r_methods);
358+
}
359+
360+
361+
let to_compress_str = serde_json::to_string(&compress_type).unwrap();
362+
let compress_type_enum = ToCompress::ToCompressType(to_compress_str);
363+
let compress_data = _ollama_compress(compress_type_enum).await;
364+
Option::from(compress_data)
365+
}
204366

205-
let to_compress_func = serde_json::to_string(&compress_func).unwrap();
206367

368+
pub async fn _ollama_compress(to_compress: ToCompress) -> String {
369+
let request_url = format!("http://localhost:11434/api/generate");
370+
let mut model_name = "codellama-private";
371+
let mut to_compress_str = String::new();
372+
match to_compress {
373+
ToCompress::ToCompressType(t) => {
374+
model_name = "codellama-private-type";
375+
to_compress_str = t;
376+
}
377+
ToCompress::ToCompressFunc(f) => {
378+
to_compress_str = f;
379+
}
380+
}
207381

208-
println!("use prompt:\n{}", to_compress_func);
209-
let req_body: ollama_req = ollama_req { model: "codellama-private".to_string(), prompt: to_compress_func };
382+
println!("use prompt:\n{}", to_compress_str);
383+
let req_body: OllamaReq = OllamaReq { model: model_name.to_string(), prompt: to_compress_str };
210384
let client = reqwest::Client::new();
211385
let mut response = client
212386
.post(&request_url)
@@ -222,7 +396,7 @@ pub async fn _ollama_compress(func: String, ctx: HashMap<String, String>) -> Str
222396
break;
223397
}
224398

225-
let value: ollama_resp = result.unwrap();
399+
let value: OllamaResp = result.unwrap();
226400

227401
if !value.response.is_empty() {
228402
output.push_str(value.response.as_str());
@@ -237,13 +411,13 @@ pub async fn _ollama_compress(func: String, ctx: HashMap<String, String>) -> Str
237411
}
238412

239413
#[derive(Serialize, Deserialize, Debug)]
240-
struct ollama_req {
414+
struct OllamaReq {
241415
model: String,
242416
prompt: String,
243417
}
244418

245419
#[derive(Serialize, Deserialize, Debug)]
246-
struct ollama_resp {
420+
struct OllamaResp {
247421
model: String,
248422
created_at: String,
249423
response: String,

0 commit comments

Comments
 (0)