You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: agentic/SKILL.md
+33-10Lines changed: 33 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,20 +7,43 @@ Background: we are working on a learned cache project, basic idea is:
7
7
for a trace, we use first 20% traces for feature extraction, then we use this feature to predict the best parameters for the remaining 80% traces. However, in order to avoid ad-hoc parameter tunning, we do the
8
8
labeling via search best parameter for the whole trace.
9
9
Then it introduces a problem: for the same trace, we have three settings
10
-
1.best parameters for the whole trace
10
+
1.default parameters for the whole trace
11
11
2. default parameters for 20% trace + best parameters for the remaining 80% trace
12
-
3.default parameters for the whole trace
12
+
3.best parameters for the whole trace
13
13
14
-
We have a hypothesis is that case 1 < case 2 < case 3 in terms of miss ratio, but we find that sometimes case 2 would be worse than case 3, which is counterintuitive.
14
+
We have a hypothesis is that case 3 < case 2 < case 1 in terms of miss ratio, but we find that sometimes case 2 would be worse than case 1 (miss ratio high), which is counterintuitive.
15
15
16
-
Then we conducted a typical analysis over `/mnt/cfs/oracleReuse/tencentBlock/tencentBlock.ns4712.oracleGeneral.zst`.
16
+
Then we conducted a typical analysis over `/mnt/cfs/oracleReuse/tencentBlock/tencentBlock.ns4712.oracleGeneral.zst`. with code [compare.sh](../grid_search/analysis_output/compare.sh)[analyze_compare_results.py](../grid_search/analysis_output/analyze_compare_results.py) and the result is shown in the figure below:
> note: the value for `after-n-reqs` can be calculated by 20% * number of total requests in the trace, which is 20% * 1,000,000 = 200,000 in this case. total request number can be found in [cluster_stats.csv](../grid_search/analysis_output/cluster_stats.csv)
20
21
21
-
1.**Start with an analogy**: Compare the code to something from everyday life
22
-
2.**Draw a diagram**: Use ASCII art to show the flow, structure, or relationships
23
-
3.**Walk through the code**: Explain step-by-step what happens
24
-
4.**Highlight a gotcha**: What's a common mistake or misconception?
22
+
And in this case, our analysis result is that
23
+
a. diagnose the tail cases: it happens when the whole trace optimal mismatches the optimal for switching after 20% requests
24
+
The mismatching can be attributed as a combination is too specific to one trace.
25
+
A typical example is tencentBlock.ns4712.oracleGeneral.zst, under cache size ratio = 0.1, best parameter is small 0.2; ghost 3; s -> m thres 1; g -> m thres 1; skip ratio -> 25%.
26
+
Result 1: whole trace default - miss ratio 0.3122
27
+
Result 2: whole trace optimal - miss ratio 0.2744
28
+
Result 3: default 20% + switching to whole trace optimal - 0.3879
29
+
If we plot the miss/hit after 20% requests, we find that a common point is the cache size cannot cover the frequently requested scan pattern. However, whole trace optimal utilizes the limited cache space in an efficient way (not general enough) and it highly depends on the cache state (vulnerable). - Shown in the figure
25
30
26
-
Keep explanations conversational. For complex concepts, use multiple analogies.
31
+
Another evidence is when we increase the cache size ratio to 0.2, the miss ratios have no difference as follows
32
+
Result 1 - miss ratio 0.0810
33
+
Result 2 - miss ratio 0.0782
34
+
Result 3 - miss ratio 0.0782
35
+
36
+
b. inspired by this, we are conducting label cleaning to remove labels that are too specialized for certain traces via check the behavior of similar hyperparameters.
37
+
38
+
Now you are a analyzer with expertise, please try to analyze other traces with similar behavior (means case 2 is worse than case 3) and how to explain each case.
39
+
40
+
Before you start, let me tell you how to find those cases.
41
+
42
+
-[optimal.csv](../grid_search/analysis_output/optimal.csv) contains the miss ratio for case 3 for all the traces, note that case 3 corresponds to miss_ratio column
43
+
-[case2.csv](../grid_search/analysis_output/case2.csv) contains the miss ratio for case 2 for all the traces, note that case 2 corresponds to miss_ratio column
44
+
-[baseline01.csv](../grid_search/analysis_output/baseline01.csv) contains the miss ratio for case 1 for all the traces with default parameters and cache size ratio = 0.1 (find records with s3fifo algo)
45
+
-[baseline001.csv](../grid_search/analysis_output/baseline001.csv) contains the miss ratio for case 1 for all the traces with default parameters and cache size ratio = 0.01 (find records with s3fifo algo)
46
+
-[baseline0001.csv](../grid_search/analysis_output/baseline0001.csv) contains the miss ratio for case 1 for all the traces with default parameters and cache size ratio = 0.001 (find records with s3fifo algo)
47
+
48
+
49
+
Then you can find the traces with similar behavior by comparing the miss ratios for case 1 and case 2, and then analyze the possible reasons for the observed behavior.
0 commit comments