Commit 8a06d78
committed
Parallelize post-passes, fix mode filtering + semantic edge quality
- Parallelize pass_similarity and pass_semantic_edges via worker pool with
thread-local edge buffers; sequential final merge since gbuf is not
thread-safe. Adds cbm_lsh_query_into() as a thread-safe variant with
caller-provided candidate buffer.
- Add activatable profiling subsystem (CBM_PROFILE=1 env or --profile flag)
for step-level timing of extract, resolve, corpus build, vector phases,
and sqlite dump. Zero overhead when disabled.
- Fix cbm_index_mode_t enum mismatch between pipeline.h (FULL=0, MODERATE=1,
FAST=2) and discover.h (FULL=0, FAST=1). mode=fast silently no-op'd
fast-discovery filtering because discover.c compared against the wrong
value. Linux kernel fast mode went 1:40 -> 3:11 as a result; now back to
1:40. Broaden the filter guard to mode != CBM_MODE_FULL so MODERATE and
FAST both get aggressive discovery.
- Clamp cbm_sem_combined_score output to [0, 1]. The proximity multiplier
returns up to 1.10 as a same-file boost which could push the final
cosine score above 1.0.
- Short-circuit semantic scoring when MinHash jaccard >= 0.95. Exact
near-clones are already emitted as SIMILAR_TO edges; returning 0 here
avoids flooding SEMANTICALLY_RELATED with cross-service copy-paste
boilerplate and frees the edge budget for genuine vocabulary-bridged
relations.
- Validate search_graph semantic_query as an array of strings and return
a clear error for a single-string input. Update the tool description
to spell out the requirement explicitly with an example.
- JSON-escape user-controlled strings (callee names, call arguments,
URL paths, import local_name) in call/argument properties. Introduces
cbm_json_escape() in foundation/str_util.
- Skip SQLite pending_byte_page (file offset 0x40000000) during raw page
writes in sqlite_writer to avoid corrupting databases that cross the
1 GiB boundary.
- Migrate pretrained vector blob from UniXcoder (51K tokens) to
nomic-embed-code (40856 tokens x 768d int8). Includes the extraction
script under scripts/extract_nomic_vectors.py.1 parent bf70078 commit 8a06d78
File tree
30 files changed
+84094
-103255
lines changed- internal/cbm
- scripts
- src
- discover
- foundation
- graph_buffer
- mcp
- pipeline
- semantic
- simhash
- store
- vendored
- nomic
- unixcoder
30 files changed
+84094
-103255
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
110 | | - | |
| 110 | + | |
| 111 | + | |
111 | 112 | | |
112 | 113 | | |
113 | 114 | | |
| |||
195 | 196 | | |
196 | 197 | | |
197 | 198 | | |
198 | | - | |
199 | | - | |
| 199 | + | |
| 200 | + | |
200 | 201 | | |
201 | 202 | | |
202 | 203 | | |
| |||
418 | 419 | | |
419 | 420 | | |
420 | 421 | | |
421 | | - | |
| 422 | + | |
422 | 423 | | |
423 | | - | |
| 424 | + | |
424 | 425 | | |
425 | 426 | | |
426 | 427 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
29 | 49 | | |
30 | 50 | | |
31 | 51 | | |
| |||
467 | 487 | | |
468 | 488 | | |
469 | 489 | | |
470 | | - | |
| 490 | + | |
| 491 | + | |
471 | 492 | | |
472 | 493 | | |
473 | 494 | | |
| |||
559 | 580 | | |
560 | 581 | | |
561 | 582 | | |
| 583 | + | |
562 | 584 | | |
563 | 585 | | |
564 | 586 | | |
| |||
1009 | 1031 | | |
1010 | 1032 | | |
1011 | 1033 | | |
| 1034 | + | |
1012 | 1035 | | |
1013 | 1036 | | |
1014 | 1037 | | |
| |||
1046 | 1069 | | |
1047 | 1070 | | |
1048 | 1071 | | |
1049 | | - | |
| 1072 | + | |
| 1073 | + | |
| 1074 | + | |
1050 | 1075 | | |
1051 | 1076 | | |
1052 | 1077 | | |
| |||
1057 | 1082 | | |
1058 | 1083 | | |
1059 | 1084 | | |
| 1085 | + | |
1060 | 1086 | | |
1061 | 1087 | | |
1062 | 1088 | | |
| |||
1660 | 1686 | | |
1661 | 1687 | | |
1662 | 1688 | | |
| 1689 | + | |
1663 | 1690 | | |
1664 | 1691 | | |
1665 | 1692 | | |
| |||
1669 | 1696 | | |
1670 | 1697 | | |
1671 | 1698 | | |
| 1699 | + | |
1672 | 1700 | | |
1673 | 1701 | | |
| 1702 | + | |
1674 | 1703 | | |
1675 | 1704 | | |
1676 | 1705 | | |
1677 | 1706 | | |
1678 | 1707 | | |
1679 | 1708 | | |
| 1709 | + | |
1680 | 1710 | | |
1681 | 1711 | | |
1682 | 1712 | | |
| |||
1703 | 1733 | | |
1704 | 1734 | | |
1705 | 1735 | | |
| 1736 | + | |
| 1737 | + | |
1706 | 1738 | | |
1707 | 1739 | | |
1708 | 1740 | | |
| |||
1720 | 1752 | | |
1721 | 1753 | | |
1722 | 1754 | | |
| 1755 | + | |
1723 | 1756 | | |
| 1757 | + | |
| 1758 | + | |
1724 | 1759 | | |
1725 | 1760 | | |
1726 | 1761 | | |
| |||
1735 | 1770 | | |
1736 | 1771 | | |
1737 | 1772 | | |
| 1773 | + | |
1738 | 1774 | | |
1739 | | - | |
1740 | | - | |
| 1775 | + | |
| 1776 | + | |
1741 | 1777 | | |
1742 | 1778 | | |
1743 | 1779 | | |
| |||
1762 | 1798 | | |
1763 | 1799 | | |
1764 | 1800 | | |
| 1801 | + | |
1765 | 1802 | | |
1766 | 1803 | | |
1767 | 1804 | | |
| |||
1925 | 1962 | | |
1926 | 1963 | | |
1927 | 1964 | | |
1928 | | - | |
1929 | | - | |
| 1965 | + | |
| 1966 | + | |
| 1967 | + | |
1930 | 1968 | | |
1931 | 1969 | | |
1932 | 1970 | | |
| |||
0 commit comments