Skip to content

Commit c52469e

Browse files
author
Your Name
committed
fix(extraction): add C# base_list handling for class inheritance
The tree-sitter C# grammar represents class inheritance via 'base_list' child nodes (e.g. 'class Foo : Bar, IBaz'). The extract_base_classes function didn't handle this node type, causing most C# inheritance to be missed. Add explicit traversal of base_list children, extracting type identifiers from both direct identifier nodes and wrapper nodes (simple_base_type, primary_constructor_base_type). Generic type arguments are stripped for resolution (List<int> → List). Tested: cube INHERITS edges went from 210 to 1,588 (7.5x improvement). Verified results include real C# domain classes: RtmpStream→DisposableBase, QuestionLibraryCourseDto→AbstractQuestionLibraryNode, etc.
1 parent 83af839 commit c52469e

File tree

1 file changed

+49
-1
lines changed

1 file changed

+49
-1
lines changed

internal/cbm/extract_defs.c

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -565,10 +565,58 @@ static const char **extract_base_classes(CBMArena *a, TSNode node, const char *s
565565
}
566566
}
567567
}
568-
// C/C++ specific: handle base_class_clause (contains access specifiers + type names)
568+
// C# specific: handle base_list node (contains base types separated by commas)
569569
{
570570
uint32_t count = ts_node_child_count(node);
571571
for (uint32_t i = 0; i < count; i++) {
572+
TSNode child = ts_node_child(node, i);
573+
if (strcmp(ts_node_type(child), "base_list") == 0) {
574+
const char *bases[16];
575+
int base_count = 0;
576+
uint32_t bnc = ts_node_named_child_count(child);
577+
for (uint32_t bi = 0; bi < bnc && base_count < MAX_BASES_MINUS_1; bi++) {
578+
TSNode bc = ts_node_named_child(child, bi);
579+
const char *bk = ts_node_type(bc);
580+
// C# base types can be: identifier, generic_name, qualified_name,
581+
// or wrapped in a simple_base_type / primary_constructor_base_type
582+
char *text = NULL;
583+
if (strcmp(bk, "identifier") == 0 || strcmp(bk, "generic_name") == 0 ||
584+
strcmp(bk, "qualified_name") == 0) {
585+
text = cbm_node_text(a, bc, source);
586+
} else {
587+
// For wrapper nodes (simple_base_type etc.), extract the first
588+
// named child which should be the type identifier
589+
TSNode inner = ts_node_named_child(bc, 0);
590+
if (!ts_node_is_null(inner)) {
591+
text = cbm_node_text(a, inner, source);
592+
}
593+
}
594+
if (text && text[0]) {
595+
// Strip generic args for resolution: "List<int>" → "List"
596+
char *angle = strchr(text, '<');
597+
if (angle) *angle = '\0';
598+
bases[base_count++] = text;
599+
}
600+
}
601+
if (base_count > 0) {
602+
const char **result =
603+
(const char **)cbm_arena_alloc(a, (base_count + 1) * sizeof(const char *));
604+
if (result) {
605+
for (int j = 0; j < base_count; j++) {
606+
result[j] = bases[j];
607+
}
608+
result[base_count] = NULL;
609+
return result;
610+
}
611+
}
612+
}
613+
}
614+
}
615+
616+
// C/C++ specific: handle base_class_clause (contains access specifiers + type names)
617+
{
618+
uint32_t count2 = ts_node_child_count(node);
619+
for (uint32_t i = 0; i < count2; i++) {
572620
TSNode child = ts_node_child(node, i);
573621
if (strcmp(ts_node_type(child), "base_class_clause") == 0) {
574622
// Extract type identifiers from base_class_clause, skipping access specifiers

0 commit comments

Comments
 (0)