Skip to content

Fix DuckDB source to handle single-valued category fields#529

Closed
kevinschaper wants to merge 1 commit into
masterfrom
single-valued-category-support
Closed

Fix DuckDB source to handle single-valued category fields#529
kevinschaper wants to merge 1 commit into
masterfrom
single-valued-category-support

Conversation

@kevinschaper
Copy link
Copy Markdown
Collaborator

Summary

  • Fix DuckDbSource.process_node() to ensure category is always returned as a list
  • Handles single string values, pipe-delimited strings, and edge cases

Problem

When reading from DuckDB databases where category is stored as a single string value (e.g., "biolink:Gene" rather than a list), the graph-summary command produced empty node category statistics.

The issue was that GraphSummary.analyse_node() iterates over categories, and when passed a string instead of a list, it iterates character-by-character, resulting in invalid single-character "categories" that fail validation.

Solution

Convert category to a list in process_node():

  • Single string → ["biolink:Gene"]
  • Pipe-delimited → ["biolink:Gene", "biolink:NamedThing"]
  • Already a list → unchanged

Test plan

  • Tested with monarch-kg DuckDB database
  • Verified kgx graph-summary now produces proper category breakdowns
  • Output increased from 326 lines to 1,774 lines with proper statistics

🤖 Generated with Claude Code

When reading from DuckDB, the category field may be stored as a single
string value rather than a list. This caused graph-summary to iterate
character-by-character over the string, resulting in empty category
statistics.

This fix ensures category is always converted to a list:
- Single string values are wrapped in a list
- Pipe-delimited strings are split into multiple categories
- Non-list/non-string values are converted to string and wrapped

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@kevinschaper
Copy link
Copy Markdown
Collaborator Author

I had forgotten about this, I don't think it turned out to be necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant