Skip to content

Improve code context given to the LLM #62

@scastlara

Description

@scastlara

Is your feature request related to a problem? Please describe.
At the moment, we pass the LLM all the contents of all files in the PR.
Can we do better?

Describe the solution you'd like
There are many possible solutions 👇

1: Enriched context based on import dependency tree

  • Given an import-dependency tree of the codebase, can we give the contents of all files at a particular distance of the files in the PR?
  • Instead of passing all content of all these files (using the above approach) can we intelligently pass something more relevant? For instance, a context window around elements that appear in the PR (functions, classes, etc) that in files at the selected distance in the dependency tree.
  • BONUS: Can we use tree-sitter or other similar tools to treat code not just as raw text but as "code" so that we pass relevant parts of it? (whole functions, classes etc?).

2: Enriched context based on grepping code elements

  • Parse the ast or get code objects with something like tree-sitter in the PR diff/files of the PR.
  • Select some elements from there based on some heuristic (number of appearances? some measure of centrality/importance?)
  • Grep those in the whole codebase.
  • Retrieve content using a context window around those elements.

Any more ideas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestideaNew ideas that are not well-defined

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions