[DEV-14703] Create NL Search Django Model#4645
Conversation
… and ensures spark is installed to the environemtn
…d award type field to lookup mapping
…hub.com/fedspendingtransparency/usaspending-api into ftr/dev-14642-fix-sorting-on-award-type
zachflanders-frb
left a comment
There was a problem hiding this comment.
Looking good! I am requesting changes to remove the LLMSearchQurey model and to add the migrations file to the commit.
There was a problem hiding this comment.
Can we add Nova Pro? (amazon.nova-pro-v1:0)
| db_table = "ai_model" | ||
| ordering = ["-id"] | ||
|
|
||
| class Prompts(models.Model): |
There was a problem hiding this comment.
I wonder about adding a name field to this model in order to have an easier way to get prompts than using the id or the full description?
| system_prompt = models.ForeignKey(Prompts, on_delete=models.SET_NULL, null=True, related_name="sessions") | ||
| started_at = models.DateTimeField(auto_now_add=True) | ||
| ended_at = models.DateTimeField(null=True, blank=True) | ||
| feedback = models.BooleanField(default=None, null=True, blank=True, help_text="positive=True, negative=False") |
There was a problem hiding this comment.
Based on the UX mocks it looks like we are going to have a short survey if someone gives feedback. This makes me think that feedback could be its own model with an is_positive field and then potentially a survey json field to represent survey questions and answers or even survey question and survey response models to fully model out the survey. OTOH the survey might not use the api at all and use some other tool to collect feedback.
| created_at = models.DateTimeField(auto_now_add=True) | ||
| input_tokens = models.IntegerField(default=0) | ||
| output_tokens = models.IntegerField(default=0) | ||
| total_tokens = models.IntegerField(default=0) |
There was a problem hiding this comment.
In my initial testing total_tokens might not be that important to keep track of since input tokens and output tokens are priced differently and this is primarily to keep track of usage and cost.
| class LLMSearchQuery(models.Model): | ||
| user_query = models.TextField() | ||
| session = models.ForeignKey(Session, on_delete=models.CASCADE, related_name="search_queries") | ||
| created_at = models.DateTimeField(auto_now_add=True) | ||
|
|
||
| def __str__(self): | ||
| preview = self.user_query[:75] + "..." if len(self.user_query) > 75 else self.user_query | ||
| return f"Query {self.id}: {preview}" | ||
|
|
||
|
|
||
| class Meta: | ||
| db_table = "llm_search_query" | ||
| indexes = [ | ||
| models.Index(fields=["-created_at"]), | ||
| ] No newline at end of file |
There was a problem hiding this comment.
I found that this model is redundant because the first message in the session will be a user message that includes the user query, so I would recommend that we do not keep this model.
| selectedRecipientLocations: dict[str, Any] = Field(default_factory=dict) | ||
| awardType: list[str] = Field(default_factory=list) | ||
| selectedAwardIDs: dict[str, Any] = Field(default_factory=dict) | ||
| awardAmounts: dict[str, list[int]] = Field(default_factory=dict) |
There was a problem hiding this comment.
I added to the poc branch to give the llm more context and understanding of the awardAmounts field.
| awardAmounts: dict[str, list[int]] = Field(default_factory=dict) | |
| awardAmounts: dict[str, list[int | None]] = Field( | |
| default_factory=dict, | |
| description=( | |
| "Dictionary of award amount ranges for filtering. " | |
| "Each value is a two-element list: [min_amount, max_amount]. " | |
| "Use `None` for unbounded ranges.\n\n" | |
| "TWO MUTUALLY EXCLUSIVE MODES:\n\n" | |
| "MODE 1 - STANDARD RANGES (can select multiple):\n" | |
| "- 'range-0': [None, 1000000] - Awards up to $1M\n" | |
| "- 'range-1': [1000000, 25000000] - Awards $1M to $25M\n" | |
| "- 'range-2': [25000000, 100000000] - Awards $25M to $100M\n" | |
| "- 'range-3': [100000000, 500000000] - Awards $100M to $500M\n" | |
| "- 'range-4': [500000000, None] - Awards over $500M\n\n" | |
| "MODE 2 - SPECIFIC RANGE (must be alone):\n" | |
| "- 'specific': [min, max] - Specify exact dollar amounts\n\n" | |
| "CRITICAL RULES:\n" | |
| "1. You can use multiple standard ranges together (range-0 through range-4)\n" | |
| "2. You can use ONE specific range with specific min/max values\n" | |
| "3. NEVER mix standard ranges with specific range\n" | |
| "4. When using 'specific', it must be the ONLY key in the dictionary" | |
| ), | |
| json_schema_extra={ | |
| "examples": [ | |
| # Example 1: Multiple standard ranges | |
| {"range-0": [None, 1000000], "range-2": [25000000, 100000000]}, | |
| # Example 2: Single standard range | |
| {"range-3": [100000000, 500000000]}, | |
| # Example 3: Custom range with both bounds | |
| {"specific": [5000000, 50000000]}, | |
| # Example 4: Custom range unbounded above | |
| {"specific": [10000000, None]}, | |
| # Example 5: Custom range unbounded below | |
| {"specific": [None, 75000000]}, | |
| ] | |
| }, | |
| ) |
There was a problem hiding this comment.
Can we add the migrations file to this PR?
Description:
Create the Django models and the corresponding migrations, ensuring the correct tables and relationships are created in Postgres, related to NL search assistant design and work.
Technical Details:
python manage.py load_llm_fixturesRequirements for PR Merge:
Explain N/A in above checklist: