feat: Bedrock - support for FileContent + citations#2883
Conversation
bogdankostic
left a comment
There was a problem hiding this comment.
Looking already good, I just have two minor comments.
| raw_name = os.path.splitext(file_content.filename)[0] if file_content.filename else "filename" | ||
| # Bedrock requires name to be present but is very strict about the format. | ||
| # See https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_DocumentBlock.html | ||
| sanitized_name = re.sub(r"\s+", " ", re.sub(r"[^a-zA-Z0-9\s\-\[\]()]", "", raw_name)).strip() |
There was a problem hiding this comment.
sanitized_name can become empty if the filename has only stripped characters, but bedrock requires a minimum length of 1. Let's handle this edge case.
| replies.append( | ||
| ChatMessage.from_assistant( | ||
| " ".join(text_content), | ||
| "".join(text_content), |
There was a problem hiding this comment.
In l.485 we strip all white space from text content (text := entry.get("text", "").strip()), joining them together like this might lead to missing white space I guess.
There was a problem hiding this comment.
I originally changed this part of the code because while experimenting with citations, I realized that " ".join() was introducing duplicate spaces (since the API already includes spacing).
Based on your comment, I propose to use "".join() and preserve the original text/citation as-is.
See ee79dc2
Related Issues
Proposed Changes:
FileContent: pass to Bedrock Documents or Videos, depending on the MIME type.How did you test it?
CI, new unit tests and integration tests
Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:.