Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,14 @@ and this project adheres to

## [Unreleased]

### Added

- ✨(backend) create a dedicated endpoint to update document content
- ⚡️(backend) stream s3 file content with a dedicated endpoint

### Changed

- ♻️(backend) rename documents content endpoint in `formatted-content` (BC)
- 🚸(frontend) show Crisp from the help menu #2222
- ♿️(frontend) structure correctly 5xx error alerts #2128
- ♿️(frontend) make doc search result labels uniquely identifiable #2212
Expand All @@ -26,6 +32,7 @@ and this project adheres to
### Removed

- 🔥(backend) remove deprecated descendants endpoint #2243
- 🔥(backend) remove content in document responses
Comment thread
lunika marked this conversation as resolved.

## [v4.8.6] - 2026-04-08

Expand Down
14 changes: 14 additions & 0 deletions UPGRADE.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,20 @@ the following command inside your docker container:

## [Unreleased]

We made several changes around document content management leading to several breaking changes in the API.

- The endpoint `/api/v1.0/documents/{document_id}/content/` has been renamed in `/api/v1.0/documents/{document_id}/formatted-content/`
- There is no more `content` attribute in the response of `/api/v1.0/documents/{document_id}/`, two new endpoints have been added to retrieve or update the document content.
- A new `GET /api/v1.0/documents/{document_id}/content/` endpoint has been implemented to fetch the document content ; this endpoint streams the whole content with a `text/plain` content-type response.
- A new `PATCH /api/v1.0/documents/{document_id}/content/` endpoint has been added to update the document content ; expected payload is:
```json
{
"content": "document content in base64",
}
```
Comment on lines +24 to +29
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix invalid JSON in the payload example and add the missing blank line before the fence.

The example payload contains a trailing comma after the last property, which is not valid JSON and will mislead integrators copy-pasting from the upgrade guide. Also, per markdownlint MD031 (flagged by static analysis), fenced code blocks should be surrounded by blank lines.

📝 Proposed fix
 - A new `PATCH /api/v1.0/documents/{document_id}/content/` endpoint has been added to update the document content ; expected payload is:
+
 ```json
 {
-  "content": "document content in base64",
+  "content": "document content in base64"
 }
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion
- A new `PATCH /api/v1.0/documents/{document_id}/content/` endpoint has been added to update the document content ; expected payload is:

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 25-25: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@UPGRADE.md` around lines 24 - 29, The JSON example for the new PATCH
/api/v1.0/documents/{document_id}/content/ endpoint is invalid and the fenced
code block lacks a surrounding blank line; remove the trailing comma after
"content" so the JSON is valid and ensure there is an empty line before the
opening ```json fence so the code block is properly separated in the markdown
for UPGRADE.md.


Other changes:

- The deprecated endpoint `/api/v1.0/documents/<document_id>/descendants` is removed. The search endpoint should be used instead.
- Upgrade docspec dependency to version >= 3.0.0
The docspec service has changed since version 3.0.0, we ware now compatible with this version and not with version 2.x.x anymore
Expand Down
1 change: 1 addition & 0 deletions src/backend/core/api/permissions.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
ACTION_FOR_METHOD_TO_PERMISSION = {
"versions_detail": {"DELETE": "versions_destroy", "GET": "versions_retrieve"},
"children": {"GET": "children_list", "POST": "children_create"},
"content": {"PATCH": "content_patch", "GET": "content_retrieve"},
}


Expand Down
84 changes: 22 additions & 62 deletions src/backend/core/api/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
import magic
from rest_framework import serializers

from core import choices, enums, models, utils, validators
from core import choices, enums, models, validators
from core.services import mime_types
from core.services.ai_services import AI_ACTIONS
from core.services.converter_services import (
Expand Down Expand Up @@ -178,7 +178,6 @@ class Meta:
class DocumentSerializer(ListDocumentSerializer):
"""Serialize documents with all fields for display in detail views."""

content = serializers.CharField(required=False)
websocket = serializers.BooleanField(required=False, write_only=True)
file = serializers.FileField(
required=False, write_only=True, allow_null=True, max_length=255
Expand All @@ -193,7 +192,6 @@ class Meta:
"ancestors_link_role",
"computed_link_reach",
"computed_link_role",
"content",
"created_at",
"creator",
"deleted_at",
Expand Down Expand Up @@ -242,13 +240,6 @@ def get_fields(self):
if request:
if request.method == "POST":
fields["id"].read_only = False
if (
serializers.BooleanField().to_internal_value(
request.query_params.get("without_content", False)
)
is True
):
del fields["content"]

return fields

Expand All @@ -265,18 +256,6 @@ def validate_id(self, value):

return value

def validate_content(self, value):
"""Validate the content field."""
if not value:
return None

try:
b64decode(value, validate=True)
except binascii.Error as err:
raise serializers.ValidationError("Invalid base64 content.") from err

return value

def validate_file(self, file):
"""Add file size and type constraints as defined in settings."""
if not file:
Expand Down Expand Up @@ -310,52 +289,33 @@ def update(self, instance, validated_data):
return instance # No data provided, skip the update
return super().update(instance, validated_data)

def save(self, **kwargs):
Comment thread
lunika marked this conversation as resolved.
"""
Process the content field to extract attachment keys and update the document's
"attachments" field for access control.
"""
content = self.validated_data.get("content", "")
extracted_attachments = set(utils.extract_attachments(content))

existing_attachments = (
set(self.instance.attachments or []) if self.instance else set()
)
new_attachments = extracted_attachments - existing_attachments
class DocumentContentSerializer(serializers.Serializer):
Comment thread
lunika marked this conversation as resolved.
"""Serializer for updating only the raw content of a document stored in S3."""

if new_attachments:
attachments_documents = (
models.Document.objects.filter(
attachments__overlap=list(new_attachments)
)
.only("path", "attachments")
.order_by("path")
)
content = serializers.CharField(required=True)
websocket = serializers.BooleanField(required=False)

user = self.context["request"].user
readable_per_se_paths = (
models.Document.objects.readable_per_se(user)
.order_by("path")
.values_list("path", flat=True)
)
readable_attachments_paths = utils.filter_descendants(
[doc.path for doc in attachments_documents],
readable_per_se_paths,
skip_sorting=True,
)
def validate_content(self, value):
"""Validate the content field."""
try:
b64decode(value, validate=True)
except binascii.Error as err:
raise serializers.ValidationError("Invalid base64 content.") from err
Comment on lines +301 to +304
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Throttle the content PATCH endpoint independently.

This serializer powers the new PATCH /documents/{id}/content/ write path that performs an S3 head_object + put_object on each successful call (and decodes the full base64 payload twice — once here, once in the view/model). Without a dedicated rate limit, an attacker (or buggy SW retry) holding any "edit" role could rapidly churn S3 writes/versions and inflate storage cost. Recommend adding a separate throttle scope (e.g., document_content_patch) in REST_FRAMEWORK.DEFAULT_THROTTLE_RATES and applying it on the action via throttle_classes/throttle_scope, rather than reusing the shared document 80/min bucket which would let a single misbehaving client starve all other document operations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/api/serializers.py` around lines 304 - 307, The PATCH
content endpoint allows expensive S3 writes and needs its own rate limit: add a
new throttle scope key (e.g., "document_content_patch") to
REST_FRAMEWORK.DEFAULT_THROTTLE_RATES in settings and create/apply an
appropriate throttle class or use SimpleRateThrottle with that scope; then apply
it to the view action that handles PATCH /documents/{id}/content/ (the view
method that uses this serializer and the b64decode validation) by setting
throttle_classes or throttle_scope to "document_content_patch" so content
PATCHes are limited separately from the general "document" 80/min bucket.


readable_attachments = set()
for document in attachments_documents:
if document.path not in readable_attachments_paths:
continue
readable_attachments.update(set(document.attachments) & new_attachments)
return value
Comment thread
lunika marked this conversation as resolved.

# Update attachments with readable keys
self.validated_data["attachments"] = list(
existing_attachments | readable_attachments
)
def update(self, instance, validated_data):
"""
This serializer does not support updates.
"""
raise NotImplementedError("Update is not supported for this serializer.")

return super().save(**kwargs)
def create(self, validated_data):
"""
This serializer does not support create.
"""
raise NotImplementedError("Create is not supported for this serializer.")


class DocumentAccessSerializer(serializers.ModelSerializer):
Expand Down
5 changes: 5 additions & 0 deletions src/backend/core/api/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,3 +194,8 @@ def get_ident(self, request):
if x_forwarded_for
else request.META.get("REMOTE_ADDR")
)


def get_content_metadata_cache_key(document_id):
"""Return the cache key used to store content metadata."""
return f"docs:content-metadata:{document_id!s}"
Loading
Loading