Skip to content

[SPARK-56367][SS][PYTHON][DOCS] Fix latestOffset docstring and tutorial to use correct field name and signature#55227

Draft
jiteshsoni wants to merge 1 commit intoapache:masterfrom
jiteshsoni:fix-latestOffset-docstring-and-tutorial
Draft

[SPARK-56367][SS][PYTHON][DOCS] Fix latestOffset docstring and tutorial to use correct field name and signature#55227
jiteshsoni wants to merge 1 commit intoapache:masterfrom
jiteshsoni:fix-latestOffset-docstring-and-tutorial

Conversation

@jiteshsoni
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR fixes two documentation bugs in PySpark's streaming data source API:

  1. Docstring attribute error (datasource.py): The DataSourceStreamReader.latestOffset() docstring example incorrectly referenced limit.maxRows when it should be limit.max_rows. The ReadMaxRows dataclass uses Python snake_case convention.

  2. Outdated method signature (python_data_source.rst): The tutorial's FakeStreamReader example showed the deprecated parameterless signature def latestOffset(self) instead of the recommended signature with admission control support: def latestOffset(self, start: dict, limit).

Why are the changes needed?

  • Users copying the docstring example would encounter an AttributeError at runtime due to the incorrect attribute name.
  • Tutorial users wouldn't learn about the start offset parameter or admission control capabilities introduced in SPARK-55304.

Does this PR introduce any user-facing change?

No. This is a documentation-only fix.

How was this patch tested?

Documentation changes only - no tests required.

Was this patch authored or co-authored using generative AI tooling?

Yes, GitHub Copilot and Claude Code were used to assist with this patch.

…al to use correct field name and signature

### What changes were proposed in this pull request?

This PR fixes two documentation bugs in PySpark's streaming data source API:

1. **Docstring attribute error** (`datasource.py`): The `DataSourceStreamReader.latestOffset()` docstring example incorrectly referenced `limit.maxRows` when it should be `limit.max_rows`. The `ReadMaxRows` dataclass uses Python snake_case convention.

2. **Outdated method signature** (`python_data_source.rst`): The tutorial's `FakeStreamReader` example showed the deprecated parameterless signature `def latestOffset(self)` instead of the recommended signature with admission control support: `def latestOffset(self, start: dict, limit)`.

### Why are the changes needed?

- Users copying the docstring example would encounter an `AttributeError` at runtime due to the incorrect attribute name.
- Tutorial users wouldn't learn about the `start` offset parameter or admission control capabilities introduced in SPARK-55304.

### Does this PR introduce _any_ user-facing change?

No. This is a documentation-only fix.

### How was this patch tested?

Documentation changes only - no tests required.

### Was this patch authored or co-authored using generative AI tooling?

Yes, GitHub Copilot and Claude Code were used to assist with this patch.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
return {"offset": 0}

def latestOffset(self) -> dict:
def latestOffset(self, start: dict, limit) -> dict:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should add the type of limit too here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, this was not ready for review yet.

Give me a couple of days, and I will test this and bring this PR out of draft status.

@jiteshsoni jiteshsoni marked this pull request as draft April 7, 2026 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants