Skip to content

feat: enable mypy session for documentai-toolbox#16690

Open
chalmerlowe wants to merge 17 commits intomainfrom
feat-enable-mypy-documentai-toolbox
Open

feat: enable mypy session for documentai-toolbox#16690
chalmerlowe wants to merge 17 commits intomainfrom
feat-enable-mypy-documentai-toolbox

Conversation

@chalmerlowe
Copy link
Copy Markdown
Contributor

This PR enables the mypy session in noxfile.py for documentai-toolbox and aligns it with the GAPIC generator template. Currently, it fails with 46 errors. This is pushed to allow farming out the work to resolve these errors or for separate review.

@chalmerlowe chalmerlowe requested a review from a team as a code owner April 16, 2026 14:42
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the mypy type checking session in the noxfile.py, replacing a previous skip command. Feedback indicates that enabling this session immediately may cause CI failures due to existing type errors and recommends using a configuration file or adding error codes for better visibility. Additionally, the specified mypy version constraint was noted as referencing an unreleased version.

Comment thread packages/google-cloud-documentai-toolbox/noxfile.py
Comment thread packages/google-cloud-documentai-toolbox/noxfile.py
@chalmerlowe chalmerlowe self-assigned this Apr 16, 2026
@parthea
Copy link
Copy Markdown
Contributor

parthea commented Apr 17, 2026

/gemini review

if gcs_uri:
gcs_bucket_name, gcs_prefix = split_gcs_uri(gcs_uri)

assert gcs_prefix is not None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid using assert outside of tests

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parthea fixed.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables mypy type checking in the noxfile and addresses various type-related issues across the package. Key changes include adding null checks for bounding box coordinates, updating type hints to use more flexible types like Iterable and TypeVar, and ensuring proper instantiation of vision.Vertex objects. While these changes improve overall type consistency, the feedback highlights that several modifications in block.py replace specific types with Any, which significantly reduces type safety. Additionally, a redundant type cast was identified in the load_blocks_from_schema function.



def _get_target_object(json_data: any, target_object: str) -> Optional[SimpleNamespace]:
def _get_target_object(json_data: Any, target_object: str) -> Any:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The return type of _get_target_object has been changed to Any, which significantly reduces type safety. Given that the docstring explicitly states it returns a SimpleNamespace, it would be better to use Optional[SimpleNamespace] or a more specific type if possible. If this is a temporary measure to resolve mypy errors, please consider adding a TODO to refine this type later.

Copy link
Copy Markdown
Contributor Author

@chalmerlowe chalmerlowe Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally Gemini and I have gone round and round today trying to find the ideal collection of types so that mypy passes and so that unittests pass.

What we have now is not necessarily ideal, but it is likely better than what we had and further churn may not be cost effective.

Comment on lines +75 to +89
type_: Any = dataclasses.field(init=True, repr=False)
text: Any = dataclasses.field(init=True, repr=False)
bounding_box: Any = dataclasses.field(init=True, repr=False, default=None)
block_references: Any = dataclasses.field(init=True, repr=False, default=None)
block_id: Any = dataclasses.field(init=False, repr=False, default=None)
confidence: Any = dataclasses.field(init=False, repr=False, default=None)
page_number: Any = dataclasses.field(init=False, repr=False, default=None)
page_width: Any = dataclasses.field(init=False, repr=False, default=None)
page_height: Any = dataclasses.field(init=False, repr=False, default=None)
bounding_width: Any = dataclasses.field(init=False, repr=False, default=None)
bounding_height: Any = dataclasses.field(init=False, repr=False, default=None)
bounding_type: Any = dataclasses.field(init=False, repr=False, default=None)
bounding_unit: Any = dataclasses.field(init=False, repr=False, default=None)
bounding_x: Any = dataclasses.field(init=False, repr=False, default=None)
bounding_y: Any = dataclasses.field(init=False, repr=False, default=None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Many fields in the Block class have been changed from SimpleNamespace or Optional[SimpleNamespace] to Any. While this might resolve immediate mypy errors, it bypasses the benefits of type checking for these attributes. It is recommended to use more specific types or maintain the Optional[SimpleNamespace] annotation where appropriate to ensure long-term maintainability.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is OBE. Much like the above comment:

Locally Gemini and I have gone round and round today trying to find the ideal collection of types so that mypy passes and so that unittests pass.

What we have now is not necessarily ideal, but it is likely better than what we had and further churn may not be cost effective.

blocks: List[Block] = []
ens = _get_target_object(objects, entities)
for i in ens:
for i in cast(Any, ens):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cast(Any, ens) is redundant here because ens is already typed as Any (the return type of _get_target_object). You can iterate over ens directly without the cast.

Suggested change
for i in cast(Any, ens):
for i in ens:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants