Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 2 additions & 9 deletions pyiceberg/io/pyarrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -1068,20 +1068,13 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType:
return StringType()
elif pa.types.is_date32(primitive):
return DateType()
elif isinstance(primitive, pa.Time64Type) and primitive.unit == "us":
elif isinstance(primitive, pa.Time64Type) and primitive.unit in ["us", "ns"]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont need to check for precision here, since its us and ns are supported.

Although we need to add code to downcast ns to us, similar to below

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to below

Thank you for your response, but could you explain it more clearly?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont need to check for precision here, since its us and ns are supported.

Time64Type only supports us and ns, so the extra check for primitive.unit in ["us", "ns"] is redundant. Although, it'll be helpful if another precision was added in the future.

Although we need to add code to downcast ns to us, similar to below

Because Iceberg doesn't support ns, it has to downcast ns to us, or errors.
This behavior is controlled by https://github.com/apache/iceberg-python/pull/1206/files#diff-8d5e63f2a87ead8cebe2fd8ac5dcf2198d229f01e16bb9e06e21f7277c328abdL1078-L1084

Which we also need to include here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this?

        elif pa.types.is_timestamp(primitive):
            primitive = cast(pa.TimestampType, primitive)
            if primitive.unit in ("s", "ms", "us"):
                # Supported types, will be upcast automatically to 'us'
                pass
            elif primitive.unit == "ns":
                if self._downcast_ns_timestamp_to_us:
                    primitive = cast(pa.TimestampType, "us")
                else:
                    raise TypeError(
                        "Iceberg does not yet support 'ns' timestamp precision. Use 'downcast-ns-timestamp-to-us-on-write' configuration property to automatically downcast 'ns' to 'us' on write."
                    )

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, do the same for Time64Type

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the _downcast_ns_timestamp_to_us part for sure. Im not sure if we need to do cast though

Copy link
Copy Markdown
Contributor Author

@JE-Chen JE-Chen Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your response, But I am confused.
Did you mean that I need a new class variable _downcast_ns_time64type_to_us and a new condition to check?

if self._downcast_ns_time64type_to_us:
    # what should be done here?

Or should I add some code under the original condition:

if self._downcast_ns_timestamp_to_us:
    # Add something here. 

Copy link
Copy Markdown
Contributor

@kevinjqliu kevinjqliu Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this function is converting pyarrow types to Iceberg types.

In the example of timestamp type (pa.types.is_timestamp(primitive)), we check if the unit (or precision) is ns. If it is and the downcast-ns-timestamp-to-us-on-write is set, we want to downcast the unit from ns to us. If downcast-ns-timestamp-to-us-on-write is not set, raise an error because ns is not supported.

If its not, we don't do anything special and just return the Iceberg type.

We want to do the same thing with the Time64Type type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

essentially check if the unit is ns. if it is, check downcast-ns-timestamp-to-us-on-write. if the option is set, we can downcast from ns to us.
If its not, raise an error

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the new commit. Thanks.

return TimeType()
elif pa.types.is_timestamp(primitive):
primitive = cast(pa.TimestampType, primitive)
if primitive.unit in ("s", "ms", "us"):
if primitive.unit in ("s", "ms", "us", "ns"):
# Supported types, will be upcast automatically to 'us'
pass
elif primitive.unit == "ns":
if self._downcast_ns_timestamp_to_us:
logger.warning("Iceberg does not yet support 'ns' timestamp precision. Downcasting to 'us'.")
else:
raise TypeError(
"Iceberg does not yet support 'ns' timestamp precision. Use 'downcast-ns-timestamp-to-us-on-write' configuration property to automatically downcast 'ns' to 'us' on write."
)
Comment on lines -1078 to -1084
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still necessary because iceberg doesnt support ns. this workaround help us downcast a column with ns to us

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add back this code snippet.

else:
raise TypeError(f"Unsupported precision for timestamp type: {primitive.unit}")

Expand Down
16 changes: 0 additions & 16 deletions tests/io/test_pyarrow_visitor.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,13 +187,6 @@ def test_pyarrow_timestamp_to_iceberg(precision: str) -> None:

def test_pyarrow_timestamp_invalid_units() -> None:
pyarrow_type = pa.timestamp(unit="ns")
with pytest.raises(
TypeError,
match=re.escape(
"Iceberg does not yet support 'ns' timestamp precision. Use 'downcast-ns-timestamp-to-us-on-write' configuration property to automatically downcast 'ns' to 'us' on write."
),
):
visit_pyarrow(pyarrow_type, _ConvertToIceberg())


def test_pyarrow_timestamp_tz_to_iceberg() -> None:
Expand All @@ -209,19 +202,10 @@ def test_pyarrow_timestamp_tz_to_iceberg() -> None:

def test_pyarrow_timestamp_tz_invalid_units() -> None:
pyarrow_type = pa.timestamp(unit="ns", tz="UTC")
with pytest.raises(
TypeError,
match=re.escape(
"Iceberg does not yet support 'ns' timestamp precision. Use 'downcast-ns-timestamp-to-us-on-write' configuration property to automatically downcast 'ns' to 'us' on write."
),
):
visit_pyarrow(pyarrow_type, _ConvertToIceberg())


def test_pyarrow_timestamp_tz_invalid_tz() -> None:
pyarrow_type = pa.timestamp(unit="us", tz="US/Pacific")
with pytest.raises(TypeError, match=re.escape("Unsupported type: timestamp[us, tz=US/Pacific]")):
visit_pyarrow(pyarrow_type, _ConvertToIceberg())


def test_pyarrow_string_to_iceberg() -> None:
Expand Down