Skip to content

perf: optimize string tensor deserialization with high performance c++ implementation#1

Closed
wweic wants to merge 1 commit into
r25.05-releasefrom
r25.05-optimize
Closed

perf: optimize string tensor deserialization with high performance c++ implementation#1
wweic wants to merge 1 commit into
r25.05-releasefrom
r25.05-optimize

Conversation

@wweic

@wweic wweic commented Aug 1, 2025

Copy link
Copy Markdown
Owner

No description provided.

Comment thread src/pb_tensor.cc
return numpy.attr("empty")(0, py::dtype("object"));
}

// First pass: count the number of strings and calculate total size

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change to mainly optimize string inputs?

@Swipe4057

Copy link
Copy Markdown

Will there be a merger of this great feature?

@wweic

wweic commented Aug 26, 2025

Copy link
Copy Markdown
Owner Author

@Swipe4057 We are working with Nvidia team on triton-inference-server/server#8348. We have engaged with their engineering team.

@yinggeh

yinggeh commented Sep 17, 2025

Copy link
Copy Markdown

Hi @wweic. You should merge to main instead of r25.05.

@wweic

wweic commented Sep 17, 2025

Copy link
Copy Markdown
Owner Author

@yinggeh Yeah, this PR is for personal review with my team. I was waiting for triton team's greenlight to upstream so I can prepare the proper PR to main. I just got response here: triton-inference-server/server#8348 . So I guess it's OK to send a PR now to the pythonbackend repo?

@wweic

wweic commented Sep 17, 2025

Copy link
Copy Markdown
Owner Author

proper PR to triton repo: triton-inference-server#416

@wweic wweic closed this Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants