-
Notifications
You must be signed in to change notification settings - Fork 54
[RFC]: DLPack C Function for Speed Exchange #973
Copy link
Copy link
Open
Labels
API extensionAdds new functions or objects to the API.Adds new functions or objects to the API.Needs DiscussionNeeds further discussion.Needs further discussion.RFCRequest for comments. Feature requests and proposed changes.Request for comments. Feature requests and proposed changes.topic: DLPackDLPack.DLPack.topic: Device HandlingDevice handling.Device handling.
Metadata
Metadata
Assignees
Labels
API extensionAdds new functions or objects to the API.Adds new functions or objects to the API.Needs DiscussionNeeds further discussion.Needs further discussion.RFCRequest for comments. Feature requests and proposed changes.Request for comments. Feature requests and proposed changes.topic: DLPackDLPack.DLPack.topic: Device HandlingDevice handling.Device handling.
Type
Projects
Status
Stage 0
This is a cross ref RFC on DLPack based exchange. As of now, DLPack exchange relies on python functions such as
tensor.__dlpack__(). While they works well for common cases, the general overhead of such exchange is at the level of 0.2-0.3 us for very well optimized version, and can go up to 0.4-1 us for less optimized implementation.For a function that takes three arguments f(a, b, c), assume we run DLPack exchange for each argument, the general conversion overhead usually gets to around 0.7us - 3us.
While such overhead can be acceptable in many settings, in GPU applications the extra 1-3us overhead can still be significant. For a kernel that takes 2us to finish, 0.7 us means 30% additional overhead in execution
Recently, we propose to develop a set of specific C functions to help DLPack based exchange for array libraries that works on C extensions, please see more context here
dmlc/dlpack#175
In the context of array-api, it would be useful to help standardize the specific field for such speed exchange
mypackage.Tensor.__dlpack_c_exchange_api__Note that the proposed speed exchange function can be used in conjunction with the current DLPack exchange, to gracefully handle fallback cases.