Skip to content

[SYCL] Improve get_kernel_info performance#21866

Open
cperkinsintel wants to merge 12 commits intointel:syclfrom
cperkinsintel:cperkins-improve-get-k-info
Open

[SYCL] Improve get_kernel_info performance#21866
cperkinsintel wants to merge 12 commits intointel:syclfrom
cperkinsintel:cperkins-improve-get-k-info

Conversation

@cperkinsintel
Copy link
Copy Markdown
Contributor

@cperkinsintel cperkinsintel commented Apr 24, 2026

O(1) optimization fix for get_kernel_info, working for both lambda-functor and free function kernels.

…r type kernels. Not yet working with free-function kernels.

Signed-off-by: Chris Perkins <chris.perkins@intel.com>
@sergey-semenov sergey-semenov self-requested a review April 24, 2026 10:42
@cperkinsintel cperkinsintel marked this pull request as ready for review April 27, 2026 17:29
@cperkinsintel cperkinsintel requested a review from a team as a code owner April 27, 2026 17:29
@cperkinsintel cperkinsintel changed the title [SYCL] [WIP] improve get_kernel_info performance [SYCL] improve get_kernel_info performance Apr 27, 2026
Comment thread sycl/include/sycl/detail/get_device_kernel_info.hpp Outdated
Comment thread sycl/source/ext/oneapi/get_kernel_info.cpp Outdated
@@ -0,0 +1,66 @@
//==-- kernel_cache.hpp - Fast kernel cache access for get_kernel_info -----==//
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header file seems misleading, this file just accesses the cache to retrieve the kernel.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

sycl::detail::getDeviceKernelInfo<KernelName>();

// Dispatch to appropriate fast cache function based on query type
if constexpr (std::is_same_v<
Copy link
Copy Markdown
Contributor

@sergey-semenov sergey-semenov Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should reuse the code from source/detail/kernel_info.hpp instead. AFAIU, all we actually need is an exported function that retrieves the kernel from the program manager with getOrCreateKernel.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

I also didn't want these if constexpr to be duplicated below, but the fallback behavior is different. I could make a helper and template with a callable but that seems really overengineered and hard to read for very little gain.

Copy link
Copy Markdown
Contributor

@sergey-semenov sergey-semenov Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need anything except queryCachedKernelInfo from kernel_info_queries? Can we just fetch the kernel (with a single new exported function) then call get_kernel_device_specific_info<...> directly instead for all of these cases?

Also, what are the queries that we use the fallback path for and why?

return Info;
}

// Overload for free function kernels
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make use of this new overload for free function kernel submission as well (as a follow-up change)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into it. The free functions submissions ARE O(1) (so long as they are explicitly submitted), but I'll see if they can use this path to make the code more DRY.

@sergey-semenov sergey-semenov changed the title [SYCL] improve get_kernel_info performance [SYCL] Improve get_kernel_info performance Apr 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes ext::oneapi::get_kernel_info device-specific queries by reusing the runtime’s fast kernel cache (via ProgramManager::getOrCreateKernel) instead of constructing a kernel_bundle for common query types, and extends device-kernel-info lookup to support free-function kernels by name.

Changes:

  • Add exported runtime helpers (getKernel*) that query kernel device-specific info using cached kernels.
  • Add an exported getDeviceKernelInfo(std::string_view) overload to support free-function kernels.
  • Update ext::oneapi::get_kernel_info to use the fast-path for common device-specific queries, with fallback to the previous bundle-based implementation.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sycl/source/detail/kernel_info_queries.cpp Implements cached-kernel device-specific query helpers.
sycl/include/sycl/detail/kernel_info_queries.hpp Declares exported cached-kernel query helper APIs.
sycl/include/sycl/ext/oneapi/get_kernel_info.hpp Switches device-specific queries to fast cached-kernel path for selected params; keeps fallback.
sycl/include/sycl/detail/get_device_kernel_info.hpp Adds getDeviceKernelInfo(std::string_view) and free-function kernel template helper.
sycl/source/detail/get_device_kernel_info.cpp Implements the new getDeviceKernelInfo(std::string_view) overload.
sycl/source/CMakeLists.txt Adds the new kernel_info_queries.cpp compilation unit.
sycl/test/abi/sycl_symbols_linux.dump Updates ABI symbol expectations for new exported helpers/overloads.
sycl/test/abi/sycl_symbols_windows.dump Updates ABI symbol expectations for new exported helpers/overloads.

Comment on lines 9 to +26
#include <sycl/detail/kernel_desc.hpp>

#include <detail/global_handler.hpp>
#include <detail/program_manager/program_manager.hpp>

#include <string_view>

namespace sycl {
inline namespace _V1 {
namespace detail {

DeviceKernelInfo &getDeviceKernelInfo(const CompileTimeKernelInfoTy &Info) {
return ProgramManager::getInstance().getDeviceKernelInfo(Info);
}

DeviceKernelInfo &getDeviceKernelInfo(std::string_view KernelName) {
return ProgramManager::getInstance().getDeviceKernelInfo(KernelName);
}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDeviceKernelInfo(...) definitions in this TU are not preceded by the __SYCL_EXPORT declarations from sycl/detail/get_device_kernel_info.hpp. When building the SYCL DLL on Windows (__SYCL_BUILD_SYCL_DLL), this can result in the new getDeviceKernelInfo(std::string_view) (and potentially the existing overload) not being exported, causing link failures for users including the header. Include sycl/detail/get_device_kernel_info.hpp here (or annotate the definitions with __SYCL_EXPORT) so the dllexport attribute is applied in this translation unit.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot is (mostly) mistaken here. __SYCL_EXPORT IS used on those declarations in the .hpp. We don't use them in the .cpp files. But get_device_kernel_info.cpp is not including its own matching .hpp directly, it's getting it indirectly, which is maybe why Copilot is confused. I'm adding a direct include.

Comment on lines +54 to +91
// Dispatch to appropriate fast cache function based on query type
if constexpr (std::is_same_v<
Param,
sycl::info::kernel_device_specific::work_group_size>) {
return sycl::detail::getKernelWorkGroupSize(CtxImpl, DevImpl, DKI);
} else if constexpr (std::is_same_v<Param,
sycl::info::kernel_device_specific::
compile_work_group_size>) {
return sycl::detail::getKernelCompileWorkGroupSize(CtxImpl, DevImpl, DKI);
} else if constexpr (std::is_same_v<Param,
sycl::info::kernel_device_specific::
preferred_work_group_size_multiple>) {
return sycl::detail::getKernelPreferredWorkGroupSizeMultiple(CtxImpl,
DevImpl, DKI);
} else if constexpr (std::is_same_v<Param,
sycl::info::kernel_device_specific::
private_mem_size>) {
return sycl::detail::getKernelPrivateMemSize(CtxImpl, DevImpl, DKI);
} else if constexpr (std::is_same_v<Param,
sycl::info::kernel_device_specific::
max_sub_group_size>) {
return sycl::detail::getKernelMaxSubGroupSize(CtxImpl, DevImpl, DKI);
} else if constexpr (std::is_same_v<Param,
sycl::info::kernel_device_specific::
compile_num_sub_groups>) {
return sycl::detail::getKernelCompileNumSubGroups(CtxImpl, DevImpl, DKI);
} else if constexpr (std::is_same_v<Param,
sycl::info::kernel_device_specific::
compile_sub_group_size>) {
return sycl::detail::getKernelCompileSubGroupSize(CtxImpl, DevImpl, DKI);
} else {
// Fallback for any other queries
auto Bundle =
sycl::get_kernel_bundle<KernelName, sycl::bundle_state::executable>(
Ctx, {Dev});
return Bundle.template get_kernel<KernelName>().template get_info<Param>(
Dev);
}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device-specific fast-path dispatch is duplicated for named kernels and free-function kernels. This increases the chance the two overloads diverge when new kernel_device_specific queries are added or adjusted. Consider factoring the if constexpr chain into a single internal helper (e.g., a templated dispatch_device_specific_query<Param>(CtxImpl, DevImpl, DKI)), and reuse it from both overloads.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants