Skip to content

feat: add ml/strided/dkmeans-init-plus-plus#12312

Draft
nakul-krishnakumar wants to merge 6 commits into
stdlib-js:developfrom
nakul-krishnakumar:ml-strided-dkmeanspp
Draft

feat: add ml/strided/dkmeans-init-plus-plus#12312
nakul-krishnakumar wants to merge 6 commits into
stdlib-js:developfrom
nakul-krishnakumar:ml-strided-dkmeanspp

Conversation

@nakul-krishnakumar
Copy link
Copy Markdown
Member


type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes. report:

  • task: lint_filenames status: passed
  • task: lint_editorconfig status: passed
  • task: lint_markdown_pkg_readmes status: na
  • task: lint_markdown_docs status: na
  • task: lint_markdown status: na
  • task: lint_package_json status: na
  • task: lint_repl_help status: na
  • task: lint_javascript_src status: passed
  • task: lint_javascript_cli status: na
  • task: lint_javascript_examples status: na
  • task: lint_javascript_tests status: na
  • task: lint_javascript_benchmarks status: na
  • task: lint_python status: na
  • task: lint_r status: na
  • task: lint_c_src status: na
  • task: lint_c_examples status: na
  • task: lint_c_benchmarks status: na
  • task: lint_c_tests_fixtures status: na
  • task: lint_shell status: na
  • task: lint_typescript_declarations status: passed
  • task: lint_typescript_tests status: na
  • task: lint_license_headers status: passed ---

Resolves None.

Description

What is the purpose of this pull request?

This pull request:

  • Adds ml/strided/dkmeans-init-plus-plus.

Related Issues

Does this pull request have any related issues?

This pull request has the following related issues:

  • None.

Questions

Any questions for reviewers of this pull request?

No.

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

No.

Checklist

Please ensure the following tasks are completed before submitting this pull request.

AI Assistance

When authoring the changes proposed in this PR, did you use any kind of AI assistance?

  • Yes
  • No

If you answered "yes" above, how did you use AI assistance?

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding

Disclosure

If you answered "yes" to using AI assistance, please provide a short disclosure indicating how you used AI assistance. This helps reviewers determine how much scrutiny to apply when reviewing your contribution. Example disclosures: "This PR was written primarily by Claude Code." or "I consulted ChatGPT to understand the codebase, but the proposed changes were fully authored manually by myself.".

I used Claude Code to compare the implementation with ml/incr/kmeans/lib/init_kmeansplusplus and existing strided blas implementations, but the proposed changes were fully authored manually by myself.


@stdlib-js/reviewers

---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
  - task: lint_filenames
    status: passed
  - task: lint_editorconfig
    status: passed
  - task: lint_markdown_pkg_readmes
    status: na
  - task: lint_markdown_docs
    status: na
  - task: lint_markdown
    status: na
  - task: lint_package_json
    status: na
  - task: lint_repl_help
    status: na
  - task: lint_javascript_src
    status: passed
  - task: lint_javascript_cli
    status: na
  - task: lint_javascript_examples
    status: na
  - task: lint_javascript_tests
    status: na
  - task: lint_javascript_benchmarks
    status: na
  - task: lint_python
    status: na
  - task: lint_r
    status: na
  - task: lint_c_src
    status: na
  - task: lint_c_examples
    status: na
  - task: lint_c_benchmarks
    status: na
  - task: lint_c_tests_fixtures
    status: na
  - task: lint_shell
    status: na
  - task: lint_typescript_declarations
    status: passed
  - task: lint_typescript_tests
    status: na
  - task: lint_license_headers
    status: passed
---
@nakul-krishnakumar nakul-krishnakumar requested a review from a team May 27, 2026 07:35
@nakul-krishnakumar nakul-krishnakumar marked this pull request as draft May 27, 2026 07:35
@stdlib-bot stdlib-bot added Needs Review A pull request which needs code review. and removed Needs Review A pull request which needs code review. labels May 27, 2026
Comment on lines +34 to +71
* Initializes centroids by performing the k-means++ initialization procedure.
*
* ## Method
*
* The k-means++ algorithm for choosing initial centroids is as follows:
*
* 1. Select a data point uniformly at random from a data set \\( X \\). This data point is first centroid and denoted \\( c_0 \\).
*
* 2. Compute the distance from each data point to \\( c_0 \\). Denote the distance between \\( c_j \\) and data point \\( m \\) as \\( d(x_m, c_j) \\).
*
* 3. Select the next centroid, \\( c_1 \\), at random from \\( X \\) with probability
*
* ```tex
* \frac{d^2(x_m, c_0)}{\sum_{j=0}^{n-1} d^2(x_j, c_0)}
* ```
*
* where \\( n \\) is the number of data points.
*
* 4. To choose centroid \\( j \\),
*
* a. Compute the distances from each data point to each centroid and assign each data point to its closest centroid.
*
* b. For \\( i = 0,\ldots,n-1 \\) and \\( p = 0,\ldots,j-2 \\), select centroid \\( j \\) at random from \\( X \\) with probability
*
* ```tex
* \frac{d^2(x_i, c_p)}{\sum_{\{h; x_h \exits C_p\}} d^2(x_h, c_p)}
* ```
*
* where \\( C_p \\) is the set of all data points closest to centroid \\( c_p \\) and \\( x_i \\) belongs to \\( c_p \\).
*
* Stated more plainly, select each subsequent centroid with a probability proportional to the distance from the centroid to the closest centroid already chosen.
*
* 5. Repeat step `4` until \\( k \\) centroids have been chosen.
*
* ## References
*
* - Arthur, David, and Sergei Vassilvitskii. 2007. "K-means++: The Advantages of Careful Seeding." In _Proceedings of the Eighteenth Annual Acm-Siam Symposium on Discrete Algorithms_, 1027–35. SODA '07. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics. <http://dl.acm.org/citation.cfm?id=1283383.1283494>.
*
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure on which all files I should write this description, please guide me on this.

* ]);
*
* var v = dkmeansInitPlusPlus( 'row-major', k, M, N, out, 2, xbuf, 2, 'sqeuclidean', 3, 44 );
* // returns <Float64Array>[0,0,1,-1,1,1]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this return value makes sense. As the user already has the strides it should be easy for them to access it, also the user is expected to pass in an empty initialized out array which will be filled with the flat centroids array.

// Create a scratch array for storing cumulative probabilities:
probs = new Float64Array( M );

// 2-5. For each data point, compute the distances to each centroid, find the closest centroid, and, based on the distance to the closest centroid, assign a probability to the data point to be chosen as centroid `c_j`...
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that all these comments in this implementation is directly referenced from ml/incr/kmeans/lib/init_kmeansplusplus

---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
  - task: lint_filenames
    status: passed
  - task: lint_editorconfig
    status: passed
  - task: lint_markdown_pkg_readmes
    status: na
  - task: lint_markdown_docs
    status: na
  - task: lint_markdown
    status: na
  - task: lint_package_json
    status: na
  - task: lint_repl_help
    status: na
  - task: lint_javascript_src
    status: passed
  - task: lint_javascript_cli
    status: na
  - task: lint_javascript_examples
    status: na
  - task: lint_javascript_tests
    status: na
  - task: lint_javascript_benchmarks
    status: na
  - task: lint_python
    status: na
  - task: lint_r
    status: na
  - task: lint_c_src
    status: na
  - task: lint_c_examples
    status: na
  - task: lint_c_benchmarks
    status: na
  - task: lint_c_tests_fixtures
    status: na
  - task: lint_shell
    status: na
  - task: lint_typescript_declarations
    status: passed
  - task: lint_typescript_tests
    status: na
  - task: lint_license_headers
    status: passed
---
---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
  - task: lint_filenames
    status: passed
  - task: lint_editorconfig
    status: passed
  - task: lint_markdown_pkg_readmes
    status: na
  - task: lint_markdown_docs
    status: na
  - task: lint_markdown
    status: na
  - task: lint_package_json
    status: na
  - task: lint_repl_help
    status: na
  - task: lint_javascript_src
    status: na
  - task: lint_javascript_cli
    status: na
  - task: lint_javascript_examples
    status: na
  - task: lint_javascript_tests
    status: passed
  - task: lint_javascript_benchmarks
    status: na
  - task: lint_python
    status: na
  - task: lint_r
    status: na
  - task: lint_c_src
    status: na
  - task: lint_c_examples
    status: na
  - task: lint_c_benchmarks
    status: na
  - task: lint_c_tests_fixtures
    status: na
  - task: lint_shell
    status: na
  - task: lint_typescript_declarations
    status: passed
  - task: lint_typescript_tests
    status: na
  - task: lint_license_headers
    status: passed
---
---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
  - task: lint_filenames
    status: passed
  - task: lint_editorconfig
    status: passed
  - task: lint_markdown_pkg_readmes
    status: na
  - task: lint_markdown_docs
    status: na
  - task: lint_markdown
    status: na
  - task: lint_package_json
    status: na
  - task: lint_repl_help
    status: na
  - task: lint_javascript_src
    status: na
  - task: lint_javascript_cli
    status: na
  - task: lint_javascript_examples
    status: na
  - task: lint_javascript_tests
    status: na
  - task: lint_javascript_benchmarks
    status: passed
  - task: lint_python
    status: na
  - task: lint_r
    status: na
  - task: lint_c_src
    status: na
  - task: lint_c_examples
    status: na
  - task: lint_c_benchmarks
    status: na
  - task: lint_c_tests_fixtures
    status: na
  - task: lint_shell
    status: na
  - task: lint_typescript_declarations
    status: passed
  - task: lint_typescript_tests
    status: na
  - task: lint_license_headers
    status: passed
---
@nakul-krishnakumar
Copy link
Copy Markdown
Member Author

/stdlib merge

@stdlib-bot stdlib-bot added the bot: In Progress Pull request is currently awaiting automation. label May 27, 2026
@stdlib-bot stdlib-bot removed the bot: In Progress Pull request is currently awaiting automation. label May 27, 2026
---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
  - task: lint_filenames
    status: passed
  - task: lint_editorconfig
    status: passed
  - task: lint_markdown_pkg_readmes
    status: na
  - task: lint_markdown_docs
    status: na
  - task: lint_markdown
    status: na
  - task: lint_package_json
    status: na
  - task: lint_repl_help
    status: na
  - task: lint_javascript_src
    status: passed
  - task: lint_javascript_cli
    status: na
  - task: lint_javascript_examples
    status: na
  - task: lint_javascript_tests
    status: na
  - task: lint_javascript_benchmarks
    status: na
  - task: lint_python
    status: na
  - task: lint_r
    status: na
  - task: lint_c_src
    status: na
  - task: lint_c_examples
    status: na
  - task: lint_c_benchmarks
    status: na
  - task: lint_c_tests_fixtures
    status: na
  - task: lint_shell
    status: na
  - task: lint_typescript_declarations
    status: passed
  - task: lint_typescript_tests
    status: na
  - task: lint_license_headers
    status: passed
---
Comment on lines +135 to +137
if ( trials < 1 ) {
throw new TypeError( format( 'invalid argument. Thirteenth argument must be a valid trials (>=1). Value: `%s`.', trials ) );
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to just return NaN?

@stdlib-bot
Copy link
Copy Markdown
Contributor

Coverage Report

Package Statements Branches Functions Lines
ml/strided/dkmeans-init-plus-plus $\color{red}519/537$
$\color{green}+96.65%$
$\color{red}38/44$
$\color{green}+86.36%$
$\color{green}2/2$
$\color{green}+100.00%$
$\color{red}519/537$
$\color{green}+96.65%$

The above coverage report was generated for the changes in this PR.

Comment on lines +138 to +140
if ( k < 1 || M < 1 || N < 1) {
return NaN;
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

  • As it is a low-level kernel, we can expect user to pass in valid parameters.
  • But if this check is not kept, for invalid params the API return <Float64Array>[ NaN, NaN, NaN, ..., NaN ].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants