Skip to content

Commit 8b46947

Browse files
committed
Add native pHash deduplication acceleration
1 parent 2c34b24 commit 8b46947

8 files changed

Lines changed: 738 additions & 9 deletions

File tree

.circleci/config.yml

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,12 @@ jobs:
8080
- image: cimg/python:3.10
8181
steps:
8282
- checkout # checkout source code to working directory
83+
- setup_remote_docker
84+
- run:
85+
name: Install Build Tools
86+
command: |
87+
pip install --upgrade pip
88+
pip install poetry cibuildwheel
8389
- run:
8490
name: Validate Tag Version # Check if the tag name matches the package version
8591
command: |
@@ -99,9 +105,18 @@ jobs:
99105
exit 1;
100106
fi
101107
- run:
102-
name: Build
108+
name: Build sdist and Linux wheels
103109
command: | # install env dependencies
104-
poetry build
110+
rm -rf dist
111+
poetry build --format sdist
112+
113+
export CIB_ARCHS_LINUX="x86_64"
114+
export CIB_BUILD="cp310-manylinux_x86_64 cp311-manylinux_x86_64 cp312-manylinux_x86_64 cp313-manylinux_x86_64 cp314-manylinux_x86_64"
115+
export CIB_SKIP="pp* *-musllinux_*"
116+
export CIB_TEST_COMMAND='python -c "import nucleus._native_dedup as native; assert native.deduplicate_phashes([0, (1 << 10) - 1, (1 << 11) - 1], 10) == [0, 2]"'
117+
python -m cibuildwheel --platform linux --output-dir dist
118+
119+
ls -lh dist
105120
- run:
106121
name: Publish to PyPI
107122
command: |
@@ -173,4 +188,3 @@ workflows:
173188
ignore: /.*/ # Runs for none of the branches
174189
tags:
175190
only: /^v\d+\.\d+\.\d+$/ # Runs only for tags with the format [v1.2.3]
176-

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,14 @@ All notable changes to the [Nucleus Python Client](https://github.com/scaleapi/n
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.18.5](https://github.com/scaleapi/nucleus-python-client/releases/tag/v0.18.5) - 2026-06-08
9+
10+
### Added
11+
- Native C acceleration for `deduplicate_by_phash`. When the compiled extension is available, all threshold values are handled in native code: thresholds `0` through `11` use the chunked Hamming index, thresholds `12` through `63` use a native linear scan, and threshold `64` uses the keep-first fast path. The public Python API is unchanged and falls back to the pure-Python implementation when the native extension is unavailable.
12+
13+
### Tooling / CI
14+
- Publish Linux `x86_64` wheels for Python 3.10 through 3.14 using `cibuildwheel`, alongside the source distribution.
15+
816
## [0.18.4](https://github.com/scaleapi/nucleus-python-client/releases/tag/v0.18.4) - 2026-06-08
917

1018
### Added

build.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
from __future__ import annotations
2+
3+
import sys
4+
5+
from setuptools import Extension
6+
7+
8+
def build(setup_kwargs):
9+
extra_compile_args = []
10+
if sys.platform != "win32":
11+
extra_compile_args.extend(["-std=c11", "-O3"])
12+
13+
setup_kwargs.update(
14+
{
15+
"ext_modules": [
16+
Extension(
17+
"nucleus._native_dedup",
18+
["nucleus/_native_dedup.c"],
19+
extra_compile_args=extra_compile_args,
20+
optional=True,
21+
)
22+
],
23+
}
24+
)

0 commit comments

Comments
 (0)