Skip to content

Commit 48aa469

Browse files
voorhsSamoed
andauthored
Feat/hashing vectorizer (#268)
* add embedder * add test embedder config * update tests * update assertions in some tests * Feat/extras matrix for tests (#269) * add dependencies matrix * add pipeline tests for each scoring module * bug fix * upd extras installation in gh actions * upd extras in mypy ci * fix issues with dependencies for bert scorer test * fix gcn scorer tests * fix typing errors * upd assertions about catboost predictions * upd ci with presets tests * try to fix unit tests * Feat/extras matrix for tests (#270) * add dependencies matrix * add pipeline tests for each scoring module * bug fix * upd extras installation in gh actions * upd extras in mypy ci * fix issues with dependencies for bert scorer test * fix gcn scorer tests * fix typing errors * upd assertions about catboost predictions * upd ci with presets tests * try to fix unit tests * try to fix mypy * fix catboost test * skip incremental evolver tests for now * upd callback test * run ruff * add missing transformers dependency to embedder ci * update test for tunable decision module * move callback testing to the env with sentence-transformers installed * fix embedder tests * remove catboost from classic-medium * lint * small fixes * lint --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
1 parent 083ee5c commit 48aa469

45 files changed

Lines changed: 826 additions & 75 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/reusable-test.yaml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@ on:
77
required: true
88
type: string
99
description: 'Command to run tests'
10+
extras:
11+
required: false
12+
type: string
13+
default: ''
14+
description: 'Space-separated --extra flags (e.g., "--extra transformers --extra peft")'
1015

1116
jobs:
1217
test:
@@ -15,7 +20,7 @@ jobs:
1520
fail-fast: false
1621
matrix:
1722
os: [ ubuntu-latest ]
18-
python-version: [ "3.10", "3.11", "3.12", "3.13", ]
23+
python-version: [ "3.10", "3.11", "3.12", "3.13", "3.14"]
1924
include:
2025
- os: windows-latest
2126
python-version: "3.10"
@@ -40,7 +45,7 @@ jobs:
4045
- name: Install dependencies for Python ${{ matrix.python-version }}
4146
run: |
4247
uv python pin ${{ matrix.python-version }}
43-
uv sync --group test --extra catboost --extra peft --extra transformers --extra sentence-transformers --extra openai
48+
uv sync --group test ${{ inputs.extras }}
4449
4550
- name: Run tests
4651
run: |
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
name: test embedder
2+
3+
on:
4+
push:
5+
branches:
6+
- dev
7+
pull_request:
8+
9+
jobs:
10+
test:
11+
uses: ./.github/workflows/reusable-test.yaml
12+
with:
13+
test_command: pytest -n auto tests/embedder/ tests/callback/
14+
extras: --extra sentence-transformers --extra transformers
15+

.github/workflows/test-inference.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ jobs:
1111
uses: ./.github/workflows/reusable-test.yaml
1212
with:
1313
test_command: pytest -n auto tests/pipeline/test_inference.py
14+
extras: --extra catboost --extra peft --extra transformers --extra sentence-transformers

.github/workflows/test-optimization.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ jobs:
1111
uses: ./.github/workflows/reusable-test.yaml
1212
with:
1313
test_command: pytest -n auto tests/pipeline/test_optimization.py
14+
extras: --extra catboost --extra peft --extra transformers --extra sentence-transformers

.github/workflows/test-presets.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ jobs:
1111
uses: ./.github/workflows/reusable-test.yaml
1212
with:
1313
test_command: pytest -n auto tests/pipeline/test_presets.py
14+
extras: --extra catboost --extra peft --extra transformers --extra sentence-transformers
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
name: test scorers
2+
3+
on:
4+
push:
5+
branches:
6+
- dev
7+
pull_request:
8+
9+
jobs:
10+
test:
11+
runs-on: ${{ matrix.os }}
12+
strategy:
13+
fail-fast: false
14+
matrix:
15+
os: [ ubuntu-latest ]
16+
python-version: [ "3.10", "3.11", "3.12" ]
17+
dependency-group: [ "base", "transformers", "peft", "catboost" ]
18+
include:
19+
- os: windows-latest
20+
python-version: "3.10"
21+
dependency-group: "base"
22+
23+
steps:
24+
- name: Checkout code
25+
uses: actions/checkout@v4
26+
27+
- name: Cache Hugging Face
28+
id: cache-hf
29+
uses: actions/cache@v4
30+
with:
31+
path: ~/.cache/huggingface
32+
key: ${{ runner.os }}-hf
33+
34+
- name: Install uv
35+
uses: astral-sh/setup-uv@v6
36+
with:
37+
version: "0.8.8"
38+
39+
- name: Install dependencies for Python ${{ matrix.python-version }}
40+
run: |
41+
uv python pin ${{ matrix.python-version }}
42+
uv sync --group test ${{ matrix.dependency-group != 'base' && format('--extra {0}', matrix.dependency-group) || '' }}
43+
44+
- name: Run scorer tests
45+
run: |
46+
uv run pytest -n auto tests/modules/scoring/
47+

.github/workflows/unit-tests.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ jobs:
1010
test:
1111
uses: ./.github/workflows/reusable-test.yaml
1212
with:
13-
test_command: pytest -n auto --ignore=tests/nodes --ignore=tests/pipeline
13+
test_command: pytest -n auto --ignore=tests/modules/scoring/ --ignore=tests/pipeline --ignore=tests/embedder --ignore=tests/callback

src/autointent/_dump_tools/unit_dumpers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -292,7 +292,7 @@ def load(path: Path, **kwargs: Any) -> PreTrainedTokenizer | PreTrainedTokenizer
292292
require("transformers", extra="transformers")
293293
import transformers
294294

295-
return transformers.AutoTokenizer.from_pretrained(path)
295+
return transformers.AutoTokenizer.from_pretrained(path) # type: ignore[no-any-return,no-untyped-call]
296296

297297
@classmethod
298298
def check_isinstance(cls, obj: Any) -> bool: # noqa: ANN401

src/autointent/_presets/classic-medium.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ search_space:
1212
k:
1313
low: 1
1414
high: 20
15-
- module_name: catboost
1615
- module_name: sklearn
1716
clf_name: [RandomForestClassifier]
1817
n_estimators: [150]

src/autointent/_wrappers/embedder/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,14 @@
22

33
from .base import BaseEmbeddingBackend
44
from .embedder import Embedder
5+
from .hashing_vectorizer import HashingVectorizerEmbeddingBackend
56
from .openai import OpenaiEmbeddingBackend
67
from .sentence_transformers import SentenceTransformerEmbeddingBackend
78

89
__all__ = [
910
"BaseEmbeddingBackend",
1011
"Embedder",
12+
"HashingVectorizerEmbeddingBackend",
1113
"OpenaiEmbeddingBackend",
1214
"SentenceTransformerEmbeddingBackend",
1315
]

0 commit comments

Comments
 (0)