Candidates Engine#787
Draft
AdrianSosic wants to merge 1 commit into
Draft
Conversation
Contributor
There was a problem hiding this comment.
Copilot wasn't able to review any files in this pull request.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Collaborator
|
@AdrianSosic please can you use the instrument of Issues and Subissues for this? Its kind of what that was made for and allows for a more fine-grained optional discussion compared to a PR that will not be reviewed anyway etc |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SubspaceDiscreteRefactor — Dev BranchThis PR tracks an incremental refactor of
SubspaceDiscretetoward a more general, lazy, and composable design. Individual feature PRs target this branch and merge here in sequence; the branch merges tomainonly when the refactor is complete.Goals
get_candidates(...)with exclusion support and lazyCandidateResult.CandidatePolicyas a two-level class hierarchy supporting both top-down filtering and bottom-up generative strategies, owned by the recommender, injected intoget_candidates.Problems with the current design
parameters,constraints,exp_rep,comp_repare four independent attributes that must remain mutually consistent (exp_rep= product filtered by constraints;comp_rep= encoding ofexp_rep). Consistency is maintained by convention only — nothing in the type prevents drift viaevolve, manual construction, or future refactoring.transformmethod usescomp_rep[self.comp_rep.columns]to anchor the column set. Investigation confirms this is purely defensive (no subspace-level column modification occurs), but it's fragile and unnecessary once encoding is purely parameter-level.Note: the existing
allow_recommending_*flags andtoggle_discrete_candidatesonCampaignare not problems. They provide legitimate trajectory/history-based control and stay exactly as they are. Their implementation simply moves fromFilteredSubspaceDiscretemask manipulation to building anexcludeframe passed toget_candidates().Target design
Candidatesis a protocol withto_lazy(parameters) -> nw.LazyFrameand anis_enumerableproperty. Concrete types:ProductCandidates(constraints)(workhorse),TableCandidates(frame)(user-supplied tables),GeneratedCandidates(generator)(escape hatch for non-enumerable spaces). Non-enumerable candidates raiseInfiniteSpaceErrorfromto_lazy().ProductCandidates. EachDiscreteConstraintsubclass implementsto_narwhals_expr(parameters) -> nw.Expr. No predicate AST — just a conjunctive list.subspace.constraintsis a simple accessor viagetattr(self.candidates, "constraints", ()).Parametervia a newto_computational_expr(col)method that returns Narwhals expression(s). A freeencode(frame, parameters)composes them. No separateEncoderclass.get_candidates(n_candidates, *, exclude, policy, rng)is the single entry point. Two orthogonal axes:policy— selection/construction strategy for candidate sets (owned by the recommender)exclude— trajectory-based row removal (owned by Campaign), applied after the policyCandidateResult— lazy, cached, backend-aware.Key design choices
parametersis schema,candidatesis extent.comp_repandconstraintsbecome derived views — there's no cached representation that can drift.ProductCandidateswithto_narwhals_expr()as the extension point. APredicateprotocol withOr/Notcan be added non-breakingly in release 2 by widening the constraint tuple type.exclude: DataFrameinstead of predicate objects. Simplest possible interface for "remove these rows." Campaign builds the combined exclusion frame from its own state. Noexclude_measured/exclude_pending/exclude_recommendedkwargs.CandidateResultinstead oftuple[DataFrame, DataFrame]. Lazy (nothing computed until asked), cached (no recomputation), backend-aware (returns same type user passed in; falls back tobaybe.Settings.default_backend).CandidatePolicylives on the recommender, not onSubspaceDiscreteorCampaign. Three responsibilities, three owners: subspace defines the space (accepts policy at call time), Campaign owns exclusion (trajectory control), the recommender owns selection strategy. Different recommenders need different policies for the same space.(Candidates, parameters), not aLazyFrame. This gives policies full access to the space definition, enabling both top-down and bottom-up strategies. AFilteringCandidatePolicybase class uses the template method pattern to shield concrete implementations fromCandidates— they only implement_filter(lazy_frame, ...). AGenerativeCandidatePolicybase class provides full access for bottom-up candidate construction from infinite/non-enumerable spaces.FilteringCandidatePolicy,GenerativeCandidatePolicy). These represent genuinely different operations: filtering is a frame transformation (LazyFrame → LazyFrame), generation is candidate construction (SpaceDefinition → LazyFrame). The template method onFilteringCandidatePolicyhandlesCandidatesinteraction (enumeration, enumerability validation); the concrete subclass only sees aLazyFrame. This eliminates coupling.GenerativeCandidatePolicyimplementsselect()directly with full space access. The hierarchy makes the infinite-space contract statically checkable viaisinstance.FilteringCandidatePolicy.select()raisesInfiniteSpaceErroron non-enumerable candidates.get_candidates()only checks the degenerate "no policy + non-enumerable" case. No redundant validation needed.ChainedPolicywith construction-time ordering validation. Only the first policy may be generative; all subsequent must beFilteringCandidatePolicy. The chain calls_filterdirectly on positions 2+. Validated at construction time —TypeErrorwhen building the chain beats a runtime crash during recommendation.RandomSamplingPolicyon a large-but-finite 10⁹-row space is legitimate. The restriction is one-directional: filtering requires enumerable (enforced), generation works on anything.Parameter.sample(n, rng)as the extension point for generative policies. Each parameter knows its domain —sampleis a natural extension. Finite parameters draw from enumerated values; infinite parameters (e.g.,SubstanceParameterwith alphabet + length) generate random valid values.ProductCandidatescarries constraints. Ad-hoc filtering across all implementations is handled byget_candidates(exclude=...).Parameter. Each subclass already owns its encoding rule. Statefulness (decorrelation, constant-column-dropping) is encapsulated in the parameter'scomp_dfcached property. The subspace never modifies encoded columns.allow_*flags stay on Campaign. They are trajectory-based controls, not a design smell. Implementation moves from mask-basedFilteredSubspaceDiscreteto building anexcludeframe.PRs
PR 0 — Groundwork
Audit, dependency setup, and performance validation. No behavior change.
narwhalsandpolarsas hard dependencies.subspace.exp_rep/comp_rep→MIGRATION_AUDIT.md(drives PR 4).Parametersubclass's current encoding method →PARAMETER_ENCODING_AUDIT.md(drives PR 6).benchmarks/searchspace/directory.PR 1 — Inert new abstractions (including
CandidatePolicyhierarchy)Add the new types without wiring them in.
baybe/searchspace/candidates.py:Candidatesprotocol withto_lazy(parameters)andis_enumerableproperty,ProductCandidates(constraints),TableCandidates(frame),GeneratedCandidates(generator).to_narwhals_expr(parameters) -> nw.Exprto every existingDiscreteConstraintsubclass.baybe/searchspace/result.py:CandidateResultwith.exp_rep(backend=)and.comp_rep(backend=).baybe/searchspace/policy.py:CandidatePolicyprotocol withselect(candidates: Candidates, parameters, n_candidates, rng) -> nw.LazyFrame.FilteringCandidatePolicyABC — template method:select()validates enumerability + callsto_lazy(), delegates to abstract_filter(lazy_frame, ...). Concrete implementations decoupled fromCandidates.GenerativeCandidatePolicyABC —select()receives full space definition, constructs candidates bottom-up.TakeFirstPolicy,RandomSubsamplePolicy,HashSubsamplePolicy,FarthestPointPolicy.RandomSamplingPolicy.ChainedPolicy(policies)— sequential composition with construction-time validation (all but first must beFilteringCandidatePolicy).sample(n, rng)method to eachParametersubclass.SubspaceDiscretechanges, no recommender changes.PR 2 — Dual-storage in
SubspaceDiscreteAdd
candidatesattribute alongside existing fields; both representations populated.candidates: Candidates | Noneattribute.ProductCandidates).__attrs_post_init__reverse-engineersProductCandidatesfor raw constructor calls (temporary; removed in PR 8)._eager_materializewraps existing construction logic.exp_rep/comp_repto pre-PR-2.PR 3 —
get_candidates()with policy supportEstablish the single public entry point with the full two-axis signature. Implementation still eager.
get_candidates(n_candidates, *, exclude, policy, rng)returningCandidateResult.(self.candidates, self.parameters)— full space definition.fuzzy_row_match), applied after policy.InfiniteSpaceError.FilteringCandidatePolicyon non-enumerable space →InfiniteSpaceError(self-enforced by the policy).GenerativeCandidatePolicyon non-enumerable space → constructs valid candidates.PR 4 — Wire policy into recommenders; migrate callers
Add
candidate_policyto recommenders. Migrate internal callers fromexp_rep/comp_reptoget_candidates(). Split into sub-PRs.candidate_policy: CandidatePolicy | None = Noneto recommender base class. Pass through toget_candidates(policy=self.candidate_policy)in_recommend_with_discrete_parts. DefaultNone= no behavioral change.RandomRecommender,FPSRecommender).Campaign— consolidatesallow_recommending_*flags +toggle_discrete_candidatesinto a singleexcludeframe perrecommend()call. Passes recommender'scandidate_policythrough.MIGRATION_AUDIT.md.By end of PR 4: zero internal references to
exp_rep/comp_rep;candidate_policyconfigurable on recommenders;FilteredSubspaceDiscreteno longer used internally; no public API break.PR 5 — Deprecate old attributes
Mark
exp_rep/comp_repdeprecated; update everything user-facing.exp_rep/comp_repbecome properties emittingDeprecationWarning.constraintsbecomes a read-only property delegating toself.candidates.constraints(not deprecated).examples/anddocs/userguide/updated to new patterns (includingcandidate_policyin large-space and infinite-space examples).pytest -W error::DeprecationWarning.PR 6 — Lazy backend (the big one)
Switch internals to Narwhals + Polars lazy evaluation. Can split into 6a (lazy candidates) and 6b (per-parameter encoding migration).
ProductCandidates.to_lazy(withis_enumerablecheck) /TableCandidates.to_lazy/GeneratedCandidates.to_lazy; port exclusion logic to Narwhals (anti-join); make everyDiscreteConstraint.to_narwhals_exprNarwhals-native; dropBAYBE_DEACTIVATE_POLARS.to_computational_expr(col)to everyParametersubclass; implement freeencode(frame, parameters); keep old pandas methods as deprecated shims._exp_rep/_comp_repbecomecached_property;get_candidatesoperates on the lazy frame end-to-end, returns lazyCandidateResult. Filtering policies now operate on truly lazy frames. Generative policies construct candidates without attempting enumeration.RandomSubsamplePolicyon 10⁹-row product space completes in bounded memory.RandomSamplingPolicyon non-enumerable space completes in O(n) time/memory.PR 7 — Deprecate old attributes
If deprecation was already completed in PR 5, this is a no-op. Allows flexibility in release cadence — deprecation can happen before or after the lazy backend switch.
PR 8 — Major version cleanup (baybe 2.0)
Remove deprecations; arrive at the final two-attribute design.
exp_rep/comp_repproperties; removeconstraintsfrom constructor.__attrs_post_init__reverse-engineering hack.Parametersubclasses.transformmethod'scomp_rep[self.comp_rep.columns]hack.FilteredDiscreteSubspace(subsumed byget_candidates(exclude=...)).SubspaceDiscrete(parameters, candidates).Cross-cutting
candidate_policy=None(the default) preserves existing behavior exactly — no policy meanshead(n)fallback.searchspace/. Includes non-enumerable space benchmarks for generative policies.SerialMixinpattern.CandidatePolicysubclasses areattrs-defined;ChainedPolicyserializes as a list. Parametercomp_dfcolumn sets serialized to prevent decorrelation drift.FilteringCandidatePolicyraises on non-enumerable,ChainedPolicyrejects non-filtering policies in non-first positions at construction time.To keep in mind
FilteredDiscreteSubspacebecomes obsolete (removed in PR 8).comp_rep[self.comp_rep.columns]) disappears — encoding is purely parameter-level; no subspace-level column anchoring needed.allow_*flags remain on Campaign — they are trajectory controls, not a design smell.GenerativeCandidatePolicy. This is enforced byget_candidates()(raises without policy) andFilteringCandidatePolicy.select()(raises on non-enumerable candidates). Error messages guide users to the correct fix.Parameter.sample(n, rng)is the extension point for generative policies — each parameter must know how to produce random valid values from its domain.Candidatesimplementations (e.g.,SimplexCandidates) should be added when benchmarks show the genericProductCandidatescross-join path is insufficient for heavily-constrained spaces.Future improvements
Or/Not: WidenProductCandidates.constraintsto accept aPredicateprotocol; addOr,Not,RawExprPredicate. Non-breaking extension.AcquisitionPrescreenPolicy: A materializing filtering policy using surrogate inference to pre-filter by expected acquisition value. Requires surrogate access — separate design discussion.SimplexCandidates: Preserve smart incremental enumeration fromfrom_simplex.UnionCandidates: Disjoint union for transfer learning / task parameter spaces.LatinHypercubePolicy,EvolutionaryPolicy,GrammarGuidedPolicy— more sophisticated bottom-up construction strategies. TheGenerativeCandidatePolicybase class andParameter.sample()provide the extension points.