Skip to content

Add DiscreteBatchConstraint#765

Open
Scienfitz wants to merge 43 commits into
mainfrom
feature/discrete_batch_constraint
Open

Add DiscreteBatchConstraint#765
Scienfitz wants to merge 43 commits into
mainfrom
feature/discrete_batch_constraint

Conversation

@Scienfitz
Copy link
Copy Markdown
Collaborator

@Scienfitz Scienfitz commented Mar 16, 2026

Closes #583
Fixes #567

Functionality:
Adds a DiscreteBatchConstraint which works via subspace generation just like the continuous cardinality constraint, just in the discrete searchspace part. While it will quickly lead to many partitions, it is also possible to use multiple DiscreteBatchConstraints and also combine them with the cardinality constraints because there is now a more unified interface for constraints that work via subspace generation.

Notes:

  • Works identical to cardinality, except it operates on discrete space and generates discrete partitions
  • The naming (originally taken form the cardinality work) using subspace is not optimal, because there is also the concept of DiscreteSubspace / ContinuousSubspace. I changed the naming that references to creating subspaces for taking care of the constraint into partition. So we would then have things like partition_masks, n_theoretical_partitions etc
  • Discrete partitions are identified via masks, while continuous partitions remain identified by parameter names
  • The hyrbid case was now implemented because it is now possible to have discrete and continuous partition-generating constraints at the same time. There is nothing conceptual that speaks against this although the number of subspaces possible explodes due to Cartesian product of individual subspaces. The limit for this is n_max_partitions which controls the max number of subspaces considered in any case
  • The methods that create the objects that identify the partitions are coded to do this lazily and return iterators. They include arguments to do that indefinitely (replace=True) or shuffle the order (shuffle=True). The latter is used when subsamapling the spaces because with shuffle=True you can just instantiate the fist n objects in the iterator to achieve random subsampling. This has some complications, especially for the overall searchspace class which needs to have a parameter to stop the process when replicated samples are drawn too often (likely no other feasible subspace combinations).
  • botorch.py in recommenders has become very large as a result of this development. So I split it into submodules
  • _FixedNumericalContinuousParameter had a bugged property which was named is_numeric but it should be is_numerical
  • Since hybrid is now implemented, the ContinuousCardinalityConstraint can now also be applied in hybrid spaces, which fixes ContinuousCardinalityConstraint not considered in hybrid spaces #567

Discuss:

  1. n_theoretical_subspaces just computes the upper limit of subspaces in discrete and hybrid cases. Because subspaces that do not have enough candidates are internally skipped the actual number of n_feasible_subspaces might be smaller. However that cannot be easily computed and would require instantiating all masks corresponding to the subspaces. The dispatcher logic is currently based on n_theoretical_subspaces but to be 100% correct it should be n_feasible_subspaces - this is not implemented due to the mentioned difficulty. I think using n_theoretical_subspaces works as a proxy and will not be very different from n_feasible_subspaces for nearly all practical problems

@Scienfitz Scienfitz self-assigned this Mar 16, 2026
@Scienfitz Scienfitz added the new feature New functionality label Mar 16, 2026
@Scienfitz Scienfitz changed the base branch from main to refactor/subspace_constraints March 16, 2026 14:37
@Scienfitz Scienfitz force-pushed the feature/discrete_batch_constraint branch from 4a381ec to 1c3bbef Compare March 18, 2026 16:12
@Scienfitz Scienfitz marked this pull request as ready for review March 18, 2026 16:24
Copilot AI review requested due to automatic review settings March 18, 2026 16:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new batch-level discrete constraint (DiscreteBatchConstraint) that enforces a fixed discrete parameter value across all points in a recommended batch by optimizing over discrete (and hybrid) subspaces, aligning with the existing subspace-based infrastructure used for continuous cardinality constraints.

Changes:

  • Introduces DiscreteBatchConstraint (non-filtering, batch-level) plus validation preventing duplicate batch constraints per parameter.
  • Extends discrete/hybrid subspace infrastructure (n_theoretical_subspaces, mask/config iterators, sampling helpers) and wires support into BotorchRecommender and RandomRecommender with compatibility gating for other recommenders.
  • Adds documentation and test coverage for discrete + hybrid subspace-generating constraints, replacing prior hybrid-only cardinality tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
baybe/constraints/discrete.py Adds DiscreteBatchConstraint and its subspace mask generation.
baybe/constraints/validation.py Adds validation to prevent multiple batch constraints on the same parameter.
baybe/constraints/__init__.py Exposes DiscreteBatchConstraint publicly.
baybe/searchspace/discrete.py Adds discrete subspace-generating constraint plumbing and mask enumeration/sampling helpers.
baybe/searchspace/continuous.py Renames/aligns to n_theoretical_subspaces and adds shuffled / with-replacement subspace configuration iteration.
baybe/searchspace/core.py Adds combined discrete+continuous theoretical subspace counting and combined sampling utilities for hybrid spaces.
baybe/recommenders/pure/base.py Adds supports_discrete_subspace_constraints gate and raises IncompatibilityError when unsupported.
baybe/recommenders/pure/bayesian/botorch.py Implements discrete + hybrid optimization over subspaces when batch constraints are present.
baybe/recommenders/pure/nonpredictive/sampling.py Updates RandomRecommender to respect discrete subspace-generating constraints.
docs/userguide/constraints.md Documents DiscreteBatchConstraint and recommender compatibility.
tests/constraints/test_batch_constraint.py Adds focused tests for the new constraint (behavior, validation, incompatibility, subspace counting/masks).
tests/constraints/test_subspace_constraints_hybrid.py Adds parametrized hybrid tests covering discrete batch + (discrete/continuous) cardinality constraints.
tests/constraints/test_cardinality_constraint_hybrid.py Removes older hybrid cardinality-only test file (superseded).
CHANGELOG.md Records new user-facing feature entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread baybe/recommenders/pure/bayesian/botorch.py Outdated
Comment thread docs/userguide/constraints.md Outdated
Comment thread baybe/searchspace/discrete.py
Comment thread baybe/searchspace/discrete.py
Comment thread baybe/searchspace/continuous.py Outdated
Comment thread baybe/searchspace/core.py Outdated
Comment thread baybe/searchspace/core.py
Comment thread baybe/recommenders/pure/bayesian/botorch.py Outdated
@Scienfitz Scienfitz changed the base branch from refactor/subspace_constraints to main March 30, 2026 12:57
@Scienfitz Scienfitz marked this pull request as draft March 30, 2026 13:02
@Scienfitz

This comment was marked as resolved.

@Scienfitz Scienfitz force-pushed the feature/discrete_batch_constraint branch 4 times, most recently from dd5bbf6 to 23b3348 Compare April 1, 2026 23:37
@Scienfitz Scienfitz marked this pull request as ready for review April 2, 2026 00:12
Copy link
Copy Markdown
Collaborator

@AVHopp AVHopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have any major comments. I personally find it a bit weird that the concept of a "partition" is somehow identical in both discrete and continuous subspaces, but caused by completely different things. However, I do not see any more elegant solution to this, and this might also be personal preference.

Comment thread baybe/constraints/discrete.py Outdated
Comment thread baybe/constraints/discrete.py
Comment thread baybe/constraints/discrete.py
Comment thread baybe/constraints/discrete.py
Comment thread baybe/searchspace/discrete.py
Comment thread baybe/searchspace/discrete.py
Comment thread baybe/searchspace/discrete.py Outdated
Comment thread baybe/searchspace/discrete.py Outdated
Comment thread baybe/searchspace/discrete.py Outdated
Comment thread baybe/recommenders/pure/bayesian/botorch/discrete.py Outdated
@Scienfitz Scienfitz force-pushed the feature/discrete_batch_constraint branch from 843793f to a188d66 Compare April 9, 2026 16:06
@Scienfitz Scienfitz added this to the 0.15.0 milestone Apr 23, 2026
@Scienfitz
Copy link
Copy Markdown
Collaborator Author

@AdrianSosic appreciate your review

@AdrianSosic AdrianSosic mentioned this pull request May 4, 2026
Copy link
Copy Markdown
Collaborator

@AdrianSosic AdrianSosic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Scienfitz, thanks for the new feature, very good to have indeed. I'm already submitting the review so that you have something to work with, but I still need to read some parts of the partitioning logic

Comment thread baybe/constraints/discrete.py Outdated
Comment thread baybe/constraints/discrete.py Outdated
Comment thread baybe/constraints/discrete.py
Comment thread baybe/constraints/discrete.py Outdated
Comment thread baybe/constraints/discrete.py Outdated
Comment thread baybe/recommenders/pure/nonpredictive/sampling.py
Comment thread baybe/searchspace/continuous.py Outdated
@property
def n_inactive_parameter_combinations(self) -> int:
"""The number of possible inactive parameter combinations."""
def n_theoretical_partitions(self) -> int:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the theoretical debate open until we've agreed on the resampling behavior for the discrete space

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please start the debate

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so now that we have arrived at the subset concept and have a clearer picture: would you still consider the theoretical necessary? Isn't it just fine to say we have n subsets that are spanned by the constraints? Or maybe give a concrete example (ideally with concrete numbers spanning the space) where you think the terminology is inappropriate.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me that was always sort of unconnected to the naming debate, but here how this property came to life:

the cardinality implementation already had such a property as its easy to calcualte the number of possuble subsets ad hoc + there are no other considerations for conti space.

When implementing for discrete spaces and batch constraint you realize that the number of subspaces is not sufficient to make a judgement (eg in dispatching logic). You actually need the number of subsets with at least M candidates, lets call it n_feasible_subsets

At the same time I did not want to implement n_feasible_subsets because it requires evaluating the size of each subset which would destroy generating them lazily.

So then I chose to make the name of the n_subsets more precise and give it a name that indicates its just an idealized number. I picked theoretical but am open to suggestions

On the other hand, and maybe thats what you're saying as well, we could just ignore this and say n_subsets is already sufficiently describing the property and doesn't need further precision naming at the small danger of a reader then not being aware of the mentioned subleties.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can follow the logic and would like to share two thoughts:

  1. I think you last paragraph summarizes it: in the end, it's just convention, right? We could say subsets are really just the sets that we are optimizing over, full stop. We've never explicitly said that there aren't potentially other sets you could build, which are however irrelevant to optimization problem. If we take this interpretation --> all good, right?
  2. In my opinion, the actual core of the debate is still this thread that you closed. To me, the collection of subsets that you discard (and that makes the difference between actual and theoretical) should not be discarded in the first place – at least not without explicitly opting out by setting some flag like allow_recommending_duplicates_within_batch. Because as long as that (non-existing) flag is set to True, you can always create an M-sized batch even if there is only 1 actual candidate left in the feasible region. So assuming that flag existed (and which we should probably add at some point), the whole debate sort of become obsolete in my opinion

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR does not implement or aim to implement the conditional duplicated sampling from batches which arises in the general recommendation context and not just for subspace-generating constraints, hence I closed that thread as I would much prefer a clean implementation of that potentially delicate topic

There is another obvious visual argument that should make you stop requesting to include more loosely-related changes into this PR:
image

Given that, this PR operates in the world where conditional duplicated sampling from batches is not present and aims to set up properties such that they work in the world this PR exists in

So in that world: is there still anything to discuss and should we rename the property? I already shared the minor disadvantage I see

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't get me wrong, I don't want you do implement more stuff in here. I just mentioned it to underpin my argument: if 1) the duplication stuff can be clearly resolved later and 2) there is a valid interpretation of the term subset that works for us --> I think it's fine if we drop the hypothetical part here and then take care of the proper handling of duplicates in a follow-up?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • do you mean theoretical?
  • if you vote to drop it we need a tie-break

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, yes, I mean the theoretical. Summon your tie breakers :D

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AVHopp @kalama-ai need a tie breaking here between the choices

  • rename n_theoretical_subsets to just n_subsets
  • keep n_theoretical_subsets

explanation why theoretical was added in the first place here

Comment thread tests/constraints/test_batch_constraint.py Outdated
Comment thread tests/constraints/test_batch_constraint.py Outdated
Comment thread tests/constraints/test_partition_constraints_hybrid.py Outdated
Comment thread docs/userguide/constraints.md Outdated
Comment thread baybe/recommenders/pure/bayesian/botorch/discrete.py
Comment thread baybe/searchspace/continuous.py Outdated
Comment thread baybe/searchspace/continuous.py Outdated
def inactive_parameter_combinations( # noqa: DOC404
self,
*,
shuffle: bool = False,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion for the shuffle argument here and in the corresponding discrete case. What I dislike about the method signature is that we have two seemingly independent arguments which are however coupled in reality (see your docstring for shuffle). This always opens the door for silent "errors", i.e. when someone provides both flags as True. We've had this countless times in other places and I urgently want to avoid these designs in the future (a la avoid invalid/meaningless configs by design).

For this situation here, we might have a very simple answer, which even simplifies the entire thing: as far as I can tell, there is no use case where we'd actually require a deterministic order, right? Also, I don't think there is a canonical default order – after all, this depends on the (arbitrary) order of constraints etc. So why not simply drop the argument altogether and just make shuffle True by default?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed: awaiting commit with suggestion

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is my suggestion: eab2277

Note that it does a few things, in fact:

  • Drops the shuffle argument and thereby resolves the problematic combination of arguments
  • Simplifies the internal logic accordingly
  • Make the entire thing more readable by separating into three logical steps: 1) apply per constraint 2) construct iterator according to input flag 3) assemble output from iterator

One important thing to note: Technically, this logic is not 100% equivalent to the previous one, since this no longer results in a flat uniform sampling but rather aggregated over groups. However, there is currently no situation where we would need a truly uniform sample, and the proposed implementation comes with significantly reduced code. If you say that uniformity is important, I think it's still possible to meet halfway, but implementing only parts of the changes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. wonder why you're glancing over this uniformity so fast, isn't it important?

  1. If its not uniform, its in some sense biased, even if we don't explore for now in detail how its baised. Being biased seems bad to me
  2. sample_subset_masks from discrete.py uses islice to select the first entries from the iterator. If its baised it will not guarantee a proper draw, I think it will even be a very badly structured draw, E.g. if group 1 shuffles to [B, A] and group 2 to [y, x], you always get (B,y), (B,x), (A,y), (A,x) — never e.g. (B,y), (A,x), (B,x), (A,y) - which doesnt make islice reasonable anymore. I remember that I specifically wanted to avoid this situation during development.

There is also a minor de-provement because infeasible samples are not removed from the pool anymore as before but appear now again

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, how about now? Added two more commits:

  1. Restored the original logic but kept the shuffle change and one logic simplification
  2. Used existing numpy helpers for the flat index computation

Copy link
Copy Markdown
Collaborator Author

@Scienfitz Scienfitz May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I am happy to keep the simplification in select_via_flat_index regardless (although I dont get why you removed the usually rated as helpful Examples/Notes)

But please take a look what your combined changes to the logic now achieved... there is almost no difference anymore. In particular nothing has become substantially shorter or simpler as claimed for your original reason to propose this change... so why do we not simply keep that tiny kwarg shuffle (even if only one of its possible values is used in the current code base)?

imo "improvements" like these are not even worth debating about...
image
image
image


also ngl I prefer the old version (left) here
image
as its more readable and uses the total variable which is now not removed

Comment thread baybe/searchspace/continuous.py Outdated
Comment thread baybe/constraints/discrete.py
Scienfitz and others added 18 commits May 7, 2026 19:00
Co-authored-by: Alexander V. Hopp <alexander.hopp@merckgroup.com>
Consistent with the existing check-return-types=False setting.
Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>
Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>
Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>
Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>
Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>
Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>
- Rename DiscreteBatchConstraint.get_invalid to _get_invalid to align
  with the abstract method rename introduced on main
- Add DiscreteBatchConstraint to DISCRETE_CONSTRAINTS_FILTERING_ORDER
  so build_constrained_product (introduced on main) can sort it
- Ignore BadInitialCandidatesWarning in pytest.ini; the warning fires
  non-deterministically in heavily constrained spaces regardless of
  data volume
@Scienfitz Scienfitz force-pushed the feature/discrete_batch_constraint branch from 6224f62 to 615b581 Compare May 7, 2026 17:47
@Scienfitz Scienfitz force-pushed the feature/discrete_batch_constraint branch from 2ff2f31 to 5b6e220 Compare May 8, 2026 15:39
@Scienfitz Scienfitz force-pushed the feature/discrete_batch_constraint branch from 5b6e220 to 86aaaf9 Compare May 8, 2026 18:37
Scienfitz and others added 5 commits May 8, 2026 21:11
Replace constraint-presence checks with n_theoretical_subsets > 0 for
the with/without-subsets dispatch. This is more general: any future
subset-generating constraint only needs to contribute to
n_theoretical_subsets, without requiring changes to the dispatch logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new feature New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Constraint for batch proposal ContinuousCardinalityConstraint not considered in hybrid spaces

4 participants