Skip to content

describe_dataset_in_correlated_attribute_mode doesn't work in Python 3.11 #40

@artemgur

Description

@artemgur
  • DataSynthesizer version: 0.1.11 (latest)
  • Python version: 3.11
  • Operating System: Windows
  • Pandas version: 1.5.3

Description

In Python 3.11, describe_dataset_in_correlated_attribute_mode raises ValueError. And in Python 3.10, the same code with the same versions of dependencies works correctly.

At the same time, describe_dataset_in_independent_attribute_mode and describe_dataset_in_random_mode work correctly in Python 3.11.

Pandas version is 1.5.3, and not the latest 2.0.3, as describe_dataset_in_correlated_attribute_mode additionally doesn't work with Pandas 2.0.3 (I will write a separate issue on that later).

What I Did

from DataSynthesizer.DataDescriber import DataDescriber

describer = DataDescriber()
describer.describe_dataset_in_correlated_attribute_mode(dataset_file=input_data, k=2, epsilon=0)
describer.save_dataset_description_to_file(description_file)

When the code is ran, following happens:

  1. "================ Constructing Bayesian Network (BN) ================" is printed (at least in Jupyter Notebook)
  2. Following exception is raised: "ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

Traceback:

ValueError                                Traceback (most recent call last)
Cell In[22], line 8
      6 describer = DataDescriber()
      7 #TODO k parameter
----> 8 describer.describe_dataset_in_correlated_attribute_mode(dataset_file=input_data,
      9                                                         k=2,
     10                                                         epsilon=0)
     11                                                         #seed=random_state,
     12                                                         #attribute_to_is_categorical=categorical_attributes)
     13 describer.save_dataset_description_to_file(description_file)

File ~\.virtualenvs\DataSynthesizerTest311\Lib\site-packages\DataSynthesizer\DataDescriber.py:177, in DataDescriber.describe_dataset_in_correlated_attribute_mode(self, dataset_file, k, epsilon, attribute_to_datatype, attribute_to_is_categorical, attribute_to_is_candidate_key, categorical_attribute_domain_file, numerical_attribute_ranges, seed)
    174 if self.df_encoded.shape[1] < 2:
    175     raise Exception("Correlated Attribute Mode requires at least 2 attributes(i.e., columns) in dataset.")
--> 177 self.bayesian_network = greedy_bayes(self.df_encoded, k, epsilon / 2, seed=seed)
    178 self.data_description['bayesian_network'] = self.bayesian_network
    179 self.data_description['conditional_probabilities'] = construct_noisy_conditional_distributions(
    180     self.bayesian_network, self.df_encoded, epsilon / 2)

File ~\.virtualenvs\DataSynthesizerTest311\Lib\site-packages\DataSynthesizer\lib\PrivBayes.py:145, in greedy_bayes(dataset, k, epsilon, seed)
    142 attr_to_is_binary = {attr: dataset[attr].unique().size <= 2 for attr in dataset}
    144 print('================ Constructing Bayesian Network (BN) ================')
--> 145 root_attribute = random.choice(dataset.columns)
    146 V = [root_attribute]
    147 rest_attributes = list(dataset.columns)

File C:\Python311\Lib\random.py:369, in Random.choice(self, seq)
    367 def choice(self, seq):
    368     """Choose a random element from a non-empty sequence."""
--> 369     if not seq:
    370         raise IndexError('Cannot choose from an empty sequence')
    371     return seq[self._randbelow(len(seq))]

File ~\.virtualenvs\DataSynthesizerTest311\Lib\site-packages\pandas\core\indexes\base.py:3188, in Index.__nonzero__(self)
   3186 @final
   3187 def __nonzero__(self) -> NoReturn:
-> 3188     raise ValueError(
   3189         f"The truth value of a {type(self).__name__} is ambiguous. "
   3190         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   3191     )

ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions