graph LR
base_data_labeler["base_data_labeler"]
base_model["base_model"]
character_level_cnn_model["character_level_cnn_model"]
regex_model["regex_model"]
column_name_model["column_name_model"]
data_processing["data_processing"]
data_labelers["data_labelers"]
base_data_labeler -- "delegates to" --> data_processing
base_data_labeler -- "interacts with" --> character_level_cnn_model
base_data_labeler -- "interacts with" --> regex_model
base_data_labeler -- "interacts with" --> column_name_model
base_model -- "is inherited by" --> character_level_cnn_model
base_model -- "is inherited by" --> regex_model
base_model -- "is inherited by" --> column_name_model
data_labelers -- "utilizes" --> base_data_labeler
The Data Labeling Module subsystem is responsible for the end-to-end process of identifying and classifying sensitive or specific data elements. It orchestrates data preparation, model execution (deep learning, regex, column name), and result processing.
Acts as the primary entry point and orchestrator for the entire data labeling pipeline. It manages the lifecycle of data labelers, including loading, saving, parameter validation, and coordinating pre-processing, model execution, and post-processing steps.
Related Classes/Methods:
Serves as the abstract base class for all data labeling models. It provides common functionalities such as managing label mappings, validating parameters, and registering subclasses, ensuring a consistent interface for various model implementations.
Related Classes/Methods:
Implements a deep learning model (Character-Level CNN) for data labeling. It handles the construction, training, and prediction using character embeddings, specializing in complex pattern recognition.
Related Classes/Methods:
Provides a rule-based data labeling mechanism using regular expressions. Its primary function is to validate its configuration parameters and apply regex patterns for classification.
Related Classes/Methods:
Implements a data labeling model that leverages column names for classification. It performs comparisons and predictions based on column name patterns, useful for structured data.
Related Classes/Methods:
Functions as a versatile preprocessor and postprocessor for the data labeling pipeline. It handles data transformations, format conversions (e.g., to NER format, structured/unstructured), and prediction result processing.
Related Classes/Methods:
Provides a higher-level facade or utility for initiating labeling processes, abstracting the direct instantiation and utilization of base_data_labeler.
Related Classes/Methods: