graph LR
BaseData["BaseData"]
CSVData["CSVData"]
JSONData["JSONData"]
ParquetData["ParquetData"]
TextData["TextData"]
GraphData["GraphData"]
StructuredDataMixin["StructuredDataMixin"]
filepath_or_buffer["filepath_or_buffer"]
CSVData -- "implements" --> BaseData
CSVData -- "uses" --> filepath_or_buffer
JSONData -- "implements" --> BaseData
JSONData -- "inherits from" --> StructuredDataMixin
JSONData -- "uses" --> filepath_or_buffer
ParquetData -- "implements" --> BaseData
ParquetData -- "inherits from" --> StructuredDataMixin
ParquetData -- "uses" --> filepath_or_buffer
TextData -- "implements" --> BaseData
TextData -- "uses" --> filepath_or_buffer
GraphData -- "implements" --> BaseData
GraphData -- "uses" --> filepath_or_buffer
The Data Ingestion & Preprocessing subsystem is a critical part of the DataProfiler project, responsible for standardizing diverse raw data into a structured format for subsequent analysis.
Serves as the abstract base class, establishing a standardized interface (data(), get_batch_generator(), reload()) for all concrete data readers. It ensures a consistent output contract for the initial stage of the data processing pipeline. Embodies the 'Pipeline/Workflow' pattern by defining the entry point and common interface for data ingestion.
Related Classes/Methods:
Concrete implementation of BaseData, specializing in reading, parsing, and performing initial preprocessing for CSV data. Handles format-specific complexities like delimiter detection. Aligns with the 'Extensible Architecture' and 'Modular Architecture'.
Related Classes/Methods:
Concrete implementation of BaseData, specializing in reading, parsing, and performing initial preprocessing for JSON data. Handles format-specific complexities like flattening nested structures. Aligns with the 'Extensible Architecture' and 'Modular Architecture'.
Related Classes/Methods:
Concrete implementation of BaseData, specializing in reading, parsing, and performing initial preprocessing for Parquet data. Aligns with the 'Extensible Architecture' and 'Modular Architecture'.
Related Classes/Methods:
Concrete implementation of BaseData, specializing in reading, parsing, and performing initial preprocessing for plain text data. Aligns with the 'Extensible Architecture' and 'Modular Architecture'.
Related Classes/Methods:
Concrete implementation of BaseData, specializing in reading, parsing, and performing initial preprocessing for graph data. Aligns with the 'Extensible Architecture' and 'Modular Architecture'.
Related Classes/Methods:
Provides reusable logic and common functionalities for structured data readers (e.g., CSV, JSON, Parquet). Promotes code reuse and consistency across similar data types. Reinforces the 'Modular Architecture' by abstracting common functionalities into a reusable mixin.
Related Classes/Methods:
A context manager that abstracts and standardizes the handling of diverse data sources, whether they are file paths or in-memory buffers. It ensures uniform input handling for all data readers. Contributes to the 'Modular Architecture' and 'Pipeline/Workflow' by providing a consistent mechanism for input data access.
Related Classes/Methods: