Skip to content

Latest commit

 

History

History
105 lines (59 loc) · 5.31 KB

File metadata and controls

105 lines (59 loc) · 5.31 KB
graph LR
    BaseData["BaseData"]
    CSVData["CSVData"]
    JSONData["JSONData"]
    ParquetData["ParquetData"]
    TextData["TextData"]
    GraphData["GraphData"]
    StructuredDataMixin["StructuredDataMixin"]
    filepath_or_buffer["filepath_or_buffer"]
    CSVData -- "implements" --> BaseData
    CSVData -- "uses" --> filepath_or_buffer
    JSONData -- "implements" --> BaseData
    JSONData -- "inherits from" --> StructuredDataMixin
    JSONData -- "uses" --> filepath_or_buffer
    ParquetData -- "implements" --> BaseData
    ParquetData -- "inherits from" --> StructuredDataMixin
    ParquetData -- "uses" --> filepath_or_buffer
    TextData -- "implements" --> BaseData
    TextData -- "uses" --> filepath_or_buffer
    GraphData -- "implements" --> BaseData
    GraphData -- "uses" --> filepath_or_buffer
Loading

CodeBoardingDemoContact

Details

The Data Ingestion & Preprocessing subsystem is a critical part of the DataProfiler project, responsible for standardizing diverse raw data into a structured format for subsequent analysis.

BaseData

Serves as the abstract base class, establishing a standardized interface (data(), get_batch_generator(), reload()) for all concrete data readers. It ensures a consistent output contract for the initial stage of the data processing pipeline. Embodies the 'Pipeline/Workflow' pattern by defining the entry point and common interface for data ingestion.

Related Classes/Methods:

CSVData

Concrete implementation of BaseData, specializing in reading, parsing, and performing initial preprocessing for CSV data. Handles format-specific complexities like delimiter detection. Aligns with the 'Extensible Architecture' and 'Modular Architecture'.

Related Classes/Methods:

JSONData

Concrete implementation of BaseData, specializing in reading, parsing, and performing initial preprocessing for JSON data. Handles format-specific complexities like flattening nested structures. Aligns with the 'Extensible Architecture' and 'Modular Architecture'.

Related Classes/Methods:

ParquetData

Concrete implementation of BaseData, specializing in reading, parsing, and performing initial preprocessing for Parquet data. Aligns with the 'Extensible Architecture' and 'Modular Architecture'.

Related Classes/Methods:

TextData

Concrete implementation of BaseData, specializing in reading, parsing, and performing initial preprocessing for plain text data. Aligns with the 'Extensible Architecture' and 'Modular Architecture'.

Related Classes/Methods:

GraphData

Concrete implementation of BaseData, specializing in reading, parsing, and performing initial preprocessing for graph data. Aligns with the 'Extensible Architecture' and 'Modular Architecture'.

Related Classes/Methods:

StructuredDataMixin

Provides reusable logic and common functionalities for structured data readers (e.g., CSV, JSON, Parquet). Promotes code reuse and consistency across similar data types. Reinforces the 'Modular Architecture' by abstracting common functionalities into a reusable mixin.

Related Classes/Methods:

filepath_or_buffer

A context manager that abstracts and standardizes the handling of diverse data sources, whether they are file paths or in-memory buffers. It ensures uniform input handling for all data readers. Contributes to the 'Modular Architecture' and 'Pipeline/Workflow' by providing a consistent mechanism for input data access.

Related Classes/Methods: