You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Concepts/Data Ingestion/Data Ingestion.md
+39-18Lines changed: 39 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,6 +28,45 @@ Common data sources include:
28
28
29
29
### 2. Ingestion Patterns
30
30
31
+
#### Extract, Transform, Load (ETL)
32
+
33
+
ETL is a traditional ingestion pattern where data is extracted from a source, transformed (during the ingestion process), and then loaded into the destination.
ELT is the modern ingestion pattern where raw data is extracted and loaded directly into the destination, then transformed within the destination system. ELT is the more popular pattern because storage is cheap and keeping the raw data allows for more flexibility in future data use cases.
Data Management is the practice of collecting, organizing, protecting, and storing data in a way that enables efficient access, analysis, and decision-making throughout its entire lifecycle. It encompasses the policies, procedures, and technologies used to ensure data is accurate, available, secure, and compliant with regulations while meeting business requirements.
10
+
11
+
## Data Management Components
12
+
13
+
### 1. [[Data Governance]]
14
+
15
+
Data Governance establishes the policies, procedures, and standards for managing data across an organization.
16
+
17
+
### 2. [[Data Quality Management]]
18
+
19
+
Data quality ensures that data is accurate, complete, consistent, and fit for its intended use.
20
+
21
+
#placeholder
22
+
23
+
### 3. [[Data Catalog]]
24
+
25
+
Data cataloging creates a centralized inventory of data assets with metadata to improve discoverability and understanding.
26
+
27
+
### 4. [[Data Security]]
28
+
29
+
Data security protects data from unauthorized access, corruption, and theft throughout its lifecycle.
30
+
31
+
#placeholder
32
+
33
+
%% wiki footer: Please don't edit anything below this line %%
34
+
35
+
## This note in GitHub
36
+
37
+
<spanclass="git-footer">[Edit In GitHub](https://github.dev/data-engineering-community/data-engineering-wiki/blob/main/Concepts/Data%20Management/Data%20Management.md"git-hub-edit-note") | [Copy this note](https://raw.githubusercontent.com/data-engineering-community/data-engineering-wiki/main/Concepts/Data%20Management/Data%20Management.md"git-hub-copy-note")</span>
38
+
39
+
<spanclass="git-footer">Was this page helpful?
40
+
[👍](https://tally.so/r/mOaxjk?rating=Yes&url=https://dataengineering.wiki/Concepts/Data%20Management/Data%20Management) or [👎](https://tally.so/r/mOaxjk?rating=No&url=https://dataengineering.wiki/Concepts/Data%20Management/Data%20Management)</span>
Data Processing is the act of transforming raw data into meaningful, actionable information. It involves collecting, manipulating, filtering, sorting, and analyzing data to extract insights, support decision-making, and enable business operations. Data processing focuses on what happens to data after it has been ingested into your systems.
Scheduling/workflow orchestration manages the coordination of processing jobs.
28
+
29
+
### 4. Processing Architectures
30
+
31
+
![[Data Architecture#Popular Data Architecture Patterns]]
32
+
33
+
%% wiki footer: Please don't edit anything below this line %%
34
+
35
+
## This note in GitHub
36
+
37
+
<spanclass="git-footer">[Edit In GitHub](https://github.dev/data-engineering-community/data-engineering-wiki/blob/main/Concepts/Data%20Processing/Data%20Processing.md"git-hub-edit-note") | [Copy this note](https://raw.githubusercontent.com/data-engineering-community/data-engineering-wiki/main/Concepts/Data%20Processing/Data%20Processing.md"git-hub-copy-note")</span>
38
+
39
+
<spanclass="git-footer">Was this page helpful?
40
+
[👍](https://tally.so/r/mOaxjk?rating=Yes&url=https://dataengineering.wiki/Concepts/Data%20Processing/Data%20Processing) or [👎](https://tally.so/r/mOaxjk?rating=No&url=https://dataengineering.wiki/Concepts/Data%20Processing/Data%20Processing)</span>
This page contains an overview of the technologies and systems used to store and retrieve data in various formats and structures. Modern data storage can be fundamentally divided into two categories: **Databases** (managed storage with built-in compute) and **Object Storage** (raw storage that requires external compute).
8
+
9
+
## 1. Databases (Storage + Compute)
10
+
11
+
[[Database|Databases]] provide both storage and built-in compute capabilities with structured query interfaces.
12
+
13
+
### [[Relational Database]]
14
+
15
+
A relational database is a traditional structured storage using tables, rows, and columns with ACID properties.
16
+
17
+
### Non-Relational (NoSQL) Databases
18
+
19
+
NoSQL databases store data in flexible formats such as documents, key-value pairs, graphs, or columns, enabling scalability and schema-less design for diverse data types.
20
+
21
+
![[Non-relational Database#Types of Non-relational Databases]]
22
+
23
+
## 2. [[Object/Blob Storage]]
24
+
25
+
Object storage provides raw data persistence without built-in compute - requiring external processing engines.
See the **data stores** category for examples and popular tools.
47
+
48
+
%% wiki footer: Please don't edit anything below this line %%
49
+
50
+
## This note in GitHub
51
+
52
+
<spanclass="git-footer">[Edit In GitHub](https://github.dev/data-engineering-community/data-engineering-wiki/blob/main/Concepts/Data%20Storage/Data%20Storage.md"git-hub-edit-note") | [Copy this note](https://raw.githubusercontent.com/data-engineering-community/data-engineering-wiki/main/Concepts/Data%20Storage/Data%20Storage.md"git-hub-copy-note")</span>
53
+
54
+
<spanclass="git-footer">Was this page helpful?
55
+
[👍](https://tally.so/r/mOaxjk?rating=Yes&url=https://dataengineering.wiki/Concepts/Data%20Storage/Data%20Storage) or [👎](https://tally.so/r/mOaxjk?rating=No&url=https://dataengineering.wiki/Concepts/Data%20Storage/Data%20Storage)</span>
0 commit comments