Skip to content

Commit 07fe51b

Browse files
committed
entites docs cleared from rst fragments
1 parent 14135c5 commit 07fe51b

1 file changed

Lines changed: 67 additions & 39 deletions

File tree

mddocs/docs/en/entities/index.md

Lines changed: 67 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,6 @@
1-
(entities)=
1+
# Entities { #entities }
22

3-
# Entities
4-
5-
```{eval-rst}
6-
.. plantuml::
3+
```plantuml
74
85
@startuml
96
title Entities diagram
@@ -41,6 +38,52 @@
4138
@enduml
4239
```
4340

41+
```mermaid
42+
---
43+
title: Entities diagram
44+
---
45+
46+
flowchart LR
47+
subgraph locations1 [locations 1]
48+
addresses1@{shape: docs, label: "addresses"}
49+
end
50+
subgraph locations2 [locations 2]
51+
addresses2@{shape: docs, label: "addresses"}
52+
end
53+
subgraph locations3 [locations 3]
54+
addresses3@{shape: docs, label: "addresses"}
55+
end
56+
dataset1[(dataset 1)]
57+
dataset2[(dataset 2)]
58+
operations@{shape: procs}
59+
runs@{shape: procs, fill: yellow}
60+
61+
style runs fill:lightyellow
62+
job
63+
style job fill:lightblue
64+
user@{shape: stadium}
65+
style user fill:lightblue
66+
67+
dataset1 -- SYMLINK ---> dataset2
68+
dataset2 -- SYMLINK --> dataset1
69+
70+
dataset2 -- located in --> locations2
71+
72+
dataset1 -. INPUT .-> operations
73+
operations -. OUTPUT .-> dataset1
74+
dataset1 -- located in --> locations1
75+
76+
operations -- PARENT --> runs
77+
78+
runs -- PARENT ----> job
79+
runs -- started by ----> user
80+
81+
job -- located in ---> locations3
82+
83+
runs -- PARENT --> runs
84+
85+
```
86+
4487
## Nodes
4588

4689
Nodes are independent entities which describe information about some real entity, like table, ETL job, ETL job run and so on.
@@ -74,8 +117,7 @@ It contains following fields:
74117

75118
- `url: str` - alternative address, in URL form.
76119

77-
```{image} location_list.png
78-
```
120+
![location list](location_list.png)
79121

80122
#### Location addresses
81123

@@ -115,8 +157,7 @@ That's why the information about datasets is very limited:
115157
- `name: str` - qualified name of Dataset, like `mydb.myschema.mytable` or `/app/warehouse/hive/managed/myschema.df/mytable`
116158
- `schema: Schema | None` - schema of dataset.
117159

118-
```{image} dataset_list.png
119-
```
160+
![dataset list](dataset_list.png)
120161

121162
#### Dataset schema
122163

@@ -146,8 +187,7 @@ It contains following fields:
146187
- `EXACT_MATCH` - returned if all interactions with this dataset used only one schema.
147188
- `LATEST_KNOWN` - if there are multiple interactions with this dataset, but with different schemas. In this case a schema of the most recent interaction is returned.
148189

149-
```{image} dataset_schema.png
150-
```
190+
![dataset schema](dataset_schema.png)
151191

152192
### Job
153193

@@ -180,8 +220,7 @@ It contains following fields:
180220
- `DBT_JOB`
181221
- `UNKNOWN`
182222

183-
```{image} job_list.png
184-
```
223+
![job list](job_list.png)
185224

186225
### User
187226

@@ -241,17 +280,13 @@ It contains following fields:
241280

242281
- `persistent_log_url: str | None` - external URL there specific Run logs could be found (e.g. Spark History server, Airflow Web UI).
243282

244-
```{image} run_list.png
245-
```
283+
![run list](run_list.png)
246284

247-
```{image} ../integrations/spark/run_details.png
248-
```
285+
![run details](../integrations/spark/run_details.png)
249286

250-
```{image} ../integrations/airflow/dag_run_details.png
251-
```
287+
![dag run details](../integrations/airflow/dag_run_details.png)
252288

253-
```{image} ../integrations/airflow/task_run_details.png
254-
```
289+
![task run details](../integrations/airflow/task_run_details.png)
255290

256291
### Operation
257292

@@ -287,8 +322,7 @@ It contains following fields:
287322

288323
- `sql_query: str | None` - SQL query executed by this operation, if any.
289324

290-
```{image} ../integrations/dbt/operation_details.png
291-
```
325+
![../integrations/dbt/operation_details.png](../integrations/dbt/operation_details.png)
292326

293327
## Relations
294328

@@ -309,13 +343,12 @@ It contains following fields:
309343
- `METASTORE` - from HDFS location to Hive table in metastore.
310344
- `WAREHOUSE` - from Hive table to HDFS/S3 location.
311345

312-
:::{note}
313-
Currently, OpenLineage sends only symlinks `HDFS location → Hive table` which [do not exist in the real world](https://github.com/OpenLineage/OpenLineage/issues/2718#issuecomment-2134746258).
314-
Message consumer automatically adds a reverse symlink `Hive table → HDFS location` to simplify building lineage graph, but this is temporary solution.
315-
:::
346+
!!! note
316347

317-
```{image} dataset_symlinks.png
318-
```
348+
Currently, OpenLineage sends only symlinks `HDFS location → Hive table` which [do not exist in the real world](https://github.com/OpenLineage/OpenLineage/issues/2718#issuecomment-2134746258).
349+
Message consumer automatically adds a reverse symlink `Hive table → HDFS location` to simplify building lineage graph, but this is temporary solution.
350+
351+
![dataset_symlinks.png](dataset_symlinks.png)
319352

320353
### Parent Relation
321354

@@ -331,8 +364,7 @@ It contains following fields:
331364
- `from: Job | Run` - parent entity.
332365
- `to: Run | Operation` - child entity.
333366

334-
```{image} parent.png
335-
```
367+
![parent.png](parent.png)
336368

337369
### Input relation
338370

@@ -348,8 +380,7 @@ It contains following fields:
348380
- `num_bytes: int | None` - number of bytes read from dataset. For `granularity=JOB|RUN` it is a sum of all read bytes from this dataset. For `granularity=DATASET` always `None`.
349381
- `num_files: int | None` - number of files read from dataset. For `granularity=JOB|RUN` it is a sum of all read files from this dataset. For `granularity=DATASET` always `None`.
350382

351-
```{image} input.png
352-
```
383+
![input.png](input.png)
353384

354385
### Output relation
355386

@@ -381,8 +412,7 @@ It contains following fields:
381412

382413
- `num_files: int | None` - number of files written from dataset. For `granularity=JOB|RUN` it is a sum of all written files to this dataset.
383414

384-
```{image} output.png
385-
```
415+
![output.png](output.png)
386416

387417
### Direct Column Lineage relation
388418

@@ -405,8 +435,7 @@ Relation Dataset columns → Dataset columns, describing how each target dataset
405435
- `AGGREGATION_MASKING` - some masking aggregation function is applied to column value, e.g. `SELECT count(DISTINCT source_column) AS target_column`
406436
- `UNKNOWN` - some unknown transformation type.
407437

408-
```{image} direct_column_lineage.png
409-
```
438+
![direct column lineage](direct_column_lineage.png)
410439

411440
### Indirect Column Lineage relation
412441

@@ -430,5 +459,4 @@ Relation Dataset columns → Dataset, describing how the entire target dataset i
430459
- `CONDITIONAL` - column is used in `CASE` or `IF` clause, e.g. `SELECT CASE source_column THEN 1 WHEN 'abc' ELSE 'cde' END AS target_column`
431460
- `UNKNOWN` - some unknown transformation type.
432461

433-
```{image} indirect_column_lineage.png
434-
```
462+
![indirect column lineage](indirect_column_lineage.png)

0 commit comments

Comments
 (0)