Skip to content

Commit 55c2c7b

Browse files
committed
Initial landscape2 scaffold
53 standards across 5 categories (Definition, Storage, Movement, Discovery, Operations) and 20 subcategories, with brand logos and a contributor guide. - data.yml — categories, subcategories, items - settings.yml — branding, groups, footer - guide.yml — narrative for each category/subcategory - logos/ — 50 standard/foundation/vendor logos - .github/workflows/build.yml — build with landscape2 v1.1.0 and deploy to GitHub Pages on push to main
1 parent 7e88918 commit 55c2c7b

56 files changed

Lines changed: 908 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/build.yml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
name: Build and deploy landscape
2+
3+
on:
4+
push:
5+
branches: [main]
6+
workflow_dispatch:
7+
8+
permissions:
9+
contents: read
10+
pages: write
11+
id-token: write
12+
13+
concurrency:
14+
group: pages
15+
cancel-in-progress: false
16+
17+
jobs:
18+
build:
19+
runs-on: ubuntu-latest
20+
steps:
21+
- uses: actions/checkout@v4
22+
23+
- name: Install landscape2
24+
run: |
25+
curl -sSL https://github.com/cncf/landscape2/releases/download/v1.1.0/landscape2-x86_64-unknown-linux-gnu.tar.xz \
26+
| tar -xJ -C /tmp
27+
sudo mv /tmp/landscape2-x86_64-unknown-linux-gnu/landscape2 /usr/local/bin/
28+
landscape2 --version
29+
30+
- name: Build landscape
31+
run: |
32+
landscape2 build \
33+
--data-file data.yml \
34+
--settings-file settings.yml \
35+
--guide-file guide.yml \
36+
--logos-path logos \
37+
--output-dir build
38+
39+
- name: Upload Pages artifact
40+
uses: actions/upload-pages-artifact@v3
41+
with:
42+
path: build
43+
44+
deploy:
45+
needs: build
46+
runs-on: ubuntu-latest
47+
environment:
48+
name: github-pages
49+
url: ${{ steps.deployment.outputs.page_url }}
50+
steps:
51+
- name: Deploy to GitHub Pages
52+
id: deployment
53+
uses: actions/deploy-pages@v4

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
build/
2+
.DS_Store
3+
node_modules/
4+
*.log

README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Data Architecture Landscape
2+
3+
An interactive landscape of the open standards that power a modern data architecture, built with [CNCF landscape2](https://github.com/cncf/landscape2).
4+
5+
Live site: https://datacontract.github.io/data-architecture-landscape/
6+
7+
Curated by [Entropy Data](https://entropy-data.com), with contributions by Denis Arnaud, Stefan Negele, and Erik Wilde.
8+
9+
## Structure
10+
11+
- **Definition** — how data is described (API interfaces, data products, schema, semantics)
12+
- **Storage** — where data lives (file formats, table formats, storage systems)
13+
- **Movement** — how data flows between systems (connectivity, messaging, transfer, in-memory)
14+
- **Discovery** — how data is found and traced (catalog APIs, lineage)
15+
- **Operations** — how data is queried, observed, and governed (query, quality, observability, policies)
16+
17+
## Contributing
18+
19+
PRs welcome — especially:
20+
21+
- new standards we missed
22+
- corrections to descriptions, governance, or status
23+
- better logos (prefer SVG)
24+
25+
To add a standard, edit `data.yml`. Find the right category and subcategory, then add an entry like:
26+
27+
```yaml
28+
- name: 'My Standard'
29+
description: 'One-paragraph summary.'
30+
homepage_url: 'https://example.org'
31+
logo: 'mystandard.svg'
32+
repo_url: 'https://github.com/example/my-standard'
33+
```
34+
35+
Drop the logo file in `logos/`. SVG preferred; PNG/JPG accepted.
36+
37+
## Building locally
38+
39+
Install landscape2 (`cargo install landscape2`, or grab a binary from the [releases page](https://github.com/cncf/landscape2/releases)) and run:
40+
41+
```sh
42+
landscape2 build \
43+
--data-file data.yml \
44+
--settings-file settings.yml \
45+
--guide-file guide.yml \
46+
--logos-path logos \
47+
--output-dir build
48+
49+
cd build && python3 -m http.server 8000
50+
```
51+
52+
Then open http://localhost:8000.
53+
54+
## License
55+
56+
MIT — see [LICENSE](LICENSE).

data.yml

Lines changed: 275 additions & 0 deletions
Large diffs are not rendered by default.

guide.yml

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# Landscape2 guide
2+
# Reference: https://github.com/cncf/landscape2/blob/main/docs/config/guide.yml
3+
4+
categories:
5+
- category: "Definition"
6+
content: |
7+
How data is described. Anything that gives meaning to bytes — contracts, products, schemas,
8+
service interfaces, and semantic models.
9+
10+
subcategories:
11+
- subcategory: "API Interfaces"
12+
content: |
13+
Specifications that describe the surface of an API: REST (OpenAPI), event-driven
14+
(AsyncAPI), graph (GraphQL), RPC (gRPC). ODCS sits here too, as the contract layer
15+
on top of data products.
16+
17+
- subcategory: "Data Products"
18+
content: |
19+
Standards for describing a data product as a deployable, ownable unit: ports,
20+
terms, owners, and dependencies. Multiple competing specs (ODPS, ODPSpec, DPDS,
21+
DPROD) reflect different communities — pick the one that fits your stack.
22+
23+
- subcategory: "Schema"
24+
content: |
25+
The schema languages that describe shape, types, and constraints. Range from
26+
XML's XSD through JSON Schema to Avro and Protobuf for binary encodings.
27+
28+
- subcategory: "Semantics"
29+
content: |
30+
Vocabularies for meaning, not structure. Mostly RDF/OWL-based today (DCAT, SKOS,
31+
SHACL), with the new OSI specification pushing for a YAML-friendly alternative.
32+
33+
- category: "Storage"
34+
content: |
35+
Where data lives. File formats, table formats, and storage systems.
36+
37+
subcategories:
38+
- subcategory: "File Formats"
39+
content: |
40+
The on-disk representations: CSV, JSON, XML for text; Parquet, Avro, ORC for
41+
column-store and binary.
42+
43+
- subcategory: "Open Table Formats"
44+
content: |
45+
Layered above object storage: Iceberg, Delta, and Hudi each add ACID semantics,
46+
time travel, and schema evolution. Iceberg has become the de-facto winner for
47+
most new architectures.
48+
49+
- subcategory: "Storage Systems"
50+
content: |
51+
The underlying storage substrate. S3 has effectively become the standard
52+
object-storage interface; HDFS remains relevant in legacy installations.
53+
54+
- category: "Movement"
55+
content: |
56+
How data flows between systems — connectivity, messaging, transfer, and in-memory
57+
formats.
58+
59+
subcategories:
60+
- subcategory: "Database Connectivity"
61+
content: |
62+
JDBC and ODBC are the row-oriented incumbents; ADBC is the Arrow-native
63+
columnar successor.
64+
65+
- subcategory: "Interconnection"
66+
content: |
67+
The protocols beneath everything else: HTTP for the synchronous web,
68+
ZeroMQ for low-latency in-process and cross-process messaging.
69+
70+
- subcategory: "Messaging"
71+
content: |
72+
Asynchronous, broker-mediated message exchange. Kafka has won the
73+
high-throughput log market; AMQP remains the open queue protocol of record.
74+
75+
- subcategory: "File Transfer"
76+
content: |
77+
Unsexy but unavoidable: FTP and SFTP still drive many regulated B2B
78+
data exchanges.
79+
80+
- subcategory: "In-Memory Format"
81+
content: |
82+
Apache Arrow is the connective tissue of modern analytics: a language-agnostic
83+
columnar memory layout enabling zero-copy interchange across engines.
84+
85+
- subcategory: "DataFrame API"
86+
content: |
87+
The DataFrame as a portable interface: Spark, pandas, and Ibis all expose
88+
one, increasingly compatible across engines.
89+
90+
- subcategory: "Data Interchange"
91+
content: |
92+
Wire-format encodings for moving structured data between processes.
93+
94+
- category: "Discovery"
95+
content: |
96+
How data is found and traced — catalog APIs and lineage standards.
97+
98+
subcategories:
99+
- subcategory: "Catalog APIs"
100+
content: |
101+
Iceberg REST Catalog, Unity Catalog, Hive Metastore, and DuckLake compete for
102+
the role of "the catalog" in a lakehouse architecture. Expect convergence over
103+
the next few years.
104+
105+
- subcategory: "Lineage"
106+
content: |
107+
OpenLineage is the open standard for emitting and consuming lineage events,
108+
including column-level lineage. The runtime counterpart to design-time data
109+
contracts.
110+
111+
- category: "Operations"
112+
content: |
113+
How data is queried, observed, and governed.
114+
115+
subcategories:
116+
- subcategory: "Query"
117+
content: |
118+
SQL — still the universal query language, defined by ISO/IEC 9075.
119+
120+
- subcategory: "Data Quality"
121+
content: |
122+
Vendor-driven open-source: dbt, Great Expectations, and SodaCL each ship
123+
a YAML-flavoured DSL for declaring quality checks.
124+
125+
- subcategory: "Observability"
126+
content: |
127+
OpenTelemetry for runtime traces, metrics, and logs; OORS (emerging) for
128+
publishing data-quality and contract-check results in a portable format.
129+
130+
- subcategory: "Policies"
131+
content: |
132+
OPA (Open Policy Agent) decouples policy decisions from services. In a data
133+
mesh, it shows up at the federated-governance enforcement boundary.

logos/apache.png

11 KB
Loading

logos/arrow.png

12.5 KB
Loading

logos/asyncapi.svg

Lines changed: 19 additions & 0 deletions
Loading

logos/avro.svg

Lines changed: 27 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)