You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/explanations/nkp-architecture.mdx
+22-57Lines changed: 22 additions & 57 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ description: A conceptual tour of how Nebari Kubernetes Platform is layered, wha
8
8
9
9
Nebari Kubernetes Platform (NKP) lets developers package AI capabilities without handling auth, routing, or TLS themselves. Users then access those capabilities through one secure platform.
10
10
11
-
## The layers, top to bottom
11
+
## How the layers fit together
12
12
13
13
NKP is a stack of layers, each doing one job. Because the layers connect through stable, well-defined interfaces, you can change one layer without rewriting the others.
14
14
@@ -17,12 +17,15 @@ NKP is a stack of layers, each doing one job. Because the layers connect through
17
17
18
18

19
19
20
-
**What users and developers see and control**
20
+
**Used by end users**
21
21
22
-
-**Dynamic landing page**: the home page where users access the Capabilities installed on the platform.
23
-
-**Software Pack**: an installable Capability (chat assistant, document analyzer, code review tool, and so on) packaged by a developer.
22
+
-**Landing page**: the home page where users access the Capabilities installed on the platform.
24
23
25
-
**Platform-managed, hidden from users**
24
+
**Built by developers**
25
+
26
+
-**Software Pack**: an installable Capability (chat assistant, document analyzer, code review tool, and so on).
27
+
28
+
**Managed by platform engineers**
26
29
27
30
-**Nebari Operator**: the automation that deploys each Software Pack and connects it to the Foundational software.
28
31
-**Foundational software**: shared services (secure connections, login, traffic routing, monitoring, continuous delivery from Git) that power the operator and every running pack.
@@ -38,18 +41,17 @@ NKP is a stack of layers, each doing one job. Because the layers connect through
38
41
-**cert-manager**: keeps secure connections working. Automatically requests, renews, and rotates HTTPS certificates for every domain in the cluster.
39
42
-**Envoy Gateway**: handles traffic routing. Inspects incoming requests and forwards each one to the right service, with built-in rate limiting and authentication checks.
40
43
-**Keycloak**: handles login. One sign-on covers every app on the platform (ArgoCD, Grafana, Software Packs), with support for multi-factor authentication and connection to an existing identity provider like Active Directory.
41
-
-**ArgoCD**: keeps the cluster in sync with a Git repository. Whenever the platform team commits a change to Git (a new pack, an updated config), ArgoCD applies it to the cluster automatically.
42
-
-**LGTM telemetry stack**: collects logs, metrics, and traces from every running pack and presents them in shared dashboards.
44
+
-**OpenTelemetry Collector**: collects logs, metrics, and traces from every running pack so they can be forwarded to an observability backend.
43
45
44
46
</details>
45
47
46
48
## The `nic` CLI
47
49
48
-
The `nic` CLI (short for **Nebari Infrastructure Core**) is the command-line tool for managing NKP: installing, updating, tearing down, plus inspecting the cloud resources underneath.
50
+
The `nic` CLI (short for **Nebari Infrastructure Core**) is the command-line tool for installing, updating, and tearing down NKP's cloud infrastructure.
49
51
50
52
### What `nic` does
51
53
52
-
-**Creates the cloud infrastructure.** This includes the network, the managed Kubernetes cluster and its worker machines, the identity and access controls, and the persistent storage. Under the hood, `nic` uses Terraform to do this.
54
+
-**Creates the cloud infrastructure.** This includes the network, the managed Kubernetes cluster and its worker machines, the identity and access controls, and the persistent storage.
53
55
-**Prepares the cluster.** Sets up the basic Kubernetes structures the platform relies on: organizational groupings (namespaces), permission rules (RBAC), storage templates (storage classes), and network rules (network policies).
54
56
-**Installs ArgoCD.** ArgoCD is the GitOps engine that delivers everything above the cluster (the foundational software, the operator, and the packs). `nic` installs it directly so the rest of the platform can flow in from a Git repository.
55
57
@@ -66,31 +68,21 @@ This split keeps cluster work (the bottom of the stack) and pack work (the top)
66
68
67
69
## GitOps via ArgoCD
68
70
69
-
ArgoCD is an open-source deployment tool for Kubernetes. It turns every cluster change into a Git commit, so updates land safely, rollbacks take seconds, and the cluster stays in sync without manual intervention.
70
-
71
-
ArgoCD uses the **app-of-apps** pattern: a top-level Application resource points to a folder of child Applications, each one a piece of foundational software with explicit dependencies.
[ArgoCD](https://github.com/argoproj/argo-cd) is an open-source deployment tool for Kubernetes. It installs and updates every foundational component on the cluster from two kinds of sources:
82
72
83
-
Three properties follow from this:
73
+
-**OpenTeams Git repositories:** for OpenTeams-built pieces like the [landing page](https://github.com/nebari-dev/nebari-landing) and the [Nebari Operator](https://github.com/nebari-dev/nebari-operator).
74
+
-**Upstream Helm charts:** for external open-source pieces like [Keycloak](https://github.com/keycloak/keycloak), [cert-manager](https://github.com/cert-manager/cert-manager), [Envoy Gateway](https://github.com/envoyproxy/gateway), and the [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector).
84
75
85
-
-**Single source of truth.** What is in Git is what is in the cluster. There is no "live but undocumented" configuration drift on the application side.
86
-
-**Self-healing.** If an operator deletes a resource by hand, ArgoCD puts it back on the next reconciliation tick.
87
-
-**Auditable change.** Every platform change is a Git commit; rollbacks are a Git revert.
76
+
A few properties follow from this:
88
77
89
-
Software Packs plug into this model from above. A pack ships its workload manifests plus a `NebariApplication` custom resource that declares what the pack needs (domain, paths, auth scope, dashboards). Once that custom resource lands in the cluster, the Nebari Operator picks it up and runs the four reconciliation steps described in the layers section.
78
+
-**No hidden changes:** Every piece of software in the cluster is there because ArgoCD pulled it from a known source. Nothing runs that you can't trace back to a repository or a Helm chart.
79
+
-**Self-healing:** If a user manually deletes or changes something in the cluster, ArgoCD detects the difference and restores it from the source.
80
+
-**Audit trail and easy rollback:** Every platform change is captured in Git, so you can see who changed what and when, and undoing a change is a single `git revert`.
81
+
-**Independent updates:** Each Software Pack moves on its own cadence. Updating one pack doesn't require coordinating with the others, and removing one leaves the rest untouched.
90
82
91
-
## Terraform state and drift detection
83
+
## State and locking
92
84
93
-
`nic` keeps a record of everything it has built in a file called the **Terraform state file**. The file lives in shared cloud storage (S3, Cloud Storage, or Blob Storage) that matches your provider:
85
+
`nic` keeps a record of everything it has built. For providers that use Terraform under the hood (AWS, GCP, Azure, local), the state file lives in cloud storage that matches the provider. For other providers, the underlying tool handles state in its own way.
94
86
95
87
| Provider | Backend |
96
88
| --- | --- |
@@ -116,34 +108,7 @@ This shared record gives `nic` two abilities:
116
108
E2->>SF: acquires lock, applies changes
117
109
```
118
110
119
-
-**Drift detection:** When someone makes a manual change in the cloud (resizing a node group in the console, editing an IAM policy by hand), the next `nic deploy` spots the difference and proposes a correction to bring things back in line.
120
-
121
-
```mermaid
122
-
flowchart LR
123
-
SF[State file<br/>what nic thinks exists] --> Compare{Compare}
|**Pack developer**| A Software Pack (Helm chart, manifests, `NebariApplication`) | Pack template, Git, the central registry |
137
-
|**End user**| Capabilities exposed in the landing page | Browser, Keycloak login |
138
-
139
-
The boundary is enforced by the architecture, not by convention:
140
-
141
-
- A pack developer never needs cluster admin or Terraform credentials. They publish a pack to the registry; the operator and ArgoCD do the rest.
142
-
- A platform team upgrade (rotating cert-manager, resizing nodes, swapping the ingress controller) does not require coordination with pack developers.
143
-
- An end user never sees Kubernetes. They see a Capability on the landing page and click into it.
144
-
145
-
This is the "why" behind the layering: each role can move at its own cadence without breaking the others.
146
-
111
+
-**Catches manual changes:** When someone makes a manual change in the cloud (resizing a node group in the console, editing an IAM policy by hand), running `nic deploy --dry-run` shows the difference, and `nic deploy` applies the correction.
147
112
## Where to go next
148
113
149
114
-[Get started with NKP](/docs/nkp/get-started): install the platform and run your first cluster.
0 commit comments