Skip to content

Commit 45593e3

Browse files
docs(explanations): tighten NKP architecture page
1 parent 4495fd2 commit 45593e3

3 files changed

Lines changed: 53 additions & 114 deletions

File tree

docs/docs/explanations/nkp-architecture.mdx

Lines changed: 22 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ description: A conceptual tour of how Nebari Kubernetes Platform is layered, wha
88

99
Nebari Kubernetes Platform (NKP) lets developers package AI capabilities without handling auth, routing, or TLS themselves. Users then access those capabilities through one secure platform.
1010

11-
## The layers, top to bottom
11+
## How the layers fit together
1212

1313
NKP is a stack of layers, each doing one job. Because the layers connect through stable, well-defined interfaces, you can change one layer without rewriting the others.
1414

@@ -17,12 +17,15 @@ NKP is a stack of layers, each doing one job. Because the layers connect through
1717

1818
![NKP architecture: layered diagram with Cloud and Kubernetes containers wrapping the Landing page, Software Pack, Nebari Operator, and Foundational software components. A user browses the Landing page; a developer builds the Software Pack. Upward arrows show that the Software Pack is shown on the Landing page, the Nebari Operator deploys the Software Pack, and the Foundational software powers the Nebari Operator.](/img/explanations/nkp-architecture.png)
1919

20-
**What users and developers see and control**
20+
**Used by end users**
2121

22-
- **Dynamic landing page**: the home page where users access the Capabilities installed on the platform.
23-
- **Software Pack**: an installable Capability (chat assistant, document analyzer, code review tool, and so on) packaged by a developer.
22+
- **Landing page**: the home page where users access the Capabilities installed on the platform.
2423

25-
**Platform-managed, hidden from users**
24+
**Built by developers**
25+
26+
- **Software Pack**: an installable Capability (chat assistant, document analyzer, code review tool, and so on).
27+
28+
**Managed by platform engineers**
2629

2730
- **Nebari Operator**: the automation that deploys each Software Pack and connects it to the Foundational software.
2831
- **Foundational software**: shared services (secure connections, login, traffic routing, monitoring, continuous delivery from Git) that power the operator and every running pack.
@@ -38,18 +41,17 @@ NKP is a stack of layers, each doing one job. Because the layers connect through
3841
- **cert-manager**: keeps secure connections working. Automatically requests, renews, and rotates HTTPS certificates for every domain in the cluster.
3942
- **Envoy Gateway**: handles traffic routing. Inspects incoming requests and forwards each one to the right service, with built-in rate limiting and authentication checks.
4043
- **Keycloak**: handles login. One sign-on covers every app on the platform (ArgoCD, Grafana, Software Packs), with support for multi-factor authentication and connection to an existing identity provider like Active Directory.
41-
- **ArgoCD**: keeps the cluster in sync with a Git repository. Whenever the platform team commits a change to Git (a new pack, an updated config), ArgoCD applies it to the cluster automatically.
42-
- **LGTM telemetry stack**: collects logs, metrics, and traces from every running pack and presents them in shared dashboards.
44+
- **OpenTelemetry Collector**: collects logs, metrics, and traces from every running pack so they can be forwarded to an observability backend.
4345

4446
</details>
4547

4648
## The `nic` CLI
4749

48-
The `nic` CLI (short for **Nebari Infrastructure Core**) is the command-line tool for managing NKP: installing, updating, tearing down, plus inspecting the cloud resources underneath.
50+
The `nic` CLI (short for **Nebari Infrastructure Core**) is the command-line tool for installing, updating, and tearing down NKP's cloud infrastructure.
4951

5052
### What `nic` does
5153

52-
- **Creates the cloud infrastructure.** This includes the network, the managed Kubernetes cluster and its worker machines, the identity and access controls, and the persistent storage. Under the hood, `nic` uses Terraform to do this.
54+
- **Creates the cloud infrastructure.** This includes the network, the managed Kubernetes cluster and its worker machines, the identity and access controls, and the persistent storage.
5355
- **Prepares the cluster.** Sets up the basic Kubernetes structures the platform relies on: organizational groupings (namespaces), permission rules (RBAC), storage templates (storage classes), and network rules (network policies).
5456
- **Installs ArgoCD.** ArgoCD is the GitOps engine that delivers everything above the cluster (the foundational software, the operator, and the packs). `nic` installs it directly so the rest of the platform can flow in from a Git repository.
5557

@@ -66,31 +68,21 @@ This split keeps cluster work (the bottom of the stack) and pack work (the top)
6668

6769
## GitOps via ArgoCD
6870

69-
ArgoCD is an open-source deployment tool for Kubernetes. It turns every cluster change into a Git commit, so updates land safely, rollbacks take seconds, and the cluster stays in sync without manual intervention.
70-
71-
ArgoCD uses the **app-of-apps** pattern: a top-level Application resource points to a folder of child Applications, each one a piece of foundational software with explicit dependencies.
72-
73-
```mermaid
74-
flowchart LR
75-
git[(Git repo<br/>foundational-software)] -->|pulled by| argo[ArgoCD]
76-
argo -->|reconciles| cm[cert-manager]
77-
argo -->|reconciles| eg[Envoy Gateway]
78-
argo -->|reconciles| kc[Keycloak]
79-
argo -->|reconciles| obs[Observability stack]
80-
argo -->|reconciles| op[Nebari Operator]
81-
```
71+
[ArgoCD](https://github.com/argoproj/argo-cd) is an open-source deployment tool for Kubernetes. It installs and updates every foundational component on the cluster from two kinds of sources:
8272

83-
Three properties follow from this:
73+
- **OpenTeams Git repositories:** for OpenTeams-built pieces like the [landing page](https://github.com/nebari-dev/nebari-landing) and the [Nebari Operator](https://github.com/nebari-dev/nebari-operator).
74+
- **Upstream Helm charts:** for external open-source pieces like [Keycloak](https://github.com/keycloak/keycloak), [cert-manager](https://github.com/cert-manager/cert-manager), [Envoy Gateway](https://github.com/envoyproxy/gateway), and the [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector).
8475

85-
- **Single source of truth.** What is in Git is what is in the cluster. There is no "live but undocumented" configuration drift on the application side.
86-
- **Self-healing.** If an operator deletes a resource by hand, ArgoCD puts it back on the next reconciliation tick.
87-
- **Auditable change.** Every platform change is a Git commit; rollbacks are a Git revert.
76+
A few properties follow from this:
8877

89-
Software Packs plug into this model from above. A pack ships its workload manifests plus a `NebariApplication` custom resource that declares what the pack needs (domain, paths, auth scope, dashboards). Once that custom resource lands in the cluster, the Nebari Operator picks it up and runs the four reconciliation steps described in the layers section.
78+
- **No hidden changes:** Every piece of software in the cluster is there because ArgoCD pulled it from a known source. Nothing runs that you can't trace back to a repository or a Helm chart.
79+
- **Self-healing:** If a user manually deletes or changes something in the cluster, ArgoCD detects the difference and restores it from the source.
80+
- **Audit trail and easy rollback:** Every platform change is captured in Git, so you can see who changed what and when, and undoing a change is a single `git revert`.
81+
- **Independent updates:** Each Software Pack moves on its own cadence. Updating one pack doesn't require coordinating with the others, and removing one leaves the rest untouched.
9082

91-
## Terraform state and drift detection
83+
## State and locking
9284

93-
`nic` keeps a record of everything it has built in a file called the **Terraform state file**. The file lives in shared cloud storage (S3, Cloud Storage, or Blob Storage) that matches your provider:
85+
`nic` keeps a record of everything it has built. For providers that use Terraform under the hood (AWS, GCP, Azure, local), the state file lives in cloud storage that matches the provider. For other providers, the underlying tool handles state in its own way.
9486

9587
| Provider | Backend |
9688
| --- | --- |
@@ -116,34 +108,7 @@ This shared record gives `nic` two abilities:
116108
E2->>SF: acquires lock, applies changes
117109
```
118110

119-
- **Drift detection:** When someone makes a manual change in the cloud (resizing a node group in the console, editing an IAM policy by hand), the next `nic deploy` spots the difference and proposes a correction to bring things back in line.
120-
121-
```mermaid
122-
flowchart LR
123-
SF[State file<br/>what nic thinks exists] --> Compare{Compare}
124-
Cloud[Live cloud<br/>what actually exists] --> Compare
125-
Compare -- match --> OK[Nothing to do]
126-
Compare -- differ --> Drift[Drift detected<br/>nic proposes corrections]
127-
```
128-
129-
## Separation of concerns
130-
131-
NKP is built around three roles with non-overlapping lifecycles:
132-
133-
| Role | What they own | Tools they use |
134-
| --- | --- | --- |
135-
| **Platform team** | Cluster, foundational software, identity, telemetry | `nic` CLI, Terraform state, ArgoCD admin |
136-
| **Pack developer** | A Software Pack (Helm chart, manifests, `NebariApplication`) | Pack template, Git, the central registry |
137-
| **End user** | Capabilities exposed in the landing page | Browser, Keycloak login |
138-
139-
The boundary is enforced by the architecture, not by convention:
140-
141-
- A pack developer never needs cluster admin or Terraform credentials. They publish a pack to the registry; the operator and ArgoCD do the rest.
142-
- A platform team upgrade (rotating cert-manager, resizing nodes, swapping the ingress controller) does not require coordination with pack developers.
143-
- An end user never sees Kubernetes. They see a Capability on the landing page and click into it.
144-
145-
This is the "why" behind the layering: each role can move at its own cadence without breaking the others.
146-
111+
- **Catches manual changes:** When someone makes a manual change in the cloud (resizing a node group in the console, editing an IAM policy by hand), running `nic deploy --dry-run` shows the difference, and `nic deploy` applies the correction.
147112
## Where to go next
148113

149114
- [Get started with NKP](/docs/nkp/get-started): install the platform and run your first cluster.

0 commit comments

Comments
 (0)