openobserve
diff --git a/‎docs/administration/maintenance/storage-management/storage.md‎
Lines changed: 34 additions & 24 deletions b/‎docs/administration/maintenance/storage-management/storage.md‎
Lines changed: 34 additions & 24 deletions
diff --git a/‎docs/enterprise-setup/performance.md‎
Lines changed: 30 additions & 8 deletions b/‎docs/enterprise-setup/performance.md‎
Lines changed: 30 additions & 8 deletions
@@ -1,7 +1,10 @@
 ---
+title: Storage Management | OpenObserve
 description: >-
-  Learn how OpenObserve stores ingested stream data and the metadata for ingested date using disk, SQLite, Postgres, or S3-compatible object storage.
+  Learn how OpenObserve stores ingested stream data and the metadata for ingested data using disk, SQLite, Postgres, or S3-compatible object storage.
 ---
+# Storage
+
 This guide explains how to configure data and metadata storage in OpenObserve. The information applies to both the open-source and enterprise versions.
 
 ## Overview
@@ -13,22 +16,21 @@ There are 2 primary items that need to be stored in OpenObserve.
 By default: 
 
 - Metadata is always stored on disk using **SQLite** in **Local mode**.
-- Metadata is always stored on disk using **postgres** in **Cluster mode**.
-- Stream data can be stored on disk or object storage such as Amazon S3, minIO, Google GCS, Alibaba OSS, or Tencent COS.
+- Metadata is always stored on disk using **PostgreSQL** in **Cluster mode**.
+- Stream data can be stored on disk or object storage such as Amazon S3, MinIO, Google GCS, Alibaba OSS, or Tencent COS.
 
 ## Storage Modes
 
 - OpenObserve runs in **Local mode** by default.
-- To enable **Cluster mode**, set the environment variable `LOCAL_MODE=false`.
+- To enable **Cluster mode**, set the environment variable `ZO_LOCAL_MODE=false`.
 - In **Local mode**, stream data can be stored in S3 by setting `ZO_LOCAL_MODE_STORAGE=s3`.
-- GCS and OSS support the S3 SDK and can be treated as S3-compatible storages. Azure Blob storage is also supported.
-
-## Data Storage Format
+- GCS and OSS support the S3 SDK and can be treated as S3-compatible storages.
+- Azure Blob storage is supported via `ZO_S3_PROVIDER=azure`.
 
-Stream data is stored in Parquet format. Parquet is columnar storage format optimized for storage efficiency and query performance. 
+### Data Storage Format
 
+Stream data is stored in **Parquet** format, a columnar storage format optimized for storage efficiency and query performance.
 ## Stream Data Storage Options
-
 ### Disk
 
 Disk is default storage place for stream data. **Ensure that sufficient disk space is available for storing stream data.**
@@ -61,7 +63,7 @@ Then set the following environment variables:
 | ZO_S3_PROVIDER       | minio | ...                                             |
 
 
-### Openstack Swift
+### OpenStack Swift
 To use OpenStack Swift for storing stream data, first create the bucket in Swift.
 Then set the following environment variables:
 
@@ -72,15 +74,15 @@ Then set the following environment variables:
 | ZO_S3_ACCESS_KEY          | -     | Access key                                      |
 | ZO_S3_SECRET_KEY          | -     | Secret key                                      |
 | ZO_S3_BUCKET_NAME         | -     | Bucket name                                     |
-| ZO_S3_FEATURE_HTTP1_ONLY  | true  | 	Enables compatibility with Swift                                              |
+| ZO_S3_FEATURE_HTTP1_ONLY  | true  | Enables compatibility with Swift                |
 | ZO_S3_PROVIDER            | s3    | Enables S3-compatible API                           |
 | AWS_EC2_METADATA_DISABLED | true  | Disables EC2 metadata access, which is not supported by Swift |
 
 
 ### Google GCS
 To use GCS for storing stream data, first create the bucket in GCS.
 
-**Using the S3-compatible API:**
+#### Using the S3-compatible API
 
 | Environment Variable     | Value  | Description                                                     |
 | ------------------------ | -------| --------------------------------------------------------------- |
@@ -94,7 +96,7 @@ To use GCS for storing stream data, first create the bucket in GCS.
 
 Refer to [GCS AWS migration documentation](https://cloud.google.com/storage/docs/aws-simple-migration) for more information.
 
-**Using GCS directly:**
+#### Using GCS directly
 
 | Environment Variable     | Value  | Description                                                             |
 | ------------------------ | -------| ----------------------------------------------------------------------- |
@@ -106,7 +108,7 @@ Refer to [GCS AWS migration documentation](https://cloud.google.com/storage/docs
 
 OpenObserve uses the [object_store crate](https://docs.rs/object_store/0.10.1/object_store/gcp/struct.GoogleCloudStorageBuilder.html) to initialize the storage configuration. It calls the with_env() function by default. If the ZO_S3_ACCESS_KEY variable is set, OpenObserve additionally uses the with_service_account_path() function to load the GCP service account key.
 
-### Alibaba OSS (aliyun)
+### Alibaba OSS (Aliyun)
 To use Alibaba OSS for storing stream data, first create the bucket in Alibaba Cloud.
 Then set the following environment variables:
 
@@ -164,15 +166,15 @@ Refer to [Baidu BOS documentation](https://cloud.baidu.com/doc/BOS/s/xjwvyq9l4).
 
 ### Azure Blob
 
-OpenObserve can use azure blob for storing stream data. Following environment variables needs to be setup:
+OpenObserve can use Azure Blob for storing stream data. The following environment variables need to be set:
 
 | Environment Variable       | Value                | Description                                  |
 | -------------------------- | -------------------- | -------------------------------------------- |
-| ZO_S3_PROVIDER             | azure                | Enables Azure Blob storage support                   |
+| ZO_S3_PROVIDER             | azure                | Enables Azure Blob storage support           |
 | ZO_LOCAL_MODE_STORAGE      | s3                   | Required only if running in single node mode |
-| AZURE_STORAGE_ACCOUNT_NAME | Storage account name | Need to provide mandatorily                  |
-| AZURE_STORAGE_ACCOUNT_KEY  | Access key           | Need to provide mandatorily                  |
-| ZO_S3_BUCKET_NAME          | Blob Container name  | Need to provide mandatorily                  |
+| AZURE_STORAGE_ACCOUNT_NAME | Storage account name | Required                                     |
+| AZURE_STORAGE_ACCOUNT_KEY  | Access key           | Required                                     |
+| ZO_S3_BUCKET_NAME          | Blob Container name  | Required                                     |
 
 ### Hetzner Cloud Object Storage
 
@@ -215,17 +217,25 @@ OpenObserve supports multiple metadata store backends, configurable using the `Z
 ### PostgreSQL
 - Set `ZO_META_STORE=postgres`.
 - Recommended for production deployments due to reliability and scalability. 
-- The default Helm chart (after February 23, 2024) uses [cloudnative-pg](https://cloudnative-pg.io/) to create a postgres cluster (primary + replica) which is used as the meta store. These instances provide high availability and backup support.
+- The default Helm chart (after February 23, 2024) uses [cloudnative-pg](https://cloudnative-pg.io/) to create a PostgreSQL cluster (primary + replica) which is used as the meta store. These instances provide high availability and backup support.
 
 ### etcd (Removed)
 
 !!! warning "Removal notice"
-    Etcd support has been removed. Use NATS instead.
-    
-- Set `ZO_META_STORE=etcd`.
-- While etcd is used as the cluster coordinator, it was also the default metadata store in Helm charts released before 23 February 2024. This configuration is now deprecated. Helm charts released after 23 February 2024 use PostgreSQL as the default metadata store.
+    Etcd support has been removed. Use NATS as the cluster coordinator and PostgreSQL (or MySQL) as the metadata store. Helm charts released after 23 February 2024 already use PostgreSQL by default.
 
 ### MySQL (Deprecated)
 - Set `ZO_META_STORE=mysql`.
 - Deprecated. 
 - Use PostgreSQL instead.
+
+## Next steps
+
+- [HA deployment](../../deployment/ha-deployment.md): configure object storage and metadata store in a production cluster.
+- [Environment variables](../../configuration/environment-variables.md): full reference for `ZO_S3_*` and `ZO_META_*` settings.
+- [Capacity planning](../../../enterprise-setup/capacity-planning.md): sizing storage, compute, and memory for each component.
+
+**Need some help?**
+
+- Join our [Community Slack](https://short.openobserve.ai/community) 
+- Or [Contact support](https://openobserve.ai/contactus/)
@@ -45,13 +45,16 @@ If you have very high ingestion speed requirements (e.g. 100s of thousands of ev
 
 OpenObserve does not do full-text indexing like Elasticsearch. This results in very high compression ratio of ingested data. Coupled with object storage this can give you ~140x lower storage cost. However, this also means that search performance for full text queries in absence of full-text indexes might suffer. However log data has some unique properties that can be leveraged to improve search performance significantly. OpenObserve uses following techniques to improve search performance:
 
-### Column pruning 
+### Column pruning
+
 OpenObserve uses columnar storage format (parquet) which allows it to read only the columns that are required for a query. This reduces the amount of data that needs to be read from disk and improves search performance. This technique is called column pruning. It reduces the amount of data that needs to be read from disk. You must switch to SQL query mode for this and specify only the columns that you want to be returned.
 
-### Predicate pushdown: 
+### Predicate pushdown
 
-#### Standard Partitioning (KeyValue partitions) 
->Note: Use For low cardinality fields
+#### Standard Partitioning (KeyValue partitions)
+
+!!! note
+    Use for low cardinality fields.
 
 OpenObserve uses a technique called predicate pushdown to further reduce the amount of data that needs to be read from disk. This is done by pushing down the filters to the storage layer. By default OpenObserve will partition data by `org/stream/year/month/day/hour`. So when searching, if you know the time range for which you are searching for data you should specify it and OpenObserve will skip data not following in date range and will search across much less data. This will improve search performance and will utilize predicate pushdown. You can also enable additional partitioning for fields on any stream by going to stream settings. Some good candidates for partition keys are host and kubernetes namespace. You can have multiple partition keys for a stream. You can then specify partition keys in your query. e.g. `host='xyz' and kubernetes_namespace='abc'`. This will improve search performance and will utilize predicate pushdown.*** `DO NOT enable partitioning on all/many fields as it may result in many small underlying parquet files which will result in low compression, extremely poor search performance and high s3 storage costs` ***. As a rule of thumb you would want the size of each stored parquet file to be above 5 MB. Order of partitions does not matter. You can partition by `namespace, pod` or `pod, namespace`. 
 
@@ -116,12 +119,16 @@ For the above scenario, you can enable hash partitioning on namespace field with
 You can specify the number of buckets (8, 16, 32, 64, 128) in the index in stream setting when setting up hash partitioning for a particular field.
 
 #### Time range partition
->Note: Enabled by default and cannot be disabled
+
+!!! note
+    Enabled by default and cannot be disabled.
 
 OpenObserve partitions all data by time range by default in addition to any other partitions that you may have defined. It always makes sense to specify the shortest time range to search for. e.g. if you know that you are looking for data for last 15 minutes, you should specify that in your query by selecting it from the top right corner. This will improve search performance and will utilize predicate pushdown.
 
-### Bloom filter (available starting v0.8.0) (For high cardinality fields)
->Note: Use For high cardinality fields
+### Bloom filter
+
+!!! note
+    Use for high cardinality fields. Available starting v0.8.0.
 
 A bloom filter is a space efficient probabilistic data structure that allow you to check if a value exists in a set. It solves proverbial `needle in a haystack` problem. OpenObserve uses bloom filters to check if a value exists in a column. This allows OpenObserve to skip reading the data from disk if the value does not exist in the column. This improves search performance by reducing `search space`. You must specify bloom filter for the specific fields that you want to search.  Fields that are well suited for bloom filter are of very high cardinality .e.g. UUID, request_id, trace_id, device_id, etc. You can specify bloom filter for a field by going to stream settings. You can specify multiple fields for bloom filter. e.g. `request_id` and `trace_id`. You can then use the fields in your query that will utilize bloom filter. e.g. `request_id='abc' and trace_id='xyz'`. Enabling bloom filter on a field with low cardinality will not result in any performance improvement. 
 
@@ -133,7 +140,10 @@ Log search involves full text search. When you try to do a full text search it e
 1. Do not use `match_all` directly on full data set, but always use it in combination with one or more filters which can themselves be optimized by partitions or bloom filters. e.g. `host='host1' and match_all('error') ` or `k8s_namespace_name='ns1' and match_all('error')` or `bank_account_number='653456-54654-65' and match_all('error')`. In all of these examples using the filter reduces search space for `match_all`. Additionally if `host` and `k8s_namespace_name` fields are partitioned then you have reduced search space very well and will gain the improvements in full text search. `bank_account_number`, `request_id`, `trace_id`, `device_id` are good candidates for bloom filter and should be used together with `match_all` to improve full text search performance.
 1. Enable full text search only on the fields that you need. e.g. body, log, message etc. Fields like hostname, ip address, etc. are not good candidates for full text search and you should not enable full text search on these fields. You can enable full text search on a field by going to stream settings. You can specify multiple fields for full text search. e.g. `body` and `message`. You can then use the fields in your query that will utilize full text search. e.g. `host='host1' and match_all('error')`. 
 
-### Inverted Index (available starting v0.10.0)
+### Inverted Index
+
+!!! note
+    Available starting v0.10.0.
 
 Above mentioned partitioning schemes and bloom filters are good for fields where you are doing equality based searches. e.g. `request_id='abc'`. For full text search in fields that contain longer log lines, OpenObserve in its earlier releases relied on brute force search (how [grep](https://www.gnu.org/software/grep/manual/grep.html) works) which works well for most of the scenarios. However, for very large data sets this can be slow. You can enable inverted index to improve full text search performance for such fields. Do not enable inverted index for fields that are not used for full text search but are used for equality based searches. Bloom filters and hash partitions are better suited for equality based searches.
 
@@ -325,3 +335,15 @@ By enabling User-Defined Schemas (via `ZO_ALLOW_USER_DEFINED_SCHEMAS=true`), you
 
 If you later need one of the fields from the `_raw` data to be searchable, simply add it to the UDS in the stream’s settings. After doing so, this field will become searchable going forward.
 
+## Next steps
+
+- [Capacity planning](capacity-planning.md): sizing CPU, memory, and storage for each component.
+- [HA deployment](../administration/deployment/ha-deployment.md): production-grade cluster setup.
+- [Architecture](../architecture.md): understand how the components interact.
+
+**Need some help?**
+
+- Join our [Community Slack](https://short.openobserve.ai/community) 
+- Or [Contact support](https://openobserve.ai/contactus/)
+
+