Skip to content

Commit ca3b09c

Browse files
authored
Merge branch 'master' into rob-1126_headers_for_centralized_prometheus
2 parents be63b64 + 996957a commit ca3b09c

38 files changed

Lines changed: 875 additions & 283 deletions

docs/configuration/holmesgpt/toolsets/aws.rst

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -74,21 +74,7 @@ Configuration
7474

7575
.. md-tab-item:: Robusta Helm Chart
7676

77-
.. code-block:: yaml
78-
79-
holmes:
80-
additionalEnvVars:
81-
- name: AWS_ACCESS_KEY_ID
82-
value: AKIXDDDSDSdSA
83-
- name: AWS_SECRET_ACCESS_KEY
84-
value: =wJalrXUtnFEMI/KNG/bPxRfiCYEXAMPLEKEY
85-
- name: AWS_DEFAULT_REGION
86-
value: us-west-2
87-
toolsets:
88-
aws/rds:
89-
enabled: true
90-
91-
.. include:: ./_toolset_configuration.inc.rst
77+
This builtin toolset is currently only available in HolmesGPT CLI.
9278

9379
.. md-tab-item:: Holmes CLI
9480

docs/configuration/holmesgpt/toolsets/coralogix_logs.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,8 @@ Configuration
6868
Advanced Configuration
6969
^^^^^^^^^^^^^^^^^^^^^^
7070

71-
**Frequent logs and archive**
71+
Frequent logs and archive
72+
****************************
7273

7374
By default, holmes fetched the logs from the `Frequent search <https://coralogix.com/docs/user-guides/account-management/tco-optimizer/logs/#frequent-search-data-high-priority>`_
7475
tier and only fetch logs from the `Archive` tier if the frequent search returned no result.
@@ -98,7 +99,8 @@ Here is a description of each possible log retrieval methodology:
9899
- **FREQUENT_SEARCH_FALLBACK** Search logs in the archive first. If there are no results, fallback to searching the frequent logs.
99100
- **BOTH_FREQUENT_SEARCH_AND_ARCHIVE** Always use both the frequent search and the archive to fetch logs. The result contains merged data which is deduplicated and sorted by timestamp.
100101

101-
**Search labels**
102+
Search labels
103+
***************
102104

103105
You can tweak the labels used by the toolset to identify kubernetes resources. This is **optional** and only needed if your
104106
logs settings for ``pod``, ``namespace``, ``application`` and ``subsystem`` differ from the defaults in the example below.
@@ -124,7 +126,8 @@ You can verify what labels to use by attempting to run a query in the coralogix
124126
:align: center
125127

126128

127-
**Disabling the default toolset**
129+
Disabling the default toolset
130+
*********************************
128131

129132
If Coralogix is your primary datasource for logs, it is **advised** to disable the default HolmesGPT logging
130133
tool by disabling the ``kubernetes/logs`` toolset. Without this. HolmesGPT may still use kubectl to

docs/configuration/holmesgpt/toolsets/grafanaloki.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ You can find the Grafana URL required for Loki in your Grafana cloud account set
3535

3636

3737
Obtaining the datasource UID
38-
-----------------------
38+
-----------------------------------
3939

4040
You may have multiple Loki data sources setup in Grafana. HolmesGPT uses a single Loki datasource to
4141
fetch the logs and it needs to know the UID of this datasource.

docs/configuration/holmesgpt/toolsets/grafanatempo.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ You can find the Grafana URL required for Tempo in your Grafana cloud account se
2525
:align: center
2626

2727
Obtaining the datasource UID
28-
-----------------------
28+
--------------------------------
2929

3030
You may have multiple Tempo data sources setup in Grafana. HolmesGPT uses a single Tempo datasource to
3131
fetch the traces and it needs to know the UID of this datasource.

docs/configuration/holmesgpt/toolsets/notion.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@ Notion
22
=======================
33

44
Notion Integration for HolmesGPT
5-
-------------
5+
---------------------------------------
66

77
Enabling this toolset allows HolmesGPT to fetch pages from Notion, making it useful when providing Notion-based runbooks.
88

99
Setup Instructions
10-
-------------
10+
-----------------------
1111

1212
1. **Create a Webhook Integration**
1313
- Go to the Notion Developer Portal.
@@ -25,7 +25,7 @@ Setup Instructions
2525

2626

2727
Configuration
28-
-------------
28+
------------------
2929

3030
.. code-block:: yaml
3131
@@ -46,7 +46,7 @@ Configuration
4646
.. include:: ./_toolset_configuration.inc.rst
4747

4848
Capabilities
49-
------------
49+
----------------
5050
.. include:: ./_toolset_capabilities.inc.rst
5151

5252
.. list-table::

docs/configuration/holmesgpt/toolsets/prometheus.rst

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
.. _toolset_prometheus:
22

33
Prometheus
4-
==========
4+
=============
55

66
By enabling this toolset, HolmesGPT will be able to generate graphs from prometheus metrics as well as help you write and
77
validate prometheus queries. HolmesGPT can also detect memory leak patterns, CPU throttling, lagging queues, and high
88
latency issues.
99

10-
Prior to generating a PromQL query, HolmesQPT tends to list the available metrics. This is done to ensure the metrics used
10+
Prior to generating a PromQL query, HolmesGPT tends to list the available metrics. This is done to ensure the metrics used
1111
in PromQL are actually available.
1212

1313
Configuration
@@ -47,7 +47,8 @@ Configuration
4747
4848
It is also possible to set the ``PROMETHEUS_URL`` environment variable instead of the above ``prometheus_url`` config key.
4949

50-
**Advanced configuration**
50+
Advanced configuration
51+
******************************************
5152

5253
Below is the full list of options for this toolset:
5354

@@ -74,7 +75,8 @@ Below is the full list of options for this toolset:
7475
- **fetch_metadata_with_series_api** Uses the `series API <https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers>`_ instead of the `metadata API <https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-metadata>`_. You should only set this value to `true` if the metadata API is disabled or not working. HolmesGPT's ability to select the right metric will be negatively impacted because the series API does not return key metadata like the metrics/series description or their type (gauge, histogram, etc.).
7576
- **tool_calls_return_data** Defaults to ``true``. If ``false``, no prometheus data will be returned to HolmesGPT. Set it to ``false`` if you frequently reach the token limit when using this toolset. Setting this setting to ``false`` will also disable HolmesGPT's ability to analyze prometheus data.
7677

77-
**Finding the prometheus URL**
78+
Finding the prometheus URL
79+
******************************************
7880

7981
The best way to find the prometheus URL is to use "ask holmes". This only works if your cluster is live and already connected to Robusta.
8082

@@ -85,7 +87,7 @@ If not, you can often find the prometheus URL by running the following command (
8587
kubectl get svc --all-namespaces -o jsonpath='{range .items[*]}{.metadata.name}{"."}{.metadata.namespace}{".svc.cluster.local:"}{.spec.ports[0].port}{"\n"}{end}' | grep prometheus | grep -Ev 'operat|alertmanager|node|coredns|kubelet|kube-scheduler|etcd|controller' | awk '{print "http://"$1}'
8688
8789
Capabilities
88-
------------
90+
-----------------
8991
.. include:: ./_toolset_capabilities.inc.rst
9092

9193
.. list-table::

docs/how-it-works/architecture.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Bundled Prometheus (Optional)
2121

2222
Install Robusta with :ref:`Prometheus included <embedded Prometheus stack>`. This is powered by ``kube-prometheus-stack``.
2323

24-
Alternatively, you can :ref:`integrate an existing Prometheus with Robusta <Integrating AlertManager and Prometheus>`.
24+
Alternatively, you can :ref:`integrate an existing Prometheus with Robusta <Integrating with Prometheus>`.
2525

2626
Web UI (Optional)
2727
^^^^^^^^^^^^^^^^^^^^^^

docs/how-it-works/coverage.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Prometheus Alerts
2020

2121
.. warning::
2222

23-
You must :ref:`send your Prometheus alerts to Robusta by webhook <Integrating AlertManager and Prometheus>` for these to work.
23+
You must :ref:`send your Prometheus alerts to Robusta by webhook <Integrating with Prometheus>` for these to work.
2424

2525
Other errors
2626
----------------
@@ -39,7 +39,7 @@ Change Tracking
3939
By default all changes to Deployments, DaemonSets, and StatefulSets are sent to the Robusta UI for correlation
4040
with Prometheus alerts and other errors.
4141

42-
These changes are not sent to other sinks (e.g. Slack) by default because they are spammy. :ref:`Automation Basics`
42+
These changes are not sent to other sinks (e.g. Slack) by default because they are spammy. :ref:`Routing Cookbook`
4343
explains how to selectively track changes you care about in Slack as well.
4444

4545
We also wrote a blog post `Why everyone should track Kubernetes changes and top four ways to do so <https://home.robusta.dev/blog/why-everyone-should-track-and-audit-kubernetes-changes-and-top-ways/>`_

docs/notification-routing/notification-routing-examples.rst

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
:hide-toc:
2+
13
Routing Cookbook
24
===================================
35

@@ -15,3 +17,63 @@ Routing Cookbook
1517
disable-oomkill-notifications
1618

1719
In this section you'll find example configurations for common routing patterns.
20+
21+
22+
.. grid:: 1 1 2 3
23+
:gutter: 3
24+
25+
.. grid-item-card:: :octicon:`book;1em;` Routing by Namespace
26+
:class-card: sd-bg-light sd-bg-text-light
27+
:link: routing-by-namespace
28+
:link-type: doc
29+
30+
Route notifications based on Kubernetes namespaces.
31+
32+
.. grid-item-card:: :octicon:`book;1em;` Routing by Alert Name
33+
:class-card: sd-bg-light sd-bg-text-light
34+
:link: routing-by-type
35+
:link-type: doc
36+
37+
Route notifications based on alert types.
38+
39+
.. grid-item-card:: :octicon:`book;1em;` Route by Time of Day
40+
:class-card: sd-bg-light sd-bg-text-light
41+
:link: implementing-monitoring-shifts
42+
:link-type: doc
43+
44+
Implement monitoring shifts for better alert management.
45+
46+
.. grid-item-card:: :octicon:`book;1em;` Routing to Multiple Slack Channels
47+
:class-card: sd-bg-light sd-bg-text-light
48+
:link: routing-to-multiple-slack-channels
49+
:link-type: doc
50+
51+
Send notifications to multiple Slack channels.
52+
53+
.. grid-item-card:: :octicon:`book;1em;` Routing Exclusion
54+
:class-card: sd-bg-light sd-bg-text-light
55+
:link: routing-exclusion
56+
:link-type: doc
57+
58+
Exclude specific alerts from being routed.
59+
60+
.. grid-item-card:: :octicon:`book;1em;` Dropping Specific Alerts
61+
:class-card: sd-bg-light sd-bg-text-light
62+
:link: routing-by-severity
63+
:link-type: doc
64+
65+
Route notifications based on alert severity.
66+
67+
.. grid-item-card:: :octicon:`book;1em;` Excluding "Resolved" Notifications
68+
:class-card: sd-bg-light sd-bg-text-light
69+
:link: excluding-resolved
70+
:link-type: doc
71+
72+
Exclude resolved alerts from notifications.
73+
74+
.. grid-item-card:: :octicon:`book;1em;` Disable "OOMKill" Notifications
75+
:class-card: sd-bg-light sd-bg-text-light
76+
:link: disable-oomkill-notifications
77+
:link-type: doc
78+
79+
Disable notifications for OOMKill events.

docs/playbook-reference/automatic-remediation-examples/job-to-remediate-alert.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Trigger the newly added playbook by simulating a Prometheus alert.
3535
3636
3737
Reference Alert Metadata in Remediation Jobs
38-
-------------------------------------------
38+
--------------------------------------------------
3939

4040
When remediating based on alerts, you can access all the alert metadata like name, namespace, cluster name, pod, node and more as environment variables.
4141

0 commit comments

Comments
 (0)