Updates to log loss article.#3267
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
WalkthroughRefines the high-volume log-loss documentation: clarifies unread-container-log loss (including short-lived pods), switches operational metrics from events/sec to bytes/sec, expands rotation/other-log controls and limitations, restructures metrics guidance, warns about rotation backlog I/O, and replaces “alternatives” with a “Bad alternatives” section plus updated checklist. ChangesHigh-volume log-loss guide
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
Review Summary by QodoComprehensive updates to high-volume log loss documentation
WalkthroughsDescription• Clarifies log loss concepts and adds missing technical details • Improves metrics documentation with better explanations and new queries • Expands recommendations section with practical guidance on CPU/memory tuning • Reorganizes and corrects information about other log types (journald, audit) • Adds warnings about disk I/O impact and per-node variation in capacity planning • Restructures "bad alternatives" section with clearer explanations of buffer limitations Diagramflowchart LR
A["Log Loss Article"] -->|Clarifies concepts| B["Overview & Rotation"]
A -->|Expands metrics| C["Metrics Documentation"]
A -->|Adds guidance| D["Recommendations"]
A -->|Improves explanations| E["Other Log Types"]
A -->|Restructures| F["Bad Alternatives"]
D -->|New section| G["Check Forwarder CPU/Memory"]
F -->|Better clarity| H["Buffer Limitations"]
File Changes1. docs/administration/high-volume-log-loss.adoc
|
Code Review by Qodo
1. Conflicting disk sizing guidance
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alanconway The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This update is to address the unresolved comments from Pull Request openshift#3166.
|
@alanconway: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
jcantrill
left a comment
There was a problem hiding this comment.
couple questions but overal lgtm
| [NOTE] | ||
| ==== | ||
| CRI-O may compress rotated log files (`.gz`). | ||
| The collector cannot read compressed files — they are excluded from collection. |
There was a problem hiding this comment.
Do we know this is a true statement other than we exclude them now. I'm not certain we have ever tested it
|
|
||
| Linux audit node logs:: The write-rate is total of all auditable actions on the node. | ||
| Rotation is controlled by `auditd`, which is configured by `/etc/auditd/auditd.conf`. | ||
| Linux audit node logs:: Rotation is controlled by `auditd`, configured in `/etc/audit/auditd.conf`. |
There was a problem hiding this comment.
is there a cluster config that controls the config settings in this file? Can it be changed without rebuilding the image? If so, can we reference that here?
|
|
||
| journald node logs:: The write-rate in is the total volume of logs from _local_ processes on the node. | ||
| Rotation is controlled by local `journald.conf` configuration files. | ||
| journald node logs:: Rotation is controlled by `journald.conf` configuration files. |
There was a problem hiding this comment.
Same question here as below
| [CAUTION] | ||
| ==== | ||
| Large rotation settings mean more data accumulates on disk during outages. | ||
| When the collector catches up after an outage, reading a large backlog causes heavy disk I/O |
There was a problem hiding this comment.
Do we need to mention anything here about receivers with ingetst rate limits (i.e. loki) where the burst may result in dropped logs and therefore they need to consider the config of the receiver
Description
Update the log loss article to address unresolved comments on the original PR:
#3166
/assign jcantrill
/cc Clee2691
/cc r2d2rnd
Links
fix: update log loss article to address comments.
This update is to address the unresolved comments from Pull Request #3166.
Summary by CodeRabbit