Skip to content

Commit 3ef7d82

Browse files
CASSANDRA-21291: Fix duplicate section and minor formatting typos in compaction overview
1 parent bf755d0 commit 3ef7d82

2 files changed

Lines changed: 46 additions & 60 deletions

File tree

doc/modules/cassandra/pages/managing/operating/compaction/overview.adoc

Lines changed: 29 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,18 @@
22

33
== What is compaction?
44

5-
Data in {cassandra} is created in xref:cassandra:architecture/storage-engine.adoc#memtables[memtables].
5+
Data in {cassandra} is created in xref:cassandra:architecture/storage-engine.adoc#memtables[memtables]. 
66
Once a memory threshold is reached, to free up memory again, the data is written to an xref:cassandra:architecture/storage-engine.adoc#SSTables[SSTable], an https://cassandra.apache.org/_/glossary.html#immutable[immutable] file residing on disk.
77

8-
Because SSTables are immutable, when data is updated or deleted, the old data is not overwritten with inserts or updates, or removed from the SSTable.
9-
Instead, a new SSTable is created with the updated data with a new timestamp, and the old SSTable is marked for deletion.
8+
Because SSTables are immutable, when data is updated or deleted, the old data is not overwritten with inserts or updates, or removed from the SSTable. 
9+
Instead, a new SSTable is created with the updated data with a new timestamp, and the old SSTable is marked for deletion. 
1010
The piece of deleted data is known as a https://cassandra.apache.org/_/glossary.html#tombstone[tombstone].
1111

12-
Over time, Cassandra may write many versions of a row in different SSTables.
13-
Each version may have a unique set of columns stored with a different timestamp.
12+
Over time, Cassandra may write many versions of a row in different SSTables. 
13+
Each version may have a unique set of columns stored with a different timestamp. 
1414
As SSTables accumulate, the distribution of data can require accessing more and more SSTables to retrieve a complete row.
1515

16-
To keep the database healthy, Cassandra periodically merges SSTables and discards old data.
16+
To keep the database healthy, Cassandra periodically merges SSTables and discards old data. 
1717
This process is called https://cassandra.apache.org/_/glossary.html#compaction[compaction].
1818

1919
== Why must compaction be run?
@@ -26,22 +26,22 @@ Deleting, updating, or expiring data are all valid triggers for compaction.
2626
== What does compaction accomplish?
2727

2828
Two important factors accomplished by compaction are performance improvement and disk space reclamation.
29-
If SSTables have duplicate data that must be read, read operations are slower.
29+
If SSTables have duplicate data that must be read, read operations are slower. 
3030
Once tombstones and duplicates are removed, read operations are faster.
3131
SSTables use disk space, and reducing the size of SSTables through compaction frees up disk space.
3232

3333
== How does compaction work?
3434

35-
Compaction works on a collection of SSTables.
36-
From these SSTables, compaction collects all versions of each unique row and assembles one complete row, using the most up-to-date version (by timestamp) of each of the row's columns.
37-
The merge process is performant, because rows are sorted by partition key within each SSTable, and the merge process does not use random I/O.
38-
The new versions of each row is written to a new SSTable.
35+
Compaction works on a collection of SSTables. 
36+
From these SSTables, compaction collects all versions of each unique row and assembles one complete row, using the most up-to-date version (by timestamp) of each of the row's columns. 
37+
The merge process is performant, because rows are sorted by partition key within each SSTable, and the merge process does not use random I/O. 
38+
The new versions of each row is written to a new SSTable. 
3939
The old versions, along with any rows that are ready for deletion, are left in the old SSTables, and are deleted as soon as pending reads are completed.
4040

4141
== Types of compaction
4242

4343
The concept of compaction is used for different kinds of operations in
44-
{cassandra}, the common thing about these operations is that it takes one
44+
{cassandra}, the common thing about these operations is that they take one
4545
or more SSTables, merges, and outputs new SSTables. The types of compactions are:
4646

4747
Minor compaction::
@@ -56,11 +56,11 @@ A major compaction is triggered when a user executes a compaction over all SSTab
5656
User defined compaction::
5757
Similar to a major compaction, a user-defined compaction executes when a user triggers a compaction on a given set of SSTables.
5858
Scrub::
59-
A scrub triggers a compaction to try to fix any broken SSTables.
59+
A scrub triggers a compaction to try to fix any broken SSTables. 
6060
This can actually remove valid data if that data is corrupted.
6161
If that happens you will need to run a full repair on the node.
6262
UpgradeSSTables::
63-
A compaction occurs when you upgrade SSTables to the latest version.
63+
A compaction occurs when you upgrade SSTables to the latest version. 
6464
Run this after upgrading to a new major version.
6565
Cleanup::
6666
Compaction executes to remove any ranges that a node no longer owns.
@@ -71,8 +71,8 @@ Anticompaction::
7171
After repair, the ranges that were actually repaired are split out of the SSTables that existed when repair started. This type of compaction rewrites SSTables to accomplish this task.
7272
Sub range compaction::
7373
It is possible to only compact a given sub range - this action is useful if you know a token that has been misbehaving - either gathering many updates or many deletes.
74-
The command `nodetool compact -st x -et y` will pick all SSTables containing the range between x and y and issue a compaction for those SSTables.
75-
For Size Tiered Compaction Strategy, this will most likely include all SSTables, but with Leveled Compaction Strategy, it can issue the compaction for a subset of the SSTables.
74+
The command `nodetool compact -st x -et y` will pick all SSTables containing the range between x and y and issue a compaction for those SSTables. 
75+
For Size Tiered Compaction Strategy, this will most likely include all SSTables, but with Leveled Compaction Strategy, it can issue the compaction for a subset of the SSTables. 
7676
With LCS the resulting SSTable will end up in L0.
7777

7878
== Strategies
@@ -82,14 +82,14 @@ Picking the right compaction strategy for your workload will ensure the best per
8282

8383
xref:cassandra:managing/operating/compaction/ucs.adoc[`Unified Compaction Strategy (UCS)`]::
8484
UCS is a good choice for most workloads and is recommended for new workloads.
85-
This compaction strategy is designed to handle a wide variety of workloads.
86-
It is designed to be able to handle both immutable time-series data and workloads with lots of updates and deletes.
87-
It is also designed to be able to handle both spinning disks and SSDs.
88-
xref:cassandra:managing/operating/compaction/stcs.adoc[`Size Tiered Compaction Strategy (STCS)`]::
89-
STCS is the default compaction strategy, because it is useful as a fallback when other strategies don't fit the workload.
85+
This compaction strategy is designed to handle a wide variety of workloads. 
86+
It is designed to be able to handle both immutable time-series data and workloads with lots of updates and deletes. 
87+
It is also designed to be able to handle both spinning disks and SSDs.  
88+
xref:cassandra:managing/operating/compaction/stcs.adoc[`Size Tiered Compaction Strategy (STCS)`]:: 
89+
STCS is the default compaction strategy, because it is useful as a fallback when other strategies don't fit the workload. 
9090
Most useful for not strictly time-series workloads with spinning disks, or when the I/O from `LCS` is too high.
9191
xref:cassandra:managing/operating/compaction/lcs.adoc[`Leveled Compaction Strategy (LCS)`]::
92-
Leveled Compaction Strategy (LCS) is optimized for read heavy workloads, or workloads with lots of updates and deletes.
92+
Leveled Compaction Strategy (LCS) is optimized for read heavy workloads, or workloads with lots of updates and deletes. 
9393
It is not a good choice for immutable time-series data.
9494
xref:cassandra:managing/operating/compaction/twcs.adoc[`Time Window Compaction Strategy (TWCS)`]::
9595
Time Window Compaction Strategy is designed for TTL'ed, mostly immutable time-series data.
@@ -107,19 +107,6 @@ of the TTL) Cassandra will have a hard time dropping the tombstones
107107
created since the partition might span many SSTables and not all are
108108
compacted at once.
109109

110-
== Fully expired SSTables
111-
112-
If an SSTable contains only tombstones and it is guaranteed that
113-
SSTable is not shadowing data in any other SSTable, then the compaction can drop
114-
that SSTable. If you see SSTables with only tombstones (note that TTL-ed
115-
data is considered tombstones once the time-to-live has expired), but it
116-
is not being dropped by compaction, it is likely that other SSTables
117-
contain older data. There is a tool called `sstableexpiredblockers` that
118-
will list which SSTables are droppable and which are blocking them from
119-
being dropped. With `TimeWindowCompactionStrategy` it
120-
is possible to remove the guarantee (not check for shadowing data) by
121-
enabling `unsafe_aggressive_sstable_expiration`.
122-
123110
== Repaired/unrepaired data
124111

125112
With incremental repairs Cassandra must keep track of what data is
@@ -161,8 +148,8 @@ When an SSTable is written a histogram with the tombstone expiry times
161148
is created and this is used to try to find SSTables with very many
162149
tombstones and run single SSTable compaction on that SSTable in hope of
163150
being able to drop tombstones in that SSTable. Before starting this it
164-
is also checked how likely it is that any tombstones will actually will
165-
be able to be dropped how much this SSTable overlaps with other
151+
is also checked how likely it is that any tombstones will actually
152+
be able to be dropped and how much this SSTable overlaps with other
166153
SSTables. To avoid most of these checks the compaction option
167154
`unchecked_tombstone_compaction` can be enabled.
168155

@@ -178,11 +165,11 @@ How much of the SSTable should be tombstones for us to consider doing a single S
178165
`tombstone_compaction_interval` (default: 86400s (1 day))::
179166
Since it might not be possible to drop any tombstones when doing a single SSTable compaction we need to make sure that one SSTable is not constantly getting recompacted - this option states how often we should try for a given SSTable.
180167
`log_all` (default: false)::
181-
New detailed compaction logging, see `below <detailed-compaction-logging>`.
168+
New detailed compaction logging, see <<detailed-compaction-logging, below>>.
182169
`unchecked_tombstone_compaction` (default: false)::
183-
The single SSTable compaction has quite strict checks for whether it should be started, this option disables those checks and for some use cases this might be needed.
170+
The single SSTable compaction has quite strict checks for whether it should be started, this option disables those checks and for some use cases this might be needed. 
184171
Note that this does not change anything for the actual compaction, tombstones are only dropped if it is safe to do so - it might just rewrite an SSTable without being able to drop any tombstones.
185-
`only_purge_repaired_tombstone` (default: false)::
172+
`only_purge_repaired_tombstones` (default: false)::
186173
Option to enable the extra safety of making sure that tombstones are only dropped if the data has been repaired.
187174
`min_threshold` (default: 4)::
188175
Lower limit of number of SSTables before a compaction is triggered.
@@ -195,7 +182,7 @@ Further, see the section on each strategy for specific additional options.
195182

196183
== Compaction nodetool commands
197184

198-
The `nodetool <nodetool>` utility provides a number of commands related to compaction:
185+
The `nodetool` utility provides a number of commands related to compaction:
199186

200187
`enableautocompaction`::
201188
Enable compaction.
@@ -212,7 +199,7 @@ Set the min/max SSTable count for when to trigger compaction, defaults to 4/32.
212199

213200
== Switching the compaction strategy and options using JMX
214201

215-
It is possible to switch compaction strategies and its options on just a single node using JMX, this is a great way to experiment with settings without affecting the whole cluster.
202+
It is possible to switch compaction strategies and its options on just a single node using JMX, this is a great way to experiment with settings without affecting the whole cluster. 
216203
The mbean is:
217204

218205
[source,console]

doc/modules/cassandra/pages/managing/operating/compaction/tombstones.adoc

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77
{cassandra}'s processes for deleting data are designed to be efficient, and to work with {cassandra}'s native features for data distribution and fault-tolerance.
88

99
{cassandra} treats a deletion as an insertion, and inserts a time-stamped deletion marker called a tombstone.
10-
The tombstones go through {cassandra}'s write path, and are written to SSTables on one or more nodes.
11-
The key feature difference of a tombstone is that it has a built-in expiration date/time.
10+
The tombstones go through {cassandra}'s write path, and are written to SSTables on one or more nodes. 
11+
The key feature difference of a tombstone is that it has a built-in expiration date/time. 
1212
At the end of its expiration period, called the grace period, the tombstone is deleted as part of {cassandra}'s normal compaction process.
1313

1414
[NOTE]
@@ -19,7 +19,7 @@ After `gc_grace_seconds` has elapsed, the data is eligible for permanent removal
1919
====
2020

2121
== Why tombstones?
22-
22+
 
2323
The tombstone represents the deletion of an object, either a row or column value.
2424
This approach is used instead of removing values because of the distributed nature of {cassandra}.
2525
Once an object is marked as a tombstone, queries will ignore all values that are time-stamped previous to the tombstone insertion.
@@ -41,17 +41,17 @@ Its default value is 864000 seconds (ten days), after which a tombstone expires
4141
Prior to the grace period expiring, {cassandra} will retain a tombstone through compaction events.
4242
Each table can have its own value for this property.
4343

44-
The purpose of the grace period is to give unresponsive nodes time to recover and process tombstones normally.
45-
If a client writes a new update to the tombstoned object during the grace period, {cassandra} overwrites the tombstone.
44+
The purpose of the grace period is to give unresponsive nodes time to recover and process tombstones normally. 
45+
If a client writes a new update to the tombstoned object during the grace period, {cassandra} overwrites the tombstone. 
4646
If a client sends a read for that object during the grace period, {cassandra} disregards the tombstone and retrieves the object from other replicas if possible.
4747

48-
When an unresponsive node recovers, {cassandra} uses hinted handoff to replay the database mutations the node missed while it was down.
49-
{cassandra} does not replay a mutation for a tombstoned object during its grace period.
48+
When an unresponsive node recovers, {cassandra} uses hinted handoff to replay the database mutations the node missed while it was down. 
49+
{cassandra} does not replay a mutation for a tombstoned object during its grace period. 
5050
But if the node does not recover until after the grace period ends, {cassandra} may miss the deletion.
5151

5252
After the tombstone's grace period ends, {cassandra} deletes the tombstone during compaction.
5353

54-
== Deletion
54+
== Deletion 
5555

5656
Once the `gc_grace_seconds` period has passed, the tombstone can be removed, meaning there will no longer be any record indicating that a specific piece of data was deleted.
5757
However, deleting data can be complicated because the tombstone might exist in one SSTable while the data it marks for deletion is in another.
@@ -60,7 +60,7 @@ More specifically, a tombstone is only dropped when:
6060

6161
* The tombstone must be older than `gc_grace_seconds`.
6262
Note that tombstones will not be removed until a compaction event even if `gc_grace_seconds` has elapsed.
63-
* If partition X contains the tombstone, the SSTable containing the partition plus all SSTables containing data older than the tombstone containing X must be included in the same compaction.
63+
* If partition X contains the tombstone, the SSTable containing the partition plus all SSTables containing data older than the tombstone containing X must be included in the same compaction. 
6464
If all data in any SSTable containing partition X is newer than the tombstone, it can be ignored.
6565
* If the option `only_purge_repaired_tombstones` is enabled, tombstones are only removed if the data has also been repaired.
6666
This process is described in the "Deletes with tombstones" sections.
@@ -70,22 +70,21 @@ This is basically the same as in the "Deletes without Tombstones" section.
7070

7171
=== Deletes without tombstones
7272

73-
Imagine a three node cluster which has the value [A] replicated to every
74-
node.:
73+
Imagine a three node cluster which has the value [A] replicated to every node:
7574

7675
[source,none]
7776
----
7877
[A], [A], [A]
7978
----
8079

81-
If one of the nodes fails and and our delete operation only removes existing values, we can end up with a cluster that looks like:
80+
If one of the nodes fails and our delete operation only removes existing values, we can end up with a cluster that looks like:
8281

8382
[source,none]
8483
----
8584
[], [], [A]
8685
----
8786

88-
Then a repair operation would replace the value of [A] back onto the two nodes which are missing the value.:
87+
Then a repair operation would replace the value of [A] back onto the two nodes which are missing the value:
8988

9089
[source,none]
9190
----
@@ -96,7 +95,7 @@ This would cause our data to be resurrected as a zombie even though it had been
9695

9796
=== Deletes with tombstones
9897

99-
Starting again with a three node cluster which has the value [A] replicated to every node.:
98+
Starting again with a three node cluster which has the value [A] replicated to every node:
10099

101100
[source,none]
102101
----
@@ -117,16 +116,16 @@ Now when we issue a repair the tombstone will be copied to the replica, rather t
117116
[A, Tombstone[A]], [A, Tombstone[A]], [A, Tombstone[A]]
118117
----
119118

120-
Our repair operation will correctly put the state of the system to what we expect with the object [A] marked as deleted on all nodes.
121-
This does mean we will end up accruing tombstones which will permanently accumulate disk space.
119+
Our repair operation will correctly put the state of the system to what we expect with the object [A] marked as deleted on all nodes. 
120+
This does mean we will end up accruing tombstones which will permanently accumulate disk space. 
122121
To avoid keeping tombstones forever, we set `gc_grace_seconds` for every table in {cassandra}.
123122

124123
== Fully expired SSTables
125124

126125
If an SSTable contains only tombstones and it is guaranteed that SSTable is not shadowing data in any other SSTable, then the compaction can drop
127-
that SSTable.
126+
that SSTable.  
128127
If you observe SSTables that contain only tombstones or expired TTL data, and compaction is not removing them, it likely indicates that older versions of the data still exist in other SSTables.
129-
There is a tool called `sstableexpiredblockers` that will list which SSTables are droppable and which are blocking them from being dropped.
128+
There is a tool called `sstableexpiredblockers` that will list which SSTables are droppable and which are blocking them from being dropped. 
130129
With `TimeWindowCompactionStrategy` it is possible to remove the guarantee (not check for shadowing data) by enabling `unsafe_aggressive_sstable_expiration`.
131130

132131

0 commit comments

Comments
 (0)