Pixels cache is the distributed columnar cache that co-locates with the query (compute) engine.
It consists of a cache coordinator on the master node and a cache manager on each worker node of the query engine cluster.
Their implementation can be found in pixels-daemon. The underlying functionalities are implemented in pixels-cache.
The cache coordinator maintains the cache plan that decides which column chunk in which row group is cached on which worker node. Whereas the cache manager on each worker node listens to the update of the cache plan and replaces the cache content on this worker node accordingly.
The cache plan is stored in etcd with the following data model:
- layout_version -> {schema_name}.{table_name}:{layout_version}: data layout version, updated by the user or program that want to trigger cache loading or replacement. Only increasing layout versions are accepted.
- cache_version -> {schema_name}.{table_name}:{layout_version}: cache version, set by the cache coordinator to notify the cache workers for cache loading or replacement when the cache tasks are ready in etcd.
- cache_location_{layout_version}_{worker_hostname} -> files: recording the array of files to be cached on the worker (cache manager) under the specified caching version.
The cache is read and updated as follows:
- When the
layout optimizergenerates a new layout, it writes the new layout (with a newlayout_version) into Etcd. - The
cache coordinatormonitors the new values oflayout_version, once it finds a newerlayout_version, it creates the corresponding cache tasks for each cache worker in Etcd, then updates thecache_versionto the latestlayout_version. - The
cache workersmonitorcache_version. When a cache worker finds a newcache_version, it reads its cache task fromcache_location_{layout_version}_{worker_hostname}and sets itself tobusyto avoid concurrent cache updating. After that, the cache worker begins to load or replace the local cache content. - When a query comes, Presto/Trino Coordinator checks Etcd for the cache plan, thus find available caches for its query splits.
- Each Presto/Trino Worker executes query splits with caching information (whether the column chunks in the query split are cached or not), and calls
PixelsCacheReaderto read the local cached column chunks (if any).
Find vmtouch-1.3.1.tar.xz in scripts/tars under the Pixels source code folder and decompress it to anywhere.
Enter the decompressed folder, run:
make installto build and install vmtouch to the operating system.
Install Pixels following the instructions HERE, but do not start Pixels before finishing the following configurations.
Check the following settings related to pixels-cache in PIXELS_HOME/etc/pixels.properties on each node:
# the base location of the cache zone files, the file name of each cache zone adds a postfix to the base location
cache.base.location=/mnt/ramfs/pixels.cache
# the user available size of the whole cache, including all zones, in bytes
cache.size=68719476736
# the location of the index files, the name of each index file adds a postfix to the base location
index.base.location=/mnt/ramfs/pixels.index
# the user available size of the whole index, including the global index and the zone indexes, in bytes
index.size=1073741824
# the number of zones in the cache
cache.zone.num=3
# the number of swap zones in the cache
cache.zone.swap.num=1
# the scheme of the storage system to be cached
cache.storage.scheme=hdfs
# set to true if cache.storage.scheme is a locality sensitive storage such as hdfs
cache.absolute.balancer.enabled=true
# set to true to enable pixels-cache
cache.enabled=true
# set to true to read cache without memory copy
cache.read.direct=true
# heartbeat lease ttl must be larger than heartbeat period
heartbeat.lease.ttl.seconds=20
# heartbeat period must be larger than 0
heartbeat.period.seconds=10The above values are a good default setting for each node to cache up-to 64GB data of table pixels.test_105 stored on HDFS.
Change cache.storage.scheme to cache the data stored in a different storage system.
On each worker node, create and mount an in-memory file system with 65GB capacity:
sudo mkdir -p /mnt/ramfs
sudo mount -t tmpfs -o size=65g tmpfs /mnt/ramfsThe size parameter of the mount command should be larger than or equal to the sum of cache.size and index.size in
PIXELS_HOME/etc/pixels.properties, but must be smaller than the available physical memory size.
Enter PIXELS_HOME and start all Pixels daemons using:
./sbin/start-pixels.shIf starting the daemons in a cluster of multiple nodes, set the hostnames of the worker nodes in PIXELS_HOME/sbin/workers
and run start-pixels.sh on the coordinator node. Each line in PIXELS_HOME/sbin/workers is the hostname of a
worker node. If the worker node has a different PIXELS_HOME environment variable than the coordinator node, append
the PIXELS_HOME variable to the hostname, separate by a space like this:
worker1 /home/pixels/worker1_pixels_home
worker2 /home/pixels/worker2_pixels_home
...On each worker node, pin the cache in memory using:
sudo -E ./sbin/pin-cache.shModify CACHE_PATH if it is not consistent with the mount point of the in-memory file system storing
the cache and index files.
Then create a new data layout for the cached table, and update layout_version of the cached table in etcd to trigger
cache loading or replacement:
./sbin/load-cache.sh {schema_name}.{table_name}:{layout_version}
# e.g., ./sbin/load-cache.sh tpch.lineitem:1schema_name and table_name specifies which table to cache.
Whereas layout_version specifies which layout version of the table to cache.
Note that pixels-cache only caches data in the compact path of the layout, so ensure the table is compacted on the layout.
See examples of compacting tables HERE.
Currently, we only cache the full compact files with the same number of row groups defined by
numRowGroupInFile in the LAYOUT_COMPACT field of the layout in metadata. The tail compact file
(if exists) with less row groups than numRowGroupInFile will be ignored in cache loading or replacement.
If you have modified the etcd hostname and port in PIXELS_HOME/etc/pixels.properties, change the ENDPOINTS property
in load-cache.sh as well.
To stop Pixels, run:
./sbin/stop-pixels.shon the coordinator node to stop all Pixels daemons in the cluster.
The cache does not lost when Pixels is stopped. And it can be reused the next time Pixels is started.
To clear the cache and free the memory, run:
sudo -E ./sbin/unpin-cache.shon each worker node to release the memory pinned for the cache.
After than, you can delete the shared-memory files at cache.location and index.location on each worker node to
finally release the memory occupied by the cache.
You can also umount the in-memory file system. This is optional. The in-memory file system will be
automatically umount when the operating system is restarted.
Then, run:
./sbin/reset-cache.shon any node in the cluster to reset the states related to pixels-cache in etcd.
If you have modified the etcd hostname and port in PIXELS_HOME/etc/pixels.properties, change the ENDPOINTS property
in reset-cache.sh as well.
To enable cache scaling, set cache_expand_or_shrink to 1 or 2 in etcd to trigger cache expansion or shrink.
The field values are interpreted as follows:
0: No scaling operation (default state)1: Trigger cache expansion (scale out)2: Trigger cache shrink (scale in)
# Set cache_expand_or_shrink to 1 to trigger cache expansion
etcdctl --endpoints=$host:$port put cache_expand_or_shrink 1Notes for Cache Scaling:
- The scaling operation does not aim to modify the configuration file. Once the PixelsWorker restarts, the scaling effect is no longer active.
- Now, Shrink cannot be used together with expansion within the same PixelsWorker lifecycle.
- The cache can automatically create a new physical zone file, but the deleted physical zone file needs to be removed manually after the shrink operation. The deleted physical zone file is the one located before the swap zones (the swap zones are positioned before the last location).
- The added or deleted physical zone file requires pinning or unpinning manually.