metagenomic CRISPR analysis tool - MCAAT v1.0.0

Finds CRISPR arrays in raw, un-assembled metagenomic reads. Builds a succinct de Bruijn graph and detects multicycles - the structural signature of CRISPR repeat-spacer arrays - without any prior assembly step.

Outperforms assembly-based workflows and other assembly-free CRISPR detectors on synthetic and real metagenomes.

Requirements

CMake ≥ 3.12, C++17, zlib, OpenMP, BZip2
Docker (recommended for production use)

Build

git clone --recurse-submodules https://github.com/RNABioInfo/mcaat.git
cd mcaat

chmod +x ./install.sh
./install.sh

The mcaat binary will be at build/mcaat.

Optional flags:

./install.sh --install   # also installs to system
./install.sh --clean     # clean build artifacts

Docker

Note

A pre-built image is available on Docker Hub — no manual dependency setup required.

docker pull feeka94/mcaat:1.0.0

docker build -t mcaat .

docker run --rm -v $(pwd):/data mcaat \
  --input-files /data/reads_R1.fastq /data/reads_R2.fastq \
  --output-folder /data/results

The image is based on debian:bookworm-slim and ships only the mcaat binary and runtime libs (libomp5, zlib1g).

Usage

Detailed usage of the tool is outlined: rnabioinfo.github.io/mcaat

Exactly one input source is required — either raw reads or a pre-built graph:

# From reads (builds the graph internally)
mcaat --input-files <file1> [file2] [options]

# From a pre-built graph (skips graph construction)
mcaat --graph <path> [options]

Required (one of):

Flag	Description
`--input-files <file1> [file2]`	One or two FASTA/FASTQ files — plain or gzipped. One file = single-end, two = paired-end
`--graph <path>`	Pre-built SDBG graph directory (or file prefix) from a previous run (skips graph construction)

Optional:

Flag	Default	Description
`--output-folder <path>`	`mcaat_run_YYYY-MM-DD_HH-MM-SS/`	Output directory
`--ram <amount>`	95% of system RAM	Memory cap. Units: `B`, `K`, `M`, `G` (e.g. `--ram 8G`)
`--threads <num>`	CPU cores − 2	Thread count
`--cycle-max-length <int>`	`77`	Maximum cycle length to search
`--cycle-min-length <int>`	`27`	Minimum cycle length to search
`--threshold-multiplicity <int>`	`20`	Min edge multiplicity for cycle start nodes
`--low-abundance <true\|false>`	`true`	Enable low-abundance mode
`--autoclean <true\|false>`	`true`	Remove intermediate graph/cycle files after run. Set to `false` to keep them
`--settings <path>`	—	Key=value settings file (CLI flags override it)
`--help`, `-h`	—	Show usage and exit

Output

<output-folder>/
├── CRISPR_Arrays_1.txt  # detected arrays (split into numbered files if large)
├── graph/               # succinct de Bruijn graph files
└── cycles/              # raw cycle data

Each CRISPR_Arrays_N.txt file has a short header followed by one block per array:

# MCAAT — CRISPR Array Output
# Generated : 2026-05-12 10:30:21
# Arrays    : 42
# Spacers   : 312

>Array_1  spacers=8
ATCGATCGATCGATCGATCGATCG
        --------------------    AACCCGGTTAATCGATCGTTTCGAGC
        --------------------    TTGGCCAATCGATCGATCAAAACGGG
        ATCGATCGATCGATCTATCG    GGAATTCCAATCGATCGAATACCCAC   ← repeat variant

The consensus repeat sequence is on its own line. Each spacer entry shows the repeat variant (or dashes when it matches the consensus exactly) followed by the spacer sequence.

Settings file

Pass a key=value file with --settings. CLI flags override any value from the file.

input-files=/data/R1.fastq /data/R2.fastq
ram=128G
threads=26
output-folder=results/run_1
cycle-max-length=77
cycle-min-length=27
threshold-multiplicity=20
low-abundance=true
autoclean=true

input-files accepts one or two paths separated by spaces, commas, or semicolons.

v2.0.0 (planned)

CAS detection: identify and annotate CAS genes flanking detected CRISPR arrays
Protospacer detection: map spacers back to reads/contigs to find protospacer sequences and PAM sites

Citation

If you use MCAAT please cite: https://academic.oup.com/microlife/article/doi/10.1093/femsml/uqaf016/8205558

Contact: Please write an issue on our GitHub page if any problems occur.

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
.github/workflows		.github/workflows
bench		bench
docs		docs
include		include
libs		libs
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
icon.png		icon.png
install.sh		install.sh
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

metagenomic CRISPR analysis tool - MCAAT v1.0.0

Requirements

Build

Docker

Usage

Output

Settings file

v2.0.0 (planned)

Citation

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

metagenomic CRISPR analysis tool - MCAAT v1.0.0

Requirements

Build

Docker

Usage

Output

Settings file

v2.0.0 (planned)

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages