Skip to content

Commit e9dc628

Browse files
ewelsclaude
andauthored
feat: expose RustQC as a library crate (#101)
* feat: expose RustQC as a library crate Adds a [lib] target alongside the existing [[bin]] so RustQC's analysis modules can be consumed as a Rust library, not just via the CLI. - New src/lib.rs publishes config, cpu, gtf, io, rna, summary as the public API and hosts the Strandedness enum at the crate root. - main.rs is slimmed to bin-only modules (cli, ui, citations) and pulls the rest from rustqc::*. - Strandedness moves out of cli.rs (which now imports it from the lib) so the 6 analysis modules using it no longer depend on the CLI module. - format_count, format_pct, format_duration move from ui.rs to io.rs so library consumers can reach them; ui.rs uses them privately. Closes #72 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: fix pre-existing clippy warnings Clean up the 9 lints that were already failing on main, so the pre-commit clippy hook passes again. - counting.rs: replace manual zero-check + division with checked_div - index.rs: sort_unstable_by_key with std::cmp::Reverse instead of by-cmp - accumulators.rs: rewrite GC-bin compute with checked_div on read_len - read_distribution.rs: collapse 4 nested if-blocks into match guards - config.rs (tests): drop unnecessary borrow and avoid PathBuf-for-cmp Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: document the new Rust library API - src/lib.rs: expand crate-level rustdoc with install snippet, module index, stability note, and two compiling examples (GTF parse and Strandedness usage). Both examples are exercised by `cargo test --doc`. - docs/usage/library.mdx: new Astro page covering installation, the module surface, quick examples, and the current stability caveats (no pipeline-level entry point yet). - docs/astro.config.mjs: add the new page to the Usage sidebar. - README.md: short "Use as a Rust library" section pointing to the guide and to docs.rs/rustqc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 6c66851 commit e9dc628

19 files changed

Lines changed: 372 additions & 190 deletions

File tree

Cargo.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ categories = ["command-line-utilities", "science"]
1313
homepage = "https://seqeralabs.github.io/RustQC/"
1414
exclude = ["benchmark/", "docs/", "paper/", "tests/", ".github/", "Dockerfile", ".dockerignore", ".pre-commit-config.yaml", "netlify.toml", "CONTRIBUTING.md", "AGENTS.md"]
1515

16+
[lib]
17+
name = "rustqc"
18+
path = "src/lib.rs"
19+
1620
[[bin]]
1721
name = "rustqc"
1822
path = "src/main.rs"

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,17 @@ cargo install rustqc
8282

8383
See the [documentation](https://seqeralabs.github.io/RustQC/) for full usage details, configuration options, output file descriptions, and benchmark results.
8484

85+
## Use as a Rust library
86+
87+
The crate is also published as a library, so the QC analysis modules (GTF parsing, dupRadar, featureCounts, RSeQC, Qualimap, preseq, samtools-style outputs) can be embedded into other Rust programs:
88+
89+
```toml
90+
[dependencies]
91+
rustqc = "0.2"
92+
```
93+
94+
See the [library guide](https://seqeralabs.github.io/RustQC/usage/library/) and the full API reference on [docs.rs/rustqc](https://docs.rs/rustqc).
95+
8596
## AI & Provenance
8697

8798
RustQC was developed with substantial assistance from AI coding agents (primarily [Claude](https://claude.ai/)), using the upstream tool source code as reference. Correctness is validated by comparing output against the original tools on real sequencing data, not by manual code review alone. See the [AI & Provenance](https://seqeralabs.github.io/RustQC/about/ai-statement/) documentation for full details, including known validation gaps.

docs/astro.config.mjs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ export default defineConfig({
5757
slug: "usage/configuration",
5858
},
5959
{ label: "Performance & Tuning", slug: "usage/performance" },
60+
{ label: "Rust Library", slug: "usage/library" },
6061
],
6162
},
6263
{
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
title: Rust Library
3+
description: Use RustQC as a Rust library crate, embedding its QC analysis modules in your own programs.
4+
---
5+
6+
import { Aside } from "@astrojs/starlight/components";
7+
8+
RustQC is published on [crates.io](https://crates.io/crates/rustqc) as both a
9+
binary and a library. The CLI (`rustqc rna ...`) is the primary interface, but
10+
the same analysis modules are also exposed as a library so they can be embedded
11+
into other Rust programs.
12+
13+
Full API reference: **[docs.rs/rustqc](https://docs.rs/rustqc)**.
14+
15+
## Adding RustQC as a dependency
16+
17+
```toml
18+
[dependencies]
19+
rustqc = "0.2.1" # Or whatever the latest release is
20+
```
21+
22+
`rust-htslib` is linked statically and a small C++ component (used by the preseq
23+
tool) is built from source, so a working C/C++ toolchain (`cc`, `c++`) is
24+
required when building. No runtime dependencies are added beyond what the binary
25+
already needs.
26+
27+
## What's in the library
28+
29+
The crate exposes these modules:
30+
31+
| Module | Contents |
32+
| ----------------------------- | --------------------------------------------------------------------------------------------------------- |
33+
| [`gtf`][docs-gtf] | GTF gene-annotation parsing. `Gene`, `Transcript`, `Exon`, `parse_gtf`. |
34+
| [`io`][docs-io] | Transparent gzip-aware reader, FNV-1a hashing, number formatters. |
35+
| [`config`][docs-config] | Configuration types mirroring the CLI's YAML config file. |
36+
| [`summary`][docs-summary] | Serializable types for the JSON run summary. |
37+
| [`cpu`][docs-cpu] | CPU feature detection and binary-target identification. |
38+
| [`rna`][docs-rna] | RNA-Seq analyses: `dupradar`, `featurecounts`, `qualimap`, `preseq`, `rseqc`. |
39+
40+
[`Strandedness`][docs-strandedness] lives at the crate root because it is used
41+
across most analysis modules.
42+
43+
[docs-gtf]: https://docs.rs/rustqc/latest/rustqc/gtf/
44+
[docs-io]: https://docs.rs/rustqc/latest/rustqc/io/
45+
[docs-config]: https://docs.rs/rustqc/latest/rustqc/config/
46+
[docs-summary]: https://docs.rs/rustqc/latest/rustqc/summary/
47+
[docs-cpu]: https://docs.rs/rustqc/latest/rustqc/cpu/
48+
[docs-rna]: https://docs.rs/rustqc/latest/rustqc/rna/
49+
[docs-strandedness]: https://docs.rs/rustqc/latest/rustqc/enum.Strandedness.html
50+
51+
## Quick examples
52+
53+
Parse a GTF file:
54+
55+
```rust
56+
use rustqc::gtf;
57+
58+
let genes = gtf::parse_gtf("genes.gtf", &[])?;
59+
println!("{} genes parsed", genes.len());
60+
for (gene_id, gene) in genes.iter().take(3) {
61+
println!("{gene_id}: {} transcripts", gene.transcripts.len());
62+
}
63+
# Ok::<(), anyhow::Error>(())
64+
```
65+
66+
Open a possibly-gzipped annotation or output file with one call:
67+
68+
```rust
69+
use std::io::BufRead;
70+
use rustqc::io::open_reader;
71+
72+
let reader = open_reader("counts.tsv.gz")?;
73+
for line in reader.lines() {
74+
println!("{}", line?);
75+
}
76+
# Ok::<(), anyhow::Error>(())
77+
```
78+
79+
Use the `Strandedness` enum (it derives `serde::Deserialize` and clap's
80+
`ValueEnum`, so it integrates with both YAML configs and CLI parsers):
81+
82+
```rust
83+
use rustqc::Strandedness;
84+
85+
let s = Strandedness::Reverse;
86+
assert_eq!(s.to_string(), "reverse");
87+
```
88+
89+
## Stability
90+
91+
The library is at `0.2.x` and the public surface is intentionally small. Expect
92+
breaking changes in minor releases until `1.0`. Module visibility may be
93+
narrowed in future versions if internal types are inadvertently exposed.
94+
95+
<Aside type="note">
96+
The full single-pass RNA-Seq pipeline (the `run_rna` orchestrator that the
97+
binary uses) is not yet exposed as a library entry point. For now, library
98+
consumers drive individual analyses themselves. Pipeline-level orchestration
99+
may be exposed in a future release — track [issue #72][issue-72].
100+
</Aside>
101+
102+
[issue-72]: https://github.com/seqeralabs/RustQC/issues/72

src/cli.rs

Lines changed: 3 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -10,34 +10,9 @@
1010
//!
1111
//! A GTF gene annotation file is required for all analyses.
1212
13-
use clap::{CommandFactory, Parser, Subcommand, ValueEnum};
14-
use serde::Deserialize;
13+
use clap::{CommandFactory, Parser, Subcommand};
1514

16-
/// Library strandedness protocol.
17-
///
18-
/// Determines how read strand is interpreted relative to the gene annotation
19-
/// strand during counting. Accepted CLI values: `unstranded`, `forward`, `reverse`.
20-
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default, ValueEnum, Deserialize)]
21-
#[serde(rename_all = "lowercase")]
22-
pub enum Strandedness {
23-
/// Count reads on either strand (library is not strand-specific).
24-
#[default]
25-
Unstranded,
26-
/// Forward stranded: read 1 maps to the transcript strand.
27-
Forward,
28-
/// Reverse stranded: read 2 maps to the transcript strand (e.g. dUTP).
29-
Reverse,
30-
}
31-
32-
impl std::fmt::Display for Strandedness {
33-
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
34-
match self {
35-
Strandedness::Unstranded => write!(f, "unstranded"),
36-
Strandedness::Forward => write!(f, "forward"),
37-
Strandedness::Reverse => write!(f, "reverse"),
38-
}
39-
}
40-
}
15+
use rustqc::Strandedness;
4116

4217
/// Fast quality control tools for sequencing data, written in Rust.
4318
#[derive(Parser, Debug)]
@@ -407,7 +382,7 @@ pub fn parse_args() -> Cli {
407382
env!("CARGO_PKG_VERSION"),
408383
env!("GIT_SHORT_HASH"),
409384
env!("BUILD_TIMESTAMP"),
410-
crate::cpu::cpu_info_line(),
385+
rustqc::cpu::cpu_info_line(),
411386
)
412387
.into_boxed_str(),
413388
);

src/config.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
//! like chromosome name mappings between alignment file and GTF references,
55
//! per-tool output configuration, and tool enable/disable toggles.
66
7-
use crate::cli::Strandedness;
7+
use crate::Strandedness;
88
use anyhow::{Context, Result};
99
use serde::Deserialize;
1010
use serde_yaml_ng::Value;
@@ -1213,7 +1213,7 @@ preseq:
12131213
deep_merge(&mut base, overlay);
12141214
let m = base.as_mapping().unwrap();
12151215
let items = m
1216-
.get(&Value::String("items".into()))
1216+
.get(Value::String("items".into()))
12171217
.unwrap()
12181218
.as_sequence()
12191219
.unwrap();
@@ -1268,7 +1268,7 @@ preseq:
12681268

12691269
let paths = collect_config_paths(Some("/tmp/nonexistent.yml"));
12701270
// The -c flag should always be last
1271-
assert!(paths.last().unwrap().0 == PathBuf::from("/tmp/nonexistent.yml"));
1271+
assert!(paths.last().unwrap().0 == Path::new("/tmp/nonexistent.yml"));
12721272
assert_eq!(paths.last().unwrap().1, "-c flag");
12731273

12741274
// Restore

src/io.rs

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ use flate2::read::GzDecoder;
99
use std::fs::File;
1010
use std::io::{BufRead, BufReader, Read, Seek};
1111
use std::path::Path;
12+
use std::time::Duration;
1213

1314
/// Gzip magic bytes: the first two bytes of any gzip-compressed file.
1415
const GZIP_MAGIC: [u8; 2] = [0x1f, 0x8b];
@@ -101,6 +102,56 @@ pub fn format_with_commas(n: u64) -> String {
101102
result
102103
}
103104

105+
/// Format a count with SI suffixes (e.g. "1.5K", "48.2M", "2.3G").
106+
///
107+
/// Used for compact human-readable counts in progress messages and summaries.
108+
pub fn format_count(n: u64) -> String {
109+
use number_prefix::NumberPrefix;
110+
match NumberPrefix::decimal(n as f64) {
111+
NumberPrefix::Standalone(n) => format!("{n}"),
112+
NumberPrefix::Prefixed(prefix, n) => {
113+
// Map SI prefixes to short single-char suffixes
114+
let suffix = match prefix {
115+
number_prefix::Prefix::Kilo => "K",
116+
number_prefix::Prefix::Mega => "M",
117+
number_prefix::Prefix::Giga => "G",
118+
number_prefix::Prefix::Tera => "T",
119+
_ => return format!("{:.1}{prefix:?}", n),
120+
};
121+
format!("{n:.1}{suffix}")
122+
}
123+
}
124+
}
125+
126+
/// Format a percentage string (e.g. "(83.3%)").
127+
pub fn format_pct(n: u64, total: u64) -> String {
128+
if total == 0 {
129+
return "(0.0%)".to_string();
130+
}
131+
format!("({:.1}%)", n as f64 / total as f64 * 100.0)
132+
}
133+
134+
/// Format a duration as human-friendly mm:ss or h:mm:ss.
135+
///
136+
/// - Under 60s: `"45.2s"`
137+
/// - Under 1h: `"1:23"`
138+
/// - Over 1h: `"1:02:34"`
139+
pub fn format_duration(d: Duration) -> String {
140+
let total_secs = d.as_secs_f64();
141+
if total_secs < 60.0 {
142+
return format!("{total_secs:.1}s");
143+
}
144+
let total_secs = d.as_secs();
145+
let hours = total_secs / 3600;
146+
let minutes = (total_secs % 3600) / 60;
147+
let seconds = total_secs % 60;
148+
if hours > 0 {
149+
format!("{hours}:{minutes:02}:{seconds:02}")
150+
} else {
151+
format!("{minutes}:{seconds:02}")
152+
}
153+
}
154+
104155
// ============================================================
105156
// Numeric helpers
106157
// ============================================================
@@ -181,6 +232,60 @@ mod tests {
181232
assert_eq!(format_with_commas(1234567), "1,234,567");
182233
}
183234

235+
#[test]
236+
fn test_format_count_small() {
237+
assert_eq!(format_count(0), "0");
238+
assert_eq!(format_count(42), "42");
239+
assert_eq!(format_count(999), "999");
240+
}
241+
242+
#[test]
243+
fn test_format_count_thousands() {
244+
assert_eq!(format_count(1000), "1.0K");
245+
assert_eq!(format_count(1500), "1.5K");
246+
assert_eq!(format_count(50000), "50.0K");
247+
}
248+
249+
#[test]
250+
fn test_format_count_millions() {
251+
assert_eq!(format_count(1_000_000), "1.0M");
252+
assert_eq!(format_count(48_200_000), "48.2M");
253+
assert_eq!(format_count(50_000_000), "50.0M");
254+
}
255+
256+
#[test]
257+
fn test_format_count_billions() {
258+
assert_eq!(format_count(1_000_000_000), "1.0G");
259+
assert_eq!(format_count(5_000_000_000), "5.0G");
260+
}
261+
262+
#[test]
263+
fn test_format_pct() {
264+
assert_eq!(format_pct(833, 1000), "(83.3%)");
265+
assert_eq!(format_pct(0, 0), "(0.0%)");
266+
assert_eq!(format_pct(1000, 1000), "(100.0%)");
267+
}
268+
269+
#[test]
270+
fn test_format_duration_seconds() {
271+
assert_eq!(format_duration(Duration::from_secs_f64(0.5)), "0.5s");
272+
assert_eq!(format_duration(Duration::from_secs_f64(45.2)), "45.2s");
273+
assert_eq!(format_duration(Duration::from_secs_f64(59.9)), "59.9s");
274+
}
275+
276+
#[test]
277+
fn test_format_duration_minutes() {
278+
assert_eq!(format_duration(Duration::from_secs(60)), "1:00");
279+
assert_eq!(format_duration(Duration::from_secs(83)), "1:23");
280+
assert_eq!(format_duration(Duration::from_secs(3599)), "59:59");
281+
}
282+
283+
#[test]
284+
fn test_format_duration_hours() {
285+
assert_eq!(format_duration(Duration::from_secs(3600)), "1:00:00");
286+
assert_eq!(format_duration(Duration::from_secs(3754)), "1:02:34");
287+
}
288+
184289
#[test]
185290
fn test_open_reader_plain() {
186291
let content = "line1\nline2\nline3\n";

0 commit comments

Comments
 (0)