DavZim · DavZim · Apr 10, 2026 · Mar 27, 2026 · Mar 27, 2026 · Mar 27, 2026
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -10,3 +10,7 @@
 ^\.github$
 ^CRAN-SUBMISSION$
 ^cran-comments\.md$
+^.lintr$
+^yellow_tripdata_2018-01.parquet$
+^AGENTS.md$
+^duckdb_*.tar.gz$
diff --git a/.gitignore b/.gitignore
@@ -2,6 +2,7 @@
 .Rhistory
 .RData
 .Ruserdata
+.Renviron
 
 *.parquet
 nyc-taxi-data/*

diff --git a/.lintr b/.lintr
@@ -0,0 +1,6 @@
+linters: linters_with_defaults(
+    line_length_linter(100),
+    object_usage_linter = NULL
+  )
+exclusions: list(
+  )
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,19 @@
+# Dataverifyr
+
+An R Library to verify that data exists and is valid, see comprehensive README.md.
+
+All code must be thoroughly tested and follow modern R standards.
+Test first: Before writing any code write the expected tests!
+
+All material updates to the code base itself must be registered in the NEWS.md file.
+
+Never change .Rd files, instead change the roxygen2 documentation above the function (use `devtools::document()` to create the .Rd documentation).
+
+The package is build with efficiency in mind.
+This shows for example that most operations are pushed to the database layer if possible.
+Also, additional dependencies need to be explicity allowed before they are added!
+
+When invoking R, use `/opt/R/4.4.0/bin/R` or `/opt/R/4.4.0/bin/Rscript`.
+
+To check, use `rcmdcheck::rcmdcheck()`.
+
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: dataverifyr
 Type: Package
 Title: A Lightweight, Flexible, and Fast Data Validation Package that Can Handle All Sizes of Data
-Version: 0.1.9
+Version: 0.1.10
 Authors@R: c(
       person(given = "David",
              family = "Zimmermann-Kollenda",
@@ -35,4 +35,4 @@ Config/testthat/edition: 3
 Encoding: UTF-8
 LazyData: true
 Roxygen: list(markdown = TRUE)
-RoxygenNote: 7.2.3
+RoxygenNote: 7.3.3
diff --git a/NAMESPACE b/NAMESPACE
@@ -5,13 +5,16 @@ S3method("+",ruleset)
 S3method(print,rule)
 S3method(print,ruleset)
 export(check_data)
+export(data_column)
 export(describe)
 export(detect_backend)
 export(filter_fails)
 export(plot_res)
 export(read_rules)
+export(reference_rule)
 export(rule)
 export(ruleset)
+export(sample_data)
 export(write_rules)
 importFrom(graphics,axis)
 importFrom(graphics,barplot)

diff --git a/NEWS.md b/NEWS.md
@@ -1,3 +1,16 @@
+# dataverifyr 0.1.10
+
+* Add `describe()` to describe a dataset
+* `check_data()` now includes schema checks in the output by default (`check_type` as first result column), including explicit rows for column existence and declared type checks
+* add `stop_on_schema_fail` to `check_data()` to optionally stop when schema checks fail
+* update `filter_fails()` to ignore schema/reference rows and only process row rules from `check_data()` results
+* Add explicit regression test for `detect_backend()` fallback to `dplyr` when input is a `data.frame` and `data.table` is unavailable
+* Add structured ruleset internals for schema metadata (`data_column()`, `rule_meta()`) and reference checks (`reference_rule()`)
+* Extend `ruleset()`, `check_data()`, `read_rules()`, and `write_rules()` for v1 schema-aware workflows; keep `rule()` as row-level API (no `col_rule()`)
+* Add exported `sample_data` dataset (mixed types, NAs, datetime) for examples and tests
+* export `reference_rule()` and extend examples in `ruleset()`, `check_data()`, `reference_rule()`, and `data_column()` to show combined schema + relational workflows
+* Require DuckDB version `>= 1.5.1.9002` in all DuckDB-backed tests via `skip_if_not_installed("duckdb", "1.5.1.9002")`
+
 # dataverifyr 0.1.9
 
 * fix tests for new duckdb version (fixes #17, thanks @krlmlr for reporting)