Skip to content
Merged

V1.0 #18

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,7 @@
^\.github$
^CRAN-SUBMISSION$
^cran-comments\.md$
^.lintr$
^yellow_tripdata_2018-01.parquet$
^AGENTS.md$
^duckdb_*.tar.gz$
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
.Rhistory
.RData
.Ruserdata
.Renviron

*.parquet
nyc-taxi-data/*
Expand Down
6 changes: 6 additions & 0 deletions .lintr
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
linters: linters_with_defaults(
line_length_linter(100),
object_usage_linter = NULL
)
exclusions: list(
)
19 changes: 19 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Dataverifyr

An R Library to verify that data exists and is valid, see comprehensive README.md.

All code must be thoroughly tested and follow modern R standards.
Test first: Before writing any code write the expected tests!

All material updates to the code base itself must be registered in the NEWS.md file.

Never change .Rd files, instead change the roxygen2 documentation above the function (use `devtools::document()` to create the .Rd documentation).

The package is build with efficiency in mind.
This shows for example that most operations are pushed to the database layer if possible.
Also, additional dependencies need to be explicity allowed before they are added!

When invoking R, use `/opt/R/4.4.0/bin/R` or `/opt/R/4.4.0/bin/Rscript`.

To check, use `rcmdcheck::rcmdcheck()`.

4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: dataverifyr
Type: Package
Title: A Lightweight, Flexible, and Fast Data Validation Package that Can Handle All Sizes of Data
Version: 0.1.9
Version: 0.1.10
Authors@R: c(
person(given = "David",
family = "Zimmermann-Kollenda",
Expand Down Expand Up @@ -35,4 +35,4 @@ Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
RoxygenNote: 7.3.3
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,16 @@ S3method("+",ruleset)
S3method(print,rule)
S3method(print,ruleset)
export(check_data)
export(data_column)
export(describe)
export(detect_backend)
export(filter_fails)
export(plot_res)
export(read_rules)
export(reference_rule)
export(rule)
export(ruleset)
export(sample_data)
export(write_rules)
importFrom(graphics,axis)
importFrom(graphics,barplot)
Expand Down
13 changes: 13 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
# dataverifyr 0.1.10

* Add `describe()` to describe a dataset
* `check_data()` now includes schema checks in the output by default (`check_type` as first result column), including explicit rows for column existence and declared type checks
* add `stop_on_schema_fail` to `check_data()` to optionally stop when schema checks fail
* update `filter_fails()` to ignore schema/reference rows and only process row rules from `check_data()` results
* Add explicit regression test for `detect_backend()` fallback to `dplyr` when input is a `data.frame` and `data.table` is unavailable
* Add structured ruleset internals for schema metadata (`data_column()`, `rule_meta()`) and reference checks (`reference_rule()`)
* Extend `ruleset()`, `check_data()`, `read_rules()`, and `write_rules()` for v1 schema-aware workflows; keep `rule()` as row-level API (no `col_rule()`)
* Add exported `sample_data` dataset (mixed types, NAs, datetime) for examples and tests
* export `reference_rule()` and extend examples in `ruleset()`, `check_data()`, `reference_rule()`, and `data_column()` to show combined schema + relational workflows
* Require DuckDB version `>= 1.5.1.9002` in all DuckDB-backed tests via `skip_if_not_installed("duckdb", "1.5.1.9002")`

# dataverifyr 0.1.9

* fix tests for new duckdb version (fixes #17, thanks @krlmlr for reporting)
Expand Down
Loading
Loading