Skip to content

Add DuckDB functionality: Changes (99739ac) from SAE environment#35

Open
stephhazlitt wants to merge 1 commit into
masterfrom
develop
Open

Add DuckDB functionality: Changes (99739ac) from SAE environment#35
stephhazlitt wants to merge 1 commit into
masterfrom
develop

Conversation

@stephhazlitt
Copy link
Copy Markdown
Member

Changes from @ateucher from the DIP SRE.

@BonnieJRobert BonnieJRobert self-requested a review July 24, 2025 21:56
Comment thread DESCRIPTION
person("Sam", "Albers", , "sam.albers@gov.bc.ca", role = c("aut", "cre")),
person("Andy", "Teucher", , "andy.teucher@gov.bc.ca", role = "aut",
comment = c(ORCID = "0000-0002-7840-692X")),
person("Bonnie", "Robert", , "bonnie.robbie@gov.bc.ca", role = "aut"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
person("Bonnie", "Robert", , "bonnie.robbie@gov.bc.ca", role = "aut"),
person("Bonnie", "Ashcroft", , "bonnie.ashcroft@gov.bc.ca", role = "aut"),

Comment thread R/csv-to-duckdb.R
#' `list(thousands = ",", comment = "#", force_not_null = c("a", "b"), null_padding = TRUE, buffer_size = 5)`
#'
#' @return a character string of the query, to be called with [DBI::dbExecute()]
#' @export
Copy link
Copy Markdown
Member

@BonnieJRobert BonnieJRobert Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an example here would be helpful - using one of the built-in datasets, perhaps?

Copy link
Copy Markdown
Member

@BonnieJRobert BonnieJRobert Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @export
#' @export
#' @examples
#' \dontrun{
#' con <- DBI::dbConnect(duckdb::duckdb())
#' p <- "./inst/extdata/starwars-dict.csv"
#' spec <- dput(sniff_csv_duckdb(p)$schema)
#' csv_to_duckdb_query(con, p, "starwars", spec)
#' }

Comment thread R/csv-to-duckdb.R
nullstr = NULL,
opts = list()
) {

Copy link
Copy Markdown
Member

@BonnieJRobert BonnieJRobert Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not necessary given the users and use cases of dipr, but I wonder about guarding against SQL injection. In this case via the tbl_name parameter:

if (grepl(";", tbl_name) stop(glue::glue("possible SQL injection introduced via tablename: {tbl_name}"))

Comment thread R/csv-to-duckdb.R
Comment on lines +74 to +77
old_column_names <- names(schema)

names(new_column_names) <- old_column_names
names(old_column_names) <- old_column_names
Copy link
Copy Markdown
Member

@BonnieJRobert BonnieJRobert Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is confusing; code comments would be helpful.

Comment thread R/csv-to-duckdb.R

check_params(con, opts)

varchar_scalar_opts <- Filter(\(x) is.character(x) & length(x) == 1, opts)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to extract the code that constructs the various *_opts_parsed variables into individual functions. Consider refactoring into smaller, focused functions like:
format_varchar_scalar_opts <- function(opts) { ... }
format_varchar_array_opts <- function(opts) { ... } etc.

Comment thread R/csv-to-duckdb.R
)
}

make_select_statement <- function(schema, col_rename = NULL, dt_override = NULL) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the final SQL CREATE or REPLACE TABLE statement is executed via DBI::dbExecute(), column names with spaces (like First Name AS first_name) will cause SQL syntax errors. The dipr functions are behaving as expected, but it could be a "nice to have" to warn against this. Here's an example:

> print(strSQL)
CREATE OR REPLACE TABLE starwars AS SELECT Name Abbr AS name_abbr, Name AS name FROM read_csv( './inst/extdata/starwars-dict.csv', types = { 'Name Abbr': 'VARCHAR', 'Name': 'VARCHAR' })
> DBI::dbExecute(con,strSQL)

This will cause dbSendQuery() to error.

@BonnieJRobert
Copy link
Copy Markdown
Member

I focused my review on the code in the csv-duckdb.R file and ran a few toy examples. The code runs as expected with no serious errors that I could reproduce. However I highlighted a couple areas of concern. Additionally, the master branch should be renamed to main and there is a stale branch from several years back with a few updates to the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants