Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
642e6de
chore(SQLParser): sync to upstream hyrise/sql-parser ccd3f68 #5370
aleks-f May 27, 2026
88755e8
chore(SQLParser): re-apply Poco local additions #5370
aleks-f May 27, 2026
1242599
chore(SQLParser): drop transitive sql/statements.h include in SQLPars…
aleks-f May 27, 2026
c2b1e62
chore(ci): run upstream hyrise/sql-parser tests on SQLParser changes …
aleks-f May 27, 2026
d2d9e58
feat(SQLParser): support postgresql $N and sqlite :name placeholders …
aleks-f May 27, 2026
aa65d55
feat(Data): port Utility class with SQL rendering and executeSQL #5371
aleks-f May 27, 2026
7678478
feat(SQLParser): record placeholder source column in Expr::ival2 #5371
aleks-f May 27, 2026
6ff024e
refactor(Data): use SQL parser for placeholder substitution in Utilit…
aleks-f May 27, 2026
103ac00
fix(Data): keep SQLParser.h out of Utility.h public surface #5371
aleks-f May 27, 2026
e3e6f46
feat(Data): add RenderingBinder for fidelity-perfect bound-SQL diagno…
aleks-f May 27, 2026
7177b12
refactor(Data): route RenderingBinder scalar capture through Utility:…
aleks-f May 27, 2026
436ed2f
feat(Data): add Utility::boundSQL(const Statement&) for diagnostic re…
aleks-f May 27, 2026
7cc3fc4
fix(SQLParser): plug ColumnConstraints leak + ASSERT_STRNEQ typo #5370
aleks-f May 27, 2026
ec57eb4
chore(Data): silence CodeQL on variadic boundSQL/executeSQL wrappers …
aleks-f May 27, 2026
5bb57ce
fix(SQLParser): plug parallel-make race between bison and flex #5370
aleks-f May 27, 2026
88e9f2e
chore(Data): silence CodeQL on boundSQLBulk pack and pos #5371
aleks-f May 27, 2026
21fe2fe
docs(Data): describe RenderingBinder::reset() as the intentional no-o…
aleks-f May 27, 2026
0cc4888
fix(SQLParser): wrap microtest assertion macros in do-while(false) #5370
aleks-f May 27, 2026
ca9bda4
perf(Data): parse template SQL once per render in boundSQLImpl #5371
aleks-f May 27, 2026
941c88a
fix(SQLite): UB in update-hook callback type on LP64 platforms #5374
aleks-f May 28, 2026
dc9ef62
fix(SQLite): lock-free reads in SQLiteStatementImpl #5375
aleks-f May 28, 2026
b08790b
fix(SQLite): race in SQLiteStatementImpl::compileImpl #5376
aleks-f May 28, 2026
87e8bb7
chore(build): make build script
aleks-f May 28, 2026
671990a
feat(Data/SQLite): add MemoryDB sharded in-memory SQLite with persist…
aleks-f May 28, 2026
b0efe72
fix(Data/SQLite): wire SQLParser include path into cmake build of Mem…
aleks-f May 28, 2026
26da86f
fix(SQLParser): sval leaks: free per iteration, include NAMED_PARAM #…
aleks-f May 28, 2026
dab6e88
fix(Data): null char pointer in renderValue is UB on string_view ctor…
aleks-f May 28, 2026
364232e
chore(SQLiteThreadSafetyTest): unregister sqlite
aleks-f May 28, 2026
e391328
fix(Data/SQLite): point testsuite sqlite3.h include at bundled source…
aleks-f May 28, 2026
a712536
fix(Data/SQLite): wrap sqlite3_limit so testrunner links on Windows s…
aleks-f May 28, 2026
827466c
fix(teststuite): timing errors #5373
aleks-f May 29, 2026
cb33288
fix(Data/SQLite): MemoryDB threading and durability fixes #5373
aleks-f May 29, 2026
fd5cda4
fix(Data/SQLite): stop leaking sqlite3.h from public headers #5377
aleks-f May 29, 2026
01f7ec2
fix(Data/SQLParser): silence bison/flex compile warnings #5370
aleks-f May 29, 2026
bc5ced8
chore: make files additions for build script
aleks-f May 29, 2026
13ebc04
enh(Data): SQLParser test coverage and integration assertions #5370
aleks-f May 29, 2026
e23f9a1
fix(Data/SQLParser): restore Windows unistd guard in flex_lexer.h #5377
aleks-f May 29, 2026
068e2ac
fix(Data/SQLParser): restore Windows unistd guard in flex_lexer.cpp t…
aleks-f May 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/path-filters.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,6 @@ data:
- 'Data/**'
- 'Redis/**'
- 'MongoDB/**'

sqlparser:
- 'Data/SQLParser/**'
21 changes: 21 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ jobs:
foundation_util: ${{ steps.filter.outputs.foundation_util }}
arm_cross: ${{ steps.filter.outputs.arm_cross }}
data: ${{ steps.filter.outputs.data }}
sqlparser: ${{ steps.filter.outputs.sqlparser }}
steps:
- uses: actions/checkout@v5
- uses: dorny/paths-filter@v4
Expand Down Expand Up @@ -1238,6 +1239,26 @@ jobs:
retry_on: any
command: nix-shell Data/ODBC/oracle.nix --pure --run "build_and_test"

# Standalone hyrise/sql-parser upstream tests: builds the vendored
# SQLParser sources with its own Makefile and runs the upstream test
# suite (queries-good.sql / queries-bad.sql), a valgrind leak check,
# and the bison grammar-conflict check. This is the only job that
# exercises the grammar-level features (FOREIGN KEY, NULLS FIRST/LAST,
# CSV import options, schema-qualified functions, ...). bison and flex
# are pre-installed on ubuntu-24.04 (3.8.2 and 2.6.4); valgrind is too.
linux-sqlparser-upstream-tests:
runs-on: ubuntu-24.04
needs: changes
if: |
needs.changes.outputs.sqlparser == 'true' ||
needs.changes.outputs.ci_core == 'true' ||
github.event_name == 'push' || github.event_name == 'workflow_dispatch'
steps:
- uses: actions/checkout@v5
- run: sudo apt -y update && sudo apt -y install bison flex valgrind
- run: cd Data/SQLParser && make -j$(nproc)
- run: cd Data/SQLParser && make test

linux-gcc-cmake-sqlite-no-sqlparser:
runs-on: ubuntu-24.04
needs: changes
Expand Down
2 changes: 1 addition & 1 deletion Data/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ objects = AbstractBinder AbstractBinding AbstractExtraction AbstractExtractor \
Range RecordSet Row RowFilter RowFormatter RowIterator \
SimpleRowFormatter Session SessionFactory SessionImpl \
SessionPool SessionPoolContainer SQLChannel \
Statement StatementCreator StatementImpl Time Transcoder
Statement StatementCreator StatementImpl Time Transcoder Utility RenderingBinder

ifndef POCO_DATA_NO_SQL_PARSER
objects += SQLParser SQLParserResult \
Expand Down
13 changes: 12 additions & 1 deletion Data/SQLParser/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,12 @@ relaxed_build ?= "off"
ifeq ($(relaxed_build), on)
$(warning $(NAME) will be built with most compiler warnings deactivated. This is fine if you want to test $(NAME) but will become an issue when you want to contribute code.)
else
LIB_CLFAGS += -Wall -Werror
LIB_CFLAGS += -Wall -Werror
# Clang: ensure files end with newline. Missing final newlines here triggered
# -Wnewline-eof warnings in downstream projects (e.g., Hyrise).
ifneq (,$(findstring clang,$(shell $(CXX) --version 2>/dev/null)))
LIB_CFLAGS += -Wnewline-eof
endif
endif

static ?= no
Expand All @@ -66,8 +71,14 @@ library: $(LIB_BUILD)
$(LIB_BUILD): $(LIB_OBJ)
$(LIBLINKER) $(LIB_LFLAGS) $(LIB_BUILD) $(LIB_OBJ)

# The auto-generated code from bison and flex contains some parts the compiler complains about with -Wall.
# bison_parser.cpp #includes flex_lexer.h, so bison_parser.o must wait for flex_lexer.cpp regeneration
# to finish (which also produces flex_lexer.h). Without this dep, parallel make races and bison_parser.o
# can start before hsql_lex is declared, producing 'hsql_lex was not declared' errors.
$(SRCPARSER)/flex_lexer.o: $(SRCPARSER)/flex_lexer.cpp $(SRCPARSER)/bison_parser.cpp
$(CXX) $(LIB_CFLAGS) -c -o $@ $< -Wno-sign-compare -Wno-unneeded-internal-declaration -Wno-register
$(SRCPARSER)/bison_parser.o: $(SRCPARSER)/bison_parser.cpp $(SRCPARSER)/flex_lexer.cpp
$(CXX) $(LIB_CFLAGS) -c -o $@ $< -Wno-unused-but-set-variable

%.o: %.cpp $(PARSER_CPP) $(LIB_H)
$(CXX) $(LIB_CFLAGS) -c -o $@ $<
Expand Down
4 changes: 2 additions & 2 deletions Data/SQLParser/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
C++ SQL Parser
=========================
[![Build Status](https://github.com/hyrise/sql-parser/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/hyrise/sql-parser/actions?query=branch%3Amaster)
[![Build Status](https://github.com/hyrise/sql-parser/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/hyrise/sql-parser/actions?query=branch%3Amain)


This is a SQL Parser for C++. It parses the given SQL query into C++ objects.
Expand All @@ -20,7 +20,7 @@ To use the SQL parser in your own projects you simply have to follow these few s
3. *(Optional, Recommended)* Run `make install` to copy the library to `/usr/local/lib/`
4. Run the tests `make test` to make sure everything worked
5. Include the `SQLParser.h` from `src/` (or from `/usr/local/lib/hsql/` if you installed it) and link the library in your project
6. Take a look at the [example project here](https://github.com/hyrise/sql-parser/tree/master/example)
6. Take a look at the [example project here](https://github.com/hyrise/sql-parser/tree/main/example)

```cpp
#include "hsql/SQLParser.h"
Expand Down
18 changes: 9 additions & 9 deletions Data/SQLParser/benchmark/benchmark.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,20 @@ int main(int argc, char** argv) {
// Create parse and tokenize benchmarks for TPC-H queries.
const auto tpch_queries = getTPCHQueries();
for (const auto& query : tpch_queries) {
std::string p_name = query.first + "-parse";
benchmark::RegisterBenchmark(p_name.c_str(), &BM_ParseBenchmark, query.second);
std::string t_name = query.first + "-tokenize";
benchmark::RegisterBenchmark(t_name.c_str(), &BM_TokenizeBenchmark, query.second);
std::string p_name = query.first + "-parse";
benchmark::RegisterBenchmark(p_name.c_str(), &BM_ParseBenchmark, query.second);
std::string t_name = query.first + "-tokenize";
benchmark::RegisterBenchmark(t_name.c_str(), &BM_TokenizeBenchmark, query.second);
}

// Create parse and tokenize benchmarks for all queries in sql_queries array.
for (unsigned i = 0; i < sql_queries.size(); ++i) {
const auto& query = sql_queries[i];
std::string p_name = getQueryName(i) + "-parse";
benchmark::RegisterBenchmark(p_name.c_str(), &BM_ParseBenchmark, query.second);
const auto& query = sql_queries[i];
std::string p_name = getQueryName(i) + "-parse";
benchmark::RegisterBenchmark(p_name.c_str(), &BM_ParseBenchmark, query.second);

std::string t_name = getQueryName(i) + "-tokenize";
benchmark::RegisterBenchmark(t_name.c_str(), &BM_TokenizeBenchmark, query.second);
std::string t_name = getQueryName(i) + "-tokenize";
benchmark::RegisterBenchmark(t_name.c_str(), &BM_TokenizeBenchmark, query.second);
}

benchmark::Initialize(&argc, argv);
Expand Down
20 changes: 10 additions & 10 deletions Data/SQLParser/benchmark/benchmark_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ void BM_TokenizeBenchmark(benchmark::State& st, const std::string& query) {
st.counters["num_chars"] = query.size();

while (st.KeepRunning()) {
std::vector<int16_t> tokens(512);
hsql::SQLParser::tokenize(query, &tokens);
std::vector<int16_t> tokens(512);
hsql::SQLParser::tokenize(query, &tokens);
}
}

Expand All @@ -26,19 +26,19 @@ void BM_ParseBenchmark(benchmark::State& st, const std::string& query) {
st.counters["num_chars"] = query.size();

while (st.KeepRunning()) {
hsql::SQLParserResult result;
hsql::SQLParser::parse(query, &result);
if (!result.isValid()) {
std::cout << query << std::endl;
std::cout << result.errorMsg() << std::endl;
st.SkipWithError("Parsing failed!");
}
hsql::SQLParserResult result;
hsql::SQLParser::parse(query, &result);
if (!result.isValid()) {
std::cout << query << std::endl;
std::cout << result.errorMsg() << std::endl;
st.SkipWithError("Parsing failed!");
}
}
}

std::string readFileContents(const std::string& file_path) {
std::ifstream t(file_path.c_str());
std::string text((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
std::istreambuf_iterator<char>());
return text;
}
4 changes: 2 additions & 2 deletions Data/SQLParser/benchmark/benchmark_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ std::string readFileContents(const std::string& file_path);

#define PARSE_QUERY_BENCHMARK(name, query)\
static void name(benchmark::State& st) {\
BM_ParseBenchmark(st, query);\
BM_ParseBenchmark(st, query);\
}\
BENCHMARK(name);

#define TOKENIZE_QUERY_BENCHMARK(name, query)\
static void name(benchmark::State& st) {\
BM_TokenizeBenchmark(st, query);\
BM_TokenizeBenchmark(st, query);\
}\
BENCHMARK(name);

Expand Down
42 changes: 21 additions & 21 deletions Data/SQLParser/benchmark/parser_benchmark.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,14 @@ static void BM_CharacterCount(benchmark::State& st) {
st.counters["num_tokens"] = getNumTokens(query);
st.counters["num_chars"] = query.size();
while (st.KeepRunning()) {
hsql::SQLParserResult result;
hsql::SQLParser::parse(query, &result);
hsql::SQLParserResult result;
hsql::SQLParser::parse(query, &result);
}
}
BENCHMARK(BM_CharacterCount)
->RangeMultiplier(1 << 2)
->Ranges({{1 << 5, 1 << 15},
{5, 5}});
{5, 5}});

// Benchmark the influence of increasing number of tokens, while
// the number of characters remains unchanged.
Expand All @@ -46,42 +46,42 @@ static void BM_ConditionalTokens(benchmark::State& st) {
std::stringstream condStream;
size_t missingTokens = numTokens - 4;
if (missingTokens > 0) {
condStream << " WHERE a";
missingTokens -= 2;
condStream << " WHERE a";
missingTokens -= 2;

while (missingTokens > 0) {
condStream << " AND a";
missingTokens -= 2;
}
while (missingTokens > 0) {
condStream << " AND a";
missingTokens -= 2;
}
}

query += condStream.str();

if (targetSize >= query.size()) {
const size_t pad = targetSize - query.size();
const std::string filler = std::string(pad, 'a');
query.replace(7, 1, filler);
const size_t pad = targetSize - query.size();
const std::string filler = std::string(pad, 'a');
query.replace(7, 1, filler);

} else {
// Query can't be the same length as in the other benchmarks.
// Running this will result in unusable data.
fprintf(stderr, "Too many tokens. Query too long for benchmark char limit (%lu > %lu).\n",
query.size(), targetSize);
return;
// Query can't be the same length as in the other benchmarks.
// Running this will result in unusable data.
fprintf(stderr, "Too many tokens. Query too long for benchmark char limit (%lu > %lu).\n",
query.size(), targetSize);
return;
}

st.counters["num_tokens"] = getNumTokens(query);
st.counters["num_chars"] = query.size();
while (st.KeepRunning()) {
hsql::SQLParserResult result;
hsql::SQLParser::parse(query, &result);
if (!result.isValid()) st.SkipWithError("Parsing failed!");
hsql::SQLParserResult result;
hsql::SQLParser::parse(query, &result);
if (!result.isValid()) st.SkipWithError("Parsing failed!");
}
}
BENCHMARK(BM_ConditionalTokens)
->RangeMultiplier(1 << 2)
->Ranges({{1 << 14, 1 << 14},
{1 << 2, 1 << 11}});
{1 << 2, 1 << 11}});



22 changes: 11 additions & 11 deletions Data/SQLParser/benchmark/queries.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ namespace filesystem = std::filesystem;

std::string getQueryName(unsigned i) {
if (sql_queries[i].first.empty()) {
std::string name = "#" + std::to_string(i + 1);
return name;
std::string name = "#" + std::to_string(i + 1);
return name;
}
return std::string("") + sql_queries[i].first;
}
Expand All @@ -22,22 +22,22 @@ std::vector<SQLQuery> getQueriesFromDirectory(const std::string& dir_path) {
std::vector<std::string> files;

for (auto& entry : filesystem::directory_iterator(dir_path)) {
if (filesystem::is_regular_file(entry)) {
std::string path_str = filesystem::path(entry);
if (filesystem::is_regular_file(entry)) {
std::string path_str = filesystem::path(entry);

if (std::regex_search(path_str, query_file_regex)) {
files.push_back(path_str);
}
}
if (std::regex_search(path_str, query_file_regex)) {
files.push_back(path_str);
}
}
}

std::sort(files.begin(), files.end());

std::vector<SQLQuery> queries;
for (const std::string& file_path : files) {
const filesystem::path p(file_path);
const std::string query = readFileContents(file_path);
queries.emplace_back(p.filename(), query);
const filesystem::path p(file_path);
const std::string query = readFileContents(file_path);
queries.emplace_back(p.filename(), query);
}
return queries;
}
Expand Down
35 changes: 20 additions & 15 deletions Data/SQLParser/src/SQLParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ bool SQLParser::parse(const std::string& sql, SQLParserResult* result) {
YY_BUFFER_STATE state;

if (hsql_lex_init(&scanner)) {
// Couldn't initialize the lexer.
fprintf(stderr, "SQLParser: Error when initializing lexer!\n");
return false;
// Couldn't initialize the lexer.
fprintf(stderr, "SQLParser: Error when initializing lexer!\n");
return false;
}
const char* text = sql.c_str();
state = hsql__scan_string(text, scanner);
Expand All @@ -44,8 +44,8 @@ bool SQLParser::tokenize(const std::string& sql, std::vector<int16_t>* tokens) {
// Initialize the scanner.
yyscan_t scanner;
if (hsql_lex_init(&scanner)) {
fprintf(stderr, "SQLParser: Error when initializing lexer!\n");
return false;
fprintf(stderr, "SQLParser: Error when initializing lexer!\n");
return false;
}

YY_BUFFER_STATE state;
Expand All @@ -54,16 +54,21 @@ bool SQLParser::tokenize(const std::string& sql, std::vector<int16_t>* tokens) {
YYSTYPE yylval;
YYLTYPE yylloc;

// Step through the string until EOF is read.
// Note: hsql_lex returns int, but we know that its range is within 16 bit.
int16_t token = hsql_lex(&yylval, &yylloc, scanner);
while (token != 0) {
tokens->push_back(token);
token = hsql_lex(&yylval, &yylloc, scanner);

if (token == SQL_IDENTIFIER || token == SQL_STRING) {
free(yylval.sval);
}
// Step through the string until EOF is read. Each lex pass that returns
// an sval-bearing token (SQL_IDENTIFIER, SQL_STRING, SQL_NAMED_PARAM)
// allocates yylval.sval via strdup / hsql::substr; free it before the
// next lex call overwrites yylval. The previous loop shape lexed twice
// before freeing, which leaked the first sval-bearing token and missed
// SQL_NAMED_PARAM entirely. The set mirrors bison's `%destructor
// { free($$); } <sval>` for the same tokens.
// Note: hsql_lex returns int, but we know its range is within 16 bit.
while (true) {
int16_t token = hsql_lex(&yylval, &yylloc, scanner);
if (token == 0) break;
tokens->push_back(token);
if (token == SQL_IDENTIFIER || token == SQL_STRING || token == SQL_NAMED_PARAM) {
free(yylval.sval);
}
}

hsql__delete_buffer(state, scanner);
Expand Down
Loading
Loading