Skip to content

Fix native query escaping#59

Closed
StpMax wants to merge 2 commits into
mainfrom
fix-raw-query-escape
Closed

Fix native query escaping#59
StpMax wants to merge 2 commits into
mainfrom
fix-raw-query-escape

Conversation

@StpMax

@StpMax StpMax commented Aug 4, 2025

Copy link
Copy Markdown
Contributor

There is an issue with escaping native queries. For example:

select * from my_postgres (
 select 'a''b'
)

in this case query select 'a'b' will be executed in the db.

This PR keep original text for native queries

Fix CONN-1354

@github-actions

github-actions Bot commented Aug 4, 2025

Copy link
Copy Markdown

Coverage

Coverage Report
FileStmtsMissCoverMissing
mindsdb_sql_parser
   __about__.py10100%1–10
   __init__.py1192282%44, 48, 53, 98, 115, 139–158, 165–166
   lexer.py2842193%372, 374, 376, 388, 390, 392, 398–416
   logger.py19479%14, 17, 23, 26
   parser.py11073297%129, 133, 288, 313, 419, 605, 622, 646–647, 868, 922, 999, 1100, 1153, 1163, 1202–1203, 1232, 1243, 1326, 1402, 1441, 1477, 1670–1671, 1846–1847, 2023, 2031, 2084–2087
   utils.py49492%73–79
mindsdb_sql_parser/ast
   base.py36586%13, 28, 31, 46, 51
   create.py801285%23–31, 92–97
   drop.py52296%10, 13
   insert.py63494%39–41, 46
   show.py48198%18
   update.py53591%40–42, 75–76
mindsdb_sql_parser/ast/mindsdb
   knowledge_base.py97199%80
mindsdb_sql_parser/ast/select
   case.py37197%22
   constant.py36197%23
   data.py11464%10–12, 15, 19
   identifier.py831187%56, 104–112, 122
   native_query.py13192%25
   operation.py139497%57, 66, 178, 202
   parameter.py15287%17, 20
   select.py109397%160–165
   star.py12283%8–9
TOTAL339915296% 

Tests Skipped Failures Errors Time
303 0 💤 0 ❌ 0 🔥 13.270s ⏱️

@entelligence-ai-pr-reviews

Copy link
Copy Markdown

Review Summary

🏷️ Draft Comments (4)

Skipped posting 4 draft comments that were valid but scored below your review threshold (>13/15). Feel free to update them here.

mindsdb_sql_parser/lexer.py (1)

335-345: t.raw_value is set for QUOTE_STRING and DQUOTE_STRING tokens, but not for ID, FLOAT, or INTEGER, causing inconsistent token attributes and potential runtime errors if downstream code expects raw_value on all tokens.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 5/5
  • Urgency Impact: 3/5
  • Total Score: 11/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb_sql_parser/lexer.py, lines 335-345, the token methods for ID, FLOAT, and INTEGER do not set t.raw_value, while QUOTE_STRING and DQUOTE_STRING do. This inconsistency can cause runtime errors if downstream code expects t.raw_value on all tokens. Please update the ID, FLOAT, and INTEGER methods to set t.raw_value = t.value before returning the token, matching the behavior of the string token methods.

mindsdb_sql_parser/utils.py (2)

65-66: tokens_to_string will raise an exception if tokens is empty, as it accesses tokens[0] without checking; this causes a crash for empty input.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 5/5
  • Urgency Impact: 3/5
  • Total Score: 12/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb_sql_parser/utils.py, lines 65-66, the function `tokens_to_string` assumes `tokens` is non-empty and accesses `tokens[0]` without checking. This will cause an exception if an empty list is passed. Add a check at the start of the function to return an empty string if `tokens` is empty.

70-95: tokens_to_string reconstructs lines by concatenating strings in a loop, which is inefficient for large token lists due to repeated string allocations (O(n^2) time complexity for long lines).

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 5/5
  • Urgency Impact: 2/5
  • Total Score: 9/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb_sql_parser/utils.py, lines 70-95, the function `tokens_to_string` reconstructs lines by concatenating strings in a loop, which is inefficient for large token lists due to repeated string allocations (O(n^2) time complexity for long lines). Refactor this section to use a list to accumulate lines and join them at the end, minimizing string concatenation overhead. Preserve all logic and formatting.

sly/lex.py (1)

416-444: Lexer.tokenize creates a new Token object for every token, even for ignored tokens and literals, causing unnecessary object allocations and memory churn when processing large texts.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 9/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In sly/lex.py, lines 416-444, the `Lexer.tokenize` method creates a new `Token` object for every match, including ignored tokens and literals, which leads to unnecessary object allocations and memory churn when processing large texts. Refactor the code so that `Token` objects are only created for tokens that are actually yielded, skipping object creation for ignored tokens and literals. Ensure the logic and indentation remain consistent with the original code.

@StpMax StpMax requested a review from ea-rus August 4, 2025 13:26
@ea-rus ea-rus mentioned this pull request Aug 4, 2025
@StpMax

StpMax commented Aug 5, 2025

Copy link
Copy Markdown
Contributor Author

chosen alternative fix-raw-query-escape2

@StpMax StpMax closed this Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant