Skip to content

Remove unused tests#12028

Closed
ZoranPandovski wants to merge 6 commits into
releases/26.1.0from
clean-tests
Closed

Remove unused tests#12028
ZoranPandovski wants to merge 6 commits into
releases/26.1.0from
clean-tests

Conversation

@ZoranPandovski
Copy link
Copy Markdown
Member

Description

This PR removes the unused/legacy tests from the repo.

@ZoranPandovski ZoranPandovski changed the base branch from main to releases/25.14.0 December 19, 2025 14:52
@entelligence-ai-pr-reviews
Copy link
Copy Markdown
Contributor

Entelligence AI Vulnerability Scanner

Status: No security vulnerabilities found

Your code passed our comprehensive security analysis.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 19, 2025

Coverage

Coverage Report
FileStmtsMissCoverMissing
mindsdb/integrations/handlers/bigquery_handler
   __init__.py14379%7–9
   bigquery_handler.py1622982%55, 59, 64–67, 70, 90, 111–115, 121, 143–145, 161–163, 304, 307, 309, 343–344, 369–371, 374–375
mindsdb/integrations/handlers/file_handler
   file_handler.py1171785%25–27, 56, 59, 62, 88–89, 113, 131–132, 155–158, 166–167
mindsdb/integrations/handlers/mssql_handler
   __init__.py14379%10–12
   mssql_handler.py2844186%39–58, 79, 99, 106, 110, 121, 196, 244, 290–294, 306–308, 336, 363, 365–368, 402–407, 421, 444, 482, 524, 568, 600, 671, 710
mindsdb/integrations/handlers/mysql_handler
   __init__.py14379%10–12
   mysql_handler.py2011195%33–37, 79–84, 100, 164
   settings.py813754%30, 37–42, 49, 57, 71–81, 87–111, 117
mindsdb/integrations/handlers/oracle_handler
   __init__.py14379%10–12
   oracle_handler.py2674782%79–80, 113, 115, 117, 194, 205–206, 229–235, 239, 242–245, 253–255, 261–263, 295–297, 303, 361–378, 516–517, 622
mindsdb/integrations/handlers/postgres_handler
   __init__.py14379%8–10
   postgres_handler.py3262592%96–102, 196, 240, 278–279, 313–318, 345, 389–392, 455, 514, 545–546, 695
mindsdb/integrations/handlers/redshift_handler
   __init__.py14379%8–10
mindsdb/integrations/handlers/salesforce_handler
   __init__.py14379%10–12
   salesforce_handler.py122596%102–104, 146, 341
   salesforce_tables.py881286%140, 184–185, 231, 249–267
mindsdb/integrations/handlers/slack_handler
   __init__.py13377%10–12
   slack_handler.py1361192%57, 59, 122–124, 310, 314, 319, 323, 327, 342
   slack_tables.py2706576%58, 64–65, 71–72, 101–103, 156, 163–165, 251–265, 285, 293–295, 310, 340–344, 375–377, 384–387, 394, 404–408, 438–440, 447–450, 459–463, 549–551, 556, 577, 585–587, 599, 630–634, 698, 705–707
mindsdb/integrations/handlers/snowflake_handler
   __init__.py14379%8–10
   auth_types.py441273%12, 38, 44, 68–77
   snowflake_handler.py3244287%33–34, 111, 116, 118, 122, 157, 159–163, 210, 294, 328, 383–399, 432, 577–594, 620–623, 637, 658, 693
mindsdb/integrations/handlers/timescaledb_handler
   __init__.py13377%7–9
TOTAL274638486% 

Tests Skipped Failures Errors Time
549 53 💤 0 ❌ 0 🔥 9.147s ⏱️

@entelligence-ai-pr-reviews
Copy link
Copy Markdown
Contributor

Review Summary

🏷️ Draft Comments (40)

Skipped posting 40 draft comments that were valid but scored below your review threshold (>=13/15). Feel free to update them here.

mindsdb/__main__.py (1)

133-148: try-except inside a loop in close_api_gracefully (lines 133-148) can cause significant performance overhead when terminating many processes, especially if exceptions are frequent.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 8/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the loop in mindsdb/__main__.py lines 133-148 (`close_api_gracefully`) to move the try-except blocks outside the main for-loop where possible. This reduces exception handling overhead when terminating many processes. Only wrap the minimal code that can raise exceptions, and avoid catching exceptions for the entire loop body.

mindsdb/api/executor/command_executor.py (4)

1446-1446: statement.params is passed directly to update in knowledge base alteration, which can result in unresolved variables being used, causing incorrect updates or runtime errors.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 5/5
  • Urgency Impact: 2/5
  • Total Score: 10/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/api/executor/command_executor.py, line 1446, replace `params=statement.params,` with `params=variables_controller.fill_parameters(statement.params),` in the call to `self.session.kb_controller.update` to ensure all variables in params are resolved before use.

1512-1512: statement.params is passed directly to add_agent in agent creation, which can result in unresolved variables being used, leading to incorrect agent configuration or runtime errors.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 5/5
  • Urgency Impact: 3/5
  • Total Score: 12/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/api/executor/command_executor.py, line 1512, replace `params=statement.params,` with `params=variables_controller.fill_parameters(statement.params),` in the call to `self.session.agents_controller.add_agent` to ensure all variables in params are resolved before use.

1546-1546: statement.params is passed directly to update_agent in agent update, which can result in unresolved variables being used, causing incorrect agent updates or runtime errors.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 5/5
  • Urgency Impact: 3/5
  • Total Score: 12/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/api/executor/command_executor.py, line 1546, replace `params=statement.params,` with `params=variables_controller.fill_parameters(statement.params),` in the call to `self.session.agents_controller.update_agent` to ensure all variables in params are resolved before use.

230-690: ExecuteCommands.execute_command is a monolithic function with excessive complexity (100+ branches/statements), making it hard to maintain and optimize, and increasing risk of performance regressions as the codebase grows.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the `execute_command` method in mindsdb/api/executor/command_executor.py (lines 230-690). The function is excessively large and complex, with over 100 branches and 70+ return statements, making it difficult to maintain and optimize. Break it into smaller, well-named methods for each major statement type or logical block, and use a dispatch pattern or mapping to route execution. This will improve maintainability, testability, and reduce the risk of performance regressions.

mindsdb/api/executor/datahub/datanodes/integration_datanode.py (2)

163-246: create_table method (lines 163-246) is overly complex with many branches and responsibilities, making it hard to maintain and optimize for performance as the codebase grows.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 8/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the `create_table` method in mindsdb/api/executor/datahub/datanodes/integration_datanode.py (lines 163-246) to reduce its cyclomatic complexity and improve maintainability. Split the method into smaller, well-named helper functions for each major responsibility (e.g., dropping tables, creating tables, inserting data, adapting types). Ensure the refactored code preserves all existing logic and performance characteristics.

318-318: The use of df.replace(..., inplace=True) (line 318) on large DataFrames can cause significant memory overhead and performance degradation due to pandas' inconsistent in-place operation behavior.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 5/5
  • Urgency Impact: 2/5
  • Total Score: 10/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/api/executor/datahub/datanodes/integration_datanode.py, line 318, replace the use of `df.replace(..., inplace=True)` with an assignment (`df = df.replace(...)`) to avoid pandas' inconsistent in-place operation behavior and reduce memory overhead for large DataFrames.

mindsdb/api/executor/datahub/datanodes/project_datanode.py (1)

103-173: query method (lines 103-173) is a large, complex function with many branches, making it difficult to maintain and optimize for performance as the codebase grows.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 8/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the `query` method in mindsdb/api/executor/datahub/datanodes/project_datanode.py (lines 103-173) to reduce its cyclomatic complexity and improve maintainability. Break out major branches (Update, Delete, Select) into separate private methods, and simplify the control flow where possible. Ensure the refactored code preserves all existing functionality and performance characteristics.

mindsdb/api/executor/planner/query_planner.py (1)

291-312: get_query_info and related traversal functions repeatedly call self.resolve_database_table and self.is_predictor for each node, causing O(n^2) behavior on large/complex queries with many tables or joins.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 9/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Optimize performance in mindsdb/api/executor/planner/query_planner.py lines 291-312: The function `find_objects` repeatedly calls `self.resolve_database_table` and `self.is_predictor` for each node, which can cause O(n^2) behavior on large queries. Refactor to cache the results of these lookups on the node object (e.g., as `_integration_cache` and `_is_predictor_cache`) to avoid redundant computation during traversal.

mindsdb/api/http/namespaces/handlers.py (1)

199-200: engine_versions = [int(x) for x in engine_storage.get_connection_args()["versions"].keys()] reads all keys and builds a list just to call max(); this is O(n) memory and time for large version sets.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 2/5
  • Urgency Impact: 2/5
  • Total Score: 6/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/api/http/namespaces/handlers.py, lines 199-200, the code builds a full list of version keys and then calls max(), which is inefficient for large sets. Refactor to avoid unnecessary list construction and ensure efficient computation of the max version. Replace the current logic with a version that computes max directly and handles empty cases gracefully.

mindsdb/api/http/namespaces/sql.py (3)

258-270: ListDatabases.get performs a separate query for each database to list tables, causing N+1 query inefficiency and high latency for many databases.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 9/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/api/http/namespaces/sql.py, lines 258-270, the current implementation of ListDatabases.get performs an N+1 query pattern by issuing a separate 'SHOW TABLES' query for each database, which causes high latency and poor scalability when there are many databases. Refactor this block to batch or parallelize the table-fetching queries, for example by using ThreadPoolExecutor to run the queries concurrently, and then aggregate the results into the response. Ensure the new code preserves the original response structure and error handling.

42-135: Query.post is a large, complex function (54+ statements) that is difficult to maintain and reason about, impacting long-term code quality.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/api/http/namespaces/sql.py, lines 42-135, the Query.post method is overly large and complex (54+ statements), making it difficult to maintain and reason about. Refactor this method by extracting logical blocks (such as request parsing/validation, query execution, error handling, and response formatting) into well-named helper methods or functions. Ensure the refactored code maintains the same functionality and error handling, but is easier to read and maintain.

144-234: ParametrizeConstants.post is a large, complex function (53+ statements) that is hard to maintain and increases risk of bugs.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 2/5
  • Urgency Impact: 2/5
  • Total Score: 6/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/api/http/namespaces/sql.py, lines 144-234, the ParametrizeConstants.post method is overly large and complex (53+ statements), making it difficult to maintain and increasing the risk of bugs. Refactor this method by extracting logical sections (such as parameter extraction, AST traversal, and response construction) into smaller, well-named helper functions or methods. Ensure the refactored code preserves all existing functionality and error handling.

mindsdb/api/mysql/mysql_proxy/mysql_proxy.py (1)

628-804: handle method (lines 628-804) is extremely large and complex, making it difficult to maintain and optimize for performance as the codebase grows.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the `handle` method in mindsdb/api/mysql/mysql_proxy/mysql_proxy.py (lines 628-804). The method is extremely large and complex, making it difficult to maintain and optimize for performance. Break it into smaller, well-named helper methods for each command type and error handling. Ensure the main loop is concise and delegates logic to these helpers. This will improve maintainability and make future performance optimizations easier.

mindsdb/api/mysql/mysql_proxy/utilities/dump.py (2)

395-395: np.NaN is deprecated and will be removed in NumPy 2.0; using it in series.replace([np.NaN, pd.NA, pd.NaT], None) can cause runtime errors or incorrect NaN handling in future NumPy versions.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 5/5
  • Urgency Impact: 3/5
  • Total Score: 12/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/api/mysql/mysql_proxy/utilities/dump.py at line 395, replace any usage of `np.NaN` with `np.nan` in the call to `series.replace([np.NaN, pd.NA, pd.NaT], None)`. This prevents runtime errors due to deprecation in NumPy 2.0. Ensure the code uses `np.nan` instead.

408-410: try-except inside a loop in dump_result_set_to_mysql (lines 408-410) causes significant performance overhead for large DataFrames due to repeated exception handling.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/api/mysql/mysql_proxy/utilities/dump.py, lines 408-410, the use of a try-except block inside a loop in `dump_result_set_to_mysql` causes significant performance overhead when processing large DataFrames. Refactor this code to avoid exception handling in the loop by explicitly checking for nulls or problematic data before attempting the operation. Replace the try-except with a conditional check to set `column_info["size"]` efficiently.

mindsdb/integrations/handlers/langchain_handler/langchain_handler.py (2)

210-279: run_agent (lines 210-279) is a large, complex function with multiple responsibilities (prompt construction, agent invocation, error handling, parallel execution), making it difficult to maintain and optimize for performance at scale.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the `run_agent` method in mindsdb/integrations/handlers/langchain_handler/langchain_handler.py (lines 210-279). The function is too large and handles multiple concerns: prompt construction, agent invocation, error handling, and parallel execution. Split it into smaller, well-named helper methods for each responsibility to improve maintainability and enable targeted performance optimizations.

217-218: The use of a for-loop to build input_variables (lines 217-218) is inefficient for large templates; a list comprehension would be significantly faster and more concise.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 5/5
  • Urgency Impact: 1/5
  • Total Score: 8/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/integrations/handlers/langchain_handler/langchain_handler.py, lines 217-218, replace the for-loop that builds `input_variables` with a list comprehension for better performance and readability, especially for large templates.

mindsdb/integrations/handlers/mssql_handler/mssql_handler.py (1)

61-150: _make_table_response (lines 61-150) is excessively complex (21 branches, 58 statements), making it hard to maintain and optimize for performance as the codebase grows.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the function `_make_table_response` in mindsdb/integrations/handlers/mssql_handler/mssql_handler.py (lines 61-150). The function is overly complex (21 branches, 58 statements), which significantly impacts maintainability and makes future performance optimizations difficult. Break it into smaller, well-named helper functions for each major code path (e.g., ODBC type inference, pymssql type mapping, DataFrame construction). Ensure the refactored code preserves all logic and performance characteristics.

mindsdb/integrations/handlers/pgvector_handler/pgvector_handler.py (1)

494-703: raw_query and related methods construct SQL queries using string interpolation with user-controlled values (e.g., table/column names, metadata), enabling SQL injection if inputs are not strictly validated.

📊 Impact Scores:

  • Production Impact: 5/5
  • Fix Specificity: 2/5
  • Urgency Impact: 4/5
  • Total Score: 11/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/integrations/handlers/pgvector_handler/pgvector_handler.py, lines 494-703, several SQL queries are constructed using f-strings with user-controlled values (table/column names, metadata), which allows SQL injection. Refactor all such SQL constructions to strictly sanitize identifiers (allow only alphanumeric and underscores), and use parameterized queries for values wherever possible. Add a helper function to sanitize identifiers and apply it to all dynamic SQL identifiers. Ensure all SQL queries are safe from injection.

mindsdb/integrations/handlers/sheets_handler/sheets_handler.py (2)

50-62,108-129: self.connect() in native_query and check_connection always re-downloads and re-registers the sheet, causing repeated large network and memory overhead for every query.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 5/5
  • Urgency Impact: 3/5
  • Total Score: 11/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/integrations/handlers/sheets_handler/sheets_handler.py, lines 50-62 and 108-129: The current implementation of `connect()` and its usage in `native_query` causes the Google Sheet to be re-downloaded and re-registered with DuckDB on every query, resulting in significant network and memory overhead for frequent queries. Refactor `connect()` to only download and register the sheet if not already connected, and ensure `native_query` uses the existing connection if available. Update both methods to avoid redundant downloads and registrations.

110-110: native_query executes unfiltered query string directly in DuckDB, allowing SQL injection if user input is not strictly controlled.

📊 Impact Scores:

  • Production Impact: 5/5
  • Fix Specificity: 2/5
  • Urgency Impact: 4/5
  • Total Score: 11/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/integrations/handlers/sheets_handler/sheets_handler.py, lines 110-110, the code executes the `query` string directly in DuckDB, which could allow SQL injection if user input is not strictly controlled. Add a check to ensure only queries generated by the AST renderer are executed, and reject any query containing SQL injection patterns such as semicolons, comments, or other suspicious characters.

mindsdb/integrations/libs/response.py (1)

115-115: self.data_frame.replace([numpy.nan, pandas.NA], None, inplace=True) can silently fail if self.data_frame is None, causing an AttributeError at runtime.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 5/5
  • Urgency Impact: 3/5
  • Total Score: 12/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/integrations/libs/response.py, lines 115-115, the code calls `self.data_frame.replace([numpy.nan, pandas.NA], None, inplace=True)` without checking if `self.data_frame` is None. This can cause an AttributeError if `self.data_frame` is None. Please add a check to ensure `self.data_frame` is not None before calling `replace`.

mindsdb/interfaces/agents/agents_controller.py (2)

375-375: params argument in update_agent is used without a default value, which can cause a TypeError if not provided, breaking contract for optional arguments.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 5/5
  • Urgency Impact: 2/5
  • Total Score: 9/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/interfaces/agents/agents_controller.py, line 375, the `params` argument in the `update_agent` method is used as an optional argument but is not explicitly typed as Optional. Change the line to use `Optional[Dict[str, str]] = None` to prevent runtime errors when the argument is omitted.

203-364,365-528: add_agent and update_agent methods are excessively large and complex (over 50 statements, >20 branches), making them hard to maintain and error-prone for future changes.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the `add_agent` (lines 203-364) and `update_agent` (lines 365-528) methods in mindsdb/interfaces/agents/agents_controller.py to reduce their cyclomatic complexity and number of statements. Extract logical blocks (such as skill validation, parameter normalization, and agent/skill association management) into well-named private helper methods. Ensure the refactoring preserves all existing functionality and error handling, and improves maintainability for future changes.

mindsdb/interfaces/agents/langchain_agent.py (1)

514-523: _langchain_tools_from_skills uses a nested loop to flatten tool groups, but iterates over both keys and values, causing unnecessary iteration and potential performance issues for large skill sets.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 5/5
  • Urgency Impact: 2/5
  • Total Score: 9/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/interfaces/agents/langchain_agent.py, lines 514-523, the function `_langchain_tools_from_skills` unnecessarily iterates over both keys and values of `tools_groups` when only the values are needed. Refactor the code to iterate directly over `tools_groups.values()` to avoid unnecessary iteration and improve performance for large skill sets.

mindsdb/interfaces/knowledge_base/controller.py (2)

278-457: select method in KnowledgeBaseTable (lines 278-457) is excessively large and complex, with deep nesting and many branches, making it hard to maintain and optimize for performance as the codebase grows.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 8/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the `select` method in `KnowledgeBaseTable` (mindsdb/interfaces/knowledge_base/controller.py, lines 278-457) to reduce complexity and improve maintainability. Break it into smaller, well-named helper methods for each major logical block (e.g., condition extraction, hybrid search, reranking, result limiting). Ensure the refactor preserves all existing logic and performance characteristics.

1156-1308: add method in KnowledgeBaseController (lines 1156-1308) is excessively complex with many branches and statements, making it hard to maintain and optimize for performance as knowledge base creation logic evolves.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 8/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the `add` method in `KnowledgeBaseController` (mindsdb/interfaces/knowledge_base/controller.py, lines 1156-1308) to reduce cyclomatic complexity and improve maintainability. Split the method into smaller, well-named helper functions for parameter validation, embedding model setup, vector DB/table creation, and final KB object creation. Ensure all current logic and performance optimizations are preserved.

mindsdb/interfaces/knowledge_base/preprocessing/document_preprocessor.py (2)

131-146: ContextualPreprocessor.__init__ does not check for missing optional dependency before using create_chat_model, causing a crash if langchain is not installed.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 3/5
  • Urgency Impact: 3/5
  • Total Score: 10/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/interfaces/knowledge_base/preprocessing/document_preprocessor.py, lines 131-146, the constructor of ContextualPreprocessor uses create_chat_model without first checking if the optional dependency (langchain) is installed. This will cause a crash if the dependency is missing. Insert a call to _require_agent_extra("Contextual preprocessing") before using create_chat_model to ensure a proper ImportError is raised if the dependency is not present.

192-262: ContextualPreprocessor.process_documents has high cyclomatic complexity and too many branches, making it difficult to maintain and optimize for performance as the codebase grows.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the method `process_documents` in `mindsdb/interfaces/knowledge_base/preprocessing/document_preprocessor.py` (lines 192-262) to reduce cyclomatic complexity and the number of branches. The function currently has too many nested conditionals and branches, making it hard to maintain and optimize. Split the logic into smaller helper methods and simplify the control flow while preserving all existing functionality.

mindsdb/interfaces/skills/retrieval_tool.py (2)

133-133: document_chunks_df.sort_values(by=sort_col) does not sort in-place or assign, so the DataFrame remains unsorted, potentially returning document chunks in the wrong order.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 5/5
  • Urgency Impact: 3/5
  • Total Score: 11/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/interfaces/skills/retrieval_tool.py, line 133, the code `document_chunks_df.sort_values(by=sort_col)` does not sort the DataFrame in-place or assign the result, so the DataFrame remains unsorted. Update this line to assign the sorted DataFrame back to `document_chunks_df` to ensure document chunks are processed in the correct order.

92-155: _build_name_lookup_tool is a large, complex function (over 60 lines, multiple nested functions and logic branches), making it difficult to maintain and reason about.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 8/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the function `_build_name_lookup_tool` in `mindsdb/interfaces/skills/retrieval_tool.py` (lines 92-155) to reduce complexity and improve maintainability. Extract the nested functions `_get_document_by_name` and `_lookup_document_by_name` as top-level functions that accept all required parameters explicitly. Update `_build_name_lookup_tool` to use these refactored functions, passing in the necessary arguments. Ensure the refactored code preserves all original logic and functionality.

mindsdb/interfaces/tabs/tabs_controller.py (2)

132-146: get_all and get both read and parse every tab file individually, resulting in O(n) file reads and JSON parses per call; this can cause significant I/O and CPU overhead as the number of tabs grows.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 9/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/interfaces/tabs/tabs_controller.py, lines 132-146, the `get_all` method reads and parses each tab file sequentially, which causes significant I/O and CPU overhead as the number of tabs increases. Refactor `get_all` to batch read and parse tab files in parallel using ThreadPoolExecutor, so that performance scales better with large numbers of tabs. Preserve error handling and sorting by index.

227-240: The modify method re-sorts and rewrites all tab files when changing a tab's index, resulting in O(n) file writes and JSON serializations per index change; this is a major bottleneck for large tab sets.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 9/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/interfaces/tabs/tabs_controller.py, lines 227-240, the `modify` method rewrites every tab file when changing a tab's index, causing O(n) file writes and JSON serializations. Refactor this block to only update and write tab files whose index actually changes, minimizing unnecessary file operations for large tab sets.

mindsdb/utilities/fs.py (2)

71-75: clean_process_marks uses nested loops and unguarded file.unlink() which can be slow for directories with many files, causing significant delays on large process mark sets.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 4/5
  • Urgency Impact: 3/5
  • Total Score: 11/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/utilities/fs.py, lines 71-75, the `clean_process_marks` function uses nested loops and an early return, which can cause it to skip cleaning directories and be slow for large numbers of files. Refactor this block to use `continue` instead of `return` for non-directories, and consider batching or pre-listing files to avoid repeated directory scans. Ensure all files in all subdirectories are unlinked efficiently.

106-127: clean_unlinked_process_marks performs a try-except inside a loop for every process/thread, causing significant CPU overhead when many marks are present.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 8/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/utilities/fs.py, lines 106-127, the `clean_unlinked_process_marks` function uses a `try`-`except` block inside a loop for every process/thread, which is inefficient for large numbers of marks. Refactor to minimize exception handling inside the loop, batch thread id checks, and reduce repeated exception overhead. Use set lookups for thread ids and combine exception handling where possible.

mindsdb/utilities/log.py (1)

461-474: try-except inside a loop in resources_log_thread (lines 461-474) causes repeated exception handling overhead when iterating many child processes, degrading performance on systems with many children.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/utilities/log.py, lines 461-474, the code uses a try-except block inside a loop over child processes in `resources_log_thread`, which causes performance overhead when there are many children. Refactor this so that exception handling is minimized and only wraps the minimal set of statements that can fail, and avoid repeated exception handling for each attribute access. Use a pre-check like `child.is_running()` and collect all child info in a list, updating `total_memory_info["children"]` at the end.

setup.py (3)

115-116: extra_requirements[extra_name] is overwritten for each requirements file in a handler, so only the last file's requirements are kept, causing missing extras for handlers with multiple requirements files.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 5/5
  • Urgency Impact: 2/5
  • Total Score: 10/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In setup.py, lines 115-116, the code overwrites extra_requirements[extra_name] for each requirements file in a handler, so only the last file's requirements are kept. Update this logic so that requirements from multiple files for the same handler are accumulated (appended) instead of overwritten. Ensure that all requirements for a handler are included in its extra.

53-140: define_deps is a large, complex function (over 80 lines, multiple nested loops/branches), making it difficult to maintain and reason about, which increases risk of performance and maintainability issues as requirements logic grows.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the `define_deps` function in setup.py (lines 53-140) to reduce its complexity. Break it into smaller, well-named helper functions for reading requirements, expanding links, and building extras. This will improve maintainability and reduce the risk of performance issues as requirements logic evolves.

25-26: The use of exec(fp.read(), about) to load metadata from __about__.py is inefficient and risky, as it executes all code in the file, potentially increasing startup time and memory usage for large files.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Replace the use of `exec(fp.read(), about)` in setup.py (lines 25-26) with a safer and more efficient method for loading metadata, such as parsing the file for variable assignments or using importlib to import the module as a namespace, to avoid unnecessary code execution and improve performance.

🔍 Comments beyond diff scope (4)
mindsdb/integrations/handlers/mssql_handler/mssql_handler.py (1)

429-710: get_columns, get_tables, meta_get_tables, meta_get_columns, meta_get_primary_keys, and meta_get_foreign_keys construct SQL queries using direct string interpolation of user-controlled values (e.g., table_name, self.schema), leading to exploitable SQL injection vulnerabilities.
Category: security


mindsdb/interfaces/skills/retrieval_tool.py (1)

114-114: User input name is directly interpolated into a SQL ILIKE clause, allowing SQL injection if name contains malicious SQL.
Category: security


mindsdb/utilities/config.py (1)

122-125: self.user_config is referenced instead of self._user_config, which can cause an AttributeError if user_config property is not set up to return a dict with 'paths'.
Category: correctness


mindsdb/utilities/context.py (1)

44-44: __delattr__ deletes the key 'name' from storage instead of the attribute specified by the name parameter, causing incorrect attribute deletion and potential data corruption.
Category: correctness


@ea-rus
Copy link
Copy Markdown
Collaborator

ea-rus commented Jan 12, 2026

Checked briefly content of files, the list of tests that might be kept:

Might be potentialy useful, to check it better and rework/restore useful pieces:

  • tests/unused/integration/metrics/test_metrics.py
  • tests/unused/unit/handler_tests/test_handler_metrics.py
  • tests/unused/integrations/utilities/rag/test_file_loader.py
  • tests/unused/unit/executor/test_udf.py
  • tests/unused/integration/knowledge_bases/test_knowledge_bases.py
  • tests/unused/integrations/utilities/rag/test_file_splitter.py
  • tests/unused/unit/api/http/knowledge_bases_test.py

This is test for an existed module. We can move it to module folder (move to mindsdb/utilities/ml_task_queue/tests/)

  • tests/unused/integration/flows/test_ml_task_queue.py

Move to handler folders

  • tests/unused/unit/handler_tests/*
  • tests/unused/unit/ml_handlers/*

@dusvyat, are these tests used / useful? What we should keep and what from them enable in CI?

  • tests/unused/integration/a2a/test_a2a_streaming.py
  • tests/unused/integration/rag/test_rag_search_kwargs.py
  • tests/unused/integrations/utilities/rag/retrievers/test_multi_hop_retriever.py
  • tests/unused/unit/broken/test_map_reduce_summarizer_chain.py
  • tests/unused/unit/broken/test_sql_retriever.py

@dusvyat
Copy link
Copy Markdown
Contributor

dusvyat commented Jan 12, 2026

@ea-rus I think we can remove them, not sure if the parts of code they test are being actively tested

@entelligence-ai-pr-reviews
Copy link
Copy Markdown
Contributor

Review Summary

🔍 Comments beyond diff scope (2)
mindsdb/integrations/handlers/dynamodb_handler/dynamodb_handler.py (1)

172-172: connection.close() is called on a boto3 DynamoDB client, but boto3 clients do not have a close() method; this will raise an AttributeError and crash at runtime.
Category: correctness


mindsdb/integrations/utilities/rag/rerankers/base_reranker.py (1)

123-127: self.base_url is no longer added to kwargs, so the model may not use the intended API endpoint, causing incorrect or failed requests.
Category: correctness


@ZoranPandovski ZoranPandovski changed the base branch from releases/25.14.0 to releases/26.0.0 January 13, 2026 21:47
@ZoranPandovski ZoranPandovski requested a review from a team as a code owner January 13, 2026 21:47
@StpMax StpMax changed the base branch from releases/26.0.0 to releases/26.1.0 February 3, 2026 13:02
@martyna-mindsdb
Copy link
Copy Markdown
Contributor

@sejubar
As you've worked on that recently, can you please asses this PR? Should it be closed?

@ea-rus
Copy link
Copy Markdown
Collaborator

ea-rus commented Mar 31, 2026

yes it should be replaced by #12299

@martyna-mindsdb
Copy link
Copy Markdown
Contributor

Thanks @ea-rus
Please feel free to close this PR if it was replaced.

@martyna-mindsdb
Copy link
Copy Markdown
Contributor

Closing as this was handled by @sejubar in this PR -- #12299

@github-actions github-actions Bot locked and limited conversation to collaborators Apr 10, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants