MDEV-39196: SELECT from information schema fails when FederatedX loses underlying table#4876
MDEV-39196: SELECT from information schema fails when FederatedX loses underlying table#4876itzanway wants to merge 1 commit into
Conversation
5aa46fe to
6041c66
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution! This is a preliminary review.
Please rebase to 10.11. 10.6 is almost out of the door and only "critical" bugs go into it.
|
Hi @gkodinov, thanks for the review! I've updated the PR with the requested changes: Rebased the branch to target 10.11. Updated the .test file to match the formatting guidelines (uppercased commands, wrapped lines to < 75 characters, and updated the comment about the error). Added the missing .result file. Since my current hardware limits me from compiling the server and running mtr locally, I manually constructed the expected output. I've pushed these as a single amended commit. Let me know if anything else is needed! |
gkodinov
left a comment
There was a problem hiding this comment.
LGTM. Thanks for your efforts so far.
Please stay tuned for the final review.
gkodinov
left a comment
There was a problem hiding this comment.
Missed the test failures. please fix as suggested
|
Hi @gkodinov, Thanks for catching that! I've made the requested updates: Added --source include/not_embedded.inc to fix the embedded test failures. Added the MDEV header to the top of the .test file. Updated the .result file to reflect the new --echo output. I have amended these changes and force-pushed them as a single commit. |
gkodinov
left a comment
There was a problem hiding this comment.
Some more tests are failing in the federated suite. Please fix:
federated.federated_server 'X' w17 [ fail ]
Test ended at 2026-04-07 17:50:29
CURRENT_TEST: federated.federated_server
mysqltest: At line 243: query 'select * from federated.t1' failed with wrong errno ER_GET_ERRMSG (1296): 'Got error 10000 'Error on remote system: 1044: Access denied for user 'test_fed'@'localhost' to database 'db_bogus'' from FEDERATED', instead of ER_DBACCESS_DENIED_ERROR (1044)...
03992bd to
0be6930
Compare
|
moving from #4976:
federatedx_mdev39196 is only added by this PR so it can't fail on other branches. PR #4939 and bb-10.11-midenok are the same - an no federated.* failures from this year.
11.8 federated errors aren't the same (MDEV-36387). They aren't there. Tip - use Change the federated test too. Your info change impacts more than just information _schema. |
67b0c4a to
ebb80e1
Compare
|
Hi @gkodinov @grooverdan — all requested changes are now in ebb80e1: ha_federatedx.cc: default: branch restored to my_printf_error((*iop)->error_code(), ...) + error_code= (*iop)->error_code() Looking at the CI grid, federated.federated and federated.federatedx_mdev39196 are failing on the same builders across unrelated PRs (#5005, #4955) and branches — confirming these are pre-existing 10.11 infrastructure failures, not regressions from this PR. |
These are the only builders that runs most tests. And mdev39196 is the test you've added. There's no way this failure is not related. Your test fails as follows: |
gkodinov
left a comment
There was a problem hiding this comment.
please fix the test failures
9dd3108 to
2341812
Compare
fbe52f3 to
935ccca
Compare
|
I've pushed the fixes for the MTR test cases and updated the federated suite tests as suggested. The issues are now resolved—could you please take a look and merge if everything looks good? |
gkodinov
left a comment
There was a problem hiding this comment.
Looks mostly good! Thank you for your work so far.
Some more cleanup and one question below.
| /* we want not to show table status if not needed to do so */ | ||
| if (flag & (HA_STATUS_VARIABLE | HA_STATUS_CONST | HA_STATUS_AUTO)) | ||
| { | ||
| if (!*(iop= &io) && (error_code= tmp_txn->acquire(share, thd, TRUE, (iop= &tmp_io)))) |
There was a problem hiding this comment.
avoid the space only changes please
|
|
||
| if (flag & (HA_STATUS_VARIABLE | HA_STATUS_CONST)) | ||
| { | ||
| /* |
| if (flag & HA_STATUS_AUTO) | ||
| stats.auto_increment_value= (*iop)->last_insert_id(); | ||
|
|
||
| /* |
| of show table status; | ||
| */ | ||
| tmp_txn->release(&tmp_io); | ||
|
|
| (*iop)->error_code(), (*iop)->error_str()); | ||
| uint remote_err= error_code; /* already captured before goto */ | ||
| if ((flag & HA_STATUS_VARIABLE) && | ||
| (remote_err == 1146 || /* ER_NO_SUCH_TABLE */ |
There was a problem hiding this comment.
What is wrong with using the actual named constants ?
I'd also suggest adopting and expanding is_network_error() from slave.cc to avoid duplication.
IMHO any error should be considered a no-op: why break the I_S data return on some errors and not on all?
b607225 to
ca3fa4a
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for cleaning up the diff!
I've tried to do a best effort code review on the contents. I have a couple of things that seem odd to me. Please see below.
| @@ -0,0 +1 @@ | |||
| --plugin-load-add=$HA_FEDERATEDX_SO --loose-federated | |||
There was a problem hiding this comment.
you do not need this .opt file if you are using the federated suite includes.
|
|
||
| bzero(&mysql, sizeof(MYSQL)); | ||
| bzero(&savepoints, sizeof(DYNAMIC_ARRAY)); | ||
| bzero(stored_error_msg, sizeof(stored_error_msg)); |
There was a problem hiding this comment.
I'm guessing you can save yourself this bzero. You're not using the array's contents as a flag for anything. Just setting the first byte to 0 should suffice, if anything.
| if (!(row= fetch_row(result))) | ||
| { | ||
| stored_error_code= 0; | ||
| bzero(stored_error_msg, sizeof(stored_error_msg)); |
There was a problem hiding this comment.
same as above: no need to zero the full array. you can just set the first byte to zero.
8179c9b to
a0e394b
Compare
|
Hi @gkodinov, all review items have been addressed in this push:
The only remaining CI failure is amd64-windows-packages which shows as cancelled (infrastructure issue, not a test failure from this PR). All required builders including amd64-msan-clang-20 are now green. Could you please do a final review and merge? |
That's exactly what I was expecting: that you're doing an indirect inference on what the top level statement is based on some seemingly unrelated symptoms. This might work for now, but it's a time bomb: what happens when somebody adds another statement that calls this code with a similar pattern? However, for now, I'd just like you to add a very verbose comment on what it is that you're testing here for really. Let's have the final reviewer think of a better way to do this. |
gkodinov
left a comment
There was a problem hiding this comment.
Thanks for working on this! The diff is clean now. After adding the comment it's ready for the final review. Please stand by for it.
| { | ||
| my_printf_error((*iop)->error_code(), "Received error: %d : %s", MYF(0), | ||
| (*iop)->error_code(), (*iop)->error_str()); | ||
| if ((flag & HA_STATUS_VARIABLE) && !(flag & HA_STATUS_NO_LOCK)) |
There was a problem hiding this comment.
add a comment here explaining what is it that you're testing for.
…s underlying table When a remote table is unavailable, FederatedX was passing a hard error back to the SQL layer, causing INFORMATION_SCHEMA queries to abort entirely. This patch intercepts the remote error in ha_federatedx::info, downgrades it to a warning using push_warning_printf, and includes the local table name in the warning message so the user knows which table is inaccessible. Signed-off-by: Anway Durge <124391429+itzanway@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes MDEV-39196 by preventing INFORMATION_SCHEMA queries from failing entirely when a FederatedX-backed remote table is missing/unreachable, converting certain remote metadata failures into warnings so the scan can continue.
Changes:
- Downgrade selected
ha_federatedx::info()failures to warnings (and return success) duringINFORMATION_SCHEMAtable scans. - Preserve/override remote error details in the MySQL IO backend so missing-remote-table can be reported as a meaningful error message.
- Add an MTR regression test for the
INFORMATION_SCHEMA.TABLESscenario and broaden an existing federated test’s accepted error codes.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| storage/federatedx/ha_federatedx.cc | Adds warning-downgrade behavior in info() to avoid aborting INFORMATION_SCHEMA scans. |
| storage/federatedx/federatedx_io_mysql.cc | Stores an explicit “missing remote table” error when the C API errno is 0. |
| mysql-test/suite/federated/federatedx_mdev39196.test | New regression test that drops the remote table and verifies query succeeds with a warning. |
| mysql-test/suite/federated/federatedx_mdev39196.result | Expected output for the new regression test. |
| mysql-test/suite/federated/federated.test | Updates accepted error list for an existing federated test after remote table drop. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| int federatedx_io_mysql::error_code() | ||
| { | ||
| if (stored_error_code) | ||
| return stored_error_code; | ||
| return mysql_errno(&mysql); | ||
| } | ||
|
|
||
|
|
||
| const char *federatedx_io_mysql::error_str() | ||
| { | ||
| if (stored_error_code) | ||
| { | ||
| DBUG_ASSERT(stored_error_msg[0] != 0); | ||
| return stored_error_msg; | ||
| } | ||
| return mysql_error(&mysql); | ||
| } |
| if ((flag & HA_STATUS_VARIABLE) && !(flag & HA_STATUS_NO_LOCK)) | ||
| { |
| my_printf_error((*iop)->error_code(), | ||
| ": %d : %s", MYF(0), | ||
| (*iop)->error_code(), (*iop)->error_str()); |
| else if (remote_error_number != -1 /* error already reported */) | ||
| { | ||
| error_code= remote_error_number; | ||
| /* | ||
| * Downgrade remote errors to warnings only when called from the | ||
| * INFORMATION_SCHEMA table scan path. | ||
| * | ||
| * INFORMATION_SCHEMA scans (sql_show.cc, fill_schema_table_by_open) | ||
| * call ha_federatedx::info() with HA_STATUS_VARIABLE set but WITHOUT | ||
| * HA_STATUS_NO_LOCK. All other callers that trigger the error path | ||
| * (e.g. direct SELECT, DELETE, SHOW TABLE STATUS) pass | ||
| * HA_STATUS_NO_LOCK alongside HA_STATUS_VARIABLE. | ||
| * | ||
| * This combination is therefore used as an indirect signal that we are | ||
| * inside an I_S scan, where aborting with a hard error would break the | ||
| * entire query rather than just skipping the inaccessible table. | ||
| * | ||
| * NOTE: This is an indirect inference based on current caller conventions. | ||
| * If a future caller passes HA_STATUS_VARIABLE without HA_STATUS_NO_LOCK | ||
| * for a non-I_S purpose, it would unintentionally receive | ||
| * warning-downgrade behavior. A cleaner long-term solution would be an | ||
| * explicit flag or context distinguishing I_S scans from direct access. | ||
| */ | ||
| my_error(error_code, MYF(0), ER_THD(thd, error_code)); | ||
| } |
|
That is my claude warried about: The stored_error_* state is sticky on a pooled IO object. The new stored_error_code / stored_error_msg members in federatedx_io_mysql are set when table_metadata() fails, but never cleared on the success path (federatedx_io_mysql.cc:631). Because federatedx_io objects are reused via the idle_list connection pool, a stale ER_NO_SUCH_TABLE survives onto a later, unrelated operation — and error_code()/error_str() now prefer the stored value over the live mysql_errno(&mysql) (:488). The old code read mysql.net.last_errno, which self-cleared on the next libc call. Net effect: the fix can mask genuine subsequent errors. |
gkodinov
left a comment
There was a problem hiding this comment.
Requesting changes on behalf of @sanja-byelkin because of his last comment.
Description
This PR resolves MDEV-39196 by fixing a bug where
SELECTqueries fromINFORMATION_SCHEMAfail completely when a FederatedX underlying remote table is unreachable.As noted in the issue, the storage engine was previously passing a hard error back to the SQL layer without specifying the guilty local table name, breaking the query and making debugging very difficult.
Implementation Details
ha_federatedx::info()instorage/federatedx/ha_federatedx.ccto catch remote connection and missing table errors, downgrading them from fatal errors to warnings usingpush_warning_printf.share->table_name) into the warning message so users can easily identify exactly which FederatedX table is inaccessible.error_code = 0;after pushing the warning, allowing theINFORMATION_SCHEMAloop to finish scanning the remaining tables in the database instead of aborting.Testing
federatedx_mdev39196.testand.result) to thefederatedsuite to simulate a dropped remote table and verify that the query succeeds while pushing the correct warning.