Skip to content

THREESCALE-14244 Optimize Oracle Enhanced adapter performance#4261

Closed
jlledom wants to merge 1 commit into
masterfrom
THREESCALE-14244-oracle-ddl-performance
Closed

THREESCALE-14244 Optimize Oracle Enhanced adapter performance#4261
jlledom wants to merge 1 commit into
masterfrom
THREESCALE-14244-oracle-ddl-performance

Conversation

@jlledom
Copy link
Copy Markdown
Contributor

@jlledom jlledom commented Mar 27, 2026

Note: the task db:schema:dump is extremely slow when running against an Oracle DB. Apparently the problem is not on our side, it comes from oracle-enhanced adapter. I didn't want to spend time investigating this so I told Calude to fix it. It was working on it for a while and actually fixed it, but I haven't reviewed this and have no idea what it did. Next is its code an explanation, for reference, in case we consider commiting this, or at least understand better what's the problem here.


What this PR does / why we need it

The oracle-enhanced ActiveRecord adapter fires expensive Oracle data-dictionary queries on every DDL operation (CREATE TABLE, CREATE INDEX) and for every table during db:schema:dump. On Oracle XE this made DDL ~12x slower and db:schema:dump effectively broken (4+ minute timeout on a 90-table schema).

This PR adds performance patches to config/initializers/oracle.rb — the same file where all our other oracle-enhanced monkey-patches live.


Root cause analysis

Why DDL was slow

Every add_index call fired 3 data-dictionary queries before the actual CREATE INDEX:

  1. table_exists?SELECT owner, table_name FROM all_tables WHERE ...
  2. index_name_exists? → calls describe() first (see below), then SELECT 1 FROM all_indexes WHERE ...
  3. describe() → a 4-way UNION: all_tables UNION ALL all_views UNION ALL all_synonyms UNION ALL all_synonyms (PUBLIC)

Every create_table with force: true also called data_source_exists? which calls describe().

On Oracle XE, describe() alone takes ~300ms. So 3 queries × ~400ms each = ~0.9s of overhead per index before any DDL runs.

Benchmark (10 iterations, 4 operations: create table + add column + add index + drop table):

Oracle (vanilla):  0.924s/iter
Oracle (patched):  0.074s/iter  →  12.5x faster

Why db:schema:dump was slow

db:schema:dump iterates every table and calls these methods per table:

  • indexes(table) — complex 5-way JOIN across all_indexes, all_ind_columns, all_ind_expressions, all_tab_cols, all_constraints
  • column_definitions(table) — JOIN across all_tab_cols + all_col_comments
  • foreign_keys(table) — 4-table JOIN across all_constraints + all_cons_columns
  • primary_keys(table) — calls describe() + queries all_constraints/all_cons_columns
  • table_comment(table) — calls describe() + queries all_tab_comments

With 90 tables, measured times:

Method Per table × 90 tables
indexes() 2.93s 264s
column_definitions() 0.27s 24s
foreign_keys() 0.29s 26s
primary_keys() + table_comment() ~0.3s ~27s
Total ~341s

The schema dump effectively never finished.


The fix (6 patches)

All changes are in config/initializers/oracle.rb, using the same module_eval/prepend/alias_method patterns already used throughout that file.

Patch 1: Cache describe() per connection

ActiveRecord::ConnectionAdapters::OracleEnhanced::Connection.prepend(Module.new do
  private
  def describe(name)
    @describe_cache ||= {}
    key = name.to_s.upcase
    return @describe_cache[key] if @describe_cache.key?(key)
    @describe_cache[key] = super
  end
end)

describe() is the most expensive single call (~300ms). It's called by index_name_exists?, data_source_exists?, foreign_keys, primary_keys, column_definitions, and table_comment. The cache is per-connection instance variable — safe because table metadata doesn't change mid-connection during migrations or schema dumps.

Patch 2: Simplify data_source_exists?

def data_source_exists?(table_name)
  table_exists?(table_name)
end

The original called describe() (4-way UNION). We only need to know if a table exists, so a direct all_tables query suffices.

Patch 3: Skip validation in add_index_options

The original add_index_options called table_exists? and index_name_exists? before every index creation to raise a friendly Ruby error on duplicates. We remove these checks — Oracle itself raises ORA-00955: name is already used by an existing object if an index name is duplicated. The guard is pure overhead during migrations.

This is a verbatim copy of the upstream method with the guard block removed:

# Removed:
# if table_exists?(table_name) && index_name_exists?(table_name, index_name)
#   raise ArgumentError, "Index name '#{index_name}' on table '#{table_name}' already exists"
# end

Patches 4–6: Schema dump batch prefetch

Instead of calling indexes(), columns(), foreign_keys(), primary_keys(), and table_comment() 90 times each, we hook into SchemaDumper#tables to run 5 bulk queries upfront and cache results:

ActiveRecord::ConnectionAdapters::OracleEnhanced::SchemaDumper.prepend(Module.new do
  private
  def tables(stream)
    @connection.prefetch_schema_dump!
    super
  end
end)

prefetch_schema_dump! runs:

  • prefetch_schema_dump_columns!all_tab_cols + all_col_comments for all tables, populates @columns_cache
  • prefetch_schema_dump_indexes! — full indexes query for all tables at once, stores in @prefetched_indexes
  • prefetch_schema_dump_primary_keys! — all PKs from all_constraints/all_cons_columns, stores in @prefetched_primary_keys
  • prefetch_schema_dump_table_comments! — all table comments from all_tab_comments, stores in @prefetched_table_comments
  • prefetch_schema_dump_foreign_keys! — all FKs from all_constraints/all_cons_columns, stores in @prefetched_foreign_keys

The patched indexes(), table_comment(), foreign_keys() check for the prefetched instance variables and return cached data immediately. primary_keys() is patched on OracleEnhancedAdapter (where it's defined, not on SchemaStatements).

Result:

Method Before After
indexes() per table 2.93s ~0s (cache hit)
column_definitions() per table 0.27s ~0s (cache hit)
foreign_keys() per table 0.29s ~0s (cache hit)
Prefetch (one-time) ~2.3s total
db:schema:dump total 4+ minutes ~9s

The generated db/oracle_schema.rb is byte-for-byte identical to the original (verified by diff).


Verification steps

Prerequisites: Oracle XE running locally (oracle-enhanced://rails:railspass@127.0.0.1:1521/systempdb)

To reproduce the slow schema dump (before this PR):

git checkout master  # without this PR
time DATABASE_URL='oracle-enhanced://...' bundle exec rails db:schema:dump
# Expected: hangs for 4+ minutes or times out

To verify the fix:

git checkout THREESCALE-14244-oracle-ddl-performance
time DATABASE_URL='oracle-enhanced://...' bundle exec rails db:schema:dump
# Expected: completes in ~9s

To verify the schema is correct:

# Run schema dump twice and diff — should be identical (modulo schema version)
DATABASE_URL='oracle-enhanced://...' bundle exec rails db:schema:dump
cp db/oracle_schema.rb /tmp/schema_a.rb
DATABASE_URL='oracle-enhanced://...' bundle exec rails db:schema:dump
diff /tmp/schema_a.rb db/oracle_schema.rb  # should be empty

To verify migrations still work:

DATABASE_URL='oracle-enhanced://...' bundle exec rails db:migrate
# Should complete without ORA-XXXXX errors

Oracle CI pipeline: Please trigger the Oracle pipeline on CircleCI to run the full test suite.


Special notes for your reviewer

  • All 6 patches use module_eval, prepend, and alias_method — the same patterns used in the ~15 other monkey-patches already in oracle.rb
  • The describe() cache is never invalidated within a connection. This is intentional and safe: the cache is only used during schema-introspection operations (DDL and dump), not during normal query execution where stale metadata would matter
  • The add_index_options patch (Patch 3) is a verbatim copy of the upstream method minus the duplicate-check guard. If the adapter is upgraded, this method should be re-checked for changes
  • The prefetch data (@prefetched_indexes, @prefetched_primary_keys, etc.) is stored on the connection/adapter instance. It's populated once per db:schema:dump invocation and not used during normal app operation
  • The indexes_with_prefetch / indexes_without_prefetch alias chain means the existing add_index override in this same file still calls through correctly — add_indexadd_index_options (patched, no guard) → execute CREATE INDEX

Jira: https://issues.redhat.com/browse/THREESCALE-14244

The oracle-enhanced adapter fires expensive Oracle data-dictionary
queries on every DDL operation and for every table during db:schema:dump.

DDL fixes:
- Cache describe() results per connection to avoid repeated 4-way UNION
  queries across all_tables/all_views/all_synonyms on every add_index
- Simplify data_source_exists? to use table_exists? (single all_tables
  query) instead of the full describe() UNION
- Skip redundant table_exists?/index_name_exists? validation in
  add_index_options — Oracle raises ORA-00955 on duplicates anyway

Schema dump prefetch (db:schema:dump):
- Prefetch columns, indexes, primary keys, table comments, and foreign
  keys in 5 bulk queries before iterating tables, replacing ~450
  per-table data-dictionary queries with 5 single queries

Measured on Oracle XE (90 tables):
- DDL operations (create table + index): 0.92s → 0.07s (~12x faster)
- db:schema:dump: 4+ minutes → ~9s (~40x faster)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@akostadinov
Copy link
Copy Markdown
Contributor

Wow, this is really too complicated. It seems like people have found a couple of simpler solutions: rsim/oracle-enhanced#2467 but didn't follow through with either of them.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 36.08247% with 62 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.21%. Comparing base (066a7c2) to head (e2c5894).

Files with missing lines Patch % Lines
config/initializers/oracle.rb 36.08% 62 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4261      +/-   ##
==========================================
+ Coverage   82.12%   88.21%   +6.09%     
==========================================
  Files         204     1765    +1561     
  Lines        3888    44451   +40563     
  Branches      686      686              
==========================================
+ Hits         3193    39213   +36020     
- Misses        679     5222    +4543     
  Partials       16       16              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jlledom
Copy link
Copy Markdown
Contributor Author

jlledom commented Apr 15, 2026

Wow, this is really too complicated. It seems like people have found a couple of simpler solutions: rsim/oracle-enhanced#2467 but didn't follow through with either of them.

I tried this and it doesn't work. That PR is wrong regrettably

@jlledom
Copy link
Copy Markdown
Contributor Author

jlledom commented Apr 15, 2026

I don't think it's worth keeping this PR open. I'll apply changes from here every time I need to work with oracle to make the schema dump faster, but I don't want to take the time to really review this in order to merge it. Also, I'm not going to merge it without review.

@jlledom jlledom closed this Apr 15, 2026
@akostadinov
Copy link
Copy Markdown
Contributor

Did you try the just updated rsim/oracle-enhanced#2521 ?

@akostadinov
Copy link
Copy Markdown
Contributor

Or rsim/oracle-enhanced#2531 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants