Skip to content

Intermitant 500 error reading timeseries when threading returns ORA-20047 / "Database Error" or "System Error" #1753

@msweier

Description

@msweier

Describe the bug
When using the GET /timeseries with cwms-python (which spawns 20 threads by default), I get intermittent 500 errors on dev server (running 2026.05.11-develop ) returning "Database Error" or "System Error". When looking at the CDA logs, I see ORA-20047: SESSION_OFFICE_ID_NOT_SET: Session office id is not set by the application, raised from CWMS_UTIL.GET_DB_OFFICE_ID. The store will succeed on retry, so it has something to do with the frequent requests.

Expected behavior
I would think the office would always be assigned to the request.

To Reproduce
Steps to reproduce the behavior:

  1. Use cwms-python to read period of record for Baldhill_Dam.Flow-Out.Inst.~15Minutes.0.best (e.g. cwms.get_timeseries(ts_id = 'Baldhill_Dam.Flow-Out.Inst.~15Minutes.0.best', office_id=MVP, begin= datetime.datetime(1950,1,1))) on dev server or issue concurrent requests to any office-scoped endpoint (e.g. 20 parallel GET /timeseries?...&office=<OFFICE> requests, chunked by time window).
  2. Observe that a fraction of responses come back as HTTP 500 with body "Database Error" or "System Error", while the rest succeed.

Logs/Incident Identifier

The Error Chain

Server log shows the following sequence:

  1. ORA-01422: exact fetch returns more than requested number of rows

    • Occurs inside CWMS_UTIL.GET_DB_OFFICE_ID when trying to resolve the office code
    • This is detection, not the root cause — without session office context, the office lookup query returns multiple rows instead of exactly one
  2. ORA-20047: SESSION_OFFICE_ID_NOT_SET: Session office id is not set by the application

    • The PL/SQL catches ORA-01422, recognizes it means session context is broken, and re-raises with a clear error message
    • Stack trace shows: CWMS_UTIL.GET_DB_OFFICE_IDCWMS_TS.* or other package calls
  3. Client sees: HTTP 500 with body "Database Error" (for main query failures) or "System Error" (for async count query failures)

    • The ORA code is wrapped by jOOQ DataAccessException and caught by ApiServlet's generic handler, which hard-codes the response body
    • Clients cannot distinguish this from other database errors

Example Log Sequence

ORA-01422: exact fetch returns more than requested number of rows
ORA-06512: at "CWMS_20.CWMS_UTIL", line 599
ORA-20047: SESSION_OFFICE_ID_NOT_SET: Session office id is not set by the application
ORA-06512: at "CWMS_20.CWMS_ERR", line 80
ORA-06512: at "CWMS_20.CWMS_UTIL", line 605

CDA Version (please complete the following information):
2026.05.11-develop

Additional context
I didn't see these errors in test or prod so I'm wondering if the new timeseries query may expose this a bit more?

AI suggests the reason is the request-time DSL builder never installs SessionOfficePreparer on pooled connections, but I'm wondering why we haven't seen this before. It's been that way for a long time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions