Skip to content

fix(prof): Add running PHP language tests for profiler and fix discovered bugs#3364

Draft
realFlowControl wants to merge 24 commits into
masterfrom
florian/run-language-tests-for-profiler
Draft

fix(prof): Add running PHP language tests for profiler and fix discovered bugs#3364
realFlowControl wants to merge 24 commits into
masterfrom
florian/run-language-tests-for-profiler

Conversation

@realFlowControl

@realFlowControl realFlowControl commented Jul 28, 2025

Copy link
Copy Markdown
Member

Description

In the light of #3360 it came to my attention that we are not running the PHP language tests for the profiler.

This PR wires up the profiler "PHP language tests" job so the upstream PHP test suite actually runs with the profiling extension loaded, across the supported PHP versions and for both the nts and zts builds. Getting that suite green surfaced several real bugs (including genuine profiler crashes) plus a few environment issues, all addressed here. The xfail lists are documented in profiling/tests/README.md.

Profiler bugs fixed

Leaked stderr fd → hang. The profiler duplicates STDERR_FILENO to keep a logging target alive after PHP closes stderr, but used a plain dup() without O_CLOEXEC, so the duplicate leaked across proc_open()/exec() into child processes. ext/curl/tests/curl_setopt_ssl.phpt spawns an openssl s_server, which inherited the leaked fd and held a run-tests.php worker's pipe open, so the worker blocked on pipe_read until the 60-minute CI timeout. Fixed by using fcntl(F_DUPFD_CLOEXEC).

NULL file dereference in the timeline error observer (PHP 8.0 segfault). The error observer called CStr::from_ptr(file) with no NULL check. An uncaught exception whose file property is NULL is reported as a fatal error with file=NULL, and strlen(NULL) segfaulted. PHP 8.0 only, where the observer receives a raw C string. Fixed with a NULL guard, regression test added.

Async signals delivered to profiler helper threads (ZTS segfault). The helper threads (ddprof_time, ddprof_upload) only masked the fixed set of signals the Zend Engine registers. A script using pcntl_async_signals(true) + pcntl_signal(SIGCHLD, …) could have SIGCHLD delivered to a helper thread, where pcntl's handler ran without a PHP/TSRM context and dereferenced the thread-local PCNTL_G → segfault (ZTS only). Fixed by blocking all non-fault signals on the helper threads so async signals go to a PHP thread, regression test added.

CI / harness fixes

Wrong conf.d path. run_php_language_tests.sh hardcoded /opt/php/debug/conf.d when removing the rdkafka/memcached extensions, but the profiler job runs nts/zts (different paths), so rdkafka stayed loaded and its CONFWARN output broke Zend/tests/instantiate_all_classes.phpt. The scan dir is now derived from the active PHP (PHP_CONFIG_FILE_SCAN_DIR).

SKIP_ONLINE_TESTS typo. The job exported SKIP_ONLINE_TEST (singular) while the tests check SKIP_ONLINE_TESTS, so online tests were never skipped (and had to be xfailed). Fixed, and the two online tests removed from the xfail list.

parallel, and why we disable it for the ZTS language tests

The CI image auto-loads ext-parallel on the zts build. parallel replaces sapi_module.ub_write at MINIT, and as a side effect, on ZTS the engine's PHP Startup diagnostics (deprecation/warning notices) are flushed after the script's first output instead of before it. ~56 upstream tests assert the normal ordering and therefore fail — purely because parallel is loaded; this reproduces on a vanilla ZTS PHP with no Datadog extension. Rather than xfail 56 tests, we remove parallel.ini for the language-test run (alongside rdkafka/memcached), which restores the correct ordering and keeps full
coverage; these tests don't need parallel.

The former is not a problem for the tracer, because the tracer does not run ZTS upstream tests.

Also: UBSAN found a parallel bug when throwing exceptions from a parallel thread and was reported upstream and a fixed parallel 1.2.14 has since been released — the profiler UBSAN job now pecl installs it over the image's version until the CI images are rebuilt.

xfails (not profiler bugs)

  • opcache optimizer dumps, PHP < 8.4 (prop_types, gh11170, nullsafe_002): the wall-time profiler hooks zend_execute_internal on < 8.4, so internal calls compile to DO_FCALL instead of DO_ICALL and the optimized-opcode dumps differ. xfailed for < 8.4 only.
  • bug60634 (die() in a session save handler): a parallel-run session-collision flake, not the profiler (passes in isolation with the profiler enabled); xfailed.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

AI Usage

99% of the PR have been written by Claude Opus 4.8, Claude Sonnet 4.6 and Kimi K2.5

@realFlowControl realFlowControl force-pushed the florian/run-language-tests-for-profiler branch from 40554c4 to 5c1f77c Compare July 28, 2025 14:59
@codecov-commenter

codecov-commenter commented Jul 28, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.64%. Comparing base (2f00cda) to head (71fbd31).
⚠️ Report is 560 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3364      +/-   ##
==========================================
- Coverage   61.73%   61.64%   -0.10%     
==========================================
  Files         139      139              
  Lines       13051    13051              
  Branches     1712     1712              
==========================================
- Hits         8057     8045      -12     
- Misses       4231     4241      +10     
- Partials      763      765       +2     

see 2 files with indirect coverage changes


Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f00cda...71fbd31. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@realFlowControl realFlowControl force-pushed the florian/run-language-tests-for-profiler branch 6 times, most recently from 3174248 to 5e65c41 Compare July 29, 2025 08:14
@realFlowControl realFlowControl force-pushed the florian/run-language-tests-for-profiler branch from 5e65c41 to bf7f2a1 Compare September 5, 2025 06:46
@realFlowControl realFlowControl force-pushed the florian/run-language-tests-for-profiler branch from bf7f2a1 to 71fbd31 Compare January 7, 2026 08:20
@realFlowControl realFlowControl force-pushed the florian/run-language-tests-for-profiler branch from 71fbd31 to 5d80878 Compare June 16, 2026 19:09
@datadog-official

datadog-official Bot commented Jun 16, 2026

Copy link
Copy Markdown

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 4 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-php | pecl tests: [8.2]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-php | test_extension_ci: [8.5]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-php | ASAN test_c with multiple observers: [8.5]   View in Datadog   GitLab

View all 4 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 54.09% (-0.03%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 21a9a8f | Docs | Datadog PR Page | Give us feedback!

@realFlowControl realFlowControl force-pushed the florian/run-language-tests-for-profiler branch from 8e69df3 to a0fd4fc Compare June 16, 2026 19:49
@pr-commenter

pr-commenter Bot commented Jun 17, 2026

Copy link
Copy Markdown

Benchmarks [ profiler ]

Benchmark execution time: 2026-06-18 05:27:49

Comparing candidate commit 21a9a8f in PR branch florian/run-language-tests-for-profiler with baseline commit a8cd140 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 26 metrics, 10 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

@realFlowControl realFlowControl force-pushed the florian/run-language-tests-for-profiler branch 4 times, most recently from ea2df51 to 8ee8887 Compare June 17, 2026 18:35
ext/curl/tests/curl_setopt_ssl.phpt hangs indefinitely when the profiling
extension is loaded. The test opens an openssl s_server subprocess via
proc_open() and cleans it up in a finally block, but with the profiling
extension present the cleanup fails: openssl inherits the run-tests.php
worker pipe write-end and keeps it open, leaving the worker blocked on
pipe_read forever.

Without the profiling extension the test passes in ~1s.

Needs further investigation to fix the root cause.
The profiler duplicates STDERR_FILENO to keep a logging target alive after
the PHP CLI closes stderr at rshutdown. The plain dup() did not set
O_CLOEXEC, so the duplicate leaked into child processes spawned via
proc_open()/exec().

This hung the PHP language tests: ext/curl/tests/curl_setopt_ssl.phpt
spawns an `openssl s_server` subprocess via proc_open(). In a run-tests.php
parallel worker the test PHP process's stderr is a pipe the parent reads to
detect test completion. openssl inherited the leaked stderr dup and kept the
pipe write-end open after the test PHP exited (proc_close), so the worker
never saw EOF and blocked on pipe_read until the 60-minute CI timeout.

Use fcntl(F_DUPFD_CLOEXEC) instead of dup() at both dup sites (env_logger in
logging.rs and the tracing-subscriber feature in lib.rs) so the duplicate is
not inherited across exec(). With this fix the test passes in ~1s, so the
xfail entry added previously is removed.
ext/standard/tests/file/include_userstream_003.phpt and
ext/standard/tests/strings/highlight_file.phpt fail in the profiler PHP
language tests job, but the failures are not profiler-related: they
reproduce bit-for-bit on vanilla PHP in the bookworm-8 image with no
Datadog extension loaded.

Both tests set allow_url_include=1 (a deprecated directive). Their
--EXPECTF-- expects the 'Directive allow_url_include is deprecated'
notice at the top (module-startup timing), but the PHP build in the CI
image emits it during request activation, so it lands after the first
script-level warning. The diff is purely the position of the deprecation
line; display_startup_errors does not change it.

The existing tracer PHP Language Tests job masks this because it loads
ddtrace, which alters startup/output ordering. The new profiler job runs
near-vanilla PHP and surfaces the upstream mismatch directly. xfail since
there is no profiler-side fix.
The assert.* tests and the allow_url_include deprecation-ordering tests
fail only on ZTS PHP builds; they pass on NTS. The failures are upstream
PHP behaviour (startup INI-deprecation notices are flushed after the
first request output on ZTS, while the bundled --EXPECTF-- expects them
before) and are unrelated to the profiler: they reproduce on vanilla ZTS
PHP with no Datadog extension loaded. The tracer PHP Language Tests job
never sees them because it only runs the NTS debug build.

Rather than dropping these tests on NTS too (where they pass), add a
separate profiling/tests/php-language-xfail-zts.list that the PHP
language tests job only appends when FLAVOUR=zts. Move the two
previously-xfailed deprecation-ordering tests (include_userstream_003,
highlight_file) from the shared list into the ZTS list, since they are
also NTS-passing.

Tests added to the ZTS list:
  ext/standard/tests/file/include_userstream_003.phpt
  ext/standard/tests/strings/highlight_file.phpt
  ext/standard/tests/assert/assert_basic{,1,2,3,4,5}.phpt
  ext/standard/tests/assert/assert_closures.phpt
  ext/standard/tests/assert/assert_error2.phpt
  ext/standard/tests/assert/assert_return_value.phpt
  ext/standard/tests/assert/assert_variation.phpt
  ext/standard/tests/assert/assert_warnings.phpt
… tests

run_php_language_tests.sh hardcoded /opt/php/debug/conf.d when removing
the memcached and rdkafka ini files. That path is only correct for the
tracer PHP Language Tests job (debug build). The profiler PHP language
tests job runs the nts and zts builds, whose ini files live in
/opt/php/{nts,zts}/conf.d, so the removal silently missed and rdkafka
stayed loaded.

With rdkafka loaded, Zend/tests/instantiate_all_classes.phpt fails: the
test instantiates every class, and constructing RdKafka\Consumer/Producer
prints 'No bootstrap.servers configured' CONFWARN lines that are not in
the expected output.

Derive the scan dir from the active PHP via PHP_CONFIG_FILE_SCAN_DIR so
the removal works for every build variant. Verified on nts and debug:
rdkafka is unloaded and the test passes.
Triaged the full PHP 8.4 ZTS language-test failure set from CI. Every
failing test passes on NTS and fails on vanilla ZTS (no profiler, no
ddtrace), confirming they are all the same upstream ZTS-specific issue:
PHP flushes 'PHP Startup' diagnostics after the SAPI begins emitting
request output on ZTS, while the bundled --EXPECTF-- expects them first.

This commit adds the 43 cases whose moved startup message is a
Deprecated notice (deprecation-ordering). Verified: each passes on NTS,
fails on vanilla ZTS, and matches expected output once the misplaced
Deprecated lines are removed.

The remaining ZTS-only failures move a Warning/Fatal-error startup
diagnostic instead of a Deprecated notice (date.timezone, mbstring
encoding, session.name/upload_progress, session save-handler, zend_test
quantity). Those are handled separately.
Add the 13 remaining ZTS-only language-test failures whose misplaced
'PHP Startup' diagnostic is a Warning or Fatal error rather than a
Deprecated notice (date.timezone, mbstring encoding,
session.name/upload_progress, session save-handler, multibyte encoding,
zend_test quantity). Same upstream ZTS root cause as the deprecation
cases: startup diagnostics are flushed after the SAPI begins emitting
request output on ZTS but before it on NTS. All pass on NTS and fail on
vanilla ZTS with no Datadog extension loaded, so they are not
profiler-related.
Add profiling/tests/README.md explaining the two xfail lists consumed by
the profiler PHP language tests job: the shared all-flavours list and the
ZTS-only list. Documents how run_php_language_tests.sh consumes them
(deletes the listed .phpt files) and groups every entry by its root cause
(profiler/FFI crash, perf-sensitive timing, unskipped online tests, ZTS
startup-message ordering, etc.).
The PHP language tests job exported SKIP_ONLINE_TEST (singular), but the
upstream tests' --SKIPIF-- sections and the tracer job both use
SKIP_ONLINE_TESTS (plural). As a result online tests were never skipped
in the profiler job and had to be xfailed.

Fix the variable name to SKIP_ONLINE_TESTS and remove the two online
tests that only failed because of this (ext/soap/tests/bugs/bug76348.phpt
and ext/standard/tests/network/bug80067.phpt). Verified both now SKIP
with reason 'online test'. Update the tests README accordingly.
Make the xfail-lists README slim and dry: drop the verbose consumption
walkthrough, the redundant online-tests note, and the per-file test
enumerations (the lists themselves are the source of truth).
With the O_CLOEXEC stderr-dup fix in place, ext/curl/tests/bug48203_multi.phpt
passes reliably. Remove it from the shared xfail list and the README.
The ~56 'ZTS-only' language-test failures were not an upstream PHP quirk:
they are caused by the parallel extension (loaded only on the ZTS build in
the CI image), which defers 'PHP Startup' diagnostics until after the
script's first output, so deprecation/warning notices land in the wrong
place relative to the bundled --EXPECTF--.

Remove parallel.ini in run_php_language_tests.sh (alongside rdkafka and
memcached; no-op for the tracer NTS debug job where parallel is not
loaded), which makes all those tests pass on ZTS. Delete the ZTS-only
xfail list and its conditional in generate-profiler.php, restoring full
ZTS coverage. Verified 57/57 formerly-failing tests pass on ZTS with the
profiler loaded and parallel removed.
Four opcache optimizer-output tests fail in the PHP language tests job only
with the profiler loaded on PHP <= 8.3:
  ext/opcache/tests/opt/prop_types.phpt
  ext/opcache/tests/opt/gh11170.phpt
  ext/opcache/tests/opt/nullsafe_002.phpt
  ext/opcache/tests/bug66251.phpt

On PHP < 8.4 the wall-time profiler overrides zend_execute_internal (to
handle VM interrupts while an internal function is still on the call stack);
on 8.4+ this is unnecessary (frameless calls) and the hook is not installed
(#[cfg(not(php_frameless))]). With the hook installed the compiler emits
DO_FCALL instead of DO_ICALL for internal calls (zend_get_call_op gates
DO_ICALL on !zend_execute_internal), so the optimized opcodes the tests
assert differ. zend_execute_ex is NOT hooked, so user-call dispatch is
unaffected.

prop_types/gh11170/nullsafe_002 are cosmetic opcode-dump mismatches.
bug66251 is a real constant-folding divergence (a same-file runtime constant
gets folded when it should stay dynamic; reproduces with default opcache
settings on a cached file) and needs a proper fix/upstream report - tracked
in profiling/tests/INVESTIGATE-opcache-do_icall.md.

All four pass on 8.4+ and without the profiler, so they are xfailed only for
PHP < 8.4 via a new php-language-xfail-pre84.list appended by the job.
ddog_php_prof_zend_error_observer dereferenced the error file pointer
without a NULL check. When an uncaught exception with a NULL file location
is reported as a fatal error, the engine calls the error observer with
file=NULL (location "Unknown"); on PHP 8.0 the observer receives a raw C
string and CStr::from_ptr(NULL) segfaulted in strlen().

This crashed any PHP 8.0 CLI process with the profiler loaded and timeline
enabled (the default) whenever such a fatal error occurred, e.g. the
upstream tests Zend/tests/bug50005.phpt and Zend/tests/bug64821.3.phpt
(both build an exception with a NULL file and throw it uncaught).

NULL-guard the file pointer (PHP 8.1+ already handled NULL via the
ZendString path). Add a regression test
profiling/tests/phpt/timeline_fatal_error_null_filename.phpt. Verified the
crash is gone on 8.0 ZTS and no regression on 8.4.
ext/session/tests/bug60634.phpt (die() in a session save handler) fails
intermittently in the PHP language tests job with "Cannot call session
save handler in a recursive manner". It is not a profiler issue - it
passes in isolation with the profiler enabled, including with
DD_PROFILING_ENABLED=1; the failure is a concurrency/session-save-path
collision in the 64-worker parallel run. Both possible test paths are
listed (some versions place it under user_session_module/). Documented in
the tests README.
The php-8.5_bookworm-8 image ships parallel 1.2.13, which has a bug that
intermittently trips UBSAN in the profiler ASAN/UBSAN workflow's zts leg.
A fixed parallel 1.2.14 has been released, so install it over the image's
version before running the phpt tests on the zts build.

pecl refuses to reinstall while the extension is loaded, so its ini is
moved aside during the build and restored afterwards; the direct package
URL is used because the image's channel REST cache lags behind new
releases. nts is unaffected (parallel is ZTS-only).

TODO: remove this step once the CI images are rebuilt with
parallel >= 1.2.14.
The profiler helper threads (ddprof_time, ddprof_upload) only masked the
fixed set of signals the Zend Engine registers (SIGPROF, SIGHUP, SIGINT,
SIGTERM, SIGUSR1, SIGUSR2). A PHP script can install an async handler for
any other signal via pcntl_async_signals(true) + pcntl_signal(), e.g.
SIGCHLD. The kernel could then deliver that signal to a helper thread,
where pcntl_signal_handler runs without a PHP/TSRM context and dereferences
the thread-local PCNTL_G, segfaulting (ZTS only). Seen on PHP 8.4 ZTS via
ext/pcntl/tests/waiting_on_sigchild_pcntl_wait.phpt.

Block every signal on the helper threads (sigfillset) so async signals are
delivered to a PHP thread instead, leaving only the synchronous fault
signals (SIGSEGV/SIGBUS/SIGFPE/SIGILL/SIGABRT/SIGTRAP) deliverable so a real
fault on a helper thread is still reported (crashtracker).

Add a regression test
profiling/tests/phpt/pcntl_async_signal_helper_thread.phpt. Verified: it
fails ~15/20 on the old code and passes 20/20 with the fix; full profiler
phpt suite still green.
@realFlowControl realFlowControl force-pushed the florian/run-language-tests-for-profiler branch from 8ee8887 to 414c2e5 Compare June 17, 2026 19:03
The DO_ICALL->DO_FCALL behavior (and the bug66251 constant-folding
divergence) are understood: on PHP < 8.4 the profiler hooks
zend_execute_internal, so internal calls compile to DO_FCALL instead of
DO_ICALL, which changes opcache's optimized-opcode output. The affected
tests are xfailed for < 8.4 and the reason is summarized in the tests
README; the standalone investigation note is no longer needed.
Replace the '< 8.4' sort -V conditional with a data-driven per-version
file: the job appends profiling/tests/php-language-xfail-${PHP_MAJOR_MINOR}.list
when it exists. PHP 7.1-8.3 symlink to the shared php-language-xfail-pre84.list
(the opcache DO_ICALL->DO_FCALL / bug66251 cases that only fail with the
profiler's zend_execute_internal hook on < 8.4); 8.4+ has no file, so those
tests run there. Update the tests README.
@realFlowControl realFlowControl changed the title feat(prof): Add running PHP language tests for profiler fix(prof): Add running PHP language tests for profiler and fix discovered bugs Jun 17, 2026
morrisonlevi and others added 2 commits June 17, 2026 17:02
Drop the 8 php-language-xfail-{7.1..8.3}.list symlinks that all pointed
at php-language-xfail-pre84.list. Use a php version check in the CI
script instead:

  if php -r 'exit(PHP_VERSION_ID < 80400 ? 0 : 1);'; then
      cat profiling/tests/php-language-xfail-pre84.list >> ...
  fi

Also remove the accidental 'env' dump from the language tests script
step (was printing all CI env vars to the job log).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants