Skip to content

start_run context manager dominates from_tesseract_api overhead #531

@jpbrodrick89

Description

@jpbrodrick89

For the noop Tesseract used for benchmarking >90% (potentially much closer to 100%) of the 0.4ms runtime is dominated by the start_run context manager. This context manager is used to create a file backend for mpa logging and redirecting stdio for setting up the teepiping. The breakdown seems to be in equal parts

  • Creating directories
  • Writing the csv
  • Redirecting stdio

In from_tesseract_api there is very little value in this, typically errors will bubble up through Python and we use results straight away. Some options for dealing with this:

  1. Lazy initialization: don't create the FileBackend/logs infrastructure unless someone actually calls log_metric/log_parameter/log_artifact
  2. Reuse the backend across calls instead of creating a new one per request
  3. Skip stdio redirection when no log_sink is provided and the server is already handling its own logging
  4. For served Tesseract setup stdio redirection at serve time not on every request
  5. Make MPA opt-in: if the user isn't using metrics/artifacts, skip the whole thing entirely

I'd probably go for a combination of the last three, but would be happy with 1, 3 and 4 instead.

I know its a tiny time but I think it would be really awesome if from_tesseract_api could be more perfomant for mini-Tesseracts (e.g. with runtimes of 100µs or less) in tesseract-jax to take advantage of the convenient Jax primitive wrapping and resulting language interop.

Tagging @linusseelinger and @sakoush for visibility in case you had any opinions if this affects P4D interactions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions