Skip to content

Add format() method for transformations and measurements#99

Open
tmager wants to merge 1 commit into
mainfrom
tmager/format-components
Open

Add format() method for transformations and measurements#99
tmager wants to merge 1 commit into
mainfrom
tmager/format-components

Conversation

@tmager
Copy link
Copy Markdown
Contributor

@tmager tmager commented May 29, 2026

Adds a format() method for converting transformations and measurements into human-readable strings. Most components use a default implementation based on inspecting each class to get its public properties, but some (most notably any component with multiple children) use custom formatting logic. Also adds a tmlt.core.utils.format module containing shared helpers used during formatting.

An example output pulled from one of the Analytics tests:

┌ AugmentDictTransformation
│   ┌ Identity
│   ├ AugmentDictTransformation
│   │   ┌ GetValue key=TableCollection(a)
│   │   ├ LimitRowsPerGroupValue key=NamedTable(part1) new_key=TemporaryTable(140191554920432)
│   │   │   LimitRowsPerGroup grouping_columns={'id'} threshold=2
│   │   └ CreateDictFromValue key=''
│   ├ Subset keys=['']
│   ├ AugmentDictTransformation
│   │   ┌ GetValue key=''
│   │   ├ Identity
│   │   └ CreateDictFromValue key=TableCollection(a)
│   ├ Subset keys=[TableCollection(a)]
│   ├ GetValue key=TableCollection(a)
│   ├ GetValue key=TemporaryTable(140191554920432)
│   ├ LimitRowsPerGroup grouping_columns={'id'} threshold=2
│   └ CreateDictFromValue key=TemporaryTable(140191555056496)
├ GetValue key=TemporaryTable(140191555056496)
└ NonInteractivePostProcess f=<function BaseMeasurementVisitor._build_adaptive_groupby_agg_and_noise_info.<locals>.perform_groupby_agg>
    DecorateQueryable preprocess_query=<function create_adaptive_composition.<locals>.preprocess_query> postprocess_answer=<function create_adaptive_composition.<locals>.postprocess_answer>
      SequentialComposition d_in=2*sqrt(2) privacy_budget=oo

#31

@tmager tmager self-assigned this May 29, 2026
@tmager tmager force-pushed the tmager/format-components branch 2 times, most recently from 60b7653 to 6243739 Compare May 29, 2026 21:18
Copy link
Copy Markdown
Contributor

@Maegereg Maegereg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be SUCH an improvement!

Comment thread src/tmlt/core/measurements/base.py Outdated
Comment on lines +123 to +126
parts.extend(
f"{name}={value}"
for name, value in default_format_attrs(self, self._FORMAT_EXCLUDED_ATTRS)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me wonders whether parentheses around the attributes would be more readable? But I'm guessing you thought about that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I actually hadn't considered that specific way of formatting it; there definitely are some cases where the current formatting isn't ideal, e.g. column names containing spaces. We could also just quote all string-valued attributes? That might be clearer, especially if there are any cases where the attribute value is a tuple or something -- I'll expand format_value a bit and see if I can come up with something better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed an alternative version that wraps quotes around all strings (it doesn't deal with quoting -- considering that this is just for human readability, I think that's fine, if you have quotes in your column names you can figure it out); it also recurses into list- and tuple-valued attributes, so that e.g. the elements of keys on Subset and key on GetValue are formatted the same way.

Original:

. AugmentDictTransformation
  . Identity
  | AugmentDictTransformation
    . GetValue key=TableCollection(a)
    | LimitRowsPerGroupValue key=NamedTable(part1) new_key=TemporaryTable(140371022854176)
        LimitRowsPerGroup grouping_column=id threshold=2
    | CreateDictFromValue key=
  | Subset keys=['']
  | AugmentDictTransformation
    . GetValue key=
    | Identity
    | CreateDictFromValue key=TableCollection(a)
  | Subset keys=[TableCollection(name='a')]
  | GetValue key=TableCollection(a)
  | GetValue key=TemporaryTable(140371022854176)
  | LimitRowsPerGroup grouping_column=id threshold=2
  | CreateDictFromValue key=TemporaryTable(140371022851056)
| GetValue key=TemporaryTable(140371022851056)
| NonInteractivePostProcess f=<function BaseMeasurementVisitor._build_adaptive_groupby_agg_and_noise_info.<locals>.perform_groupby_agg>
    DecorateQueryable preprocess_query=<function create_adaptive_composition.<locals>.preprocess_query> postprocess_answer=<function create_adaptive_composition.<locals>.postprocess_answer>
      SequentialComposition d_in=2*sqrt(2) privacy_budget=oo

New:

. AugmentDictTransformation
  . Identity
  | AugmentDictTransformation
    . GetValue key=TableCollection(a)
    | LimitRowsPerGroupValue key=NamedTable(part1) new_key=TemporaryTable(139997773750640)
        LimitRowsPerGroup grouping_column='id' threshold=2
    | CreateDictFromValue key=''
  | Subset keys=['']
  | AugmentDictTransformation
    . GetValue key=''
    | Identity
    | CreateDictFromValue key=TableCollection(a)
  | Subset keys=[TableCollection(a)]
  | GetValue key=TableCollection(a)
  | GetValue key=TemporaryTable(139997773750640)
  | LimitRowsPerGroup grouping_column='id' threshold=2
  | CreateDictFromValue key=TemporaryTable(139997773744496)
| GetValue key=TemporaryTable(139997773744496)
| NonInteractivePostProcess f=<function BaseMeasurementVisitor._build_adaptive_groupby_agg_and_noise_info.<locals>.perform_groupby_agg>
    DecorateQueryable preprocess_query=<function create_adaptive_composition.<locals>.preprocess_query> postprocess_answer=<function create_adaptive_composition.<locals>.postprocess_answer>
      SequentialComposition d_in=2*sqrt(2) privacy_budget=oo

I don't think we should spend too much effort on handling every possible case in this MR, though -- it should be pretty painless to tweak the format later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String quoting seems great! I still find it a little hard to distinguish the component name from the attributes, and think parentheses would help, e.g.:
LimitRowsPerGroupValue(key=NamedTable(part1), new_key=TemporaryTable(139997773750640), LimitRowsPerGroup grouping_column='id', threshold=2)

But I agree that this is a relatively minor thing.

Comment thread src/tmlt/core/utils/format.py Outdated
Comment thread src/tmlt/core/utils/format.py Outdated
Comment thread src/tmlt/core/utils/format.py
Comment thread src/tmlt/core/utils/format.py Outdated
@tmager tmager force-pushed the tmager/format-components branch 2 times, most recently from 26a7a85 to 6838cfd Compare June 2, 2026 00:37
Adds a format() method for converting transformations and measurements into
human-readable strings. Most components use a default implementation based on
inspecting each class to get its public properties, but some (most notably any
component with multiple children) use custom formatting logic. Also adds a
`tmlt.core.utils.format` module containing shared helpers used during
formatting.

#31
@tmager tmager force-pushed the tmager/format-components branch from 6838cfd to bd2e2e1 Compare June 3, 2026 16:31
Comment on lines +123 to +126
parts.extend(
f"{name}={value}"
for name, value in default_format_attrs(self, self._FORMAT_EXCLUDED_ATTRS)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String quoting seems great! I still find it a little hard to distinguish the component name from the attributes, and think parentheses would help, e.g.:
LimitRowsPerGroupValue(key=NamedTable(part1), new_key=TemporaryTable(139997773750640), LimitRowsPerGroup grouping_column='id', threshold=2)

But I agree that this is a relatively minor thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants