You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
task(logger): accept serializable objects as metadata (#437)
### Description
Run `metadata` through the `bt_safe_deep_copy` before validating it so
Pydantic models and other objects that dump to dictionaries can be
logged as metadata.
This change covers:
- `Logger.log`
- `Experiment.log`
- `Span.log`
- `Logger.log_feedback`
- `Experiment.log_feedback`
- `update_span`
- `Dataset.__init__`
- `Dataset.insert`
- `Dataset.update`
### Testing
- Added type tests
- Added unit tests for the following:
- Mocked Pydantic `model_dump` behavior to avoid needing dependency:
- `span.log`
- `logger.log`
- `experiment.log`
- `logger.log_feedback`
- `experiment.log_feedback`
- `dataset.insert`
- `dataset.update`
- reject `metadata` with non string keys
- reject `metadata` that does not serialize into dict
- test `span.log` with actual pydantic model
- Manually tested the repro script
Copy file name to clipboardExpand all lines: py/src/braintrust/logger.py
+44-35Lines changed: 44 additions & 35 deletions
Original file line number
Diff line number
Diff line change
@@ -1800,7 +1800,8 @@ def init_dataset(
1800
1800
key is specified, will prompt the user to login.
1801
1801
:param org_name: (Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.
1802
1802
:param project_id: The id of the project to create the dataset in. This takes precedence over `project` if specified.
1803
-
:param metadata: (Optional) a dictionary with additional data about the dataset. The values in `metadata` can be any JSON-serializable type, but its keys must be strings.
1803
+
:param metadata: (Optional) a dictionary, or an object that serializes to a dictionary (such as a Pydantic model), with additional data about the dataset. The values in `metadata` can be any
1804
+
JSON-serializable type, but its keys must be strings.
1804
1805
:param use_output: (Deprecated) If True, records will be fetched from this dataset in the legacy format, with the "expected" field renamed to "output". This option will be removed in a future version of Braintrust.
1805
1806
:param _internal_btql: (Internal) If specified, the dataset will be created with the given BTQL filters.
1806
1807
:param state: (Internal) The Braintrust state to use. If not specified, will use the global state. For advanced use only.
# Note that this only checks properties that are expected of a complete event.
@@ -3832,7 +3844,7 @@ def log(
3832
3844
:param expected: (Optional) the ground truth value (an arbitrary, JSON serializable object) that you'd compare to `output` to determine if your `output` value is correct or not. Braintrust currently does not compare `output` to `expected` for you, since there are so many different ways to do that correctly. Instead, these values are just used to help you navigate your experiments while digging into analyses. However, we may later use these values to re-score outputs or fine-tune your models.
3833
3845
:param error: (Optional) The error that occurred, if any. If you use tracing to run an experiment, errors are automatically logged when your code throws an exception.
3834
3846
:param scores: A dictionary of numeric values (between 0 and 1) to log. The scores should give you a variety of signals that help you determine how accurate the outputs are compared to what you expect and diagnose failures. For example, a summarization app might have one score that tells you how accurate the summary is, and another that measures the word similarity between the generated and grouth truth summary. The word similarity score could help you determine whether the summarization was covering similar concepts or not. You can use these scores to help you sort, filter, and compare experiments.
3835
-
:param metadata: (Optional) a dictionary with additional data about the test example, model outputs, or just about anything else that's relevant, that you can use to help find and analyze examples later. For example, you could log the `prompt`, example's `id`, or anything else that would be useful to slice/dice later. The values in `metadata` can be any JSON-serializable type, but its keys must be strings.
3847
+
:param metadata: (Optional) a dictionary, or an object that serializes to a dictionary (such as a Pydantic model), with additional data about the test example, model outputs, or just about anything else that's relevant, that you can use to help find and analyze examples later. For example, you could log the `prompt`, example's `id`, or anything else that would be useful to slice/dice later. The values in `metadata` can be any JSON-serializable type, but its keys must be strings.
3836
3848
:param tags: (Optional) a list of strings that you can use to filter and group records later.
3837
3849
:param metrics: (Optional) a dictionary of metrics to log. The following keys are populated automatically: "start", "end".
3838
3850
:param id: (Optional) a unique identifier for the event. If you don't provide one, BrainTrust will generate one for you.
@@ -3881,7 +3893,7 @@ def log_feedback(
3881
3893
:param expected: (Optional) the ground truth value (an arbitrary, JSON serializable object) that you'd compare to `output` to determine if your `output` value is correct or not.
3882
3894
:param tags: (Optional) a list of strings that you can use to filter and group records later.
3883
3895
:param comment: (Optional) an optional comment string to log about the event.
3884
-
:param metadata: (Optional) a dictionarywith additional data about the feedback. If you have a `user_id`, you can log it here and access it in the Braintrust UI. Note, this metadata does not correspond to the main event itself, but rather the audit log attached to the event.
3896
+
:param metadata: (Optional) a dictionary, or an object that serializes to a dictionary (such as a Pydantic model), with additional data about the feedback. If you have a `user_id`, you can log it here and access it in the Braintrust UI. Note, this metadata does not correspond to the main event itself, but rather the audit log attached to the event. The values in `metadata` can be any JSON-serializable type, but its keys must be strings.
3885
3897
:param source: (Optional) the source of the feedback. Must be one of "external" (default), "app", or "api".
:param error: (Optional) The error that occurred, if any. If you use tracing to run an experiment, errors are automatically logged when your code throws an exception.
5266
5275
:param tags: (Optional) a list of strings that you can use to filter and group records later.
5267
5276
:param scores: (Optional) a dictionary of numeric values (between 0 and 1) to log. The scores should give you a variety of signals that help you determine how accurate the outputs are compared to what you expect and diagnose failures. For example, a summarization app might have one score that tells you how accurate the summary is, and another that measures the word similarity between the generated and grouth truth summary. The word similarity score could help you determine whether the summarization was covering similar concepts or not. You can use these scores to help you sort, filter, and compare logs.
5268
-
:param metadata: (Optional) a dictionary with additional data about the test example, model outputs, or just about anything else that's relevant, that you can use to help find and analyze examples later. For example, you could log the `prompt`, example's `id`, or anything else that would be useful to slice/dice later. The values in `metadata` can be any JSON-serializable type, but its keys must be strings.
5277
+
:param metadata: (Optional) a dictionary, or an object that serializes to a dictionary (such as a Pydantic model), with additional data about the test example, model outputs, or just about anything else that's relevant, that you can use to help find and analyze examples later. For example, you could log the `prompt`, example's `id`, or anything else that would be useful to slice/dice later. The values in `metadata` can be any JSON-serializable type, but its keys must be strings.
5269
5278
:param metrics: (Optional) a dictionary of metrics to log. The following keys are populated automatically: "start", "end".
5270
5279
:param id: (Optional) a unique identifier for the event. If you don't provide one, BrainTrust will generate one for you.
5271
5280
:param allow_concurrent_with_spans: (Optional) in rare cases where you need to log at the top level separately from using spans on the logger elsewhere, set this to True.
@@ -5313,7 +5322,7 @@ def log_feedback(
5313
5322
:param expected: (Optional) the ground truth value (an arbitrary, JSON serializable object) that you'd compare to `output` to determine if your `output` value is correct or not.
5314
5323
:param tags: (Optional) a list of strings that you can use to filter and group records later.
5315
5324
:param comment: (Optional) an optional comment string to log about the event.
5316
-
:param metadata: (Optional) a dictionarywith additional data about the feedback. If you have a `user_id`, you can log it here and access it in the Braintrust UI. Note, this metadata does not correspond to the main event itself, but rather the audit log attached to the event.
5325
+
:param metadata: (Optional) a dictionary, or an object that serializes to a dictionary (such as a Pydantic model), with additional data about the feedback. If you have a `user_id`, you can log it here and access it in the Braintrust UI. Note, this metadata does not correspond to the main event itself, but rather the audit log attached to the event. The values in `metadata` can be any JSON-serializable type, but its keys must be strings.
5317
5326
:param source: (Optional) the source of the feedback. Must be one of "external" (default), "app", or "api".
0 commit comments