Skip to content

Add usage analytics utils#16

Closed
liam-sbhoo wants to merge 14 commits into
mainfrom
add-usage-analytics-utils
Closed

Add usage analytics utils#16
liam-sbhoo wants to merge 14 commits into
mainfrom
add-usage-analytics-utils

Conversation

@liam-sbhoo
Copy link
Copy Markdown
Collaborator

@liam-sbhoo liam-sbhoo commented Apr 12, 2025

Change Description

Brief (a few bullet points describing your changes, use full sentences and try to link lines in the code whenever needed)

Implement minimal usage tracking feature:

  • AnalyticsHttpClient can retrieves some basic usage info from the caller, and send them in the HTTP header.
  • in tabpfn_client, we replace standard httpx.Client with AnalyticsHttpClient
  • in tabpfn_server, we extract these info based on ANALYTICS_TO_TRACK

Details (add details if your pull request is more complicated and harder to understand from the code alone)

Standard Qs (leave questions that do not apply blank)

If you broke behavior: Please describe what behavior you broke and how you inform people to not get stuck trying to use the old behavior.

If you used new dependencies: Did you add them to requirements.txt?
No new dependencies.

Who did you ping on Mattermost to review your PR? Please ping that person again whenever you are ready for another review.
@Jabb0


Please do not mark comments/conversations as resolved unless you are the assigned reviewer. This helps maintain clarity during the review process.

@liam-sbhoo liam-sbhoo requested review from Jabb0, LeoGrin and noahho April 12, 2025 13:53
Comment thread usage_analytics/analytics_definition.py Outdated
)


ANALYTICS_TO_TRACK = [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use another prefix here please. Something like PL- for prior labs.

From Gemini:

Deprecation (RFC 6648): In June 2012, RFC 6648 was published, which deprecated the use of the X- prefix for new, non-standard parameters (including HTTP headers).

Reasoning: The practice caused problems.1 When headers starting with X- became widely adopted and effectively standard (like X-Forwarded-For or X-Frame-Options), removing the X- prefix later would cause compatibility issues. The prefix didn't reliably prevent name collisions and added confusion.  
1.
Why we need to deprecate x prefix for HTTP headers? - Tony Xu Blog

tonyxu.io

Recommendation: RFC 6648 advises against using the X- prefix for new non-standard or experimental headers. Instead, developers creating new headers should try to register them officially if appropriate, or choose names carefully, perhaps using vendor-specific identifiers or choosing names that are unlikely to clash with future standards, without relying on the X- prefix.

Comment thread usage_analytics/analytics_definition.py Outdated
)


ANALYTICS_TO_TRACK = [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a list and this globally mutable. Use a tuple please.

Comment thread usage_analytics/analytics_func.py Outdated
If no such frame is found, returns 'StandaloneFunction'.
"""

import inspect
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import here likely adds runtime overhead and error potential. If this method is used add inspect import to the top of the module.

Comment thread usage_analytics/analytics_func.py Outdated
if not recursive:
break

# If no class context was found, assume it's a standalone function
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove that comment. It is already apparent from the doc string and the default value of outmost_caller.

Comment thread usage_analytics/analytics_func.py Outdated
@@ -0,0 +1,72 @@
def get_calling_class(recursive=True):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return type missing.

Comment thread tests/test_analytics_http_client.py Outdated

# Call request method
self.client.request(
"GET", "https://example.com", headers={"Existing": "Header"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use this enum to avoid typing out the requests types. This enables autocompletion. And, although veeeery unlikely, that you get the correct GET verbs if the global community wants to change it to GET2.

https://docs.python.org/3/library/http.html#http.HTTPMethod

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is only available from python 3.11

Comment thread tests/test_analytics_http_client.py Outdated
self.assertEqual(headers.get("X-Module-Name"), self.module_name)
self.assertIn("X-Unique-Call-Id", headers)
self.assertIn("X-Python-Version", headers)
self.assertIn("X-Calling-Class", headers)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does not check for all options in ANALYTICS_TO_TRACK.

The stream one does.

At best you can share code between the two tests.


# Verify all analytics headers from ANALYTICS_TO_TRACK were added
for header_name, _ in ANALYTICS_TO_TRACK:
self.assertIn(header_name, headers)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not test the functions who populate this headers, although they are defined here.

Add tests that they work and check that the values are correct in here.

self.assertIsNotNone(headers.get("X-Calling-Class"))


if __name__ == "__main__":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, however I'd prefer running the tests from CLI or using the IDEs build in features for better debugging.

There should be no in harm in it, so keeping it is fine too.

Comment thread requirements.txt Outdated
scikit-learn>=1.6.1
typing_extensions>=4.12.2 No newline at end of file
typing_extensions>=4.12.2
httpx>=0.28.1 No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too broad requirements! You cannot guarantee your code to work with every upcoming version.

Use Semver to make sure this does not break please.

Example: 0.28.1 is compatible with 0.28.x but not with 0.29.0. However, you allow this version to be used.

httpx~=0.28.1

Gemini:

~= (Compatible Release):

Example: numpy~=1.21.0 means >=1.21.0, <1.22.0 (allows PATCH updates).
Example: numpy~=1.21 means >=1.21.0, <2.0.0 (allows MINOR and PATCH updates).
This is specifically designed with SemVer in mind. It allows updates that should be backward-compatible (PATCH or MINOR+PATCH fixes/features) but prevents updates that might break things (MAJOR). This is often a good balance for libraries.

@liam-sbhoo liam-sbhoo requested a review from Jabb0 April 17, 2025 13:20
)


class ANALYTICS_KEYS(str, Enum):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@safaricd safaricd closed this Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants