Skip to content

Commit 91a2b0a

Browse files
abedefmashalifshin
andauthored
Add context on MARS data collection to streamline data reviews (#14)
* Clarify README for stewards * Add more background and links on mars data collection * Add referencing the README to instructions for data review requests * Clarify concept of users vs context ids in the Data Deletion section * Fix spelling of aggregated * Reword to avoid implying MARS is receiving PII from Firefox * Reword for clarity * Remove hypothetical future speculation about delete_user * Final tweaks --------- Co-authored-by: Masha Lifshin <mlifshin@mozilla.com>
1 parent 8caf63b commit 91a2b0a

1 file changed

Lines changed: 56 additions & 6 deletions

File tree

README.md

Lines changed: 56 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,71 @@
11
# mars-telemetry
22

3-
Contains the metrics.yaml file documenting the metrics collected by the Unified API.
3+
This repo contains the yaml files specifying the metrics collected by the Mozilla Ads Routing Service (MARS).
44

5-
Used by https://github.com/mozilla-services/mars
5+
[MARS repo](https://github.com/mozilla-services/mars )
66

7-
## Making Changes to Collected Data
7+
[MARS Glean dictionary](https://dictionary.telemetry.mozilla.org/apps/ads_backend)
8+
9+
# Background
10+
11+
Commonly in the ads industry, client apps or websites make requests for ads directly to ad servers. These direct requests allows ad partners to see a wealth of information, which can be used to identify the specific person using an app or site, and to build profiles about that person across many different apps and sites. The ads returned and shown also commonly contain tracking code that detects that person's activities and adds to their profile.
12+
13+
MARS is a backend API service that prevents ad partners from gaining this kind of information. It functions as a privacy-preserving proxy bewteen Firefox clients and third party ad providers. MARS takes requests for ads from the Firefox browser, redacts or anonymizes any information that can be used to identify users or create profiles, forwards along these anonymized requests to third party ad providers, and returns privacy-respecting, tracker-free ads to Firefox.
14+
15+
Some examples of these ads are the Sponsored Shortcuts and Sponsored Stories found on Firefox's Home and New Tab.
16+
17+
An example of one way MARS functions to preserve user privacy:
18+
* Instead of passing the Firefox user's potentially fingerprintable User Agent string to ad partners, MARS sends along only the user's OS and whether they are on Desktop, Tablet, or Phone. The ad partner gets enough information to return an ad appropriate for the device, but has no way to identify, fingerprint, or track any user.
19+
20+
# Data Collection in MARS with Glean
21+
22+
MARS is a stateless service and doesn't collect or store any data itself. The only data it persists is via Glean ping to Mozilla's data warehouse. MARS also plays a privacy-preserving proxy role between Firefox users and our own data warehouse.
23+
24+
## Necessary data
25+
26+
MARS needs to collect some anonymized, aggregated data about user interactions with ads in order to do business with our third party ad partners and advertisers, and for our financial record keeping. This is because we are paid based on these interactions, for example by how many times we show an ad, or by how many times an ad gets clicked. This is Category 2 "Interaction" data, captured via our [interaction ping](https://dictionary.telemetry.mozilla.org/apps/ads_backend/pings/interaction).
27+
28+
MARS also collects some anonymized, aggregated data to ensure our systems are functioning correctly and our third party partners are meeting their contractual obligations. This is Category 1 "Technical" data, captured via our [request-stats](https://dictionary.telemetry.mozilla.org/apps/ads_backend/pings/request-stats) and [provider-request-stats](https://dictionary.telemetry.mozilla.org/apps/ads_backend/pings/provider-request-stats) pings.
29+
30+
## Server-side Glean
31+
32+
MARS's Glean integration is server-side, and all our pings are sent without any of the `client_info.*` fields that would be populated in a typical client-side Glean integration. This means no client info ever gets sent by default. Instead we pick a few coarse, non-identifying client fields to include for necessary reporting, and place them under [the `ad_client` category](https://dictionary.telemetry.mozilla.org/apps/ads_backend?page=1&search=ad_client.) of the ping's metrics.
33+
34+
## Opt-out
35+
36+
At Mozilla we always give users meaningful choice and control over data collection. To opt-out of Mozilla Ads data collection, a user must opt out of ads entirely by going to Preferences > Home and unchecking "Support Firefox", or unchecking both "Sponsored shortcuts" and "Sponsored stories".
37+
38+
This is because we invoice our third party ad partners and advertisers based on this data. So if we are showing ads, it is necessary to keep anonymized, aggregated data about user interactions with ads, for doing business and for our financial record keeping.
39+
40+
## Preventing Persistent Identifiers
41+
42+
Firefox clients that use MARS are required to send `ContextId`, a UUID, with their ad requests. The `ContextId` is used to enable features like users blocking specific ads they don't want to see again. The `ContextId` is one of the metrics we send in Glean pings and store under the `ad_client.*` category.
43+
44+
Clients of the MARS API are required to rotate their `ContextId`s at least every 3 days to prevent it from becoming a persistent identifier within our data warehouse.
45+
46+
## Data Deletion
47+
48+
MARS is currently a stateless service, we do not store any data on the users' behalf, nor have any way to identify clients or users, aside from the ephemeral `ContextId` detailed above. The only data we store is the Category 1 and Category 2 data that we send in our Glean pings. This data is necessary to retain for our business purposes, so we do not provide a way for users to delete it.
49+
50+
# Making Changes to Collected Data
851

952
> At Mozilla, like at many other organizations, we rely on data to make product decisions. But here, unlike many other organizations, we balance our goal of collecting useful, high-quality data with our goal to give users meaningful choice and control over their own data. The Mozilla data collection program was created to ensure we achieve both goals whenever we make a change to how we collect data in our products.
1053
1154
[*Data Collection at Mozilla*](https://wiki.mozilla.org/Data_Collection)
1255

13-
Making changes to our data collection practices requires additional review by a Data Steward.
56+
Making changes to the metrics and pings in this repo requires review by a Data Steward, in addition to the usual Ads Engineering code review.
57+
58+
## Data Steward Review
1459

1560
1. Submit your PR to `mars-telemetry` (but do **not** merge it yet!)
1661
2. Fill out a [data collection review form](https://github.com/mozilla/data-review/blob/main/request.md) ([examples](https://bugzilla.mozilla.org/show_bug.cgi?id=1900898) [here](https://bugzilla.mozilla.org/show_bug.cgi?id=2006440)).
1762
3. Submit the request to Bugzilla.
1863
4. Add a comment to your PR linking your Bugzilla request and a copy of your proposed measurements table (from the data collection review form).
19-
5. Send a friendly message to [#data-stewardship-help](https://mozilla.enterprise.slack.com/archives/C07LMRQ5Q6B) to request a review. Make sure to give some brief context on the change, and include a link to the PR and Bugzilla request.
64+
5. Send a friendly message to [#data-stewardship-help](https://mozilla.enterprise.slack.com/archives/C07LMRQ5Q6B) to request a review. Make sure to:
65+
* Give some brief context on the change
66+
* Include a link to the PR and Bugzilla request
67+
* Include a link to this `README.md` for our reviewer to reference for background on MARS data collection.
68+
69+
## Sensitive Data Review
2070

21-
Please note that any data collection modifications involving category 3 or 4 data will also need to follow the [Sensitive Data Collection Review Process](https://wiki.mozilla.org/Data_Collection#Step_3:_Sensitive_Data_Collection_Review_Process) outlined in the Data Collection wiki page.
71+
If the data collection changes involve Category 3 or Category 4 data, or if it is initially unclear which Category the data might fall under, then the change will also need a [Sensitive Data Collection Review](https://wiki.mozilla.org/Data_Collection#Step_3:_Sensitive_Data_Collection_Review_Process), as outlined in the Data Collection wiki page.

0 commit comments

Comments
 (0)