Skip to content

Enrich cli_command_push telemetry with actorId and wasCreated flag #1151

@patrikbraborec

Description

@patrikbraborec

Background

The apify push command currently fires a cli_command_push telemetry event (via the generic cli_command Segment event in src/lib/hooks/telemetry/trackEvent.ts), but the payload does not include:

  • the actorId of the Actor being pushed
  • whether the push created a new Actor or updated an existing one

This makes it impossible to use Mixpanel/Snowflake to answer basic product questions like:

  • How many new Actors were created via CLI in the last N days?
  • What share of platform-wide Actor creation goes through the CLI vs the Console UI (Quick Start, templates, Git import)?
  • For an A/B test on the Console "new Actor" flow, how do we attribute the CLI-created baseline?

Today we can count cli_command_push events, but every push (including code updates to existing Actors) is indistinguishable from a brand-new Actor creation, and we can't dedupe by actorId.

Proposal

Surface two values that the push command already has in scope.

In src/commands/actors/push.ts (around lines 178 / 237), both pieces of state already exist:

let actorId: string | undefined;     // line 176
let isActorCreatedNow = false;       // line 178
// ...
const actor = await apifyClient.actors().create(newActor); // line 235
actorId = actor.id;
isActorCreatedNow = true;            // line 237

We just need to attach them to telemetry before the command exits.

Change 1 — extend the event type

src/lib/hooks/telemetry/trackEvent.ts — add to the CliCommandEvent interface (~line 50):

push?: {
    actorId?: string;
    wasCreated?: boolean;
};

Change 2 — populate from the push command

src/commands/actors/push.ts — after actorId and isActorCreatedNow are settled (~line 240):

this.telemetryData.push = {
    actorId,
    wasCreated: isActorCreatedNow,
};

Change 3 — tests

Update test/api/commands/push.test.ts if it asserts on the telemetry payload shape.

What this unlocks

After the Keboola ETL refresh, cli_command_push rows in Snowflake get two new flattened columns (push_actor_id, push_was_created). Queries like this become trivial:

-- New actors created via CLI in the last 30 days
SELECT COUNT(DISTINCT "push_actor_id") AS cli_created_actors
FROM mixpanel_events
WHERE "event" = 'cli_command_push'
  AND "push_was_created" = TRUE
  AND "event_timestamp_utc" >= DATEADD('day', -30, CURRENT_TIMESTAMP());

And platform-wide attribution becomes exact:

total_new_actors = (template clicks) + (Git imports) + (CLI where wasCreated) + (other)

Notes

  • Confirm with the data team that the nested push object flattens to push_actor_id / push_was_created in Snowflake. The existing create nested object on cli_command (used by apify create) is a good reference for the expected ETL behavior.
  • New columns typically take ~24h to appear in the warehouse after the first event lands.

Related context

Surfaced while building a baseline funnel for the Console /actors/new page in preparation for the upcoming actor-new-wizard A/B test — we wanted to validate per-surface creation counts against a platform-wide total and found the CLI path was the only one that couldn't be counted precisely.

Metadata

Metadata

Labels

enhancementNew feature or request.t-dxIssues owned by the DX team.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions