-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathpackage_unified_features.yaml
More file actions
196 lines (162 loc) · 5.4 KB
/
Copy pathpackage_unified_features.yaml
File metadata and controls
196 lines (162 loc) · 5.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
contract:
name: package_unified_features
catalog: ddc_databricks
schema: gold
table: package_unified_features
description: >
Gold-layer wide feature table joining all three silver feature tables
(GitHub, PyPI, Stack Overflow) plus overall_sentiment for each tracked
package. This is the primary input to the health scoring step in
05_gold_health_score.ipynb.
owner: openlens-team
update_frequency: on-demand (manual pipeline run)
freshness_sla: 7 days
columns:
# Identity
- name: package_name
type: string
nullable: false
description: Canonical PyPI package name.
# GitHub features
- name: stars
type: long
nullable: true
description: GitHub star count at time of ingestion.
- name: forks_count
type: long
nullable: true
description: GitHub fork count.
- name: open_issues_count
type: long
nullable: true
description: Open GitHub issues count.
- name: contributors_total
type: long
nullable: true
description: Total unique contributors with at least 1 commit.
- name: contributions_sum
type: long
nullable: true
description: Total commits across all contributors.
- name: core_contributors_5plus
type: long
nullable: true
description: Contributors with 5 or more commits.
- name: top_contributor_share
type: double
nullable: true
description: Fraction of total commits made by the top contributor [0–1].
- name: days_since_last_push
type: double
nullable: true
description: Days elapsed since the most recent push to the repository.
- name: readme_char_count
type: long
nullable: true
description: Length of the cleaned README in characters.
- name: readme_heading_count
type: long
nullable: true
description: Number of Markdown headings in the README.
- name: readme_code_block_count
type: long
nullable: true
description: Number of fenced code blocks in the README.
- name: repo_age_days
type: double
nullable: true
description: Age of the GitHub repository in days since creation.
- name: is_archived
type: boolean
nullable: true
description: Whether the GitHub repository is marked as archived.
- name: has_wiki
type: boolean
nullable: true
description: Whether the GitHub repository has the wiki feature enabled.
# PyPI features
- name: downloads_last_month
type: long
nullable: true
description: >
PyPI downloads in the last 30-day rolling window
(pypistats.org recent endpoint).
- name: downloads_30d
type: double
nullable: true
description: Sum of PyPI downloads in the most recent 30 days
(overall time-series endpoint, without mirrors).
- name: downloads_90d
type: double
nullable: true
description: Sum of PyPI downloads in the most recent 90 days.
- name: downloads_momentum_30d_vs_90d
type: double
nullable: true
description: >
Ratio of the 30-day download rate to the 90-day download rate.
Values > 1 indicate accelerating adoption.
- name: release_count
type: long
nullable: true
description: Total number of releases published on PyPI.
- name: download_velocity_month_over_week
type: double
nullable: true
description: >
Ratio of monthly downloads to 4× weekly downloads.
Values > 1 imply a growing weekly trend.
# Stack Overflow features
- name: questions_total
type: long
nullable: true
description: Total question count on Stack Overflow for this package tag.
- name: question_answered_rate
type: double
nullable: true
description: Fraction of sampled questions with an answer [0–1].
- name: avg_question_score
type: double
nullable: true
description: Average upvote score of sampled questions.
- name: unique_question_askers
type: long
nullable: true
description: Unique user IDs who asked sampled questions.
- name: answers_total
type: long
nullable: true
description: Total answer count across sampled questions.
- name: accepted_answer_rate
type: double
nullable: true
description: Fraction of sampled questions with an accepted answer [0–1].
- name: avg_answer_score
type: double
nullable: true
description: Average upvote score of sampled answers.
- name: unique_answerers
type: long
nullable: true
description: Unique user IDs who answered sampled questions.
- name: qa_ratio
type: double
nullable: true
description: Ratio of total answers to total questions in the sample.
- name: community_activity_30d
type: long
nullable: true
description: Question count asked in the last 30 days.
# Sentiment
- name: overall_sentiment
type: double
nullable: true
description: >
Weighted aggregate sentiment score [-1, +1] from gold.package_sentiment.
Passed through directly for use in the composite health scoring step.
upstream:
- ddc_databricks.silver.github_package_features
- ddc_databricks.silver.pypi_package_features
- ddc_databricks.silver.so_package_features
- ddc_databricks.gold.package_sentiment
served_by: []