-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathpackage_sentiment.yaml
More file actions
69 lines (61 loc) · 2.23 KB
/
Copy pathpackage_sentiment.yaml
File metadata and controls
69 lines (61 loc) · 2.23 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
contract:
name: package_sentiment
catalog: ddc_databricks
schema: gold
table: package_sentiment
description: >
Gold-layer table of per-package sentiment scores aggregated from four
text sources: Stack Overflow questions, Stack Overflow answers, GitHub
README, and PyPI package description. Sentiment scored via
DistilBERT (distilbert-base-uncased-finetuned-sst-2-english) running
as a Spark UDF on Databricks cluster workers, with a keyword-heuristic
fallback when the model is unavailable.
owner: openlens-team
update_frequency: on-demand (manual pipeline run)
freshness_sla: 7 days
columns:
- name: package_name
type: string
nullable: false
description: Canonical PyPI package name.
- name: so_question_sentiment_avg
type: double
nullable: true
description: >
Average compound sentiment score [-1, +1] across sampled
Stack Overflow questions about this package. Null if no questions
were available.
- name: so_answer_sentiment_avg
type: double
nullable: true
description: >
Average compound sentiment score [-1, +1] across sampled
Stack Overflow answers about this package. Null if no answers
were available.
- name: readme_sentiment_compound
type: double
nullable: true
description: >
Compound sentiment score [-1, +1] of the GitHub repository README
(clean text after HTML/Markdown stripping). Null if README was
not ingested.
- name: pypi_desc_sentiment_compound
type: double
nullable: true
description: >
Compound sentiment score [-1, +1] of the PyPI long description.
Null if description was empty or unavailable.
- name: overall_sentiment
type: double
nullable: true
description: >
Weighted aggregate sentiment score [-1, +1].
Weights: SO questions 30%, SO answers 30%, README 25%, PyPI desc 15%.
Null if all source fields are null.
upstream:
- ddc_databricks.silver.so_package_features
- ddc_databricks.silver.github_readmes
- ddc_databricks.silver.pypi_metadata
served_by:
- endpoint: "GET /packages/{name}/sentiment"
- endpoint: "GET /packages/{name}"