-
-
Notifications
You must be signed in to change notification settings - Fork 72
Add WikiCommons Data Source #180
Copy link
Copy link
Labels
help wantedOpen to participation from the communityOpen to participation from the community✨ goal: improvementImprovement to an existing featureImprovement to an existing feature🏁 status: ready for workReady for workReady for work💻 aspect: codeConcerns the software code in the repositoryConcerns the software code in the repository🟩 priority: lowLow priority and doesn't need to be rushedLow priority and doesn't need to be rushed
Metadata
Metadata
Assignees
Labels
help wantedOpen to participation from the communityOpen to participation from the community✨ goal: improvementImprovement to an existing featureImprovement to an existing feature🏁 status: ready for workReady for workReady for work💻 aspect: codeConcerns the software code in the repositoryConcerns the software code in the repository🟩 priority: lowLow priority and doesn't need to be rushedLow priority and doesn't need to be rushed
Type
Projects
Status
Backlog
Problem
Hello, right now, the project collects data from Google Custom Search and GitHub, and work on adding Wikipedia is already in progress via PRs #176 and #167. Also, @TimidRobot commented about “more meaningful data” (for Wikipedia) suggesting they expect more than just basic counts — but WikiCommons they hasn’t been addressed yet.
However, WikiCommons is also an important source for Creative Commons–licensed media, and it’s not yet part of the automated system.
There’s an older version of it under
pre-automation/wikicommons/, but it hasn’t been updated to the new structure.Description
Work can be done on adding WikiCommons as a new data source using the MediaWiki API.
This would collect counts of CC-licensed media files (like images, videos, and audio) by license type.
The plan is to:
pre-automation/wikicommons_scratcher.pyscript.This will help the project measure CC-licensed media content more accurately.
Alternatives
It could be combined with the Wikipedia data, but keeping it separate makes it easier to track media content specifically.
Additional context
pre-automation/wikicommons/Implementation