Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .github/workflows/mine-cargo-packageurls.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
on: [workflow_dispatch]

jobs:
mine-pypi-purls:
runs-on: ubuntu-24.04
name: Mine cargo PackageURLs
steps:
- uses: aboutcode-org/scancode-action@beta
with:
scancodeio-repo-branch: "collect-purl-metadata#egg=scancodeio[mining]"
pipelines: "mine_cargo"
env:
FEDERATEDCODE_GIT_ACCOUNT_URL: https://github.com/aboutcode-data/minecode-data-cargo-test

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FEDERATEDCODE_GIT_SERVICE_TOKEN: ${{ secrets.MINING_GITHUB_TOKEN }}
FEDERATEDCODE_GIT_SERVICE_NAME: "AboutCode Automation"
FEDERATEDCODE_GIT_SERVICE_EMAIL: "automation@aboutcode.org"
2 changes: 1 addition & 1 deletion .github/workflows/mine-pypi-packageurls.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ jobs:
env:
FEDERATEDCODE_GIT_ACCOUNT_URL: https://github.com/aboutcode-data/minecode-data-pypi-test
FEDERATEDCODE_GIT_SERVICE_TOKEN: ${{ secrets.MINING_GITHUB_TOKEN }}
FEDERATEDCODE_GIT_SERVICE_NAME: "the AboutCode bot"
FEDERATEDCODE_GIT_SERVICE_NAME: "AboutCode Automation"
FEDERATEDCODE_GIT_SERVICE_EMAIL: "automation@aboutcode.org"
22 changes: 14 additions & 8 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,22 +21,28 @@ Configuration format

* last serial number processed (used in indexes at pypi, npm etc)
* last processed commit (where the data is stored in git repos)
* directory to store las fetched index data (like the JSON fetched from pypi simple with package names and last updated info)
* directory to store las fetched index data
(like the JSON fetched from pypi simple with package names and last updated info)
* state information in ``state``:

* ``null``: mining has not started.
* ``initital-sync`` : at the start of mining we need to mine a huge amount of packages for packageURL to catch up.
This is typically very large and could take several hours to several days dependening on the ecosystem size.
We fetch and save an index state and mine all packageURLs till there. Once we reach a state where remaining
new packageURLs can be mined in a couple hours, we can move on to the next state where we mine new packageURLs
added in a periodic manner.
* ``periodic-sync`` : This is a periodic update of new packageURLs added in the index in a period, and typically this
* ``initital-sync`` : at the start of mining we need to mine a huge
amount of packages for packageURL to catch up.
This is typically very large and could take several hours to several days
dependening on the ecosystem size.
We fetch and save an index state and mine all packageURLs till there.
Once we reach a state where remaining
new packageURLs can be mined in a couple hours, we can move on to
the next state where we mine new packageURLs
added in a periodic manner.
* ``periodic-sync`` : This is a periodic update of new packageURLs
added in the index in a period, and typically this
should not take more than a couple hours.

* optional elements to improve readability/debugging:

* ``last_updated``: date and time of last checkpoint update

* ``packages_checkpoints.json``: stores checkpoint related to:

* ``packages_mined``: which packages have been mined in the ``initital-sync`` state.
1 change: 1 addition & 0 deletions cargo/checkpoints.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{}
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Welcome to miencode-pipelines documentation!
=========================================
=============================================

This is released at pypi: https://pypi.org/project/minecode-pipelines/

Expand Down
Empty file added etc/.gitkeep
Empty file.