|
1 | | -################################################### |
2 | | - Adding ability to store/query downloaded packages |
3 | | -################################################### |
4 | 1 |
|
5 | | -**Organization:** `AboutCode <https://aboutcode.org>`_ |
| 2 | +===================================================== |
| 3 | +Adding Ability to Store and Query Downloaded Packages |
| 4 | +===================================================== |
6 | 5 |
|
7 | | -**Project:** `ScanCode.io |
8 | | -<https://github.com/aboutcode-org/scancode.io>`_ |
| 6 | +**Organization:** `AboutCode <https://aboutcode.org>`__ |
9 | 7 |
|
10 | | -| **Varsha U N** |
11 | | -| GitHub: `VarshaUN <https://github.com/VarshaUN>`_ |
12 | | -| LinkedIn: `Varsha U N <https://www.linkedin.com/in/varsha-un/>`_ |
| 8 | +**Project:** `ScanCode.io <https://github.com/aboutcode-org/scancode.io>`__ |
13 | 9 |
|
14 | | -**Mentors:** |
15 | | - |
16 | | -- `Philippe Ombredanne <https://github.com/pombredanne>`_ |
17 | | -- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_ |
18 | | - |
19 | | -********** |
20 | | - Overview |
21 | | -********** |
| 10 | +| **Contributor:** Varsha U N |
| 11 | +| **GitHub:** `VarshaUN <https://github.com/VarshaUN>`__ |
| 12 | +| **LinkedIn:** `Varsha U N <https://www.linkedin.com/in/varsha-un/>`__ |
22 | 13 |
|
23 | | -Currently ScanCode.io scans the packages but doesn’t store it. This |
24 | | -makes it difficult for users to maintain a reference of packages used in |
25 | | -their projects, meet source redistribution obligations, or revisit |
26 | | -scanned packages for future. |
| 14 | +**Mentors:** |
| 15 | +- `Philippe Ombredanne <https://github.com/pombredanne>`__ |
| 16 | +- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`__ |
27 | 17 |
|
28 | | -This project enhanced ScanCode.io by adding the ability to store and |
29 | | -query downloaded packages locally and re-use packages that were already |
30 | | -scanned. |
| 18 | +Overview |
| 19 | +-------- |
31 | 20 |
|
32 | | ----- |
| 21 | +ScanCode.io currently stores scanned packages on disk without a centralized index, |
| 22 | +leading to duplicate storage, project-specific data, and potential data loss when |
| 23 | +inputs are deleted. This project enhances ScanCode.io by introducing structured |
| 24 | +package storage and querying, enabling indexing, reuse across projects, and |
| 25 | +reliable preservation. |
33 | 26 |
|
34 | | -**************** |
35 | | - Implementation |
36 | | -**************** |
| 27 | +Implementation |
| 28 | +-------------- |
37 | 29 |
|
38 | 30 | The project involved the following key components and steps: |
39 | 31 |
|
| 32 | + |
40 | 33 | .. figure:: /_static/gsoc2025/scancodeio_varsha/project_flow.png |
41 | 34 | :alt: Project Flow Diagram |
42 | 35 | :align: center |
43 | 36 | :width: 70% |
44 | 37 |
|
45 | | - Currently ScanCode.io downloads packages but does not store them. The new archiving system stores downloaded packages on the local filesystem and allows querying them. |
| 38 | +This project addresses the limitations of ScanCode.io's unstructured package |
| 39 | +storage by adding a system to index, reuse, and preserve packages reliably. |
| 40 | + |
46 | 41 |
|
47 | 42 | Storage System Development: |
48 | 43 |
|
@@ -79,37 +74,76 @@ Validation and Testing: |
79 | 74 | `find`), testing normal cases, edge cases (e.g., empty files), and |
80 | 75 | errors (e.g., duplicate origins). |
81 | 76 |
|
82 | | -********************** |
83 | | - Linked Pull Request: |
84 | | -********************** |
85 | 77 |
|
86 | | -Add download archiving system with local filesystem provider - |
87 | | -(https://github.com/aboutcode-org/scancode.io/pull/1815) |
| 78 | +Linked Pull Requests |
| 79 | +-------------------- |
| 80 | + |
| 81 | +.. list-table:: |
| 82 | + :widths: 10 40 20 |
| 83 | + :header-rows: 1 |
| 84 | + |
| 85 | + * - Sr. No |
| 86 | + - Name |
| 87 | + - Link |
| 88 | + * - 1 |
| 89 | + - Add download archiving system |
| 90 | + - `scancode.io#1815 <https://github.com/aboutcode-org/scancode.io/pull/1815>`__ |
| 91 | + * - 2 |
| 92 | + - Support local package storage |
| 93 | + - `scancode.io#1685 <https://github.com/aboutcode-org/scancode.io/pull/1685>`__ |
| 94 | + |
| 95 | +Related Issues |
| 96 | +-------------- |
| 97 | + |
| 98 | +.. list-table:: |
| 99 | + :widths: 10 40 20 |
| 100 | + :header-rows: 1 |
| 101 | + |
| 102 | + * - Sr. No |
| 103 | + - Name |
| 104 | + - Link |
| 105 | + * - 1 |
| 106 | + - Store and retrieve scanned packages |
| 107 | + - `#1063 <https://github.com/aboutcode-org/scancode.io/issues/1063>`__ |
| 108 | + * - 2 |
| 109 | + - Support local package storage |
| 110 | + - `#1683 <https://github.com/aboutcode-org/scancode.io/issues/1683>`__ |
| 111 | + |
| 112 | +Pre-GSoC Work |
| 113 | +------------- |
| 114 | + |
| 115 | +Here are some PRs submitted before GSoC: |
| 116 | + |
| 117 | +- `Add bluefin-container image support <https://github.com/aboutcode-org/scancode.io/pull/1620>`__ |
| 118 | +- `Tag whitedout files <https://github.com/aboutcode-org/scancode.io/pull/1529>`__ |
| 119 | +- `Support python-private-classifier <https://github.com/aboutcode-org/scancode-toolkit/pull/4075>`__ |
| 120 | +- `Parse labels in Dockerfile <https://github.com/aboutcode-org/scancode-toolkit/pull/3987>`__ |
| 121 | +- `Add OCI labels to Dockerfile <https://github.com/aboutcode-org/scancode-toolkit/pull/3987>`__ |
| 122 | +- `Extract LibreOffice documents <https://github.com/aboutcode-org/extractcode/pull/67>`__ |
| 123 | + |
| 124 | +Links |
| 125 | +----- |
| 126 | + |
| 127 | +- **Project Idea:** `GSoC 2025 Idea <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#scancodeio-add-ability-to-storequery-downloaded-packages>`__ |
| 128 | +- **GSoC Project Page:** `GSoC 2025 <https://summerofcode.withgoogle.com/programs/2025/projects/x7sA6uN6>`__ |
| 129 | +- **Proposal:** `Project Proposal <https://docs.google.com/document/d/1LfTGfatLfg9RB-OyLhlS4_h0-Tc9Q8QU1ObsCVDV_sM/edit?usp=sharing>`__ |
| 130 | + |
| 131 | +Future Work |
| 132 | +----------- |
88 | 133 |
|
89 | | -**************** |
90 | | - Related Issue: |
91 | | -**************** |
| 134 | +Future enhancements include implementing the web UI for the `LocalFilesystemProvider` |
| 135 | +to enable package uploads, searches, listings, and retrievals in ScanCode.io, with |
| 136 | +Django views, templates, and URL routes, backed by comprehensive testing. Additionally, |
| 137 | +integrating an external cloud storage option (e.g., AWS S3) alongside the local |
| 138 | +filesystem will extend the `DownloadStore` interface, providing scalable and remote |
| 139 | +storage capabilities. |
92 | 140 |
|
93 | | -Store and retrieve on demand scanned packages/archives - |
94 | | -(https://github.com/aboutcode-org/scancode.io/issues/1063) |
| 141 | +Closing Note |
| 142 | +------------ |
95 | 143 |
|
96 | | -******** |
97 | | - Links: |
98 | | -******** |
| 144 | +During GSoC 2025, my mentors and I held weekly meetings to discuss progress, |
| 145 | +challenges, and next steps. I am deeply grateful to my mentors for their guidance |
| 146 | +and support, which greatly enriched my learning experience. |
99 | 147 |
|
100 | | -| Project Idea: `Idea Link |
101 | | - <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#scancodeio-add-ability-to-storequery-downloaded-packages>`_ |
102 | | -| GSoC Project Page: `GSoC 2025 |
103 | | - <https://summerofcode.withgoogle.com/programs/2025/projects/x7sA6uN6>`_ |
104 | | -| Proposal: `Proposal Link |
105 | | - <https://docs.google.com/document/d/1LfTGfatLfg9RB-OyLhlS4_h0-Tc9Q8QU1ObsCVDV_sM/edit?usp=sharing>`_ |
106 | 148 |
|
107 | | -*************** |
108 | | - Closing Notes |
109 | | -*************** |
110 | 149 |
|
111 | | -During the GSoC coding period, my mentors and I had weekly meetings to |
112 | | -discuss progress, challenges, and next steps. Thank you so much to my |
113 | | -mentors for being there every step of the way during GSoC 2025. Your |
114 | | -encouragement and insights made a huge difference in my learning |
115 | | -journey. |
|
0 commit comments