Skip to content

Commit 635026c

Browse files
authored
Merge pull request #217 from VarshaUN/GSoC-report
Add the final report of gsoc I have not reviewed in detail, but trust Varsha to update her own report.
2 parents 18c859f + ea2f56b commit 635026c

File tree

1 file changed

+90
-56
lines changed

1 file changed

+90
-56
lines changed

docs/source/archive/gsoc/reports/2025/scancodeio_varsha.rst

Lines changed: 90 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,43 @@
1-
###################################################
2-
Adding ability to store/query downloaded packages
3-
###################################################
41

5-
**Organization:** `AboutCode <https://aboutcode.org>`_
2+
=====================================================
3+
Adding Ability to Store and Query Downloaded Packages
4+
=====================================================
65

7-
**Project:** `ScanCode.io
8-
<https://github.com/aboutcode-org/scancode.io>`_
6+
**Organization:** `AboutCode <https://aboutcode.org>`__
97

10-
| **Varsha U N**
11-
| GitHub: `VarshaUN <https://github.com/VarshaUN>`_
12-
| LinkedIn: `Varsha U N <https://www.linkedin.com/in/varsha-un/>`_
8+
**Project:** `ScanCode.io <https://github.com/aboutcode-org/scancode.io>`__
139

14-
**Mentors:**
15-
16-
- `Philippe Ombredanne <https://github.com/pombredanne>`_
17-
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_
18-
19-
**********
20-
Overview
21-
**********
10+
| **Contributor:** Varsha U N
11+
| **GitHub:** `VarshaUN <https://github.com/VarshaUN>`__
12+
| **LinkedIn:** `Varsha U N <https://www.linkedin.com/in/varsha-un/>`__
2213
23-
Currently ScanCode.io scans the packages but doesn’t store it. This
24-
makes it difficult for users to maintain a reference of packages used in
25-
their projects, meet source redistribution obligations, or revisit
26-
scanned packages for future.
14+
**Mentors:**
15+
- `Philippe Ombredanne <https://github.com/pombredanne>`__
16+
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`__
2717

28-
This project enhanced ScanCode.io by adding the ability to store and
29-
query downloaded packages locally and re-use packages that were already
30-
scanned.
18+
Overview
19+
--------
3120

32-
----
21+
ScanCode.io currently stores scanned packages on disk without a centralized index,
22+
leading to duplicate storage, project-specific data, and potential data loss when
23+
inputs are deleted. This project enhances ScanCode.io by introducing structured
24+
package storage and querying, enabling indexing, reuse across projects, and
25+
reliable preservation.
3326

34-
****************
35-
Implementation
36-
****************
27+
Implementation
28+
--------------
3729

3830
The project involved the following key components and steps:
3931

32+
4033
.. figure:: /_static/gsoc2025/scancodeio_varsha/project_flow.png
4134
:alt: Project Flow Diagram
4235
:align: center
4336
:width: 70%
4437

45-
Currently ScanCode.io downloads packages but does not store them. The new archiving system stores downloaded packages on the local filesystem and allows querying them.
38+
This project addresses the limitations of ScanCode.io's unstructured package
39+
storage by adding a system to index, reuse, and preserve packages reliably.
40+
4641

4742
Storage System Development:
4843

@@ -79,37 +74,76 @@ Validation and Testing:
7974
`find`), testing normal cases, edge cases (e.g., empty files), and
8075
errors (e.g., duplicate origins).
8176

82-
**********************
83-
Linked Pull Request:
84-
**********************
8577

86-
Add download archiving system with local filesystem provider -
87-
(https://github.com/aboutcode-org/scancode.io/pull/1815)
78+
Linked Pull Requests
79+
--------------------
80+
81+
.. list-table::
82+
:widths: 10 40 20
83+
:header-rows: 1
84+
85+
* - Sr. No
86+
- Name
87+
- Link
88+
* - 1
89+
- Add download archiving system
90+
- `scancode.io#1815 <https://github.com/aboutcode-org/scancode.io/pull/1815>`__
91+
* - 2
92+
- Support local package storage
93+
- `scancode.io#1685 <https://github.com/aboutcode-org/scancode.io/pull/1685>`__
94+
95+
Related Issues
96+
--------------
97+
98+
.. list-table::
99+
:widths: 10 40 20
100+
:header-rows: 1
101+
102+
* - Sr. No
103+
- Name
104+
- Link
105+
* - 1
106+
- Store and retrieve scanned packages
107+
- `#1063 <https://github.com/aboutcode-org/scancode.io/issues/1063>`__
108+
* - 2
109+
- Support local package storage
110+
- `#1683 <https://github.com/aboutcode-org/scancode.io/issues/1683>`__
111+
112+
Pre-GSoC Work
113+
-------------
114+
115+
Here are some PRs submitted before GSoC:
116+
117+
- `Add bluefin-container image support <https://github.com/aboutcode-org/scancode.io/pull/1620>`__
118+
- `Tag whitedout files <https://github.com/aboutcode-org/scancode.io/pull/1529>`__
119+
- `Support python-private-classifier <https://github.com/aboutcode-org/scancode-toolkit/pull/4075>`__
120+
- `Parse labels in Dockerfile <https://github.com/aboutcode-org/scancode-toolkit/pull/3987>`__
121+
- `Add OCI labels to Dockerfile <https://github.com/aboutcode-org/scancode-toolkit/pull/3987>`__
122+
- `Extract LibreOffice documents <https://github.com/aboutcode-org/extractcode/pull/67>`__
123+
124+
Links
125+
-----
126+
127+
- **Project Idea:** `GSoC 2025 Idea <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#scancodeio-add-ability-to-storequery-downloaded-packages>`__
128+
- **GSoC Project Page:** `GSoC 2025 <https://summerofcode.withgoogle.com/programs/2025/projects/x7sA6uN6>`__
129+
- **Proposal:** `Project Proposal <https://docs.google.com/document/d/1LfTGfatLfg9RB-OyLhlS4_h0-Tc9Q8QU1ObsCVDV_sM/edit?usp=sharing>`__
130+
131+
Future Work
132+
-----------
88133

89-
****************
90-
Related Issue:
91-
****************
134+
Future enhancements include implementing the web UI for the `LocalFilesystemProvider`
135+
to enable package uploads, searches, listings, and retrievals in ScanCode.io, with
136+
Django views, templates, and URL routes, backed by comprehensive testing. Additionally,
137+
integrating an external cloud storage option (e.g., AWS S3) alongside the local
138+
filesystem will extend the `DownloadStore` interface, providing scalable and remote
139+
storage capabilities.
92140

93-
Store and retrieve on demand scanned packages/archives -
94-
(https://github.com/aboutcode-org/scancode.io/issues/1063)
141+
Closing Note
142+
------------
95143

96-
********
97-
Links:
98-
********
144+
During GSoC 2025, my mentors and I held weekly meetings to discuss progress,
145+
challenges, and next steps. I am deeply grateful to my mentors for their guidance
146+
and support, which greatly enriched my learning experience.
99147

100-
| Project Idea: `Idea Link
101-
<https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#scancodeio-add-ability-to-storequery-downloaded-packages>`_
102-
| GSoC Project Page: `GSoC 2025
103-
<https://summerofcode.withgoogle.com/programs/2025/projects/x7sA6uN6>`_
104-
| Proposal: `Proposal Link
105-
<https://docs.google.com/document/d/1LfTGfatLfg9RB-OyLhlS4_h0-Tc9Q8QU1ObsCVDV_sM/edit?usp=sharing>`_
106148

107-
***************
108-
Closing Notes
109-
***************
110149

111-
During the GSoC coding period, my mentors and I had weekly meetings to
112-
discuss progress, challenges, and next steps. Thank you so much to my
113-
mentors for being there every step of the way during GSoC 2025. Your
114-
encouragement and insights made a huge difference in my learning
115-
journey.

0 commit comments

Comments
 (0)