File tree Expand file tree Collapse file tree 3 files changed +15
-21
lines changed
Expand file tree Collapse file tree 3 files changed +15
-21
lines changed Original file line number Diff line number Diff line change @@ -81,10 +81,7 @@ Migrated from the original Python150k preprocessing pipeline:
8181# Install dependencies
8282pip install -e " .[dev]"
8383
84- # Download the seed dataset
85- cd data/raw/python-method && bash get_data.sh && cd -
86-
87- # Convert to HuggingFace format
84+ # Convert to HuggingFace format (requires dataset access, see below)
8885python -m src.data.convert_seed \
8986 --input-dir data/raw/python-method \
9087 --output-dir data/processed/python-method
@@ -95,6 +92,17 @@ python -m src.data.convert_seed \
9592The seed dataset comes from the [ NeuralCodeSum] ( https://github.com/wasiahmad/NeuralCodeSum )
9693project (ACL 2020): 92,545 Python function-docstring pairs split into train/dev/test.
9794
95+ ### Dataset Access
96+
97+ The python-method dataset was previously available via a Google Drive download script
98+ (` data/raw/python-method/get_data.sh ` ). This script has been removed as the Google Drive
99+ link (file ID: ` 1XPE1txk9VI0aOT_TdqbAeI58Q8puKVl2 ` ) is no longer accessible.
100+
101+ To obtain the dataset, you can:
102+ 1 . Contact the [ NeuralCodeSum] ( https://github.com/wasiahmad/NeuralCodeSum ) authors
103+ 2 . Download from the original source if available at the project repository
104+ 3 . Use the alternative python150k dataset from [ ETH Zurich SRI Lab] ( https://www.sri.inf.ethz.ch/py150 )
105+
98106## Acknowledgments
99107
100108- Original C2NL dataset: [ A Transformer-based Approach for Source Code Summarization] ( https://arxiv.org/abs/2005.00653 )
Load Diff This file was deleted.
Original file line number Diff line number Diff line change @@ -32,6 +32,9 @@ dev = [
3232 " ruff>=0.1.0" ,
3333]
3434
35+ [tool .hatch .build .targets .wheel ]
36+ packages = [" src" ]
37+
3538[tool .ruff ]
3639line-length = 100
3740target-version = " py310"
You can’t perform that action at this time.
0 commit comments