Skip to content

Commit 4da4edc

Browse files
authored
Update README.md
1 parent be31599 commit 4da4edc

1 file changed

Lines changed: 20 additions & 16 deletions

File tree

README.md

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,46 @@
1-
### Source Code
1+
# Using Sequences of Life-events to Predict Human Lives
2+
3+
This repository contains code for the [Using Sequences of Life-events to Predict Human Lives](https://doi.org/10.21203/rs.3.rs-2975478/v1)
24

3-
This repository contains scripts and several notebooks for the data processing, life2vec training, statistical analysis, and visualization. The model weights, experiment logs, and associated model outputs can be obtained in accordance with the rules of [Statistics Denmark's Research Scheme](https://www.dst.dk/en/TilSalg/Forskningsservice/Dataadgang).
5+
### Source Code
46

5-
Paths (e.g. to data, or model weights) were **redacted** before submitting scripts to the CodeOcean platform.
7+
This repository contains scripts and several notebooks for data processing, life2vec training, statistical analysis, and visualization. The model weights, experiment logs, and associated model outputs can be obtained in accordance with the rules of [Statistics Denmark's Research Scheme](https://www.dst.dk/en/TilSalg/Forskningsservice/Dataadgang).
68

9+
Paths (e.g., to data, or model weights) were **redacted** before submitting scripts to GitHub.
710

811
### Overall Structure
912

10-
We use [Hydra](https://hydra.cc/docs/intro/) to run the experiments. The `/conf` folder contain configs for the experiments:
13+
We use [Hydra](https://hydra.cc/docs/intro/) to run the experiments. The `/conf` folder contains configs for the experiments:
1114
1. `/experiment` contains configuration `yaml` for pretraining and finetuning,
12-
2. `/tasks` contain the specification for data augmentation in MLM, SOP, CLS etc.,
15+
2. `/tasks` contain the specification for data augmentation in MLM, SOP, etc.,
1316
3. `/trainer` contains configuration for logging (not used) and multithread training (not used),
1417
4. `/data_new` contains configs for data loading and processing,
1518
5. `/datamodule` contains configs that specify how data should be loaded to PyTorch and PyTorch Lightning
16-
6. `callbacks.yaml` specifies configuration for the PyTorch Lightning Callbacks ,
19+
6. `callbacks.yaml` specifies the configuration for the PyTorch Lightning Callbacks ,
1720
7. `prepare_data.yaml` can be used to run data preprocessing.
1821

1922
The `/analysis` folder contains `ipynb` notebooks for post-hoc evaluation:
2023
1. `/embedding` contains the analysis of the embedding spaces,
21-
2. `/metric` containt notebooks for the model evaluation
24+
2. `/metric` contains notebooks for the model evaluation
2225
3. `/visualisation` contains notebooks for the visualisation of spaces.
2326

24-
The source folder, `/src`, contains the codes related to the data loading and model training. Due to specifics of the `hydra` package, this folder includes have TCAV implementation (in `/src/analysis/tcav`) and hyperparameter tuning (in `/src/analysis/hyperparameter`). Here is the overview of the `/src` folder:
27+
The source folder, `/src`, contains the data loading and model training codes. Due to the specifics of the `hydra` package, this folder includes TCAV implementation (in `/src/analysis/tcav`) and hyperparameter tuning (in `/src/analysis/hyperparameter`). Here is the overview of the `/src` folder:
2528
1. The `/src/data_new` contains scripts to preprocess data as well as prepare data to load into the PyTorch or PyTorch Lightning,
2629
2. The `/src/models` contains the implementation of baseline models,
27-
3. The `/src/tasks` include code specific to the particular task, aka MLM, SOP, Mortality Prediction, Emigration Prediction etc.
30+
3. The `/src/tasks` include code specific to the particular task, aka MLM, SOP, Mortality Prediction, Emigration Prediction, etc.
2831
4. `/src/tranformer` contains the implementation of the life2vec model:
2932
1. In `performer.py`, we overwrite the functionality of the `performer-pytorch` package,
30-
2. In `cls_model.py`, we have implementation of the finetuning stage (aka life2vec for personality, mortality, emigration),
33+
2. In `cls_model.py`, we have an implementation of the finetuning stage (aka life2vec for personality, mortality, emigration),
3134
3. `models.py` contains the code for the life2vec pretraining (aka the base life2vec model),
32-
4. The `transformer_utils.py` contains the implementation of custom modules, like losses, ac tivation functions etc.
35+
4. The `transformer_utils.py` contains the implementation of custom modules, like losses, activation functions, etc.
3336
5. The `metrics.py` contains code for the custom metric,
34-
6. The `modules.py`, `attention.py`, `att_utils.py`, `embeddings.py` contain the implementation of modules used in the transformer-network (aka life2vec encoders).
37+
6. The `modules.py`, `attention.py`, `att_utils.py`, and `embeddings.py` contain the implementation of modules used in the transformer network (aka life2vec encoders).
3538

36-
Scripts such as `train.py`, `test.py`, `tune.py`, `val.py` used to run a particular stage of the training; while `prepare_data.py` used to run the data processing (see below the example).
39+
Scripts such as `train.py`, `test.py`, `tune.py`, and `val.py` used to run a particular stage of the training, while `prepare_data.py` was used to run the data processing (see below the example).
3740

3841

3942
### Run the script
40-
To run the code you would use the following commands:
43+
To run the code, you would use the following commands:
4144

4245
```
4346
# run the pretraining:
@@ -60,12 +63,13 @@ python -m src.prepare_data +data_new/sources=labour target=\${data_new.sources}
6063
HYDRA_FULL_ERROR=1 python -m src.train experiment=emm trainer.devices=[0] version=0.01
6164
```
6265

63-
## How to cite
66+
### How to cite
67+
6468
**Research Square Preprint**
6569
```bibtex
6670
@article{savcisens2023using,
6771
title={Using Sequences of Life-events to Predict Human Lives},
6872
author={Savcisens, Germans and Eliassi-Rad, Tina and Hansen, Lars Kai and Mortensen, Laust and Lilleholt, Lau and Rogers, Anna and Zettler, Ingo and Lehmann, Sune},
6973
year={2023}
7074
}
71-
```
75+
```

0 commit comments

Comments
 (0)