Skip to content

Commit f4cc735

Browse files
author
jjhenkel
committed
Added / fixed some docs.
1 parent 5e8e271 commit f4cc735

2 files changed

Lines changed: 9 additions & 1 deletion

File tree

data/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,11 @@ in the `./data/build-results/*/` folder.
2727
This directory includes the (compressed) pre-processed version of the broken Dockerfiles that we used as input to
2828
our clustering algorithm (BERT + HDBSCAN). We hope that, by providing the original pre-processed data, others can
2929
build new clustering techniques or refine the clustering approach and parameters.
30+
31+
## `./data/non-clustered-data`
32+
33+
This directory contains a (compressed) pre-processed version of the broken Dockerfiles that _did not cluster_ (HDBSCAN, under most configurations, does not place every element into a cluster). As part of `rq3` we analyze clustered and non-clustered data to compare how shipwright performs on either set. This data is used by `./shipwright.sh run-rq3`.
34+
35+
## `./data/clustered-data`
36+
37+
In this folder you can find the clusters we generated and corresponding metadata for each of the broken Dockerfiles that are in each cluster. This data is used by `./shipwright.sh run-rq3`.

rq2/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ If you wish to run things from this directory, you would need `python3` with the
66

77
## Too Long Didn't Read (TLDR)
88

9-
We provided the pre-processed form of our broken Dockerfiles in the `./data/for-clustering` directory (one gzipped json file per broken Dockerfile). You can use these files to run your own clustering, or our clustering, or you can use our pre-generated clusters in the `./rq3/Clusters` directory.
9+
We provided the pre-processed form of our broken Dockerfiles in the `./data/for-clustering` directory (one gzipped json file per broken Dockerfile). You can use these files to run your own clustering, or our clustering, or you can use our pre-generated clusters in the `./data/clustered-data` directory.
1010

1111
To spit out some quick output data (and verify things can run) run `./clustering.py` --- this is quick and should print something like the following:
1212

0 commit comments

Comments
 (0)