Skip to content

Commit 84730bf

Browse files
authored
Merge pull request #24 from IloBe/20-github-action-isssue-with-pytest-fixture-creation-for-unittests
Fix: change testing fixture
2 parents e4b9e2d + 244d520 commit 84730bf

8 files changed

Lines changed: 1869 additions & 19 deletions

File tree

.github/workflows/python-app.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ jobs:
4343
# # see: https://dvc.org/doc/user-guide/troubleshooting#missing-files,
4444
# # no remote cloud service available yet
4545
# dvc pull
46-
# - name: Test with pytest
47-
# run: |
48-
# # Final action: runs pytest for all tests in the ./tests dir
49-
# pytest ./tests/*.py -vv
46+
- name: Test with pytest
47+
run: |
48+
# Final action: runs pytest for all tests in the ./tests dir
49+
pytest ./tests/*.py -vv

README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Regarding software engineering principles, beside documentation, logging and pyt
3131
The Unit tests are written via _pytest_for GET and POST prediction requests for the FastAPI component as well as for the mentioned data and model task parts. All unit test results are reported in associated html files of the [tests directory](https://github.com/IloBe/US_CensusData_Classifier_PipelineWithDeployment/tree/master/tests).
3232

3333
All project relevant configuration values, including model hyperparameter ranges for the cross validation concept, are handled via specific configuration _config.yml_ file.<br>
34-
For versioning tasks, [_git_](https://git-scm.com/) and [_dvc_](https://dvc.org/doc/use-cases/versioning-data-and-models), handled with ignore files content, are chosen.
34+
For versioning tasks, [_git_](https://git-scm.com/) and [_dvc_](https://dvc.org/doc/use-cases/versioning-data-and-models), handled with ignore files content, are chosen. If a remote storage, like AWS S3 or Azure shall be used as future task, dvc[all] for the selected dvc version is installed via requirements.txt file as well for specific configuration. By now, only dvc 'local' remote is set.
3535

3636

3737
## Environment Set up
@@ -58,6 +58,7 @@ or use
5858
* In our GitHub repository an automatic Action script is set up to check amongst others dependencies, linting and unit testing.
5959
![github action][image1]
6060

61+
<br>
6162

6263
## Data
6364
* The download raw _census.csv_ file is preprocessed and stored as new .csv file. Both files are committed and versioned with _dvc_.
@@ -103,6 +104,7 @@ Several other insights are visualised and stored as .png files. So, have a look
103104
...
104105
```
105106
* As mentioned, the model card informs about our found insights of the binary classification estimator including evaluation diagrams and general metrics.
107+
<br>
106108

107109

108110
## API Creation
@@ -148,7 +150,8 @@ As an examples regarding the use case of having a person earning <=50K as income
148150

149151
<br>
150152

151-
* after selection, render starts its advanced deployment configuation, some parameters are already set, some have to be set manually appropriately. Render guides you through with easy to handle UI's.
153+
* Because default render Python version is 3.7 and this version has issues with dvc, the environment variable PYTHON_VERSION has to be configured being version 3.10.9.
154+
* After selection, render starts its advanced deployment configuation, some parameters are already set, some have to be set manually appropriately. Render guides you through with easy to handle UI's.
152155
* That's it. Implement coding changes, push to the GitHub repository, and the app will automatically redeploy each time, but it will only deploy if your continuous integration action passes.
153156
* Have in mind: if you rely on your CI/CD to fail before fixing an issue, it slows down your deployment. Fix issues early, e.g. by running an ensemble linter like flake8 locally before committing changes.
154157
* For checking the render deployment, a python file exists that uses the httpx module to do one GET and POST on the live render web service and prints its results.

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
dvc==3.23.0
2+
dvc[all]==3.23.0
23
flake8==6.1.0
34
numpy==1.23.5
45
pandas==2.0.3

src/config/config.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ etl:
5353
preproc_census: "preproc_census.csv"
5454
orig_census_dvc_url: "data/census.csv"
5555
preproc_census_dvc_url: "data/preproc_census.csv"
56+
test_data_orig_census: "tests/df_test_1500raw.csv"
5657
fnlgt_range: range(0, 1550000, 50000)
5758
non_converting_countries: ['Canada', 'United-States', 'Mexico', 'Hongkong', 'China',
5859
'Japan', 'Taiwan', 'Russia', 'Outlying-US']

tests/TestDataframeCreation.ipynb

Lines changed: 339 additions & 0 deletions
Large diffs are not rendered by default.

tests/conftest.py

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
"""
44
This conftest script delivers the fixture setup used for pytest testsuite.
5+
Having had issues with dvc cache and github actions part of unittests,
6+
a subset of data is stored once in the tests directory.
7+
58
author: I. Brinkmeier
69
date: 2023-09
710
@@ -43,13 +46,14 @@ def raw_test_data() -> pd.DataFrame:
4346
4447
Returns:
4548
Subset dataframe with 1500 rows loaded from original csv file
46-
"""
47-
data = config_file['etl']['orig_census_dvc_url']
48-
49+
"""
4950
try:
50-
return pd.read_csv(data)[:1500]
51+
# data = './df_test_1500raw.csv'
52+
ROOT = os.getcwd()
53+
filepath = os.path.join(ROOT, config_file['etl']['test_data_orig_census'])
54+
return pd.read_csv(filepath)
5155
except Exception as e:
52-
pytest.fail(f"Fixture creation with 1500 orig rows: e.g. Data not found at path: {data}, exc: {e}")
56+
pytest.fail(f"Fixture with 1500 orig rows: exc: {e}")
5357

5458

5559
@pytest.fixture(scope='session')
@@ -60,9 +64,10 @@ def cleaned_test_data() -> pd.DataFrame:
6064
Returns:
6165
Subset dataframe with 200 rows loaded from original csv file and cleaned
6266
"""
63-
data = config_file['etl']['orig_census_dvc_url']
64-
6567
try:
66-
return clean_data(pd.read_csv(data)[:200], config_file)
68+
# data = './df_test_1500raw.csv'
69+
ROOT = os.getcwd()
70+
filepath = os.path.join(ROOT, config_file['etl']['test_data_orig_census'])
71+
return clean_data(pd.read_csv(filepath)[:200], config_file)
6772
except Exception as e:
68-
pytest.fail(f"Fixture creation with 200 cleaned rows: Data not found at path: {data}, exc: {e}")
73+
pytest.fail(f"Fixture with 200 cleaned rows: exc: {e}")

tests/data_test_report.html

Lines changed: 4 additions & 4 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)