Skip to content
This repository was archived by the owner on Feb 16, 2023. It is now read-only.

Commit 80878e3

Browse files
Merge pull request #833 from OpenMined/dev
Master
2 parents dbcd06f + 2ceaf7e commit 80878e3

File tree

511 files changed

+45470
-8897
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

511 files changed

+45470
-8897
lines changed

.github/workflows/tests.yml

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ on:
55
branches:
66
- master
77
- dev
8+
- pygrid_0.4.0
89
pull_request:
910
types: [opened, synchronize, reopened]
1011

@@ -14,7 +15,7 @@ jobs:
1415
strategy:
1516
max-parallel: 4
1617
matrix:
17-
python-version: [3.7]
18+
python-version: [3.8]
1819

1920
steps:
2021
- uses: actions/checkout@v1
@@ -26,17 +27,17 @@ jobs:
2627
- name: Install Poetry
2728
run: |
2829
pip install --upgrade pip
29-
pip install poetry==1.0
30+
pip install poetry==1.1.2
3031
31-
- name: Test Grid Node
32+
- name: Test Grid Domain
3233
run: |
33-
cd ./apps/node/
34+
cd ./apps/domain/
3435
3536
# Install dependencies
3637
poetry install
3738
3839
# Run black
39-
poetry run black --check --verbose --exclude src/syft .
40+
poetry run black --check --verbose .
4041
4142
# Run docformatter
4243
poetry run docformatter --check --recursive .
@@ -78,7 +79,7 @@ jobs:
7879
7980
- name: Run Integration Tests
8081
run: |
81-
cd ./apps/node/
82+
cd ./apps/domain/
8283
8384
# Run Integration tests
84-
poetry run coverage run -m pytest -v ../../tests
85+
# poetry run coverage run -m pytest -v ../../tests

.gitignore

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,39 @@ test_flask_grid_server.db
2828
# Sphinx documentation
2929
docs/_build/
3030

31+
# Terraform .gitignore (https://github.com/github/gitignore/blob/master/Terraform.gitignore)
32+
33+
# Local .terraform directories
34+
**/.terraform/*
35+
36+
# .tfstate files
37+
*.tfstate
38+
*.tfstate.*
39+
40+
# Crash log files
41+
crash.log
42+
43+
# Exclude all .tfvars files, which are likely to contain sentitive data, such as
44+
# password, private keys, and other secrets. These should not be part of version
45+
# control as they are data points which are potentially sensitive and subject
46+
# to change depending on the environment.
47+
#
48+
*.tfvars
49+
50+
# Ignore override files as they are usually used to override resources locally and so
51+
# are not checked in
52+
override.tf
53+
override.tf.json
54+
*_override.tf
55+
*_override.tf.json
56+
57+
# Include override files you do wish to add to version control using negated pattern
58+
#
59+
# !example_override.tf
60+
61+
# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
62+
# example: *tfplan*
63+
64+
# Ignore CLI configuration files
65+
.terraformrc
66+
terraform.rc

.pre-commit-config.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,17 @@ repos:
1818
rev: v1.1.0
1919
hooks:
2020
- id: python-use-type-annotations
21+
22+
- repo: https://github.com/aflc/pre-commit-jupyter
23+
rev: v1.0.0
24+
hooks:
25+
- id: jupyter-notebook-cleanup
26+
args:
27+
- --remove-kernel-metadata
28+
- --pin-patterns
29+
- "[pin];[donotremove]"
30+
31+
- repo: https://github.com/pre-commit/mirrors-isort
32+
rev: 'v5.4.2' # Use the revision sha / tag you want to point at
33+
hooks:
34+
- id: isort

.vscode/settings.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
{
2-
"python.pythonPath": "/Users/cereallarceny/.pyenv/versions/3.7.5/bin/python"
3-
}
2+
"python.formatting.provider": "black",
3+
"editor.tabSize": 4
4+
}

README.md

Lines changed: 46 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
![PyGrid logo](https://raw.githubusercontent.com/OpenMined/design-assets/master/logos/PyGrid/horizontal-primary-trans.png)
22

3-
[![Run Tests](https://github.com/OpenMined/PyGrid/workflows/Run%20tests/badge.svg)](https://github.com/OpenMined/PyGrid/actions?query=workflow%3A%22Run+tests%22) [![Docker build](https://github.com/OpenMined/PyGrid/workflows/Docker%20build/badge.svg)](https://github.com/OpenMined/PyGrid/actions?query=workflow%3A%22Docker+build%22)
3+
[![Tests](https://github.com/OpenMined/PyGrid/workflows/Run%20tests/badge.svg)](https://github.com/OpenMined/PyGrid/actions?query=workflow%3A%22Run+tests%22)
44

55
PyGrid is a peer-to-peer network of data owners and data scientists who can collectively train AI models using [PySyft](https://github.com/OpenMined/PySyft/). PyGrid is also the central server for conducting both model-centric and data-centric federated learning.
66

7-
_**A quick note about PySyft 0.3.x:** Currently, PyGrid is designed to work with the PySyft 0.2.x product line only. We are working on support for 0.3.x and hope to have this released by early 2021. Thanks for your patience!_
7+
You may control PyGrid via our user-interface, [PyGrid Admin](https://github.com/OpenMined/pygrid-admin).
88

99
## Architecture
1010

1111
PyGrid platform is composed by three different components.
1212

13-
- **Network** - A Flask-based application used to manage, monitor, control, and route instructions to various PyGrid Nodes.
14-
- **Node** - A Flask-based application used to store private data and models for federated learning, as well as to issue instructions to various PyGrid Workers.
15-
- **Worker** - An emphemeral instance, managed by a PyGrid Node, that is used to compute data.
13+
- **Network** - A Flask-based application used to manage, monitor, control, and route instructions to various PyGrid Domains.
14+
- **Domain** - A Flask-based application used to store private data and models for federated learning, as well as to issue instructions to various PyGrid Workers.
15+
- **Worker** - An emphemeral instance, managed by a PyGrid Domain, that is used to compute data.
1616

1717
## Use Cases
1818

@@ -30,41 +30,41 @@ Model-centric FL is when the model is hosted in PyGrid. This is really useful wh
3030
4. Once training is completed, a "diff" is generated between the new and the original state of the model
3131
5. The diff is reported back to PyGrid and it's averaged into the model
3232

33-
This takes place potentially with hundreds, or thousands of devices simultaneously. **For model-centric federated learning, you only need to run a Node. Networks and Workers are irrelevant for this specific use-case.**
33+
This takes place potentially with hundreds, or thousands of devices simultaneously. **For model-centric federated learning, you only need to run a Domain. Networks and Workers are irrelevant for this specific use-case.**
3434

3535
_Note:_ For posterity sake, we previously used to refer to this process as "static federated learning".
3636

37-
![Cycled MCFL](https://github.com/OpenMined/PyGrid/blob/dev/assets/MCFL-cycled.png?raw=true)
37+
![Cycled MCFL](assets/MCFL-cycled.png)
3838

3939
#### Data-centric FL
4040

4141
Data-centric FL is the same problem as model-centric FL, but from the opposite perspective. The most likely scenario for data-centric FL is where a person or organization has data they want to protect in PyGrid (instead of hosting the model, they host data). This would allow a data scientist who is not the data owner, to make requests for training or inference against that data. The following workflow will take place:
4242

43-
1. A data scientist searches for data they would like to train on (they can search either an individual Node, or a Network of Nodes)
43+
1. A data scientist searches for data they would like to train on (they can search either an individual Domain, or a Network of Domains)
4444
2. Once the data has been found, they may write a training plan and optionally pre-train a model
45-
3. The training plan and model are sent to the PyGrid Node in the form of a job request
46-
4. The PyGrid Node will gather the appropriate data from its database and send the data, the model, and the training plan to a Worker for processing
45+
3. The training plan and model are sent to the PyGrid Domain in the form of a job request
46+
4. The PyGrid Domain will gather the appropriate data from its database and send the data, the model, and the training plan to a Worker for processing
4747
5. The Worker performs the plan on the model using the data
48-
6. The result is returned to the Node
48+
6. The result is returned to the Domain
4949
7. The result is returned to the data scientist
5050

5151
For the last step, we're working on adding the capability for privacy budget tracking to be applied that will allow a data owner to "sign off" on whether or not a trained model should be released.
5252

5353
_Note:_ For posterity sake, we previously used to refer to this process as "dynamic federated learning".
5454

55-
**Node-only data-centric FL**
55+
**Domain-only data-centric FL**
5656

57-
Technically speaking, it isn't required to run a Network when performing data-centric federated learning. Alternatively, as a data owner, you may opt to only run a Node, but participate in a Network hosted by someone else. The Network host will not have access to your data.
57+
Technically speaking, it isn't required to run a Network when performing data-centric federated learning. Alternatively, as a data owner, you may opt to only run a Domain, but participate in a Network hosted by someone else. The Network host will not have access to your data.
5858

59-
![Node-only DCFL](https://github.com/OpenMined/PyGrid/blob/dev/assets/DCFL-node.png?raw=true)
59+
![Domain-only DCFL](assets/DCFL-node.png)
6060

6161
**Network-based data-centric FL**
6262

63-
Many times you will wat to use a Network to allow multiple Nodes to be connected together. As a data owner, it's not strictly necessary to own and operate mulitple Nodes. PyGrid doesn't prescribe one way to organize Nodes and Networks, but we expose these applications to allow you and various related stakeholders to make the correct decision about your infrastructure needs.
63+
Many times you will wat to use a Network to allow multiple Domains to be connected together. As a data owner, it's not strictly necessary to own and operate mulitple Domains. PyGrid doesn't prescribe one way to organize Domains and Networks, but we expose these applications to allow you and various related stakeholders to make the correct decision about your infrastructure needs.
6464

65-
![Network-based DCFL](https://github.com/OpenMined/PyGrid/blob/dev/assets/DCFL-network.png?raw=true)
65+
![Network-based DCFL](assets/DCFL-network.png)
6666

67-
## Getting started
67+
## Local Setup
6868

6969
Currently, we suggest two ways to run PyGrid locally: Docker and manually running from source. With Docker, we can organize all the services we'd like to use and then boot them all in one command. With manually running from source, we have to run them as separate tasks.
7070

@@ -74,14 +74,14 @@ To install Docker, just follow the [docker documentation](https://docs.docker.co
7474

7575
#### 1. Setting the your hostfile
7676

77-
Before start the grid platform locally using Docker, we need to set up the domain names used by the bridge network. In order to use these nodes from outside of the containers context, you should add the following domain names on your `/etc/hosts`
77+
Before start the grid platform locally using Docker, we need to set up the domain names used by the bridge network. In order to use these Domains from outside of the containers context, you should add the following domain names on your `/etc/hosts`
7878

7979
```
8080
127.0.0.1 network
81-
127.0.0.1 bob
8281
127.0.0.1 alice
83-
127.0.0.1 bill
84-
127.0.0.1 james
82+
127.0.0.1 bob
83+
127.0.0.1 charlie
84+
127.0.0.1 dan
8585
```
8686

8787
Note that you're not restricted to running 4 nodes and a network. You could instead run just a single node if you'd like - this is often all you need for model-centric federated learning. For the sake of our example, we'll use the network running 4 nodes underneath but you're welcome to modify it to your needs.
@@ -90,8 +90,9 @@ Note that you're not restricted to running 4 nodes and a network. You could inst
9090

9191
The latest PyGrid Network and Node images are also available on the Docker Hub.
9292

93+
- [PyGrid Domain - `openmined/grid-domain`](https://hub.docker.com/repository/docker/openmined/grid-domain)
94+
- [PyGrid Worker - `openmined/grid-worker`](https://hub.docker.com/repository/docker/openmined/grid-worker)
9395
- [PyGrid Network - `openmined/grid-network`](https://hub.docker.com/repository/docker/openmined/grid-network)
94-
- [PyGrid Node - `openmined/grid-node`](https://hub.docker.com/repository/docker/openmined/grid-node)
9596

9697
To setup and start the PyGrid platform you just need start the docker-compose process.
9798

@@ -106,19 +107,29 @@ This will download the latest Openmined Docker images and start a grid platform
106107
If you want to build your own custom images, you may do so using the following command for the Node:
107108

108109
```
109-
docker build . --file ./apps/node/Dockerfile --tag openmined/grid-node:mybuildname
110+
docker build ./apps/domain --file ./apps/domain/Dockerfile --tag openmined/grid-domain:mybuildname
111+
```
112+
113+
Or for the Worker:
114+
115+
```
116+
docker build ./apps/worker --file ./apps/worker/Dockerfile --tag openmined/grid-worker:mybuildname
110117
```
111118

112119
Or for the Network:
113120

114121
```
115-
docker build . --file ./apps/node/Dockerfile --tag openmined/grid-node:mybuildname
122+
docker build ./apps/network --file ./apps/network/Dockerfile --tag openmined/grid-network:mybuildname
116123
```
117124

118125
### Manual Start
119126

120127
#### Running a Node
121128

129+
> ##### Installation
130+
>
131+
> First install [`poetry`](https://python-poetry.org/docs/) and run `poetry install` in `apps/node`
132+
122133
To start the PyGrid Node manually, run:
123134

124135
```
@@ -132,15 +143,15 @@ You can pass the arguments or use environment variables to set the network confi
132143

133144
- `-h, --help` - Shows the help message and exit
134145
- `-p [PORT], --port [PORT]` - Port to run server on (default: 5000)
135-
- `--host [HOST]` - The Network host
146+
- `--host [HOST]` - The Node host
136147
- `--num_replicas [NUM]` - The number of replicas to provide fault tolerance to model hosting
137148
- `--id [ID]` - The ID of the Node
138149
- `--start_local_db` - If this flag is used a SQLAlchemy DB URI is generated to use a local db
139150

140151
**Environment Variables**
141152

142-
- `GRID_NETWORK_PORT` - Port to run server on
143-
- `GRID_NETWORK_HOST` - The Network host
153+
- `GRID_NODE_PORT` - Port to run server on
154+
- `GRID_NODE_HOST` - The Node host
144155
- `NUM_REPLICAS` - Number of replicas to provide fault tolerance to model hosting
145156
- `DATABASE_URL` - The Node database URL
146157
- `SECRET_KEY` - The secret key
@@ -151,15 +162,15 @@ To start the PyGrid Network manually, run:
151162

152163
```
153164
cd apps/network
154-
./run.sh --port 5000 --start_local_db
165+
./run.sh --port 7000 --start_local_db
155166
```
156167

157168
You can pass the arguments or use environment variables to set the network configs.
158169

159170
**Arguments**
160171

161172
- `-h, --help` - Shows the help message and exit
162-
- `-p [PORT], --port [PORT]` - Port to run server on (default: 5000)
173+
- `-p [PORT], --port [PORT]` - Port to run server on (default: 7000)
163174
- `--host [HOST]` - The Network host
164175
- `--start_local_db` - If this flag is used a SQLAlchemy DB URI is generated to use a local db
165176

@@ -170,13 +181,18 @@ You can pass the arguments or use environment variables to set the network confi
170181
- `DATABASE_URL` - The Network database URL
171182
- `SECRET_KEY` - The secret key
172183

184+
## Deployment & CLI
185+
186+
[Please check the instruction for deployment and CLI here.](deployment.md)
187+
188+
173189
## Contributing
174190

175191
If you're interested in contributing, check out our [Contributor Guidelines](CONTRIBUTING.md).
176192

177193
## Support
178194

179-
For support in using this library, please join the **#lib_pygrid** Slack channel. If you’d like to follow along with any code changes to the library, please join the **#code_pygrid** Slack channel. [Click here to join our Slack community!](https://slack.openmined.org)
195+
For support in using this library, please join the **#support** Slack channel. [Click here to join our Slack community!](https://slack.openmined.org)
180196

181197
## License
182198

apps/domain/Dockerfile

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
FROM python:3.8
2+
3+
RUN mkdir /app
4+
WORKDIR /app
5+
6+
RUN apt-get update
7+
RUN apt-get install -y git python-dev python3-dev
8+
9+
RUN pip install poetry
10+
COPY poetry.lock pyproject.toml entrypoint.sh /app/
11+
COPY /src /app/src
12+
13+
WORKDIR /app/
14+
RUN poetry export -f requirements.txt --output requirements.txt --without-hashes
15+
RUN pip3 install -r requirements.txt
16+
17+
ENTRYPOINT ["sh", "entrypoint.sh"]
File renamed without changes.

apps/domain/entrypoint.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#!/bin/bash
2+
exec poetry run gunicorn --chdir ./src -k flask_sockets.worker --bind 0.0.0.0:$PORT wsgi:app \
3+
"$@"

0 commit comments

Comments
 (0)