Skip to content

Commit 49e2ed7

Browse files
Merge pull request #7 from LCSB-BioCore/develop
Develop
2 parents bd34c68 + 677d834 commit 49e2ed7

11 files changed

Lines changed: 846 additions & 8 deletions

File tree

.github/workflows/docs.yml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# ref: https://juliadocs.github.io/Documenter.jl/stable/man/hosting/#GitHub-Actions-1
2+
name: Documentation
3+
4+
on:
5+
push:
6+
branches:
7+
- develop
8+
tags: '*'
9+
pull_request:
10+
release:
11+
types: [published, created]
12+
13+
jobs:
14+
build:
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/checkout@v2
18+
- uses: julia-actions/setup-julia@latest
19+
with:
20+
version: 1.5
21+
- name: Install dependencies
22+
run: julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
23+
- name: Build and deploy
24+
env:
25+
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} # For authentication with SSH deploy key
26+
run: julia --project=docs/ docs/make.jl

.gitlab-ci.yml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
image: $CI_REGISTRY/r3/docker/julia-custom
2+
3+
stages:
4+
- build
5+
6+
variables:
7+
GIT_STRATEGY: clone
8+
9+
.global_settings: &global_settings
10+
tags:
11+
- artenolis
12+
- slave01
13+
14+
.global_testing: &global_testing
15+
script:
16+
- $ARTENOLIS_SOFT_PATH/julia/$JULIA_VER/bin/julia --color=yes --project=@. -e 'import Pkg; Pkg.test(; coverage = true)'
17+
18+
.global_testing_win: &global_testing_win
19+
script:
20+
- Invoke-Expression $Env:ARTENOLIS_SOFT_PATH"\julia\"$Env:JULIA_VER"\bin\julia --color=yes --project=@. -e 'import Pkg; Pkg.test(; coverage = true)'"
21+
22+
linux julia v1.5:
23+
stage: build
24+
variables:
25+
JULIA_VER: "v1.5.3"
26+
<<: *global_settings
27+
<<: *global_testing
28+
29+
windows10:
30+
stage: build
31+
tags:
32+
- artenolis
33+
- windows10
34+
variables:
35+
JULIA_VER: "v1.5.3"
36+
<<: *global_testing_win
37+
38+
windows8:
39+
stage: build
40+
tags:
41+
- artenolis
42+
- windows8
43+
variables:
44+
JULIA_VER: "v1.5.3"
45+
<<: *global_testing_win

.travis.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
language: julia
2+
3+
branches:
4+
only:
5+
- develop
6+
- master
7+
8+
os:
9+
- linux
10+
11+
julia:
12+
- 1.0
13+
- 1.5
14+
15+
notifications:
16+
email: false

README.md

Lines changed: 136 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,140 @@
1-
# DiDa.jl
1+
# <img src="docs/src/assets/logo.svg" alt="DiDa.jl logo" height="32px"> DiDa.jl
22

3-
Simple Distributed Data manipulation and processing routines for Julia.
3+
4+
| Build status | Documentation |
5+
|:---:|:---:|
6+
| ![CI](https://github.com/LCSB-BioCore/DiDa.jl/workflows/CI/badge.svg?branch=develop) | [![doc](https://img.shields.io/badge/docs-stable-blue)](https://lcsb-biocore.github.io/DiDa.jl/stable/) [![doc](https://img.shields.io/badge/docs-dev-blue)](https://lcsb-biocore.github.io/DiDa.jl/dev/) |
7+
8+
Simple distributed data manipulation and processing routines for Julia.
49

510
This was originally developed for
6-
[GigaSOM.jl](https://github.com/LCSB-BioCore/GigaSOM.jl), this package contains
7-
the separated-out lightweight distributed-processing framework that can be used
8-
with GigaSOM.
11+
[`GigaSOM.jl`](https://github.com/LCSB-BioCore/GigaSOM.jl); DiDa.jl package
12+
contains the separated-out lightweight distributed-processing framework that
13+
was used in `GigaSOM.jl`.
14+
15+
## Why?
16+
17+
DiDa.jl provides a very simple, imperative and straightforward way to move your
18+
data around a cluster of Julia processes created by the
19+
[`Distributed`](https://docs.julialang.org/en/v1/stdlib/Distributed/) package,
20+
and run computation on the distributed data pieces. The main aim of the package
21+
is to avoid anything complicated-- the first version used in
22+
[GigaSOM](https://github.com/LCSB-BioCore/GigaSOM.jl) had just under 500 lines
23+
of relatively straightforward code (including the doc-comments).
24+
25+
Compared to plain `Distributed` API, you get more straightforward data
26+
manipulation primitives, some extra control over the precise place where code
27+
is executed, and a few high-level functions. These include a distributed
28+
version of `mapreduce`, simpler work-alike of the
29+
[DistributedArrays](https://github.com/JuliaParallel/DistributedArrays.jl)
30+
functionality, and easy-to-use distributed dataset saving and loading.
31+
32+
Most importantly, the main motivation behind the package is that the
33+
distributed processing should be simple and accessible.
34+
35+
## Brief how-to
36+
37+
The package provides a few very basic primitives that lightly wrap the
38+
`Distributed` package functions `remotecall` and `fetch`. The most basic one is
39+
`save_at`, which takes a worker ID, variable name and variable content, and
40+
saves the content to the variable on the selected worker. `get_from` works the
41+
same way, but takes the data back from the worker.
42+
43+
You can thus send some random array to a few distributed workers:
44+
45+
```julia
46+
julia> using Distributed, DiDa
47+
48+
julia> addprocs(2)
49+
2-element Array{Int64,1}:
50+
2
51+
3
52+
53+
julia> @everywhere using DiDa
54+
55+
julia> save_at(2, :x, randn(10,10))
56+
Future(2, 1, 4, nothing)
57+
```
58+
59+
The `Future` returned from `save_at` is the normal Julia future from
60+
`Distributed`, you can even `fetch` it to wait until the operation is really
61+
done on the other side. Fetching the data is done the same way:
62+
63+
```julia
64+
julia> get_from(2,:x)
65+
Future(2, 1, 15, nothing)
66+
67+
julia> get_val_from(2,:x) # auto-fetch()ing variant
68+
10×10 Array{Float64,2}:
69+
-0.850788 0.946637 1.78006
70+
-0.49596 0.497829 -2.03013
71+
72+
```
73+
74+
All commands support full quoting, which allows you to easily distinguish
75+
between code parts that are executed locally and remotely:
76+
77+
```julia
78+
julia> save_at(3, :x, randn(1000,1000)) # generates a matrix locally and sends it to the remote worker
79+
80+
julia> save_at(3, :x, :(randn(1000,1000))) # generates a matrix right on the remote worker and saves it there
81+
82+
julia> get_val_from(3, :x) # retrieves the generated matrix and fetches it
83+
84+
85+
julia> get_val_from(3, :(randn(1000,1000))) # generates the matrix on the worker and fetches the data
86+
87+
```
88+
89+
Notably, this is different from the approach taken by `DistributedArrays` and
90+
similar packages -- all data manipulation is explicit, and any data type is
91+
supported as long as it can be moved among workers by the `Distributed`
92+
package. This helps with various highly non-array-ish data, such as large text
93+
corpora and graphs.
94+
95+
There are various goodies for easy work with matrix-style data, namely
96+
scattering, gathering and running distributed algorithms:
97+
98+
```julia
99+
julia> x = randn(1000,3)
100+
1000×3 Array{Float64,2}:
101+
-0.992481 0.551064 1.67424
102+
-0.751304 -0.845055 0.105311
103+
-0.712687 0.165619 -0.469055
104+
105+
106+
julia> dataset = scatter_array(:myDataset, x, workers()) # sends slices of the array to workers
107+
Dinfo(:myDataset, [2, 3]) # a helper for holding the variable name and the used workers together
108+
109+
julia> get_val_from(3, :(size(myDataset)))
110+
(500, 3) # there's really only half of the data
111+
112+
julia> dmapreduce(dataset, sum, +) # MapReduce-style sum of all data
113+
-51.64369103751014
114+
115+
julia> dstat(dataset, [1,2,3]) # get means and sdevs in individual columns
116+
([-0.030724038974465212, 0.007300925745200863, -0.028220577808245786],
117+
[0.9917470012495775, 0.9975120525455358, 1.000243845434252])
118+
119+
julia> dmedian(dataset, [1,2,3]) # distributed iterative median in columns
120+
3-element Array{Float64,1}:
121+
0.004742259615849834
122+
0.039043266340824986
123+
-0.05367799062404967
124+
125+
julia> dtransform(dataset, x -> 2 .^ x) # exponentiate all data (medians should now be around 1)
126+
Dinfo(:myDataset, [2, 3])
127+
128+
julia> gather_array(dataset) # download the data from workers to a sing
129+
1000×3 Array{Float64,2}:
130+
0.502613 1.46517 3.1915
131+
0.594066 0.55669 1.07573
132+
0.610183 1.12165 0.722438
133+
134+
```
135+
136+
## What does the name `DiDa` mean?
137+
138+
**Di**stributed **Da**ta.
9139

140+
There is no consensus on how to pronounce the shortcut.

docs/Project.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[deps]
2+
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
3+
DocumenterTools = "35a29f4d-8980-5a13-9543-d66fff28ecb8"

docs/make.jl

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
using Documenter, DiDa
2+
3+
makedocs(modules = [DiDa],
4+
clean = false,
5+
format = Documenter.HTML(prettyurls = !("local" in ARGS)),
6+
sitename = "DiDa.jl",
7+
authors = "The developers of DiDa.jl",
8+
linkcheck = !("skiplinks" in ARGS),
9+
pages = [
10+
"Documentation" => "index.md",
11+
"Tutorial" => "tutorial.md",
12+
"Function reference" => "functions.md",
13+
],
14+
)
15+
16+
deploydocs(
17+
repo = "github.com/LCSB-BioCore/DiDa.jl.git",
18+
target = "build",
19+
branch = "gh-pages",
20+
devbranch = "develop",
21+
versions = "stable" => "v^",
22+
)

docs/src/assets/logo.svg

Lines changed: 126 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)