Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
e2d77fc
initial import
Nov 2, 2018
5ec01c5
randomforest initial work
Nov 30, 2018
82f8b63
moved around include
Nov 30, 2018
f814358
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Nov 30, 2018
a412109
renamed impl file
Dec 4, 2018
add0231
comment update
Dec 4, 2018
196847c
almost done getting randomforest into a primitive
Dec 4, 2018
ce81381
Merge branch 'master' of https://github.com/ct-clmsn/phylanx into ran…
Dec 4, 2018
d89cd11
fixed compile issues with randomforest
Dec 4, 2018
8f7b4d7
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Dec 4, 2018
4f01400
recompile test
Dec 4, 2018
f1646cd
compilation fixes part 2
Dec 4, 2018
2bf0a0f
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Dec 4, 2018
9e27773
fixed compilation issues v3
Dec 5, 2018
ece2a10
fixed exaplanation message
Dec 5, 2018
d6b22c2
whitespace alterations to pass linter
Dec 5, 2018
3f285a8
fixed argument test and error message
Dec 5, 2018
37d5d90
err w/param test
Dec 5, 2018
10b93c1
more wspace fixes
Dec 5, 2018
d43b953
more wspace fixes
Dec 5, 2018
c4cd8cf
fixed more code formatting issues
Dec 5, 2018
ec5db06
added ifdefs
Dec 5, 2018
e83899f
updated ifndefs
Dec 5, 2018
45430b4
starting to add serialization
Dec 7, 2018
f580364
removed tuple-of-vectors
Dec 13, 2018
a7ddc25
replaced boost variant function calls for phylanx variant calls
Dec 14, 2018
d017dba
resolved issues storing randomforest implementation into phylanx::ir:…
Dec 14, 2018
422cd12
storage into phylanx types
Dec 16, 2018
5f41619
mapping fit/predict functions into primitive
Dec 16, 2018
3107bdb
implemented fit/predict for randomforest
Dec 17, 2018
fe87a07
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Dec 17, 2018
ebd7427
stores forest classes properly now
Dec 17, 2018
5d4c273
added randomforest unit test
Dec 17, 2018
ff90ed7
fixed arg should be double issue
Dec 17, 2018
5cc50ff
added randomforest unit test to example dir
Dec 17, 2018
0614c40
print output
Dec 17, 2018
8fdf583
code formatting issues resolved
Dec 17, 2018
5be1847
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Dec 18, 2018
5c25a4d
renamed cpp demo
Dec 18, 2018
f11855f
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Dec 19, 2018
1a5e83d
added randomforest
Dec 19, 2018
bbd78a5
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Jan 5, 2019
84d9c49
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Jan 12, 2019
3c4b5e9
adding randomforest cpp test
Jan 22, 2019
437ed4d
fixing issues in randomforest
Jan 22, 2019
21083cc
fixing src issues
Jan 22, 2019
50d7c20
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Jan 22, 2019
62c6552
update cmake
Jan 22, 2019
03fcf63
added example code
Jan 22, 2019
d3e97d5
resolved merge
Jan 22, 2019
536bf90
space modifications
Jan 22, 2019
9d9eb5f
added plugin
Jan 22, 2019
9664882
cd ../hpxMerge branch 'master' of https://github.com/STEllAR-GROUP/ph…
Jan 26, 2019
e5854f0
clean compile rf
Jan 31, 2019
0107053
added license
Jan 31, 2019
1f9531d
added cpp examples
Jan 31, 2019
92ee48f
updated cmake
Jan 31, 2019
eba9fe1
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Feb 14, 2019
a8b4813
git Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx…
Feb 15, 2019
c6080a0
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Feb 19, 2019
571794c
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Feb 27, 2019
8ce305f
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Feb 28, 2019
00d3660
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Mar 1, 2019
95a6778
git statusMerge branch 'master' of https://github.com/STEllAR-GROUP/p…
Mar 2, 2019
e577b51
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Mar 3, 2019
be71a03
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Mar 3, 2019
3a0d51f
git remote -vMerge branch 'master' of https://github.com/STEllAR-GROU…
Mar 5, 2019
7d5dfe3
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Mar 6, 2019
1abf34a
cd buildMerge branch 'master' of https://github.com/STEllAR-GROUP/phy…
Mar 7, 2019
3f99de0
added phylanx compiled randomforest
Mar 7, 2019
c231959
print out the line number in python where a multi-target (tuple) assi…
Mar 7, 2019
eba2ff1
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Mar 8, 2019
a3898de
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Mar 19, 2019
aba9385
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Mar 20, 2019
83033db
cd buildMerge branch 'master' of https://github.com/STEllAR-GROUP/phy…
Mar 21, 2019
a3c5081
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Mar 22, 2019
583026b
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Mar 26, 2019
0491875
cd ../hpxMerge branch 'master' of https://github.com/STEllAR-GROUP/ph…
Apr 20, 2019
dc9c912
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Apr 23, 2019
7426f87
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Apr 24, 2019
f64c618
randomforest with new node types works
May 14, 2019
e0e3b29
working version
May 17, 2019
8640668
clean compile of phylanx plugin wrapper
May 18, 2019
3228b90
cleaning up phylanx plugin wrapper
May 18, 2019
f85d0fd
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
May 18, 2019
1455f64
fixing up more flake8 issues
May 26, 2019
c30a2d1
fixed more spacing issues
Jun 5, 2019
95b7df1
fixed more spacing issues
Jun 5, 2019
46cbae6
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Jun 5, 2019
95b250c
cd ../hpxMerge branch 'master' of https://github.com/STEllAR-GROUP/ph…
Jul 10, 2019
bb14237
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Jul 25, 2019
0b61b9d
updated flake8 issues
Jul 25, 2019
cacffb9
format fixes for ci
Jul 31, 2019
48af936
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Jul 31, 2019
47f3624
ci inspection test
Jul 31, 2019
8b71581
more ci edits
Jul 31, 2019
b272f63
formatting fixes
Aug 11, 2019
2201286
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Aug 11, 2019
3fd5722
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Sep 7, 2019
e6bec46
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Sep 29, 2019
8d3be78
Merge branch 'master' of https://github.com/STEllAR-GROUP/phylanx int…
Oct 12, 2019
580686d
formatting fixes
Oct 12, 2019
6bfb509
fixed more formatting issues
Oct 12, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions examples/algorithms/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ set(subdirs
lir
lra
kmeans
randomforest
nn
)

Expand Down
36 changes: 36 additions & 0 deletions examples/algorithms/randomforest/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Copyright (c) 2017 Hartmut Kaiser
#
# Distributed under the Boost Software License, Version 1.0. (See accompanying
# file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

set(example_programs
randomforest
simple_randomforest
)

foreach(example_program ${example_programs})

set(${example_program}_FLAGS DEPENDENCIES iostreams_component)

set(sources ${example_program}.cpp)

source_group("Source Files" FILES ${sources})

# add example executable
add_phylanx_executable(${example_program}
SOURCES ${sources}
${${example_program}_FLAGS}
FOLDER "Examples/Algorithms")

# add a custom target for this example
add_phylanx_pseudo_target(examples.algorithms_.${example_program})

# make pseudo-targets depend on master pseudo-target
add_phylanx_pseudo_dependencies(examples.algorithms_
examples.algorithms_.${example_program})

# add dependencies to pseudo-target
add_phylanx_pseudo_dependencies(examples.algorithms_.${example_program}
${example_program}_exe)
endforeach()

244 changes: 244 additions & 0 deletions examples/algorithms/randomforest/phylanx_randomforest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,244 @@
# Copyright (c) 2018 Christopher Taylor
#
# Distributed under the Boost Software License, Version 1.0. (See accompanying
# file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#
# significant re-working of the algorithm implementation found on this site:
#
# https://machinelearningmastery.com/implement-random-forest-scratch-python/
#

from numpy import floor, argsort, sqrt
from numpy import float64, int64, zeros
from numpy import argmax, inf, genfromtxt
from numpy import vstack, iinfo, finfo, unique
from numpy.random import randint, rand
from phylanx import Phylanx


def test_split(idx, val, dataset):
left, right = list(), list()
for i in range(dataset.shape[0]):
row = dataset[i, :]
if row[idx] < val:
left.append(row)
else:
right.append(row)

if len(left) < 1 and len(right) > 0:
return (zeros((0,)), vstack(right))
elif len(left) > 0 and len(right) < 0:
return (vstack(left), zeros((0,)))

return (vstack(left), vstack(right))


def gini_index(groups, classes):
groups_len = list([len(x) for x in groups])
n_instances = float64(sum(groups_len))
gini = 0.0

p = zeros(len(classes), dtype=float64)
for (group, group_len) in filter(
lambda x: x[1] > 0, zip(groups, groups_len)
):
for row in group:
p[classes[int64(row[-1])]] += 1.0
score = sum(((p / float64(group_len)) ** 2.0))
gini += (1.0 - score) * float64(group_len / n_instances)
p[:] = 0.0

return gini


def get_split(dataset, n_features, classes):
cls_values = zeros(len(classes), dtype=int64)
for i, v in enumerate(classes):
cls_values[v] = i

b_idx = iinfo(int64).max
b_val = finfo(float64).max
b_score = finfo(float64).max
b_groups = (list(), list())
idx_w = randint(0, dataset.shape[1] - 1, size=dataset.shape[1] - 1)
idx = zeros(dataset.shape[1] - 1, dtype=int64)

for i in range(dataset.shape[1] - 1):
idx[i] = i

features = idx[argsort(idx_w)][:n_features]
for feature in features:
for r in range(dataset.shape[0]):
groups = test_split(feature, dataset[r, feature], dataset)
gini = gini_index(groups, cls_values)
if gini < b_score:
b_idx = feature
b_val = dataset[r, feature]
b_score = gini
b_groups = groups

return {'index': b_idx,
'value': b_val,
'groups': b_groups,
'lw': inf,
'rw': inf}


def to_terminal(group, classes):
outcome_hist = zeros(len(classes), dtype=int64)
for g in group:
k = int64(g[-1])
outcome_hist[classes[k]] += 1

return argmax(outcome_hist)


def split(node, max_depth, min_sz, n_features, depth, classes):
GRP, LFT, RHT, LW, RW = 'groups', 'left', 'right', 'lw', 'rw'

(left, right) = node[GRP]
del(node[GRP])

if left.shape == (0,) or right.shape == (0,):
if left.shape == (0,):
term = to_terminal(right, classes)
else:
term = to_terminal(left, classes)

node[LW] = term
node[RW] = term
return

if depth >= max_depth:
lterm = to_terminal(left, classes)
rterm = to_terminal(right, classes)
node[LW] = lterm
node[RW] = rterm
return

if len(left) <= min_sz:
node[LW] = to_terminal(left, classes)
else:
node[LFT] = get_split(left, n_features, classes)
split(node[LFT], max_depth, min_sz, n_features, depth + 1, classes)

if len(right) <= min_sz:
node[RW] = to_terminal(right, classes)
else:
node[RHT] = get_split(right, n_features, classes)
split(node[RHT], max_depth, min_sz, n_features, depth + 1, classes)


def build_tree(train, max_depth, min_sz, n_features, classes):
root = get_split(train, n_features, classes)
split(root, max_depth, min_sz, n_features, 1, classes)
return root


def node_predict(node, r):
if r[node['index']] < node['value']:
if node['lw'] == inf:
return node_predict(node['left'], r)
else:
return node['lw']
else:
if node['rw'] == inf:
return node_predict(node['right'], r)
else:
return node['rw']


def subsample(dataset, ratio):
n_sample = int64(floor(len(dataset) * ratio))
idx_w = list(map(lambda x: rand(), range(dataset.shape[0])))
idx_s = argsort(idx_w)
sample = vstack(map(lambda x: dataset[idx_s[x], :], range(n_sample)))
return sample


def bagging_predict(trees, row, classes):
predictions = list(map(lambda tree: node_predict(tree, row), trees))
# parallel
#
# predictions =
# list(map(lambda tree:
# node_predict(trees[tree], row),
# prange(len(trees)))
#
classes_vec = zeros(len(classes), dtype=int64)
for p in predictions:
classes_vec[classes[p]] += 1

idx = argmax(classes_vec)
for (k, v) in classes.items():
if v == idx:
return k
return inf


@Phylanx
def random_forest(train, max_depth, min_sz, sample_sz, n_trees):
cls = unique(train[:, -1])
classes = dict()
print(cls.shape)
for c in range(cls.shape[0]):
classes[int64(cls[c])] = c

n_features = int64(floor(sqrt(dataset.shape[0])))
trees = list(
map(lambda i:
build_tree(
subsample(train, sample_sz),
max_depth,
min_sz,
n_features,
classes
),
range(n_trees))
)

# parallel
#
# trees =
# list(map(lambda i:
# build_tree(subsample(train, sample_sz)
# , max_depth, min_sz, n_features)
# , prange(n_trees)))
#
return {'trees': trees, 'classes': classes}


@Phylanx
def predict(randomforest, test):
trees, classes = randomforest['trees'], randomforest['classes']
predictions = list(
map(lambda row:
bagging_predict(
trees,
test[row, :],
classes),
range(len(test)))
)

return predictions


if __name__ == "__main__":
file_name = "../datasets/breast_cancer.csv"
dataset = genfromtxt(file_name, skip_header=1, delimiter=",")
max_depth = 10
min_size = 1
sample_size = 1.0
n_trees = [1, 5, 10]
train = int64(dataset.shape[0] / 2)
trees = random_forest(
dataset[:train, :],
max_depth,
min_size,
sample_size,
n_trees[1]
)

print('predict')
predict = predict(trees, dataset[train:, :])
print(predict)
109 changes: 109 additions & 0 deletions examples/algorithms/randomforest/randomforest.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
// Copyright (c) 2018 Christopher Taylor
//
// Distributed under the Boost Software License, Version 1.0.0. (See accompanying
// file LICENSE_1_0.0.txt or copy at http://www.boost.org/LICENSE_1_0.0.txt)

#include <phylanx/phylanx.hpp>
#include <phylanx/plugins/algorithms/impl/randomforest.hpp>
#include <phylanx/version.hpp>
#include <phylanx/config.hpp>
#include <phylanx/config/version.hpp>

#include <blaze/Blaze.h>

#include <hpx/hpx_init.hpp>
#include <hpx/include/agas.hpp>
#include <hpx/runtime_fwd.hpp>


#include <blaze/Math.h>
#include <boost/program_options.hpp>

#include <cstddef>
#include <cstdint>
#include <iostream>
#include <map>
#include <string>
#include <utility>
#include <vector>
#include <iostream>

#include <phylanx/ir/node_data.hpp>
#include "impl/randomforest.hpp"

using namespace phylanx::algorithms::impl;

int hpx_main(boost::program_options::variables_map& vm)
{
// evaluate generated execution tree
auto ntrees = vm["trees"].as<std::uint64_t>();
auto mnsize = vm["minsize"].as<std::uint64_t>();
auto mxdepth = vm["maxdepth"].as<std::uint64_t>();
auto samplesize = vm["samples"].as<double>();

using namespace phylanx::algorithms::impl;

blaze::DynamicMatrix<double> train{ { 1.0, 1.0, 1.0, 1.0, 0.0 }
, { 1.0, 1.0, 1.0, 1.0, 0.0 }
, { 1.0, 1.0, 1.0, 1.0, 1.0 }
, { 1.0, 1.0, 1.0, 1.0, 1.0 }
};

blaze::DynamicVector<double> labels { 1.0, 1.0, 1.0, 1.0 };

auto const train_submat_data = blaze::submatrix( train, 0UL
, 0UL, train.rows(), train.columns()-1UL );

randomforest_impl rf(ntrees);

// Measure execution time
hpx::util::high_resolution_timer traintimer;

rf.fit(train, labels, mxdepth, mnsize, samplesize);

auto trainelapsed = traintimer.elapsed();

blaze::DynamicVector<double> results(train.rows());

// Make sure all counters are properly initialized,
// don't reset current counter values
hpx::reinit_active_counters(false);

hpx::util::high_resolution_timer predicttimer;

rf.predict(train, results);

auto predictelapsed = predicttimer.elapsed();

// Make sure all counters are properly initialized, don't reset current
// counter values
hpx::reinit_active_counters(false);

std::cout << "fit lapsed\t" << trainelapsed << std::endl;
std::cout << "predict lapsed\t" << predictelapsed << std::endl;

for(auto & r : results) {
std::cout << r << std::endl;
}

return hpx::finalize();
}

int main(int argc, char* argv[])
{
// command line handling
boost::program_options::options_description desc(
"usage: randomforest [options]");

desc.add_options()("trees,t",
boost::program_options::value<std::uint64_t>()->default_value(5),
"number of trees (default: 5)")("samples,s",
boost::program_options::value<double>()->default_value(1.0),
"ratio of sample size (default: 1.0)")("minsize,m",
boost::program_options::value<std::uint64_t>()->default_value(1),
"min size (default: 1")("maxdepth,d",
boost::program_options::value<std::uint64_t>()->default_value(10),
"max depth (default: 10)");

return hpx::init(desc, argc, argv);
}
Loading