-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathml.Rmd
More file actions
172 lines (112 loc) · 5.86 KB
/
ml.Rmd
File metadata and controls
172 lines (112 loc) · 5.86 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
title: "Machine Learning For Retail"
author: "Roby"
date: "8/5/2020"
output: html_document
---
This is a project from DqLab that conducts basic data science on some simple retail data and machine learning to produce product package recommendations that can solve stock problems and increase sales.
## About Store
DQLab.id Fashion is a fashion shop that sells various products such as jeans, shirts, cosmetics, and others. Although it is quite developed, but with the increasing number of competitors and many products whose stocks are still large, it certainly worries DQLab.id Fashion managers.
One solution is to create innovative packages. Where products that were previously unsold but have market share can even be packaged and sold.
## What We Need To Do
Several steps were taken to complete this project. First is prepare the library and download the required dataset. Then the raw data is cleaned up and scaled using standardization and normalization prior to modeling. And to realize this, using the `Apriori algorithm` of the `Arules` package throughout this project.
1. Get insight into the top 10 and bottom 10 of the products sold.
2. Get a list of all product package combinations with strong correlations.
3. Get a list of all product package combinations with specific items.
## Prepare Required
install.packages("arules")
library(arules)
### Import Dataset
```{r}
tabular_transaction <- read.transactions(file="data_input/transaksi_dqlab_retail.tsv", format="single", sep="\t", cols=c(1,2), skip=1)
write(tabular_transaction, file="data_input/project_retail_1.txt", sep=",")
```
### Data summaries can be seen to get initial understanding as follow:
```{r}
head(tabular_transaction)
```
```{r}
dim(tabular_transaction)
```
```{r}
summary(tabular_transaction)
```
## Get insight into the top 10 and bottom 10 of the products sold
### Initial Output: Top 10 Statistics
```{r}
data_item <- itemFrequency(tabular_transaction, type="absolute")
data_item <- sort(data_item, decreasing = TRUE)
data_item <- data_item[1:10]
data_item <- data.frame("Nama Produk"=names(data_item), "Jumlah"=data_item, row.names=NULL)
data_item
```
The results are saved in the `top10_item_retail.txt` file.
```{r}
write.csv(data_item, file="data_output/top10_item_retail.txt")
```
### Initial Output: Bottom 10 Statistics
```{r}
data_item2 <- itemFrequency(tabular_transaction, type="absolute")
data_item2 <- sort(data_item2, decreasing = FALSE)
data_item2 <- data_item2[1:10]
data_item2 <- data.frame("Nama Produk"=names(data_item2), "Jumlah"=data_item2, row.names=NULL)
data_item2
```
The results are saved in the `bottom10_item_retail.txt` file.
```{r}
write.csv(data_item2, file="data_output/bottom10_item_retail.txt")
```
## Get a list of all product package combinations with strong correlations
## Get interesting product combinations:
1. Have close associations or relationships.
2. Product combinations of at least 2 items, and a maximum of 3 items.
3. The combination of products that appear at least 10 of all transactions.
4. Have a minimum confidence level of 50 percent.
```{r}
apriori_rules <- apriori(tabular_transaction, parameter=list(supp=10/length(tabular_transaction), conf=0.5, minlen=2, maxlen=3))
apriori_rules <- head(sort(apriori_rules, by='lift', decreasing = TRUE),n=10)
inspect(apriori_rules)
```
The results are saved in the `retail_combination.txt` file.
```{r}
write(apriori_rules, file="data_output/retail_combination.txt")
```
## Get a list of all product package combinations with specific items
### Look for Product Packages that can be paired with a Slow-Moving Item
Slow-moving items are products whose sales movements are slow or not fast enough. This will be problematic if the product items are still piling up.
Sometimes this item may not necessarily sell, it's just that the price may not be good and rarely needed if sold in units. Now, if the units are not sold we need to find a strong association of these product items with other products so that if it is packaged it will be more attractive.
From the previous analysis `apriori_rules`, the two product items are `Tas Makeup` and `Baju Renang Pria Anak`. The manager wants to ask for a combination that can be bundled with the two products.
Each of these products was issued 3 rules with the strongest associations, so there are a total of 6 rules. The requirements of this strong association are still the same as those previously mentioned by the Manager, unless the confidence level is tried at a minimum level of 0.1.
```{r}
transaction_file <- "data_input/transaksi_dqlab_retail.tsv"
tabular_transaction2 <- read.transactions(
file = transaction_file,
format = "single",
sep = "\t",
cols = c(1,2),
skip = 1
)
transaction_amount <- length(tabular_transaction2)
minimal_appearance_count <- 10
apriori_rules_a <- apriori(
tabular_transaction2,
parameter= list(
supp=minimal_appearance_count/transaction_amount,
conf=0.1,
minlen=2,
maxlen=3)
)
#Filter
apriori_rules1 <- subset(apriori_rules_a, lift > 1 & rhs %in% "Tas Makeup")
apriori_rules1 <- sort(apriori_rules1, by='lift', decreasing = T)[1:3]
apriori_rules2 <- subset(apriori_rules_a, lift > 1 & rhs %in% "Baju Renang Pria Anak-anak")
apriori_rules2 <- sort(apriori_rules2, by='lift', decreasing = T)[1:3]
apriori_rules_a <- c(apriori_rules1, apriori_rules2)
inspect(apriori_rules_a)
```
The results are saved in the `retail_combination_slow_moving.txt` file.
```{r}
write(apriori_rules_a,file="data_output/retail_combination_slow_moving.txt")
```
## Conclusion
Finally, product package recommendations can helps managers to identify attractive product packages to be packaged so that they can ultimately increase the profitability and loyalty of DQLab.id Fashion customers.