You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -480,17 +480,21 @@ General:
480
480
-[ ] helpers to get schema (generate pgdump/mysqldump commands, get index stats, ...)
481
481
-[x] protect against foreign key cycles. Both explicits and implicits (avoid generating implicits that would end up causing loops)
482
482
-[x] detect selfpointing foreign keys
483
-
-[ ] have some graph to show --coin-flip-percent with --bulk-size
484
483
-[x] using --values-freq-map to make query parameters work
485
484
485
+
Sampling:
486
+
-[x] normal law through box-muller, select sqrt(-2*log(random()))*sin(2*pi()*random());
487
+
-[x] pareto laws
488
+
-[ ] have some graph to show --coin-flip-percent with --bulk-size
489
+
486
490
Stepping stones to fully reproduce cardinalities:
487
491
-[x] incorporating arbitrary values with fixed frequency into the bulk inserts
488
492
-[x] table-per-table override for --rows, --null-frequency
489
493
-[ ] coin-flip-percent per relationship basis. Current thought: adding it to --binomial this way --binomial="parent=child:70" to set the coinflip to 70 for this link
490
494
-[ ] parse col/index stats (cardinality + most_common_elems + most_common_freqs for postgres, cardinalities for MySQL)
491
495
492
496
Without clear plan:
493
-
-[] More random algorithms (as of now, no good implementations has been found for pareto that wouldn't provoke huge runtime and/or huge memory consumption, unless implemented fields are restricted to integers)
497
+
-[x] More random algorithms (as of now, no good implementations has been found for pareto that wouldn't provoke huge runtime and/or huge memory consumption, unless implemented fields are restricted to integers)
494
498
-[ ] guessing joins on subqueries/cte. Joins wouldn't be based on columns, but on expressions
495
499
-[ ] be able to "suplement" existing foreign keys with additional columns ?
returnfmt.Sprintf("(SELECT %s, ROW_NUMBER() OVER (ORDER BY %s) as rownumber FROM %s.%s ) f", escapedFields, escapedFields, Escape(schema), Escape(table))
Copy file name to clipboardExpand all lines: generate/generate.go
+26-12Lines changed: 26 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -17,33 +17,40 @@ import (
17
17
)
18
18
19
19
typeInsertstruct {
20
-
table*db.Table
21
-
writer io.Writer
22
-
NotifyChanchanint64
23
-
fklinksForeignKeyLinks
24
-
workersCountint
25
-
insertMutex sync.Mutex
26
-
maxTextSizeint64
27
-
uuidVersionint
28
-
maxRetriesint
29
-
frequencies frequency.ColumnFrequency
20
+
table*db.Table
21
+
writer io.Writer
22
+
NotifyChanchanint64
23
+
fklinksForeignKeyLinks
24
+
workersCountint
25
+
insertMutex sync.Mutex
26
+
maxTextSizeint64
27
+
uuidVersionint
28
+
maxRetriesint
29
+
frequencies frequency.ColumnFrequency
30
+
expectedTableSizeint64
30
31
}
31
32
32
33
typeForeignKeyLinksstruct {
33
-
DefaultRelationshipstring`name:"default-relationship" help:"Will define the default foreign-key relationship to apply. Possible values: ${BinomialFlag},${SequentialFlag}. The default relation can be overriden with other parameters --${BinomialFlag} or --${SequentialFlag}" enum:"${BinomialFlag},${SequentialFlag}" default:"${BinomialFlag}"`
34
+
DefaultRelationshipstring`name:"default-relationship" help:"Will define the default foreign-key relationship to apply. Possible values: ${BinomialFlag},${SequentialFlag}. The default relation can be overriden with other parameters --${BinomialFlag} or --${SequentialFlag}" enum:"${BinomialFlag},${SequentialFlag},${NormalFlag},${ParetoFlag}" default:"${BinomialFlag}"`
34
35
Binomialmap[string]string` help:"Defines a 1-N foreign key relationships using repeated coin flips. Postgres' tablesamples Bernouilli or mysql RAND() < 0.1 (can be tuned with --coin-flip-percent). Format should be \"parent_table=child_table\" E.g: --${BinomialFlag}=\"customers=orders;orders=items\""`
35
36
Sequentialmap[string]string`name:"sequential" help:"Defines a sequential foreign key links relationships, using SELECT ... LIMIT x OFFET y. Format should be \"parent_table=child_table\" E.g: --${SequentialFlag}=\"citizens=ssns\""`
36
37
CoinFlipPercentfloat64`name:"coin-flip-percent" help:"When used with ${BinomialFlag}, it will set the likeliness of each rows to be sampled or not. 10 would mean each rows have only 10%% chance to be selected when sampling a parent table. Using large values will favor hot rows: the coin flips are done with a table full scan, with a limit set at --bulk-size, so with a large percent chance most of the time the first rows will be selected. No effects when used with --${SequentialFlag}. Lower value (e.g 0.001) will also slow down the sampling speed" default:"1"`
38
+
Normalmap[string]string`help:"Defines a 1-N foreign key relationships using box-muller transformation to provide normal distribution"`
39
+
Paretomap[string]string`help:"Defines a 1-N foreign key relationships using zipf (pareto) distribution"`
0 commit comments