Kolibrie/changes.txt at main · StreamIntelligenceLab/Kolibrie · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
0.2.0
1. Add support multiple window + static data for RSP engine
2. Modify webui to support stream ingestion for RSP
3. Refactoring of materialisation & knowledge graph code
4. Fixes DStream and IStream R2S operators
5. Adds Reasoning support to RSP (single window materialization)
6. Add fraud detection system scenario and fix a bux with windowing
7. Fix a problem with combination_ml example by creating temperature predictor
8. Cover reasoning with tests
9. Fix a bug in reasoning related to filtering
10. Extend a bit test with reasoning
11. Add RDF-star and SPARQL-star support (quoted triples, Turtle-star/N-Triples-star parsing, SPARQL-star querying)
12. Add SPARQL-star built-in functions: TRIPLE, SUBJECT, PREDICATE, OBJECT, isTRIPLE
13. Add DELETE keyword support (basic and DELETE WHERE)
14. Fix INSERT to handle both normal and quoted triples
15. Add N-Triples-star and Turtle-star serialization (generate_ntriples, generate_turtle)
16. Improve Streamertail optimizer with quoted triple cardinality estimation
17. Fix ArithmeticExpression support in FILTER (properly store and evaluate AST instead of raw strings)
18. Add probabilistic RULE syntax with PROB annotations (combination, threshold, confidence)
19. Add ProbabilityStore for per-triple confidence score tracking
20. Add probabilistic semi-naive materialisation algorithm
21. Fix literal-in-quoted-triple bug in parse_ntriples_line
22. Add provenance semiring framework and TagStore for semiring-based reasoning
23. Remove old probabilistic reasoning path and migrate RULE PROB inference to provenance-based materialisation
24. Update combined syntax and combined rules examples to use provenance reasoning
25. Add stratified negation-as-failure (NAF) support: `negative_premise` field in Rule, safety check, `try_add_rule`
26. Add NOT body atom syntax in SPARQL rule parser
27. Add single-negative-stratum evaluation in provenance semi-naive materialisation
28. Add `WmcProvenance` (exact DNF-based WMC with signed literals, De Morgan negation, contradictory-clause pruning)
29. Add `TopKProofs::negate()` approximate complement via synthetic probability seed
30. Add polarity-aware RDF-star explanation export (`prob:hasNegatedSeed`)
31. Add SDD (Sentential Decision Diagram) engine: `SddManager`, vtree, apply, negate, WMC — O(|SDD|) inference
32. Add `SddProvenance` implementing `Provenance` trait backed by SDD
33. Add `PROB(combination=sdd)` parser arm
34. Rename `WmcProvenance` to `DnfWmcProvenance` with backward-compat type alias
35. Add `pub mod sdd` to shared crate
36. Add `SeedSpec`, `ExclusiveChoice`, and differentiable SDD utilities for neural fact seeding
37. Extend `SddManager` with explicit negative weights, variable kinds, and `exactly_one` support
38. Add neural training query structures: loss functions, optimizer kinds, and training clause metadata
39. Add seeded SDD materialisation bridge and semi-naive inference with initial tags
40. Add Candle-based CPU `MlpNeuralPredicate` with binary and categorical outputs
41. Add ML feature loader for numeric training rows from SPARQL queries
42. Add end-to-end neural training execution with exact SDD probabilities and gradient backpropagation
43. Add parser support for first-class neural training and wire training into combined rule execution
44. Add tests for exclusive SDD semantics, differentiable WMC gradients, feature loading, parser coverage, and end-to-end ML training
45. Add first-class neurosymbolic syntax: `MODEL`, `NEURAL RELATION`, and `TRAIN NEURAL RELATION`
46. Extend parser and query model to store neural declarations alongside `PREFIX`, `RULE`, and normal queries
47. Add neural relation registry and model artifact tracking in `SparqlDatabase`
48. Add runtime support to materialize neural relations as normal RDF predicates in `WHERE` and `RULE` patterns
49. Keep `ML.PREDICT` compatibility lowering into the first-class neural relation path
50. Add relation-driven training with `DATA { ... }` and fallback training with `QUERY { SELECT ... }`
51. Rewrite MNIST neurosymbolic example to use the new first-class neural syntax
52. Add parser and runtime tests for first-class neural syntax, `ML.PREDICT` alias lowering, and neural relation execution
53. Remove the legacy public `ML.TRAIN(...)` compatibility syntax in favor of `TRAIN NEURAL RELATION`
54. Extend `ML.PREDICT` with Candle-backed execution for models trained via `TRAIN NEURAL RELATION`, automatic rule-time materialization of ML outputs, Python fallback for legacy models, and probability companion facts for binary predictions
55. Small refactor of SparqlDatabase
56. Add cross-window SDS/SDS+ representation for RSP-QL reasoning under annotated predicates
57. Add `ExpirationProvenance` semiring with max/min expiry propagation for cross-window entailment
58. Add naive cross-window SDS+ materialisation algorithm for full recomputation
59. Add incremental cross-window SDS+ materialisation algorithm with `SdsWithExpiry`
60. Fix provenance semi-naive materialisation to re-trigger derivations when existing fact tags improve
61. Add explicit initial-delta API for provenance semi-naive reasoning so incremental SDS+ starts from `D_new`
62. Extend N3 document parser for shared prefix blocks, multiple rules, full IRI terms, optional final conclusion dots, and leftover-input validation
63. Add SDS-aware N3 rule parser wrapper with `WindowContext` and component/window IRI discovery
64. Add correctness tests comparing naive and incremental cross-window SDS+ outputs and expiry behavior
65. Add RSP builder API for opt-in cross-window rules via `add_cross_window_rules`
66. Integrate cross-window SDS+ reasoning into the RSP engine for `Triple` + `SimpleR2R`
67. Preserve event timestamps in RSP window contents for SDS construction
68. Add selectable cross-window reasoning mode for RSP evaluation: naive vs incremental
69. Add RSP integration tests for cross-window derived facts, no-rule baseline, and expiry behavior
70. Add CityBench-inspired cross-window benchmarks for naive vs incremental SDS+ reasoning
71. Add RSP-engine CityBench-inspired benchmark using generated streams, S2R windowing, and cross-window rules
72. Add family-tree cross-window SDS+ benchmark with parentOf facts in SDS stream 1 and all other family facts in SDS stream 2
73. Add old/new data-ratio evaluation for family-tree benchmark to compare incremental update against naive from-scratch recomputation
74. Add first-class top-level `ML.PREDICT` syntax outside `RULE`, with neural-relation-based materialization after training, ambiguity checks for model-to-relation mapping, parser/runtime tests, and updated `predict_after_train` example
75. Clean Python bindings API by exposing `kolibrie.SparqlDatabase`, `KnowledgeGraph`, and related classes without `Py*` prefixes; add `SparqlDatabase.load_file(path, format=None)` with Turtle, N-Triples, and RDF/XML dispatch; update Python examples
76. Fix Streamertail star-join planning to avoid object-position star explosions in path queries
77. Add graph-aware `DatasetIndex` with `GraphId`, `Quad`, `QuadPattern`, graph-scoped indexes, default-graph compatibility methods, and named-graph lookup support
78. Migrate `SparqlDatabase` canonical storage from `UnifiedIndex`/`BTreeSet<Triple>` to `DatasetIndex`, keeping triples as the default-graph public API
79. Add default and named graph insert/delete/query paths, N-Quads parsing/serialization, graph clearing, and same-triple-in-multiple-graphs handling
80. Add SPARQL named graph execution support for `GRAPH <g>`, `GRAPH ?g`, `FROM NAMED`, and mixed default/named graph joins
81. Update optimizer, Streamertail execution, query builder/engine, RSP code, ML execution, HTTP server, and Datalog reasoning to read/write through `DatasetIndex`
82. Remove legacy `index_manager` module usage and direct `SparqlDatabase.triples` storage access from runtime code
83. Add named graph tests for default graph isolation, named graph selection, graph variable binding, `FROM NAMED` filtering, and N-Quads roundtrip
84. Re-run WatDiv 10M benchmark on the DatasetIndex-backed implementation and record that default graph queries complete without catastrophic regression

0.1.1
1. Modify whole project by making Cargo workspace
2. Modify GPU CUDA
3. Add proc_macro for GPU ```[gpu::main]```
4. Add cuda example
5. Add possibility to make a user defined function
6. Add user defined function example
7. Add 'CONCAT' keyword
8. Add indexing optimization
9. Modify WHERE clause by accept a uri in addition to variable or literal
10. Modify query by adding semicolon
11. Modify query, ability to write nested query
12. Add Datalog engine
13. Add Trie indexing for Datalog engine
14. Modify workspace split everything into parts (triple&dict -> shared, kg -> datalog)
15. Modify N3 logic by including nested rules
16. Modify N3 parser by utilize rayon
17. Modify Indexing by removing Trie algorithm and using my own from triple
18. Modify Indexing by adding Rule Index, add some examples
19. Modify knowledge Graph adding parallel processing for semi-naive
20. Modify SPARQL syntax by combining SPARQL + LP
21. Modify SPARQL syntax make execution of SPARQL + LP (N3 logic)
22. Integrate project with Python (currently only Datalog)
23. Modify SPARQL by making CUDA as a feature (by default disabled)
24. Add ARM instructions to 'FILTER'
25. Small cleaning
26. Modify Datalog by adding inconsistency
27. Add example in Rust and Python with inconsistency
28. Fix problem with inconsistency
29. Fix SIMD part for ARM instructions
30. Modify SPARQL + LP syntax by adding multiple parameter set
31. Add parsing for machine learning SPARQL + LP + ML
32. Ability to use machine learning models
33. Modify execution, make it as a separate file
34. Add ARM support for machine learning wrapper
35. Minor fix of example with backward chaining
36. Modify machine learning execution, more depend on query
37. Add error handling
38. Modify ML handler to use MLSchema and ability to run multiple ML models
39. Modify FILTER to have such epxression: ?age > 10 && ?age < 15 or ?age < 10 || ?age > 15
40. Add multiple conclusion, add example with mqtt (real scenario)
41. Modify multiple conclusion
42. Fix mqtt real scenario
43. Modify ML handler for using different machine learning algorithms
44. Modify ML execution to make it more generic
45. Modify ML handler by making MLSchema global, clean some parts of ML
46. Minor update of ML handler
47. Fix problem with handling different ML models
48. Modify ML handler by taking and compare models from Turtle file
49. Modify FILTER by adding arithmetic epxression
50. Clean SPARQL
51. Design QueryBuilder class for functional API
52. Add into Python wrapper QueryBuilder
53. Modify License
54. Add RSP
55. Modify RULE syntax instead of N3 logic conclusion use CONSTRUCT
56. Separate RULE syntax and SPARQL syntax
57. Add tests for parser
58. Improve License
59. Add Dockerfile
60. Minor fix in ML examples
61. Minor fix of dependency vulnarability
62. Modify ML by adding ability to link OUTPUT with CONSTRUCT clause
63. Add an ML example with link OUTPUT and CONSTRUCT clause
64. Fix ML example with link OUTPUT and CONSTRUCT clause
65. Integrate RSP with QueryBuilder
66. Add tests for QueryBuilder
67. Modify wrapper of QueryBuilder by adding RSP
68. Add an example in Python for RSP
69. Add N-Triples parser
70. Add in some files license header
71. Ability to querying 10M triples
72. Improve optimizer and joining algorithm for processing 10M triples
73. Use sorted-merge join algorithm instead of nested loop for big data
74. Modify sorted-merge join algorithm by using IDs instead of string
75. Integrate RSP with a parser
76. Minor fix of RULE syntax
77. Modify parser for execution when RSP
78. Add Hierarchy reasoning
79. Major update hierarchy reasoning
80. Add LIMIT
81. Major fix of RULE syntax
82. Update README and minor change
83. Add CLI for Kolibrie
84. Update one dependency to a new version
85. Update dependencies for parsing
86. Add combination stream
87. Add new queries for processing 10M triples
88. Improve volcano optimizer
89. Improve volcano optimizer by using IDs instead of string
90. Improve volcano optimizer by improving cardinality estimator and join algorithms
91. Minor change
92. Minor fix of queries in combination example
93. Improve parser by adding Retrieve RSP-QL syntax and add example
94. Minor fix of volcano optimizer
95. Add ORDER BY keyword
96. Separate join by adding into shared workspace
97. Modify naive and semi-naive algorithms by using hash join algorithm
98. Small cleaning
99. Cargo cleaning
100. Minor fix of predicate
101. Add initial RSP support
102. Modify sparqldatabase to isolate parsing, encoding and adding of ntriple data
103. Minor fix of predicate
104. Modify SpaqlDatabase and UnifiedIndex by adding deletion for triples, added to unit test
105. Multiple window support for RSP-QL syntax
106. Fix a small bug with windowing (retrieve the queries that are executed in each window)
107. Major refactoring of Volcano Optimizer (splitting into separate files)
108. Major update of Volcano Optimizer: correctly execute all query patterns in optimizer
109. Store into cache database stats
110. Properly estimate operators
111. Repalce BTreeMap with HashMap
112. Improve join algorithms
113. Add ability to recognize different shapes of query such as snowflake, line, star
114. Rename knowledge_graph.rs to reasoning.rs and rename class KnowledgeGraph to Reasoner & remove unused class
115. Improve Volcano Optimizer by including filtering
116. Improve Volcano Optimizer by adding nested query support
117. Improve Volcano Optimizer by adding BIND, UDF, INSERT, VALUES support
118. Modified RSP engine
119. Improve windowing of RSP engine
120. Add slicing for BIND instead of Vec
121. Adding support multi-window for SingleThread
122. Rename Volcano Optimizer to Streamertail
123. Improve error handler
124. Add StorageManager and trait for coordinating between static and streaming data
125. Add join operation for RSP engine
126. Modify RSP engine by using crossbeam instead of mpsc
127. Full rename of volcano optimizer into streamertail optimizer
128. Modify LSM-Tree by supporting disk storage layer
129. Fix problem with ML execution
130. Add full combination example (RSP + Reasoning + ML)
131. Split LSM-Tree
132. MLPredict operator for Streamertail optimizer
133. Improve docker, webui and add support of n-triples
134. Fix a bug with reasoning n-triples
135. Fix a bug with a select all query
136. Add syntatix sugar 'a' for rdf:type
137. Modify test for parser
138. Modify webui and support of rule chains
139. Fix a bug with reasoning
140. Fix a bug with dictionary in parser instead of cloning everything, clone a pointer
141. Modify webui by adding support of N3 logic and Turtle

0.1.0
1. Parsing RDF/XML
2. Parsing Turtle
3. Parsing N3
4. Desing SQL syntax
5. Desing JOINing
5. Add aggregation functions (MIN, MAX, AVG, SUM)
6. Add 'FILTER' keyword
7. Add 'INSERT' keyword
8. Ability to make 'SELECT *' or select all
9. Add 'VALUES' keyword
10. Ability to read files (dataset files)
12. Add benchmark
13. Add unit tests
14. Add examples
15. Modify join by using rayon
16. Modify parse_rdf by using rayon and crossbeam
17. Modify join by uisng hash join and use rayon for parallel computation
18. Add volcano optimizer and cardinality estimator
19. Add knowledge graph
20. Add forward and backward chaining
21. Ability to process N3 logic
22. Add IStream, RStream, DStream
23. Add sliding window
24. Add policies (window close policy, content change policy, non-empty content policy, periodic policy)
25. Add REST API for database engine
26. Ability to generate synthetic dataset