Enable custom aggregate functions (take 2)#529
Conversation
|
cc @lovasoa |
lovasoa
left a comment
There was a problem hiding this comment.
This looks good, thank you !
A few things:
- Can we avoid calling
initin create_aggregate itself ? I think this is a very surprising behavior for the programmer.initshould be called every time the aggregate function is called. - Can we make the step function return the new state, like the standard reduce ? This would avoid the awkward
function init() { return {sum: 0} } - Taking three functions as arguments makes the code using create_aggregate a little bit hard to read. Can we take an object instead ?
db.create_aggregate("json_array", {
init: () => [],
step: (state, value) => [...state, value],
finalize: state => JSON.stringify(state)
})- Can we make init and finalize optional ? init should default to
() => nulland finalize tostate => state. In the end we should be able to call
db.create_aggregate("js_sum", { step: (a,b) => a+b })- Can we add memory leak tests like the ones we have in
test/test_functions_recreate.js
Yes, thank you! Bad mental model striking there. I like all your proposed changes, will do. |
|
A question: when an instance of That is, if I say: I'm assuming I can? If not, storing the state is tricky, because I need to account for this scenario:
I would try to answer this for myself but the web platform is a really tricky runtime so I'm asking in case you know off hand? |
|
The way that proper sqlite3 extensions do it is by hanging their state off the execution context like so. Currently, |
I don't think so? We're not the ones calling the step function, that's sqlite doing it
We can make We can't make (Maybe I'm not being clever enough and there is a way for both!) |
Basically it seems that the sqlite extension pattern of 'allocate a struct and stick it in the context pointer' is not going to work for us here. I wonder if using the id of the pointer returned by sqlite3_aggregate_context would be enough? Since no two functions could use the same pointer, per https://www.sqlite.org/c3ref/aggregate_context.html ?
|
I managed to make This branch is WIP status, and I apologize for the noise on it. I will look into getting it down to a single function argument tomorrow. |
|
About defaults for init and finalize: Can't we just have create_aggregate_func start like that : function create_aggregate_func<T>(
name: string,
methods: {
init?: () => T,
step: (state: T, value: any) => T,
finalize?: (state: T) => any
}
) {
let init = methods.init || (() => null);
let step = (state: T, value: any) => {
state = methods.step(state, value);
__update_sqlite_internal_state(state);
};
let finalize = methods.finalize || (state => state);
...
}About thread-safety: there is no such thing as multi-threading with concurrent access in javascript. The javascript runtime guarantees that there is a single execution thread that can access mutable javascript objects. That said, even within a single thread, there can be multiple aggregate functions that are at various stages of there execution simultaneously. The following has to work: SELECT js_agg_fn(x), js_agg_fn(y) FROM t;A test case like this one should probably be added to the tests too. |
|
Having to return values from db.create_aggregate(
"json_agg", {
step: function(state, val) { state = (state || []); state.push(val); return state; },
finalize: function(state) { return JSON.stringify(state); }
}
);And with modifying state (and with state defaulting to db.create_aggregate(
"json_agg", {
step: function(state, val) { state.push(val); },
finalize: function(state) { return JSON.stringify(state); }
}
);I think we might even want to make
(I also wonder if there's efficency trickery we could do to accumulate values in a typed array?) edit: standard deviation is another example of an aggregation that can be completed without accumulating all values |
No, that's ugly, you wouldn't write it like that ! 🙂 This is a very common pattern in javascript, in many state-management libraries. You would write it as
Let's just stick to what developers are used to. Reduce (sometimes called fold) is a very common pattern, and users will appreciate it if you don't try to reinvent the wheel here. |
|
Can we also add a test with
|
|
|
|
You can see from the changes in ac548d4 how much that simplifies the step functions |
|
OK, I think this is ready for re-review now. Thanks for working through this with me! |
lovasoa
left a comment
There was a problem hiding this comment.
It looks good to me ! Can you just fix the documentation, I'll read it one last time, and we'll merge :)
Co-authored-by: Ophir LOJKINE <contact@ophir.dev>
* initial commit. * documentation * remove no-longer-valid type * close over state initialization for performance * link documentation in comment * more testing * run tests if they're main * accept a single arg * this kind of works but I'm abandoning this branch Basically it seems that the sqlite extension pattern of 'allocate a struct and stick it in the context pointer' is not going to work for us here. I wonder if using the id of the pointer returned by sqlite3_aggregate_context would be enough? Since no two functions could use the same pointer, per https://www.sqlite.org/c3ref/aggregate_context.html ? * a middle road sqlite3_agg_context solution * try out auto-updating state * improve quantile test, add multiple agg test * add a null to the test * acorn fails to parse ||=, whatever * make eslint happy * make initial_value an argument * test step and finalize exceptions * add memory leak test * update docs to current interface * delete state in exception handlers * remove null state * return init function and document object * more tests and update back to init function * update redefinition test for new interface * update README to match fixed signature * more consistent test formatting * Update README.md Co-authored-by: Ophir LOJKINE <contact@ophir.dev> * clarify what exactly the result will contain * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Improve documentation and type annotations * ignore documentation in eslintrc * reduce code size Thanks a lot, @llimllib ! Co-authored-by: dogquery <> Co-authored-by: Ophir LOJKINE <contact@ophir.dev>
This PR builds on #407, updating it to current HEAD, adding more tests, and documenting it.
This PR closes #204, and is composed mostly of the work that @AnyhowStep did so long ago in that thread.
#407 notes that aggregate window functions do not work, but I think that's a different issue that should be tackled in another PR.
The full list of changes follows:
create_aggregateparseFunctionArgumentsinto a function that can be used by bothcreate_aggregateandcreate_functionsetFunctionResultsinto a functionsqlite3_aggregate_contextfunctioncreate_aggregatefunctioncreate_aggregate(function_name, initial_value, { step: (state, ...value) =>, finalize: (state) =>})create_aggregatecallssqlite3_aggregate_contextto allocate 1 byte of memory per instance of the aggregate function, which it uses as a key for its state array.sqlite3_aggregate_contextdocumentation