Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 27 additions & 18 deletions lib/backend.js
Original file line number Diff line number Diff line change
Expand Up @@ -881,29 +881,38 @@ Backend.prototype._fetchSnapshotByTimestamp = function(collection, id, timestamp
var from = 0;
var to = null;

var shouldGetLatestSnapshot = timestamp === null;
if (shouldGetLatestSnapshot) {
return backend.db.getSnapshot(collection, id, null, null, function(error, snapshot) {
if (error) return callback(error);

callback(null, snapshot);
});
}

milestoneDb.getMilestoneSnapshotAtOrBeforeTime(collection, id, timestamp, function(error, snapshot) {
// Always fetch the current snapshot first. We request its metadata so that we
// can read its mtime, which lets us serve the current snapshot directly when
// the requested timestamp is after it. This avoids replaying ops when they
// aren't needed, and - crucially - still works when older ops have been
// deleted/TTLed and the current version can no longer be rebuilt from ops.
db.getSnapshot(collection, id, null, {metadata: true}, function(error, currentSnapshot) {
if (error) return callback(error);
milestoneSnapshot = snapshot;
if (snapshot) from = snapshot.v;

milestoneDb.getMilestoneSnapshotAtOrAfterTime(collection, id, timestamp, function(error, snapshot) {
var mtime = currentSnapshot.m && currentSnapshot.m.mtime;
var shouldGetLatestSnapshot = timestamp === null || (mtime != null && timestamp > mtime);
if (shouldGetLatestSnapshot) {
// Strip the metadata that we only fetched in order to compare the mtime,
// so that the returned snapshot is consistent with the op-replayed path.
currentSnapshot.m = null;
return callback(null, currentSnapshot);
}
Comment on lines +892 to +899

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a performance hit? It looks like we now do an extra round trip (getSnapshot for the current version) on every timestamp request, including the cases where we still end up replaying ops.

I guess it depends on what the expected access pattern is. My intuition (numbers pulled from a hat 🎩) is that ~90% of timestamp requests are for an older point in time, where we'll want to fetch and replay the older ops anyway — so for those we now pay for the current-snapshot fetch on top of the op replay, with no benefit.

That said, I don't have a better alternative for the case this is solving (rebuilding when older ops have been TTLed), so this might just be the necessary trade-off. Mostly flagging it to check the assumption — do we have a sense of how often the requested timestamp is actually after the latest snapshot?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is the tradeoff we're making. I think it's basically impossible to get numbers on this, since we have no idea how other consumers are using this.

My feeling is that because this is a historic snapshot fetch:

  1. You probably don't care too much about speed (fetching arbitrary numbers of ops and replaying them is already quite a slow path)
  2. Current snapshot fetch should be pretty optimized, and we already do it on every op submission

If we wanted to be super conservative about this change, I guess I could hide it behind an opt-in flag that would leave existing performance untouched, but allow users to be able to fetch snapshots in projects where ops are TTLed. We've done that in the past with sharedb mongo and strict op linking. The downside of this approach is that ShareDB won't work quite as smoothly out-of-the-box, and consumers will have to rummage through documentation to find this flag, which doesn't feel like great developer experience to me.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this over a call and we'll go ahead and release without a flag: this is a bugfix and performance may get better or worse depending on use case.

If any consumers find this impacting you badly, please raise an issue with your use-case and we can add a flag to this (or improve in some other way).


milestoneDb.getMilestoneSnapshotAtOrBeforeTime(collection, id, timestamp, function(error, snapshot) {
if (error) return callback(error);
if (snapshot) to = snapshot.v;
milestoneSnapshot = snapshot;
if (snapshot) from = snapshot.v;

var options = {metadata: true};
db.getOps(collection, id, from, to, options, function(error, ops) {
milestoneDb.getMilestoneSnapshotAtOrAfterTime(collection, id, timestamp, function(error, snapshot) {
if (error) return callback(error);
filterOpsInPlaceBeforeTimestamp(ops, timestamp);
backend._buildSnapshotFromOps(id, milestoneSnapshot, ops, callback);
if (snapshot) to = snapshot.v;

var options = {metadata: true};
db.getOps(collection, id, from, to, options, function(error, ops) {
if (error) return callback(error);
filterOpsInPlaceBeforeTimestamp(ops, timestamp);
backend._buildSnapshotFromOps(id, milestoneSnapshot, ops, callback);
});
});
});
});
Expand Down
15 changes: 15 additions & 0 deletions test/client/snapshot-timestamp-request.js
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,21 @@ describe('SnapshotTimestampRequest', function() {
], done);
});

it('fetches the current snapshot when ops are missing and the timestamp is after the latest op', function(done) {
var connection = backend.connect();
async.series([
// Simulate ops having been deleted/TTLed so the snapshot can't be rebuilt from ops.
backend.db.deleteOps.bind(backend.db, 'books', 'time-machine', null, null, null),
function(next) {
connection.fetchSnapshotByTimestamp('books', 'time-machine', day4, function(error, snapshot) {
if (error) return next(error);
expect(snapshot).to.eql(v3);
next();
});
}
], done);
});

it('fetches the most recent version when not specifying a timestamp', function(done) {
var connection = backend.connect();
async.waterfall([
Expand Down
Loading