-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch All Replies v2 - Service Edition #44
Conversation
Working on this again, leaving it in a little bit of a rough state bc i need to go home and don't have my remote workflow down quite yet, but we are successfully getting the JSON-LD collection of replies, just need a few more tweaks to handle pagination, write tests, and then it'll be time to clean up and do a little perf before we roll it out for live testing. |
01afb6e
to
1696221
Compare
ok this works now :) We
The only thing that is not working is that you have to refresh the page to get the new replies, rather than them being inserted as they are fetched. going to see if i can make that happen now tho |
@@ -47,6 +48,22 @@ def context | |||
ancestors_limit = ANCESTORS_LIMIT | |||
descendants_limit = DESCENDANTS_LIMIT | |||
descendants_depth_limit = DESCENDANTS_DEPTH_LIMIT | |||
else | |||
unless @status.local? && [email protected]_fetch_replies? | |||
json_status = fetch_resource(@status.uri, true, current_account) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will take a while and possibly fail due to networking issues. What if you extracted this section to a worker, and returned HTTP 206 Partial Response to signal to the client that there might be more replies coming in a future response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of adding the reply URI to the status model to avoid the network request in the context endpoint, because we have this at the time any status is created via activitypub (and should be able to fill it in for local posts without too much difficulty). I didn't mostly just to get it working but also i figured that adding new columns would be discouraged, but it makes sense to have since this at least doubles the number of requests per status we have to do.
thx for the tip on 206, i didn't even think of that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saving the replies collection URI kind of makes sense but it's also going to add gigabytes of data into the statuses
table (on mastodon.social this table has roughly 701977540 rows, an example replies URI has 73 characters, I think that's roughly 51 GB of data if all the rows had it from the start?) so if it can be avoided maybe it should.
More on HTTP 206, we do a similar trick when uploading media. The GET /api/v1/media/:id
endpoint returns HTTP 206 while the media is processing; the client keeps polling it until it gets HTTP 200, this is how the "Processing..." progress bar state is implemented in the composer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it help the case for storing the replies uri if we stored it as a relative uri vs. The status urI? We could hide that behind a getter/setter so it always appeared as the full URL everywhere else but on disk we could store it cheaply?
There are probably only a few dozen unique relative paths from a status to its replies (I would bet ~hundreds, realistically, given whacky single-user homebrew impls), not sure if the db compresses at all or if there's a cheap way to store discrete values as an int:string map in postgres, but that would probably be orders of magnitude cheaper
9bdd7c7
to
33b5964
Compare
Committing some recent changes from needed refactor to mitigate infinite tree abuse scenario, need to go to bed for the night so didn't test the changes, i'm pulling this back into draft status until i can write tests, but the strategy I think here will be to make this the minimal deployable 'fetch all replies sans abuse' backend patch, and then i will follow up with the frontend changes for streaming replies as they are fetched to another PR. Some notes for now before i can write the full docs on this
|
Alright i'm marking this as ready for review, i'm genuinely not sure what tests we should/can have here, but i want to get this deployed ASAP and start testing it live. if i can get an independent confirmation that it does indeed fetch all replies i'm going to merge this and punt rigorous testing to later revisions if we want to pull upstream. I think this is reasonably clean. Moving out of draft status. Recall that the plan is to split up the PR into two - this one that does the service that fetches all the replies when a context endpoint is requested, and another that will do the work of injecting the fetched replies into the web UI and handle the changes to the context API endpoint to reflect that. edit: added some more tests for the worker, that should be mostly good? |
@Gargron I think this is in a reviewable state - it's not finished yet (splitting off the web UI/streaming/modifying the context API response to indicate fetching stuff for a second PR), but when we finish this, if y'all would be interested in having it upstream, is there anything else you'd be looking for here? tests, limits, etc? I think i have it covered so the worker can't run off into infinity and handles fetch failures gracefully, but you obviously know way better what kinda stuff we'd need here. I split it off into a new worker/service because it got to a point where i was mangling the existing service/workers and it felt like i was shoving too many unrelated things in together, so hopefully that makes it easier to consider as something that might be mergeable - the only changes i made to the existing service were to improve pagination handling and adding an instance attribute for switching the filtering by matching domain in reply and OP. the behavior could be globally turned on/off by adding an env variable flag that makes edit: ofc no hurry or obligation, appreciate all the time you've given so far. we might merge this relatively soon, but we can open a draft PR against upstream if you can't get to it until after we do. second PR will come from this branch so it should be additive. |
SELECT user_id, | ||
ip, | ||
max(used_at) AS used_at | ||
SELECT t0.user_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyone know why rails always wants to do this for me when i run a migration? i didn't do anything special to the config or anything, and it seems to not do anything, but it keeps happening?
There is one privacy concern here - as is we include the current account in the edit: this turns out not to be true - when fetching replies, mastodon will always use its internal instance representative account ( |
Oop I seem to have broken it while adding env vars. Hold up ill be able to fix it tonight |
OK actually it wasn't broken it was just a my env problem. added some more debug messages (i assume loglevel of most prod servers isn't debug so those should be free right?) and was watching it and fetching a bunch of requests tonight. there are a lot of instances that will return I wanna merge this, see how it works, and keep iterating until we get to something we can pull upstream, sound gud? |
… are almost there baby
…requested from context endpoint, but async
This reverts commit 98c280e.
…mit to global maximum statuses fetched (untested, this might not work yet)
…ways caused us to only do one page). Rename some variables to make purpose clearer. Return the array of all fetched uris instead of just the number we got
…this context anyway
…, deduplicate uris to fetch, prevent multiple fetch-alls for the parent status
aa9f4a2
to
8048524
Compare
lets do it |
Feel free to submit the patch to the upstream repository and we'll try to review it as soon as we can. |
Fix: #43
More discussion/description at #43
Current status:
Sub-branches
fetch-all-replies-service-stash-streaming
- some partial work on getting streaming to work, stashing to work on other thingsTODO:
context
endpoint. Need to figure out some way of limiting to a certain n statuses or only doing a request once (ie. if the next TODO is completed this one should be)Get replies on fake other server on loading context endpoint(testing context endpoint deferred to next PR)Pushed to next PR:
timelines:thread:{root_id}
streaming channel (almost done)