Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After a failed file download, all subsequent downloads fail for same session #2194

Closed
rocodes opened this issue Sep 3, 2024 · 5 comments · Fixed by #2195
Closed

After a failed file download, all subsequent downloads fail for same session #2194

rocodes opened this issue Sep 3, 2024 · 5 comments · Fixed by #2195
Assignees
Labels
bug Something isn't working sdk
Milestone

Comments

@rocodes
Copy link
Contributor

rocodes commented Sep 3, 2024

Description

From a user report:

Once any download has failed, even though the "Download" button is restored, all subsequent attempts to download [any] files fail during that app session. This is resolved by quitting and relaunching the app.

Related: #1633 (general issue)

Steps to Reproduce

See previous

Expected Behavior

  • After a failed download, when the Download button is restored, the original file can be re-attempted successfully
  • A failed download of one file has no bearing on the ability to download subsequent files

Actual Behavior

See above

Comments

  • Either 0.16.0 or a hotfix, this is a pretty impactful bug.
  • We recently made changes to support automatic download retries of stalled downloads - needs investigation to see if related.
@rocodes rocodes added bug Something isn't working needs/reproducing labels Sep 3, 2024
@rocodes rocodes added this to the 0.16.0 milestone Sep 3, 2024
@zenmonkeykstop zenmonkeykstop moved this to Ready to go in SecureDrop dev cycle Sep 3, 2024
@legoktm
Copy link
Member

legoktm commented Sep 3, 2024

I enabled debug logging for the client with LOGLEVEL=debug securedrop-client. I then kicked off a large download (5MB), while running pkill -9 -e securedrop in the sd-proxy VM a few times because of the built-in retries. It then failed but left the download spinner going. Trying to download any other file fails with:

Sep  3 13:48:18 sd-app 2024-09-03 13:48:18,852 - securedrop_client.sdk:298(_send_json_request) DEBUG: Sending request to proxy: GET api/v1/sources/f526ad1b-9d97-4d27-9ae4-aa3d555aebde/submissions/57be23ba-2ddb-4024-8a61-f7058fe122e4/download
Sep  3 13:48:18 sd-app 2024-09-03 13:48:18,852 - securedrop_client.sdk:152(_streaming_download) DEBUG: Streaming download, retry 0
Sep  3 13:48:18 sd-app 2024-09-03 13:48:18,852 - securedrop_client.sdk:164(_streaming_download) DEBUG: Retry 0, opening process
Sep  3 13:48:18 sd-app 2024-09-03 13:48:18,853 - securedrop_client.sdk:175(_streaming_download) DEBUG: Retry 0, sending data
Sep  3 13:48:19 sd-app 2024-09-03 13:48:19,929 - securedrop_client.sdk:191(_streaming_download) DEBUG: Retry 0, bytes written: 319
Sep  3 13:48:19 sd-app 2024-09-03 13:48:19,929 - securedrop_client.sdk:185(_streaming_download) DEBUG: Retry 0, download finished
Sep  3 13:48:19 sd-app 2024-09-03 13:48:19,931 - securedrop_client.sdk:195(_streaming_download) DEBUG: Retry 0, process ended with return code 0
Sep  3 13:48:19 sd-app 2024-09-03 13:48:19,931 - securedrop_client.sdk:210(_streaming_download) DEBUG: Retry 0, reading contents from disk
Sep  3 13:48:19 sd-app 2024-09-03 13:48:19,932 - securedrop_client.sdk:246(_streaming_download) DEBUG: Retry 0, base error: 'Unknown error'
Sep  3 13:48:19 sd-app 2024-09-03 13:48:19,932 - securedrop_client.api_jobs.downloads:170(_download) ERROR: Download failed
Sep  3 13:48:19 sd-app 2024-09-03 13:48:19,932 - securedrop_client.api_jobs.downloads:171(_download) DEBUG: Download failed: This should be unreachable, we should've already returned or raised a different exception
Sep  3 13:48:19 sd-app 2024-09-03 13:48:19,932 - securedrop_client.queue:244(process) ERROR: Skipping job
Sep  3 13:48:19 sd-app 2024-09-03 13:48:19,932 - securedrop_client.queue:245(process) DEBUG: Skipping job: DownloadDecryptionException: Failed to download 57be23ba-2ddb-4024-8a61-f7058fe122e4
Sep  3 13:48:19 sd-app 2024-09-03 13:48:19,932 - securedrop_client.logic:983(on_file_download_failure) ERROR: Failed to decrypt 57be23ba-2ddb-4024-8a61-f7058fe122e4
Sep  3 13:48:20 sd-app 2024-09-03 13:48:20,248 - securedrop_client.gui.widgets:2399(_set_file_state) DEBUG: Changing file 57be23ba-2ddb-4024-8a61-f7058fe122e4 state to not downloaded

@legoktm
Copy link
Member

legoktm commented Sep 3, 2024

Sep 3 13:48:19 sd-app 2024-09-03 13:48:19,932 - securedrop_client.sdk:246(_streaming_download) DEBUG: Retry 0, base error: 'Unknown error'

With some more logging, we're getting a HTTP 416 error, which means invalid range header.

For some reason the Range header is being sent even on request 0:

2024-09-03 11:14:47,520 - securedrop_client.sdk:152(_streaming_download) DEBUG: Streaming download, retry 0
2024-09-03 11:14:47,520 - securedrop_client.sdk:164(_streaming_download) DEBUG: Retry 0, opening process
2024-09-03 11:14:47,667 - securedrop_client.sdk:175(_streaming_download) DEBUG: Retry 0, sending data: {"method": "GET", "path_query": "api/v1/sources/829bfc89-22d2-4239-8dd2-f09fbfa82010/submissions/fc994fdf-9570-4177-9dbc-ee9ef0272c31/download", "stream": true, "headers": {"Authorization": "Token ...", "Content-Type": "application/json", "Accept": "application/json", "Range": "bytes=396556-"}, "timeout": 26}

But:

                # Update the range request if we're retrying
                if retry > 0:
                    data["headers"]["Range"] = f"bytes={bytes_written}-"
                    logger.debug(f"Retry {retry}, range: {bytes_written}-")

Feels like there's some cross-request data collision here...

@legoktm
Copy link
Member

legoktm commented Sep 3, 2024

OK, got it. In download_submission() we have:

        response = self._send_json_request(
            method,
            path_query,
            stream=True,
            headers=self.req_headers,
            timeout=timeout or self.default_download_timeout,
        )

Importantly headers=self.req_headers means that any modifications to headers, will be persistently stored in self.req_headers. So when we set the Range header, it is stored for all future requests!! (🦀: the fact that Python dicts are pass by reference is an annoying footgun, in Rust this type of error is far less likely because you have to explicitly state mutability)

If we instead do headers=self.req_headers.copy() it works fine. So it's a small fix that I think we can ship as a hotfix; I'm still trying to think of what a regression test looks like.

@legoktm
Copy link
Member

legoktm commented Sep 3, 2024

Practically this means anything that triggered a retry would work, but break any future download. I'm not sure if this is the sole bug here that's causing problems, there's at least one other logic issue.

@legoktm legoktm self-assigned this Sep 3, 2024
@legoktm legoktm moved this from Ready to go to In Progress in SecureDrop dev cycle Sep 3, 2024
@legoktm
Copy link
Member

legoktm commented Sep 3, 2024

A regression test for this would need to 1) trigger a download request that fails, but only after downloading some a non-zero amount of bytes; and then 2) triggers other downloads that succeed.

I don't think we have a good way to simulate 1 yet, so I'm punting on it in favor of manual testing for the hotfix.

@github-project-automation github-project-automation bot moved this from In Progress to Done in SecureDrop dev cycle Sep 3, 2024
zenmonkeykstop pushed a commit that referenced this issue Sep 3, 2024
When we introduced retries with `Range` support, it was the first time
we started mutating the request headers. At the time we didn't realize
that headers are stored in a single dict that is used across requests,
so the `Range` header would get reused in other requests, and then of
course fail.

This affected any session in which a Range request was needed, so if no
download ever failed, you'd never hit it.

To fix it, we now generate a fresh dictionary on each request. This will
add a small amount of performance overhead, but it's not going to be
noticeable and more importantly, makes it impossible to re-use the
header dictionary across requests.

Add some more debug entries for future operations as well.

Fixes #2194.

(cherry picked from commit ba01651)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sdk
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants