Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-the-fly running plus frontend feedback of GCD public key finder on emails during gmail upload #91

Open
foolo opened this issue Jun 12, 2024 · 3 comments
Labels
enhancement New feature or request high

Comments

@foolo
Copy link
Contributor

foolo commented Jun 12, 2024

Created new issue from "Part B" here: #70 (comment)

When a user uploads emails via gmail upload, and a mail has a domain/selector for which we don't know the key, then pass that to a "GCD solver server" which brute-forces public keys and streams it back to
plus frontend feedback of GCD public key finder on emails during gmail upload

Steps needed:

  • Setup a "GCD solver server", that will run on Modal/AWS/etc. Either the web server or the sovler server needs a queuing/blocking mechanism if the capacity of the solver is limited. Needs autorization between the web server and the solver server.
  • If we are to use modal.com, we probably need to modify the GCD solver to move away from Docker/Sagemath (which most likely won't work on Modal), but instead use for example gmpy2, which seems to have about the same performance. Done: foolo/sigs2rsa@f432e78
  • During user email upload, if we don't already have the public key for the domain+selector, and cannot find it via DNS, then search for that domain+selector in EmailSignatures table, and if found, call the GCD solver server with the hash+signature of the newly contributed message and the message from EmailSignatures.
  • (*) Implement the mechanism to send the requests from the web server to the GCD solver server, and then handle the results asynchonously and put in the database.
  • (*) Propagate those results back to the frontend, to display to the user while uploading.
  • Needs a quite big redesign of the frontend, which currently looks like in the screenshot below.

Cost: > 4 weeks of work.

Cost/benefit:
Personal take: Similar reasoning as with #90. I.e. expensive cost and maintenance but low probability of finding anything. Would be cool when there is actually a result but I would say the chance here is even smaller than with issue #90.

(*) These steps are the ones that are least clear at moment.

Screenshot_2024-06-12_14-42-58

@foolo
Copy link
Contributor Author

foolo commented Jun 19, 2024

Statistics about which percentage of selectors are still available on DNS for emails of different ages:

Not that this does not account for the fact that old selectors may still be on DNS, but with an updated public key, so the DNS record no longer corresponds to the email in question.

I will add more statistics where we only account for "probably key-bound" selectors, such as "202306", "zj6feok33gleqrx3lyj6wcf777va63fa"

src/util/statistics.py --dkimDnsStatsMbox ~/Documents/dkim/mbox/yahoo.mbox  ~/Documents/dkim/mbo
x/gmail_priv.mbox ~/Documents/dkim/mbox/oa146.mbox
INFO:root:loading /home/olof/Documents/dkim/mbox/yahoo.mbox
INFO:root:loading /home/olof/Documents/dkim/mbox/gmail_priv.mbox
INFO:root:loading /home/olof/Documents/dkim/mbox/oa146.mbox
INFO:root:checking DNS for domainkeys
2007_Q1Q2: 0 of 1 active domainkeys (0.00%)
2007_Q3Q4: 0 of 4 active domainkeys (0.00%)
2008_Q1Q2: 0 of 4 active domainkeys (0.00%)
2008_Q3Q4: 0 of 4 active domainkeys (0.00%)
2009_Q1Q2: 3 of 5 active domainkeys (60.00%)
2009_Q3Q4: 2 of 11 active domainkeys (18.18%)
2010_Q1Q2: 3 of 11 active domainkeys (27.27%)
2010_Q3Q4: 8 of 21 active domainkeys (38.10%)
2011_Q1Q2: 6 of 16 active domainkeys (37.50%)
2011_Q3Q4: 12 of 29 active domainkeys (41.38%)
2012_Q1Q2: 10 of 20 active domainkeys (50.00%)
2012_Q3Q4: 8 of 26 active domainkeys (30.77%)
2013_Q1Q2: 14 of 30 active domainkeys (46.67%)
2013_Q3Q4: 16 of 38 active domainkeys (42.11%)
2014_Q1Q2: 22 of 34 active domainkeys (64.71%)
2014_Q3Q4: 15 of 25 active domainkeys (60.00%)
2015_Q1Q2: 23 of 41 active domainkeys (56.10%)
2015_Q3Q4: 18 of 35 active domainkeys (51.43%)
2016_Q1Q2: 20 of 36 active domainkeys (55.56%)
2016_Q3Q4: 19 of 45 active domainkeys (42.22%)
2017_Q1Q2: 32 of 50 active domainkeys (64.00%)
2017_Q3Q4: 32 of 44 active domainkeys (72.73%)
2018_Q1Q2: 57 of 79 active domainkeys (72.15%)
2018_Q3Q4: 89 of 126 active domainkeys (70.63%)
2019_Q1Q2: 53 of 81 active domainkeys (65.43%)
2019_Q3Q4: 42 of 66 active domainkeys (63.64%)
2020_Q1Q2: 78 of 114 active domainkeys (68.42%)
2020_Q3Q4: 91 of 126 active domainkeys (72.22%)
2021_Q1Q2: 117 of 138 active domainkeys (84.78%)
2021_Q3Q4: 143 of 174 active domainkeys (82.18%)
2022_Q1Q2: 111 of 141 active domainkeys (78.72%)
2022_Q3Q4: 82 of 111 active domainkeys (73.87%)
2023_Q1Q2: 115 of 148 active domainkeys (77.70%)
2023_Q3Q4: 141 of 170 active domainkeys (82.94%)
2024_Q1Q2: 248 of 266 active domainkeys (93.23%)

@Divide-By-0
Copy link
Member

Very interesting! So seems to hover around 50%, validating the idea that we could get quite a few old keys.

@foolo
Copy link
Contributor Author

foolo commented Jun 19, 2024

Very interesting! So seems to hover around 50%, validating the idea that we could get quite a few old keys.

yep!
and here are new statistics where we only count "probably key-bound" selectors (for example selectors that contains something that looks like a year, or that look like "zj6feok33gleqrx3lyj6wcf777va63fa" )

 src/util/statistics.py --dkimDnsStatsMbox ~/Documents/dkim/mbox/yahoo.mbox  ~/Documents/dkim/mbox
/gmail_priv.mbox ~/Documents/dkim/mbox/oa146.mbox --includeOnlyKeyboundSelectors
INFO:root:loading /home/olof/Documents/dkim/mbox/yahoo.mbox
INFO:root:loading /home/olof/Documents/dkim/mbox/gmail_priv.mbox
INFO:root:loading /home/olof/Documents/dkim/mbox/oa146.mbox
INFO:root:processing messages
2009_Q1Q2: 0 active domainkeys of total 1 (0.00%)
2009_Q3Q4: 0 active domainkeys of total 1 (0.00%)
2010_Q1Q2: 0 active domainkeys of total 2 (0.00%)
2010_Q3Q4: 1 active domainkeys of total 3 (33.33%)
2011_Q1Q2: 0 active domainkeys of total 3 (0.00%)
2011_Q3Q4: 3 active domainkeys of total 4 (75.00%)
2012_Q1Q2: 0 active domainkeys of total 3 (0.00%)
2012_Q3Q4: 1 active domainkeys of total 6 (16.67%)
2013_Q1Q2: 3 active domainkeys of total 9 (33.33%)
2013_Q3Q4: 4 active domainkeys of total 12 (33.33%)
2014_Q1Q2: 5 active domainkeys of total 11 (45.45%)
2014_Q3Q4: 1 active domainkeys of total 5 (20.00%)
2015_Q1Q2: 5 active domainkeys of total 14 (35.71%)
2015_Q3Q4: 5 active domainkeys of total 10 (50.00%)
2016_Q1Q2: 6 active domainkeys of total 18 (33.33%)
2016_Q3Q4: 5 active domainkeys of total 20 (25.00%)
2017_Q1Q2: 10 active domainkeys of total 16 (62.50%)
2017_Q3Q4: 10 active domainkeys of total 15 (66.67%)
2018_Q1Q2: 13 active domainkeys of total 26 (50.00%)
2018_Q3Q4: 19 active domainkeys of total 42 (45.24%)
2019_Q1Q2: 16 active domainkeys of total 31 (51.61%)
2019_Q3Q4: 8 active domainkeys of total 24 (33.33%)
2020_Q1Q2: 12 active domainkeys of total 39 (30.77%)
2020_Q3Q4: 14 active domainkeys of total 37 (37.84%)
2021_Q1Q2: 21 active domainkeys of total 37 (56.76%)
2021_Q3Q4: 25 active domainkeys of total 44 (56.82%)
2022_Q1Q2: 10 active domainkeys of total 30 (33.33%)
2022_Q3Q4: 12 active domainkeys of total 37 (32.43%)
2023_Q1Q2: 19 active domainkeys of total 47 (40.43%)
2023_Q3Q4: 31 active domainkeys of total 54 (57.41%)
2024_Q1Q2: 45 active domainkeys of total 53 (84.91%)

@Divide-By-0 Divide-By-0 added enhancement New feature or request high labels Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high
Projects
None yet
Development

No branches or pull requests

2 participants