-
Notifications
You must be signed in to change notification settings - Fork 308
Conversation
|
Ooh, what are we going to do about those? Mark them all as succeeded? |
@rohitpaulk I guess so? @kaguillera and I were thinking we should approach this month by month alongside gratipay/finances#3, rather than trying to get the backfilling all done before the catching up. For June, 2012 we only have 66 exchanges, and I think we can assume they all succeeded? |
@kaguillera points out that payday logfiles include username and payment network ("charging foo on Stripe"), which should be helpful here. |
c758d96
to
3a8076e
Compare
Script (and |
@rohitpaulk @aandis et al. If this script looks good then @kaguillera and I can start loading up data with it once we are able to produce the input CSVs. |
sql/branch.sql
Outdated
ALTER TYPE payment_net ADD VALUE 'samurai'; | ||
ALTER TYPE payment_net ADD VALUE 'stripe'; | ||
|
||
ALTER TABLE exchanges ADD UNIQUE (ref); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this unique constraint be for a combination of (network
+ ref
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait - network
is in the exchange routes table. Hmm, how would this work out? Isn't it possible that the ref
from a braintree
and that from a balanced
payment collide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, fair enough. Fixed in 8bec476.
P.S. I'm not expecting this PR to be merged. I'm expecting to use this PR to track code and comments for backfilling the database (#2779). |
Blech. We don't have |
And I guess we'll have to merge this PR or another one like it, to apply the schema changes. |
@kaguillera and I are working on a script to match up Stripe transfer details with exchanges in our database. The problem we're up against is ... usernames as primary key (#835). 😊 |
The way we log username changes, we don't record the old username, only the new one. This makes it impossible to trace back to what username a participant had at a particular point in time. We do have old database snapshots we could look at, but we didn't add user id until late enough that an old snapshot wouldn't help us map old username to current user id anyway. |
Let this be a lesson, kids! 😳 |
Immutable user id implemented in #680. |
Event logging added in #2006. Username change events were backfilled at that point using the username at that time but with the original claimed time (+ 0.01 s). |
Tech debt, anyone? :) |
🍷 🍒 |
But we can still use the old database backups to get the exchange id and username-at-the-time from |
Our offline backups only go back to August of 2013. |
So I guess we manually reconstruct before then? |
Making progress on this, I'm pretty close to matching all Stripe transactions with exchanges in our database. Now it looks like I have to back up a step because I'm seeing duplicate entries in the Stripe input files. |
Here's the script I used for https://github.com/gratipay/logs/commit/f0ab1ba9f3fce55b1a7b0b7d4e8e393a8aa89872: #!/usr/bin/env python2 -u
from __future__ import absolute_import, division, print_function, unicode_literals
import os
def process_month(year, month):
seen = set()
lines = list()
filepath = '3912/{}/{}/_stripe-payments.csv'.format(year, month)
for line in open(filepath, 'r'):
if line not in seen:
lines.append(line)
seen.add(line)
open(filepath, 'w+').writelines(lines)
def main():
for year in os.listdir('3912'):
if not year.isdigit(): continue
for month in os.listdir('3912/' + year):
if not month.isdigit(): continue
process_month(year, month)
if __name__ == '__main__':
main() And here's the variant I used for https://github.com/gratipay/logs/commit/1ea21621d84b1fb1c8292c2d0fcc420cbcfdc0dd: --- find-dupes.py 2016-03-09 13:39:39.000000000 -0500
+++ find-dupes.py.new 2016-03-09 13:39:31.000000000 -0500
@@ -9,9 +9,10 @@
lines = list()
filepath = '3912/{}/{}/_stripe-payments.csv'.format(year, month)
for line in open(filepath, 'r'):
- if line not in seen:
+ stripped = line.strip()
+ if stripped not in seen:
lines.append(line)
- seen.add(line)
+ seen.add(stripped)
open(filepath, 'w+').writelines(lines) |
Okay! The That means we're about ready to dial back out and run the |
@kaguillera are thinking that we'll backfill exchanges in the database month by month in lockstep with gratipay/finances#3, because we'll be able to learn and adapt more easily as we discover bugs and edge cases and whatnot. |
Also output kind for debugging
Rebased, was 43804a3. |
43804a3
to
063a8cf
Compare
Gosh. Bringing @dmk246 up to speed here, and it's getting worse. We're seeing a route for yours truly that is of |
Alright, this is A Big Deal™. It's by far our biggest piece of technical debt. This underpins our whole accounting function, so we have to get this done. And it's going to be a lot of work. It sounds like @kaguillera is going to dive back into this one, along with @dmk246 as she gets up to speed. As @dmk246 pointed out IRL, the immediate next task is to ensure that we're not adding new bad data ... which we are: we are not storing |
Diving back into this.
This is taken care of by #4361 Now back to figuring out the corrupted data problem. @whit537 I think I will need a snapshot of the tables that are involved...exchanges, routes and exchange_route if that is at all possible. |
@whit537 can I have a the actual records from exchanges and exchange_routes for the month of 2012-06 to begin with. I guess a csv would be good or any format that would be easiest and fastest for you. I don't want to take you away from it what you are working on. Maybe you could drop it some where in logs repo. |
I exported the |
tanx...going to start looking at this |
This ticket is getting too large as it pertains to the comments so I am moving the research of the errors to #4442 |
Bah! You can never have too many comments! :-) |
We added |
We haven't had an
|
We plugged the |
The last unknown networks are for PayPal masspays. Those run through record an exchange which depends on existing routes. How do we have a route with a null network and why did it stop seven months ago? Last change to routes/associate.json.spt is four months ago, then before that is two years. |
|
We didn't fully plug the
|
Looking at those 1,089 it is clear from the |
Closing per gratipay/inside.gratipay.com#1196. |
Picks up where #3807 left off. Hopefully we can close #2779, which is blocking gratipay/finances#3.