Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 30 find partial page citations #116

Merged

Conversation

mattdahl
Copy link
Contributor

@mattdahl mattdahl commented Jul 16, 2022

This enables eyecite to find citations that don't yet have a final page number, but just have a placeholder ___ space instead. This is common with the U.S. reporter. This simple parsing change is the first half of addressing #30; figuring out how to match these partial citations will have to be done in CL.

Some other things I thought of but didn't include:

  1. Should we add a metadata flag to the citation objects like has_proper_page_number or something? This would be False if the page number is all underscores, True otherwise. I wasn't sure if this would actually be helpful or not.
  2. How should this change affect hashing? Would we expect 1 U.S. __ to hash equivalently to 1 U.S. ___ (identical expect one "placeholder page" has more underscores than the other)? Or should neither hash to the other at all, since maybe these are two completely different citations that just happen to have the same volume and reporter?
  3. Are there other characters besides underscores that courts use to indicate a placeholder for an eventual page number?

Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this looks simple enough, thanks Matt!

@jcushman do you have time to take a look and chime in before we merge?

(A couple questions/comments in the review.)

tests/test_ResolveTest.py Outdated Show resolved Hide resolved
tests/test_FindTest.py Outdated Show resolved Hide resolved
@mlissner
Copy link
Member

  1. Should we add a metadata flag to the citation objects like has_proper_page_number or something? This would be False if the page number is all underscores, True otherwise. I wasn't sure if this would actually be helpful or not.

See my thought about this above.

  1. How should this change affect hashing? Would we expect 1 U.S. __ to hash equivalently to 1 U.S. ___ (identical expect one "placeholder page" has more underscores than the other)? Or should neither hash to the other at all, since maybe these are two completely different citations that just happen to have the same volume and reporter?

I forget how we use this hash. Can you elaborate and maybe that help me form an opinion?

  1. Are there other characters besides underscores that courts use to indicate a placeholder for an eventual page number?

Not that I know of. @flooie have you seen anything weird in the wild?

@mattdahl
Copy link
Contributor Author

I forget how we use this hash. Can you elaborate and maybe that help me form an opinion?

It's primarily for resolve_citations() to see if citations are equal enough to be grouped together. I think that we should be conservative and say that missing page full citations do NOT hash to any other full citations so there's no risk of false positives. If the user wants to handle these differently (e.g., in CL we're going to do that "search +-1 year" thing) they can implement a custom resolve_full_citation() function.

@jcushman
Copy link
Contributor

@jcushman do you have time to take a look and chime in before we merge?

Not with anything well informed. :)

For Id. cites I imagine they tend to also give a parallel cite with page numbers, e.g. S.Ct. or a docket number, and you could cite to that? Like "Foo v. Bar, 1 S.Ct. 3, 4 U.S. __ (2000). ... Id. at (something)". I'm not sure how that's handled, though. It wouldn't be a terrible idea to grep through for 2 or 3 underscores and see how these are actually used. Possibly one answer is there's always a parallel cite that's better for resolution of the citation.

@mlissner
Copy link
Member

I think that we should be conservative and say that missing page full citations do NOT hash to any other full citations

Sounds good to me.

@mlissner
Copy link
Member

I see you working, but let me know when this is ready for another review.

@mattdahl
Copy link
Contributor Author

For Id. cites I imagine they tend to also give a parallel cite with page numbers, e.g. S.Ct. or a docket number, and you could cite to that? Like "Foo v. Bar, 1 S.Ct. 3, 4 U.S. __ (2000). ... Id. at (something)". I'm not sure how that's handled, though. It wouldn't be a terrible idea to grep through for 2 or 3 underscores and see how these are actually used. Possibly one answer is there's always a parallel cite that's better for resolution of the citation.

I found this to be sometimes the case but often the Id. cite is just Id., at ___, so I changed our behavior to just not match those. Enhancing this with the parallel cite info could be done as a part of #76 perhaps.

Sounds good to me.

Hashability changes done in 08b9625.


N.B. I also had to make a change to the joke cite (5ee4145) because it was causing an issue with the CitationBase.__post_init__ method for some reason after I implemented the missing page normalization logic.

@mattdahl mattdahl requested a review from mlissner July 20, 2022 17:59
Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple little things, but looks close.

eyecite/models.py Outdated Show resolved Hide resolved
tests/test_FindTest.py Outdated Show resolved Hide resolved
@mattdahl mattdahl requested a review from mlissner July 21, 2022 15:43
@mlissner mlissner merged commit 7de3b9e into freelawproject:main Jul 22, 2022
@mlissner
Copy link
Member

very nice, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants