Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect historical or non-standard citations #103

Open
lmullen opened this issue Nov 19, 2021 · 6 comments
Open

Detect historical or non-standard citations #103

lmullen opened this issue Nov 19, 2021 · 6 comments

Comments

@lmullen
Copy link

lmullen commented Nov 19, 2021

First, thanks for eyecite. @kfunk074 and I are historians working on American legal history, and we intend to use eyecite for a project in progress.

eyecite does very well with citations from the twentieth century on (post-Bluebook?) but it does not detect citations from case reporters in the nineteenth century. To give one example: before Georgia created official case reports, the de facto standard reporter for Georgia's case law was Kelly's Reports. When Georgia began official reports, it adopted the first five volumes of Kelly as its official reports. So, 1 Kelly 254 = 1 Ga. 254 and so on. Of course citations before the Georgia reports all go to Kelly, but even after the official reports, Kelly might still be cited directly. eyecite will detect the Georgia reports, but not Kelly.

The same is true for basically every state jurisdiction in the U.S. I believe there are issues on this repository that are subsets of this problem. I suspect, e.g., that the reporters listed in this issue (#27) are the same kind of problem as described for Georgia. And it depends on the corpus, of course, but such citations can be a substantial body that are missed by eyecite.

We are currently compiling a list of these "antique" reporters. We would like to contribute a pull request that adds these reporters to eyecite. A few questions.

  1. Would such a pull request be welcome?
  2. If so, could you please give us some guidance about how best to do that. I've looked through the eyecite code, though not in great detail just yet. My understanding is that we would contribute the data to the reporters-db repo, but we could use some advice about how best to do so.
  3. On a secondary issue. Bluebook usually standardizes citations, e.g., to 3 Or. 534 for Oregon. But historically, it's common for such citations to be written as 3 Oreg. 534. We'd also like to contribute some variant abbreviations, and aren't sure what the best way to do that is.
@mlissner
Copy link
Member

Wow, this sounds great. A few replies:

  1. Would this be welcome?

Yes, absolutely. This is exactly the kind of thing we want eyecite to excel at.

  1. How?

The reporters_db is exactly the right answer for how to do this. The basic trick there is to add these reporters to reporters.json. Once they're added there, we'll update the reporters_db dependency here in eyecite, and that usually will do it. Tests are nice to have too, to make sure things work as expected.

As for how to update reporters_db, the readme should get you pretty far. If it doesn't answer something, I'd be happy to field questions. I'd suggest doing a small contribution first, then adding more and more as you get comfy.

  1. Variants?

reporters_db is organized around reporter abbreviations, and each has the ability to have variants. Here's Or. for example (using jq to filter to it):

↪ curl -s https://raw.githubusercontent.com/freelawproject/reporters-db/main/reporters_db/data/reporters.json | jq '."Or."'
[
  {
    "cite_type": "state",
    "editions": {
      "Or.": {
        "end": null,
        "start": "1853-01-01T00:00:00"
      }
    },
    "mlz_jurisdiction": [
      "us:or;supreme.court"
    ],
    "name": "Oregon Reports",
    "variations": {
      "O.": "Or.",
      "Or": "Or.",
      "Ore.": "Or."
    }
  }
]

You an see that there's a variations key that includes O., Or, and Ore.. If you wanted to add Oreg. that'd be the place to do it.

Thanks for this offer of help. It sounds like it'll add a lot of nice citation formats.

@lmullen
Copy link
Author

lmullen commented Nov 19, 2021

@mlissner Cool. Thanks for the prompt reply. Seems straightforward enough.

I wasn't clear whether the CSV or the JSON in reporters-db was the standard, but now I see it in the documentation. And the variations makes sense.

Two follow up questions:

  1. If we don't know the start and end dates for reporters, can we leave those fields as null? We have not specifically been collecting that data. What's the best approach?
  2. I'm not familiar with multilingual Zotero. Is there a list of the jurisdictions somewhere that we can consult, or should that field also be left blank if unknown? The README seems to indicate that it can be left blank, but wanted to verify.

@mlissner
Copy link
Member

  1. If we don't know the start and end dates for reporters, can we leave those fields as null?

You can, but if it's possible to have them, they're helpful. We use them in CourtListener to disambiguate citations. I don't have an example offhand, but what often happens is that there is a reporter of decisions who works in one court for a period of time and then moves to another one later. By having the dates of each, we can then figure out if a citation is from the first court or the second one.

So, null is fine, yes, but having them helps.

  1. I'm not familiar with multilingual Zotero. Is there a list of the jurisdictions somewhere that we can consult, or should that field also be left blank if unknown? The README seems to indicate that it can be left blank, but wanted to verify

Honestly, I'm not either, and I don't think these are used anywhere. We'll probably yank them eventually, but for now, what I do is just copy the values from other nearby reporters. Usually that works pretty well, if that'd be OK.

@lmullen
Copy link
Author

lmullen commented Nov 19, 2021

Okay, makes sense. And yes, I completely agree about disambiguation via dates.

We will send a sample pull request to make sure we are doing it right once we've got the right data.

@jcushman
Copy link
Contributor

jcushman commented Nov 20, 2021

When Georgia began official reports, it adopted the first five volumes of Kelly as its official reports.

In case you haven't come across this yet, the term in legal citation for this practice is "nominative", e.g. "1 Kelly" is the nominative citation and "1 Ga." is the official citation. There's a bug over on reporters-db here for this that includes some more examples.

reporters-db knows about some nominatives and not others -- it knows about all of the nominative citations that the Indigo Book knows about. So for example for Massachusetts it can recognize these nominative cites:

Allen 1861–1867 e.g., 83 Mass. (1 Allen)
Gray 1854–1860 e.g., 67 Mass. (1 Gray)
Cushing 1848–1853 e.g., 55 Mass. (1 Cush.)
Metcalf 1840–1847 e.g., 42 Mass. (1 Met.)
Pickering 1822–1839 e.g., 18 Mass. (1 Pick.)
Tyng 1805–1822 e.g., 2 Mass. (1 Tyng)
Williams 1804–1805 1 Mass. (1 Will.)

But the Bluebook (and thus Indigo Book, and thus reporters-db so far) doesn't know about any nominative cites for Georgia, so eyecite can't recognize 1 Kelly yet.

(reporters-db also knows about whatever else it's been told about, of course, so for example it knows about whatever nominative reporters came into CAP's collection listed over at https://cite.case.law/ )

I don't know why the Bluebook includes parallel nominative citations for some states and not others, but my hunch is that there's a line-drawing problem about figuring out whether a given nominative cite was ever actually used. When I was doing the research behind the bug I linked, you can see I petered out around 86 N.J. Eq., published in 1916, because there continue to be nominative citations for volumes after that but it's not clear they were ever used for anything -- and even the earlier ones were only used a handful of times in published cases. The books continued to be published with front matter suggesting that 86 N.J. Eq. may also be cited as 1 B. Stockton, but no one actually did.

Of course it's great to include any cite format that was in fact used, and it also makes sense to include every time a book itself provided nominative numbering, whether or not it was cited that way. I'm glad you're taking a look at it!

I'll also throw in for those with unrestricted CAP access that a reasonably quick way to check for nominatives is to look at the volume PDFs (which is how I generated the tables in that other bug). For example over in Georgia I can see that the second reporter, Cobb, numbered his first volume as 6 instead of 1, so I know not to expect nominative cites in Georgia after Kelly.

image

[Edit: but if cases in that volume were variously cited as "1 Cobb" and "6 Cobb" in addition to "6 Ga." I will not be shocked, of course. 😆]

@lmullen
Copy link
Author

lmullen commented Nov 23, 2021

@jcushman Thanks for the very helpful background. The term "nominative cite" is helpful; we had been using "antique cite" as shorthand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants