-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct order of disambiguation methods #30
Correct order of disambiguation methods #30
Conversation
@fbennett, I noticed that the spec currently doesn't mention how disambiguating cites by adding and expanding names affects the corresponding bibliographic entries. Can you quickly recall how this is supposed to work? A citation like "(Doe 2000, Doe 2000)" that is expanded to "(Joe Doe 2000, Jane Doe 2000)" requires that the bibliographic entries use a similarly or more detail name form (e.g. "J. Doe" wouldn't cut it). Is that how it works? |
Yep, that's exactly how it works. The settings in the bibliography are defaults, and will be overridden by the extended name parameters applied during disambiguation, if they are more specific. |
Thanks! I'll add something to that effect. |
I included some text relating to the effects of name disambiguation on the involved bibliographic entries. |
Another try:
|
After another off-list chat, another version:
|
Bumping. Obviously if we merge, the merge conflict would need to be resolved. I think a |
It's not clear to me how to reconcile this paragraph:
With the statement that adding names always takes place before expanding names:
With the ordering rule, that would imply that there is never a condition when you expand names . The order of the disambiguation methods is wrong I think. The first method should be Expand names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few places where I am unsure of the meaning or if the ordering is correct. @rmzelle @cormacrelf @adam3smith @fbennett, can you take a look?
1. Show more names | ||
2. Expand names (adding initials or full given names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Show more names | |
2. Expand names (adding initials or full given names) | |
1. Expand names (adding initials or full given names) | |
2. Show more names |
If expand names is used, that should take place before adding names, no? Otherwise, this paragraph never activates:
If cites cannot be (fully) disambiguated by expanding the rendered names, and if ``disambiguate-add-names`` is set to "true", then the names still hidden as a result of et-al abbreviation after the disambiguation attempt of ``disambiguate-add-names`` are added one by one to all members of a set of ambiguous cites, until no more cites in the set can be disambiguated by adding expanded names.
In the description of disambiguation methods (1) and (2) above, we assumed that | ||
each (disambiguated) cite has an unambiguous link to its bibliographic entry. To | ||
assure that each cite does in fact uniquely identify its entry in the | ||
bibliography, detail that distinguishes cites (such as names, initials, and full | ||
given names) must be shown in the corresponding bibliography entries. If this is | ||
not the case, disambiguation methods (1) and (2) also act on all members of a | ||
set of ambiguously cited bibliographic entries, until no more entries in the set | ||
can be unambiguously cited by adding (expanded) names. Each method only takes | ||
effect on the involved bibliographic entries after it has been used to | ||
disambiguate cites. | ||
|
||
A disambiguation attempt can also be made by rendering ambiguous cites with the | ||
``disambiguate`` condition testing "true" [Method (3)] (see `Choose`_). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what these lines mean. What are some examples here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sold. There are two things going on. One is that if your cite has to add initials/given names to unambiguously refer to a bib entry, then the bib entry should also end up with initials/givens. The way this was ensured previously is that people would have bibliographies with full names in them. Are there even any styles that don't do this? What sort of bibliography declines to print authors' full names? Also it's a bit tricky to implement (but not impossible) -- is it a real problem that people are experiencing? Are there actual style guides out there that require cite-style name disambiguation within the bibliography itself in order to safely use initials? Because if that's not the case, don't bother reading the rest here, it's not worth your time :) But generally the interpretation I've managed to eke out of the rest seems like a lot of work before I've seen it to be a problem at all.
It's a bit of a slog to make sense of the rest, but I guess that's the game we're playing. The best reading I can do is that "The overall goal is for bib entries to be uniquely citable such that a person reading the document can tell which entry a cite refers to, so we try to expand or add names in bib entries, only insofar as the ambiguous cites that refer to them get less ambiguous." Honestly, you can stop there, you don't need to be more detailed than that in the disambiguation spec and I'd prefer we didn't. I think what's meant by "Each method only takes effect..." is that the un-expanded versions of the rendered cites (i.e. <citation>
) built specifically for disambiguation against a specific entry should NOT stop being the basis for cites matching that entry during cite disambiguation, but you can just say "This does not affect the cite disambiguation process.". What this requires is:
- add/expand a name in a bibliography entry
- look up the same name in the dummy cites used to match other cites against, propagate the changes
- run through all the cites that matched previously again to check if any of them would be more finely disambiguated after the change. You can ignore the ones that actually refer to the entry.
- stop when they can't.
I think the intention is that this improves the yield for cite-disambiguation. You'd have to mount a tricky argument to show me it would help, because cite disambiguation is meant to stop just before it ceases to be effective; if you've already added the names/initials/givens from all the cites referred to in step (3) already (in their "corresponding bibliography entries"), can you even devise a test case that will trigger this uniqueness failsafe? I have no energy left to do that.
Altogether, it's like... OK... but disambiguation is already slow. This not only adds a pretty big computation, it makes it more difficult to make the existing stuff fast. If you think of matching a cite against a possible matching reference as a list of normalised, rendered cites built with a dummy cite for that reference, then you are no longer so free to optimise that list. (With your weapon of choice, DFAs, RegexSet, ...) You have to have the ability to do step (2) on those dummy rendered cites, and none of those code weapons can withstand that mutation. That list was one of the very few things that never changes as long as a reference doesn't change! So -- render them again, and each time you add a single initial to the bib entry, render more? At the very least these things are not "the same disambiguation methods" as for cites. They'll have to be re-written to reflect the is-cite-ambiguity-improved test. Any way you cut it, this is a LOT of work that's already solved by using full names in your bibliographies.
Pinging @rmzelle @cormacrelf @adam3smith @fbennett. Could you folks take a look at thes? |
@denismaier @bdarcus Could you also weigh in? |
Current citeproc-js behavior does apply expand-names first, before add-names, and this is the logical order. |
@bwiernik: Sorry for failing to respond on this. @cormacrelf and I had a discussion about this in the |
Great! After running a bunch of tests against current citeproc-js behavior, I figured this had to be the right setup. Did you @fbennett or @cormacrelf write tests for that? |
yup, I agree. If both types of disambiguation are used, add-givenname should be applied first (that's the right summary, yes?) |
I don't have a strong view on the add-givenname ordering but looks like this is a stylistic choice which y'all appear to agree is good. So it's good! |
@adam3smith Yes. That's right. |
This might not be relevant here, but ... setting aside the use of shortened
names in bibliographies, it can happen that adding names in the
bibliography, beyond its et-al-use-first constraint, is necessary in cites
and bib.
…On Sat, Nov 28, 2020 at 4:10 AM Brenton M. Wiernik ***@***.***> wrote:
@adam3smith <https://github.com/adam3smith> Yes. That's right.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#30 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAASMSQ263J5RP7ZP7OAAZTSR72THANCNFSM4AHQBJIA>
.
|
@fbennett Yes? I think that's clear from the spec?
|
In response to #25 and https://bitbucket.org/bdarcus/citeproc-test/issue/10/disambiguate_bycitedisambiguateconditiontx .
I'll accept this pull request once people have had a chance to review.