Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bibtex support #94

Open
jeisner opened this issue Dec 25, 2023 · 9 comments
Open

bibtex support #94

jeisner opened this issue Dec 25, 2023 · 9 comments

Comments

@jeisner
Copy link

jeisner commented Dec 25, 2023

Might you consider adding BibTeX support?
Many people miscapitalize titles in their BibTeX entries.
The title is supposed to look like the output of the titlecase module, which is why this person used the module.

However, substrings that have "inherent case" are supposed to be enclosed in protective {} to prevent individual BibTeX styles from changing their case. It seems easier to handle that within titlecase than by some kind of postprocessing.

@ppannuto
Copy link
Owner

IIUC, I believe title casing rules in the *tex universe should be defined by the .bst file controlling bibliography generation. Latex/bibtex/natbib/etc have their own built in titlecasing engines, the casing in the source bibtex doesn't matter (beyond {} as an escape).

I don't think this is something that makes sense to build into titlecase (though feel free to re-open if you feel strongly otherwise).

@jeisner
Copy link
Author

jeisner commented Mar 20, 2024

No, bibtex doesn't do titlecasing! That's why I spend too much time fixing my co-authors' incorrect .bib files. :-/

.bib files have to already provide titlecase as input because bibtex is not intended to be smart enough to know which words would be uppercased by titlecase.

This titlecase entry can then be modified by the .bst style, for example to leave book titles alone but do more lowercasing in article titles (when not protected by {}).

I wasn't very explicit about the kind of bibtex support that I was suggesting. If I recall, I was thinking of a bibtex=True flag that says

  • don't change anything that is already enclosed by {}
  • enclose words like IBM and iPhone in {} because they have internal capitalization
  • ideally, figure out to enclose words like Bayesian in {} because they are nearly always capitalized in a corpus even when not at the start of a sentence

@jeisner
Copy link
Author

jeisner commented Mar 20, 2024

though feel free to re-open if you feel strongly otherwise).

I don't seem to have the power to do so, though I can still comment.

@ppannuto
Copy link
Owner

Ah, I understand what you're going for now—more like a "include escape sequences for bibtex in output" mode?

That seems like a reasonable thing to add. Not something I'll have a chance to implement any time in the forseeable future, but I'm happy to take a PR that adds support.

@jeisner
Copy link
Author

jeisner commented Mar 22, 2024

more like a "include escape sequences for bibtex in output" mode?

Yes, and also respect them if they're in the input.

If you can reopen the issue, maybe I or someone else can get around to it at some point. Thanks.

@ppannuto ppannuto reopened this Mar 25, 2024
@csjaugustus
Copy link

csjaugustus commented Jul 31, 2024

Not sure if this will help but for me, wrapping a correctly capitalized title inside {{}} in the bib.ref file results in correctly rendered titles.

I just put the result of titlecase in a pair of double curly braces. Doesn't seem to matter how the rest of the bibtex entry is formatted. They could be wrapped in curly braces or quotes.

CleanShot 2024-07-31 at 09 36 29

@jeisner
Copy link
Author

jeisner commented Jul 31, 2024

Not sure if this will help but for me, wrapping a correctly capitalized title inside {{}} in the bib.ref file results in correctly rendered titles.

Many people commit this sin. But the extra layer of {} turns off all of the bibtex functionality about changing case. So that is the wrong way to make a .bib file.

The titles are "correctly rendered" "for you" because you are using a bibliography style (.bst file) that actually wants to show the titlecase version that you put between {{}}: "A Method for CRISPR Control".

But the whole point of a .bib file is that it should work with different .bst files. E.g., if you resubmit to another journal that wants titles to be formatted like "A method for CRISPR control", you should only change the .bst file and it will handle the intended downcasing for you.

You do want to surround the single word CRISPR with the extra layer of {} in order to protect just that word from downcasing.

title = {A Method for {CRISPR} Control}

and this was the subject of my original issue about "inherent case".

@alerque
Copy link

alerque commented Jul 31, 2024

I would suggest this is out of scope for a casing library. That being said I would also suggest it would be a great tool to have! I would just envision it as a separate project dedicated to bibtex casing that handles loading the bibliographies, parsing out the format specific issues like {} wrapped words, then passes the segments to a casing library like this one for the title case work, then returns the normalized file.

If I were to sit down and implement this I would probably do it in Rust and use my own decasify project as a library to handle the casing. It could also be done in Python using this project as a library for English, and optionally my decasify as a Python library to handle languages other than English or style guides other than this one.

Is anyone interested in working on such a project? Besides casing of titles, is there other normalization tasks that might benefit from automating on the input content side rather than the output style side?

@jeisner
Copy link
Author

jeisner commented Jul 31, 2024

I would just envision it as a separate project dedicated to bibtex casing

Note that it is nontrivial to figure out which words should be protected with {} in the first place.
People often leave out this protection, and then words with inherent capitalization such as person names and system names are incorrectly downcased (or upcased) by the .bst file.
A good heuristic is to see whether the word consistently retains capitalization in a text corpus even when not at the start of a sentence (Bayesian). Another is to see whether it has internal capitalization (CamelCase, iPhone, IBM).

Besides casing of titles, is there other normalization tasks that might benefit from automating on the input content side rather than the output style side?

Quite a few, although casing is the worst offender.
Some are minor, like using -- rather than - for page ranges.
There are also questions of consistency within the bibfile: should you warn -- or even fix -- if only some conference proceedings have publisher or editor fields? if only some papers have URLs? if the same conference/journal name is abbreviated in one place but not in another?

You might call such a tool biblint.

A year ago I wrote a GPT-3.5 prompt that is good at cleaning up bib files, so it explicitly lists the main issues I know about. Happy to send it to you. My workflow is to use AI to edit the file, and then a diff tool to review the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants