Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle non-ASCII Unicode symbols in CIF dictionary files #182

Closed
vaitkus opened this issue Oct 30, 2023 · 3 comments · Fixed by #181
Closed

Handle non-ASCII Unicode symbols in CIF dictionary files #182

vaitkus opened this issue Oct 30, 2023 · 3 comments · Fixed by #181
Labels
bug Something isn't working dic2owl Issue or PR related specifically to the dic2owl Python package

Comments

@vaitkus
Copy link
Collaborator

vaitkus commented Oct 30, 2023

The CIF2 format permits files to contain non-ASCII Unicode symbol. This is a somewhat novel development as earlier versions of CIF format and the related STAR format were restricted to the ASCII character set.

The CIF_CORE DDLm dictionary has been recently updated to use proper Unicode characters for Greek symbol instead of the LaTeX-like markup (e.g. α instead of \a). However, this seems to now trip up the STAR parser which is used by the dic2owl:

> dic2owl cif_core.dic

Fail value check, match only 0-208 in string '\n    The reciprocal space matrix for converting the U(ij) matrix of\n    atomic displacement parameters to a dimensionless beta(IJ) matrix.\n    The ADP factor in a structure factor expression:\n\n    t = exp - 2π^2^ ( U11 h h a* a* + ...... 2 U23 k l b* c* )\n    t = exp - 0.25  ( B11 h h a* a* + ...... 2 B23 k l b* c* )\n      = exp -       ( β11 h h + ............ 2 β23 k l )\n\n    The conversion of the U or B matrices to the β matrix\n\n        β =   C U C   =    C B C /8π^2^\n\n    where C is conversion matrix defined here.'
Traceback (most recent call last):
...
CifFile.StarFile.StarError: 
Star Format error: Data item "'\n    The reciprocal space matrix for converting the U(ij) matrix of\n    atomic displacement parameters to a dimensionless beta(IJ) matrix.\n    The ADP factor in a structure factor expression:\n\n    t = exp - 2π^2^ ( U11 h h a* a* + ...... 2 U23 k l b* c* )\n    t = exp - 0.25  ( B11 h h a* a* + ...... 2 B23 k l b* c* )\n      = exp -       ( β11 h h + ............ 2 β23 k l )\n\n    The conversion of the U or B matrices to the β matrix\n\n        β =   C U C   =    C B C /8π^2^\n\n    where C is conversion matrix defined here.'"... contains forbidden characters


@jamesrhester
Copy link

I think this is likely to be a bug in PyCIFRW so should be raised there

@CasperWA CasperWA added bug Something isn't working dic2owl Issue or PR related specifically to the dic2owl Python package labels Oct 31, 2023
@CasperWA
Copy link
Contributor

CasperWA commented Nov 1, 2023

The new version v4.4.6 of PyCIFRW fixes the issue 👍

@CasperWA CasperWA closed this as completed Nov 1, 2023
@CasperWA
Copy link
Contributor

CasperWA commented Nov 1, 2023

Reopened, as the issue will not be fixed until the requirements require at minimum v4.4.6.

@CasperWA CasperWA reopened this Nov 1, 2023
CasperWA added a commit that referenced this issue Nov 1, 2023
Fixes #182

Generate and update `cif-core.ttl` under ontologies.
@CasperWA CasperWA linked a pull request Nov 1, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dic2owl Issue or PR related specifically to the dic2owl Python package
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants