-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best format of input files? #85
Comments
Thanks @gassmoeller for this extensive issue. We totally agree with you, that widely-use formats for parsing input as the ones you have mentioned should be used instead of writing own parsers. The main reason we adopted this approach was practicality, i.e., no need to define the Python variable names (i.e., less characters in the file), and only change values using the provided configuration files in the examples folder. Then, to address this issue and comply with the guiding principles of FAIR research software you mentioned, the coming PR will add support for TOML files, i.e., we will keep the existing parsing for compatibility with previous configuration files, while TOML format will be used for input files with toml extension, e.g., |
Support for toml input files has been added in #102, thanks again @gassmoeller. |
Thanks for the quick implementation, I am glad you picked up the idea. 🎉
I understand the wish to stay compatible with existing input file. I would recommend to then clearly state in the documentation that the text based format is frozen/deprecated with some message, and after a reasonable time ultimately remove the text format. Keeping two input formats of which one is increasingly outdated will split your user base and eventually confuse new users. This is something for the future of course. |
This is a slightly difficult issue to write, because I need to strike a balance between the purpose of this project (which is good and I support) and one of the fundamental decisions you made when creating the project (which I like to initiate a discussion about). I would like to discuss and get your opinion about the format of the input files that you chose.
Just as a basis for discussion, this is the format you chose:
Just from a data perspective, this is a:
x=y
)"""
stands for that)Such a type of input format has a number of serious challenges:
utils/inputvalues.py
)I am not pointing this out, because I want to diminish your effort. Constructing data formats and writing parsers is just hard (e.g. one of the projects I contribute to that uses a custom parser has a parser of 2500 lines of C++ code), which is why many state-of-the-art software makes use of existing file formats and established parser libraries that can read them.
Currently the following formats are widely in-use:
E.g. the input file listed above, but converted to TOML could look like:
And this file could be read into a python dictionary as simple as:
You can then check that all necessary entries in the dict exist with the correct type, or create default values if parameters are not given in the file.
I understand that this is one of the basic assumptions about the user interface of pyopenspe11 and you already distributed this to a number of users, but I would like to at least urge you to consider to switch a better data format. You will make your own life and that of your users unbelievably hard with the current data format, and better formats are available and easy to use / implement. Additionally using standard data formats is one of the guiding principles of FAIR research software (https://www.go-fair.org/fair-principles/, I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.](https://www.go-fair.org/fair-principles/i1-metadata-use-formal-accessible-shared-broadly-applicable-language-knowledge-representation/).
Related to openjournals/joss-reviews#7357.
The text was updated successfully, but these errors were encountered: