Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic GAF pass-through check for UTF-8 #1515

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion metadata/rules/gorule-0000001.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,25 @@ contact: "[email protected]"
implementations:
- language: python
source: https://github.com/biolink/ontobio/blob/master/ontobio/io/gafparser.py
examples:
repair:
- comment: (UTF-8) CJK can pass through without modification (隣のトトロ)
format: gaf
input: "FB FBgn0033449 隣のトトロ GO:1902361 FB:FBrf0202953|GO_REF:0000024 ISS UniProtKB:Q05516 F protein taxon:7227 20171127 FlyBase"
output: "FB FBgn0033449 隣のトトロ GO:1902361 FB:FBrf0202953|GO_REF:0000024 ISS UniProtKB:Q05516 F protein taxon:7227 20171127 FlyBase"
- comment: (UTF-8) Accent marks can pass through without modification (Astérix_le_Gaulois)
format: gaf
input: "FB FBgn0033449 123_456 GO:1902361 FB:FBrf0202953|GO_REF:0000024 ISS UniProtKB:Q05516 F protein taxon:7227 20171127 Astérix_le_Gaulois"
output: "FB FBgn0033449 123_456 GO:1902361 FB:FBrf0202953|GO_REF:0000024 ISS UniProtKB:Q05516 F protein taxon:7227 20171127 Astérix_le_Gaulois"
- comment: (UTF-8) Greek latters can pass through without modification (αΩ)
format: gaf
input: "α FBgn0033449 123_456 GO:1902361 FB:FBrf0202953|GO_REF:0000024 ISS UniProtKB:Q05516 F protein taxon:7227 20171127 Ω"
output: "α FBgn0033449 123_456 GO:1902361 FB:FBrf0202953|GO_REF:0000024 ISS UniProtKB:Q05516 F protein taxon:7227 20171127 Ω"
---
Each line of a GAF file is checked that it generally conforms to the GAF 2.1 spec and some
GO specific specifications. The GAF 2.1 spec is here: http://geneontology.org/page/go-annotation-file-gaf-format-21.

Qualifier, evidence, aspect and DB object columns must be within the list of allowed values
(as per the spec).

Error report (number of errors) in [db_species]-summary.txt & owltools-check.txt (details).
Error report (number of errors) in [db_species]-summary.txt & owltools-check.txt (details).