-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health Sector Git Repo Topic Ontology #35
Comments
feedback from Jonny: Consider removing the Meta-tag - we will already filter by organisation anyways, and the maturity tag basically fulfils the same role. |
Suggestion to have an opt-out (black list) topic rather than a white-list meta tag |
maybe "not-optimised-for-reuse" |
Technique and Domain suggested as the key areas that need topics mandating. |
I'd find useful the release status, WIP, done and ready, active (continuously being improved), and inactive (WIP but with no plans to keep working on it). |
Some of these might be an overkill, but just for consideration... |
Some interesting suggestions. A question I have... would they help you find useful code? For example, you can see if a repo is still "active" by whether it has been updated recently, so perhaps a "topic" for this doesn't add so much (though it would make iti slightly faster to see). They would indicate the code was potentially higher quality... but I suppose so would the "gold, silver etc." RAP topics...
I think specific types of algorithms might be good, but I wonder if by making the topics too granular we reduce their usefulness? It's difficult to know where to set the threshold though - so if you think it would benefit people to add those in, we can try it. I think database used, e.g. databricks, chroma, postgres, SQLserver, probably is important, under technology, as it really will affect how the functions are written if they're useful to you. Development tools, do you mean jupyter, Vscode, etc.? I think if they are very... proprietary, such as Databricks, and ipynb notebooks more generally, might be good to add as a "notebook" topic. But Vscode / Pycharm, probably shouldn't change how the code is... I'd assume! |
I'm going to move this to the RAP website and then we can continue to develop it there. |
Health Sector Git Repo Topic Ontology
!!! tip "TLDR"
- Apply topics to each of your published repos following the ontology described below
- Focus initially on topics related to technique and domain - these are what people are usually most interested in
- Then, you add even more value by adding other topics.
- There is a website which scans github for NHS github repositories and displays them by topic - making it easier to find useful code
??? question "Why should we care?"
- Applying topics for your repos will make it much easier to for you and others to find and reuse useful bits of code
- Using a common ontology will make the topics more useful - we will all be speaking the same language
??? success "Pre-requisites"
* Some information on what someone might need to be familiar with before they can use this page
A key aim of RAP is to not only automate our pipelines to re-use useful code in other work. This relies on us publishing the code as publicly as possible, and then making it easy to find these useful bits of code. Topics in github can help with this, however we will get the most benefit from topics by using a common topic vocabulary to describe our GitHub code repos.
The topic ontology described in this guide will ensure our code can be searched by:
!!! warning
## The Differences between "topics" and "tags"
In GitHub, tags and topics are different:
- Topics are labels applied to whole repos which describe them, like keywords. Each repo can up to twenty, and github is good at searching and sorting results by topics.
- Tags are labels applied to specific commits within a git repo, and it's how releases are made, e.g. v0.1.0 might be a tag applied to a specific commit locking in that this commit is Version 0.1.0.
Topics
Our aim with topics is to allow people to find code which might be useful to them, so they can reuse it. With this in mind, they usually want to know what kind of data the code was used on, in which language, if it was using the compatible datastructures (e.g. pandas, or pyspark) and how recently it was made / updated (people are less trustworth of ancient, dead code).
When applying topics to your code:
primary-care
hospital-episode-statistics
gpdpr
civil-registration-of-deaths
gdppr
artificial (perhaps if it was using artifical data)
forecasting
classification
regression
statistical-disclosure-control
deduplication
entity-resolution
record-linkage
summarisation
data-cleansing
data-validation
hyperparameter-tuning
artificial-data-generation
etc.
sparklyr
pandas
pyspark
polars
sqlalchemy
sqlalchemy-orm
numpy
sklearn
tensorflow
pytorch
scipy
etc.
r
sql
silver-rap
gold-rap
Using topics to find useful repos (and code)
You can search for repos by topic within github using the search bar (e.g., as seen here, with tips on github search syntax here) or you can use this helpful website which gathers the repos and topics from the various NHS organisations on GitHub.
The text was updated successfully, but these errors were encountered: