Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github actions and tests #166

Closed
wants to merge 41 commits into from
Closed

Github actions and tests #166

wants to merge 41 commits into from

Conversation

Lathrisk
Copy link
Contributor

@Lathrisk Lathrisk commented Nov 2, 2022

Related issues
#test

Description
Adding tests, and github actions workflows.

@Lathrisk Lathrisk self-assigned this Nov 2, 2022
@Lathrisk Lathrisk marked this pull request as ready for review November 2, 2022 12:07
@Lathrisk Lathrisk marked this pull request as draft November 2, 2022 12:07
@Lathrisk Lathrisk changed the title actions: Link check Github actions and tests Nov 2, 2022
@Lathrisk
Copy link
Contributor Author

Lathrisk commented Nov 3, 2022

@duncandewhurst @lgs85 (cc @odscjames)
I think this covers the Generic tests described in the linked comment.

I thought it might make sense to apply fixes for the tests on this branch, either by correcting files or adjusting the parameters for the tests themselves, and merge once we have all tests passing. mdformat will correct in place, I assume safely, but other approaches might be needed for CSV and JSON.

I'm going to start looking at the schema specific tests, but thought it might be useful to continue those a separate PR, so let me know how you'd like to split that work out.

@duncandewhurst
Copy link
Collaborator

duncandewhurst commented Nov 4, 2022

Thanks, @Lathrisk!

I think that mdformat --check ./docs/**/*.md isn't checking the full depth of the docs directory so the files in /docs/reference/publication_formats/ are not checked.

I thought it might make sense to apply fixes for the tests on this branch, either by correcting files or adjusting the parameters for the tests themselves, and merge once we have all tests passing.

Yes, that makes sense. I've run mdformat and committed the changes. (Edit: pytest reports some issues that still need to be resolved).

Can you configure linkcheck to report only broken links? Currently, it's quite hard to spot the broken links amongst all the successful links.

I'm going to start looking at the schema specific tests, but thought it might be useful to continue those a separate PR, so let me know how you'd like to split that work out.

I don't have a preference, whatever makes sense to you - I don't know if you need to hang the schema-specific tests off the Github actions workflow that you have added in this PR.

@Lathrisk
Copy link
Contributor Author

@duncandewhurst cc @odscjames

Edit: pytest reports some issues that still need to be resolved

  • Some of the codelist files use capitalised headers and other don't. The tests expect the first header to be code, not Code, for example.
  • Many of the CSV files have windows line endings (\r\n) and the tests expect unix line endiings (\n).
  • I think a lot of the JSON files have the incorrect indentation to pass the tests.

I'm not sure how opinionated we are on each of these issues. We could update the code, adapt the tests, or remove the tests in each instance. The current tests are opintionated based on the JSCC library

Can you configure linkcheck to report only broken links? Currently, it's quite hard to spot the broken links amongst all the successful links.

I'll have a look, I assume that will be relatively trivial.

I think that mdformat --check ./docs/**/*.md isn't checking the full depth of the docs directory so the files in /docs/reference/publication_formats/ are not checked.

Double asterisk should match to any depth. I'll look into that and see if there's a reason it isn't working.

@duncandewhurst
Copy link
Collaborator

duncandewhurst commented Nov 29, 2022

@Lathrisk I reverted your last commit (458b6ff) because it erroneously removed whitespace within codelist descriptions.

Let's go with the JSCC defaults, which I understand to be:

  • Capitalised headers for codelist files
  • Unix line endings for CSV files
  • 2-space indentation for JSON files

I've merged in 0.1-dev and updated everything so that the tests pass. I also had to make some changes to the tests.

A couple of questions:

  • Should we be using a codelist schema like OCDS?
  • Why are there separate requirements files for the tests?

Edit: I forgot to say that the change to the pre-commit command in manage.py depends on OpenDataServices/flatten-tool#410 being merged.

@Lathrisk
Copy link
Contributor Author

@duncandewhurst
The separate requirements file replicate the division of dependencies in the BODS standard. AFAIK is just a separation of the dependencies needed to build vs test. I'm don't have strong opinions on that though.

I reckon a codelist schema would remove any ambiguity in things like header formatting etc. It seems like a good addition. I've not written any json schema but can take a look. I suspect a modified version of the one you linked to. It's the regex that might catch me out on the code property (and mapping that to what's in the CSVs currently)!

Simple codelist schema missing regex for the Code value. WIP.
@duncandewhurst
Copy link
Collaborator

Thanks!

Unless there's a particular reason to separate the dependencies for building and testing, I'd prefer a single requirements file to keep the process for installing and updating requirements simpler.

I think that we can start with the pattern below for Code, check which codes fail and update it accordingly. It's based on the pattern from the OCDS codelist schema with the following changes:

^([a-z]+([A-Z][a-z]+)*|[A-Z]+[0-9]?|[a-z]+(.[a-z]+)*|LineString|Point|(application|audio|font|example|image|message|model|multipart|text|video)/[\\w.+-]+|[A-Z]{2}([-_][A-Z]+)+|G.65[1-9](\.1)?|tag:opentelecomdata\.net,2022:(nodes|spans)(API|File))$

@odscjames
Copy link
Collaborator

Double asterisk should match to any depth. I'll look into that and see if there's a reason it isn't working.

From what I read this only works if a special option is turned on in the shell. It looks like "mdformat docs" works recursively so I'll use that. (Will put in pre-commit too)

@odscjames
Copy link
Collaborator

This PR is being split into separate ones, and applied one at a time. There were to many commits and to many things being done at once here to comment on and finish up effectively. If we had tried to finish this one, it would be ages before any improvements made it to a working branch - by splitting it up we have already merged some improvements to a working branch.

We'll leave this PR open, and only close it when we are sure all work from here has been moved to other PR's.

This is going to be careful manual work, there aren't shortcuts here we can take. We'll just have to check things carefully as we make new PR's, and check this carefully before closing it.

eg in #208 we missed that something we asked a question about was already fixed in commit 1ea4598 - however the commit message for that commit just said "docs: Run mdformat " and made no reference to the fact that some content had been edited by hand. There was also no change log entry. A nice example of why we want to ensure good commit history - it really helps when going back through things later!

@odscjames
Copy link
Collaborator

odscjames commented Dec 14, 2022

I think all relevant bits from this have been put into 0.1-dev and this can be closed?

This was tested by manually copying all the files from this PR over the latest files, then looking at the git diffs to make sure that nothing unexpected turned up.

@duncandewhurst
Copy link
Collaborator

duncandewhurst commented Dec 15, 2022

I think all relevant bits from this have been put into 0.1-dev and this can be closed?

I'm struggling to validate this, why does the diff between this branch and 0.1-dev still show lots of changes? e.g.

image

@odscjames
Copy link
Collaborator

How are you getting that diff? I just tried the /compare function on GitHub (cos it looks like a GitHub screenshot) and the results were not useful at all - it looked like it was getting confused by the branch separation.

For the example you highlighted, if you look at :

You can see they are actually the same.

@duncandewhurst
Copy link
Collaborator

I used the following link: 0.1-dev...github-actions

I don't want to check individual files for differences manually as it'll be time-consuming and I'm likely to miss things so I was hoping that I could get a usable diff to compare.

@odscjames
Copy link
Collaborator

Yeah, GitHub is not helping here.

Try:

git diff  ded3cb39d35ced8d41388adf32fe8be48831f061  062de09446dc5cb96968c462f85f8c5d7e609da8

It starts:

diff --git a/.github/workflows/linkcheck.yml b/.github/workflows/linkcheck.yml
index 7c03d61..284fd8e 100644
--- a/.github/workflows/linkcheck.yml
+++ b/.github/workflows/linkcheck.yml
@@ -4,7 +4,6 @@ on: [push, pull_request]
 jobs:
   linkcheck:
     runs-on: ubuntu-latest
-    continue-on-error: true
     steps:
     - uses: actions/checkout@v2
     - name: Setup python

In this case the minus shows things in the PR that aren't in the live branch - so continue-on-error is in the PR but not in the main branch. (For that one: I couldn't see any message explaining why it was needed and I couldn't see any good reason it should be there so I didn't copy it across)

@duncandewhurst
Copy link
Collaborator

Thanks for the advice! I reviewed the diff and it all looks as expected so I'll close this PR.

@duncandewhurst duncandewhurst deleted the github-actions branch January 30, 2023 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants