Skip to content
This repository has been archived by the owner on May 8, 2024. It is now read-only.

v0.14.0 #481

Merged
merged 85 commits into from
Feb 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
e525a3a
docs: generate a new accuracy plot
github-actions[bot] Jan 16, 2024
0fbf3a4
docs: update README
github-actions[bot] Jan 16, 2024
f0c614b
chore: build module
github-actions[bot] Jan 16, 2024
4855bb3
add swerik id
SimonHallen Jan 17, 2024
a0cee5e
2024-01-18_reviewing_chairs
SimonHallen Jan 18, 2024
f345139
2024-01-19_reviewing_chairs
SimonHallen Jan 19, 2024
e66b7b8
feat: add unittest based on manually curated data
BobBorges Jan 22, 2024
6fd827e
2024-01-22_reviewing_chairs
SimonHallen Jan 22, 2024
d544b1c
fix: actually run the test
BobBorges Jan 22, 2024
b4c245f
fix: renamed pyriksdagen function
BobBorges Jan 22, 2024
007c628
2022_01_23_1/2_reviewing_chairs
SimonHallen Jan 23, 2024
e48dcec
2022-01-23_reviewing_chairs_2/2
SimonHallen Jan 23, 2024
9f19b1d
2023_01_24_reviewing_chairs
SimonHallen Jan 24, 2024
f8c1178
feat: pass args to unittests when running locally
BobBorges Jan 25, 2024
11659df
fix: handle dates better (sorry for this hot mess)
BobBorges Jan 25, 2024
3e761dd
2023-01-25_reviewing_chairs
SimonHallen Jan 25, 2024
cd0556a
2024-01-26_reviewing_chairs
SimonHallen Jan 26, 2024
b4ddbae
2024_01_29_reviewing_chairs
SimonHallen Jan 29, 2024
bfc6521
feat: impute MP dates based on riksdagår start-end
BobBorges Jan 31, 2024
667c689
fix: 1923 did not start in 1905
BobBorges Jan 31, 2024
cc98dfd
refactor: work with new date handling implementation
BobBorges Jan 31, 2024
f1b1450
fix: most recent version line on top of z-index
BobBorges Jan 31, 2024
13357e3
chore: update after new date handling
BobBorges Jan 31, 2024
a5b60b2
refactor: rewrite pipeline from scratch
ninpnin Feb 2, 2024
c592ce2
refactor: use external library for ALTO XML
ninpnin Feb 2, 2024
5bcec68
refactor: add XML_NS as a shared variable for the package
ninpnin Feb 2, 2024
f01f516
fix: correct amount of padding when printing XML
ninpnin Feb 2, 2024
6bb0190
refactor: rewrite pipeline from scratch
ninpnin Feb 2, 2024
68a3853
fix: digital originals pipeline
ninpnin Feb 2, 2024
e57b77c
fix: missing dependency
ninpnin Feb 2, 2024
1fcf4e3
refactor: remove unnecessary code
ninpnin Feb 2, 2024
8d6277b
fix: make digital originals first ID unique
ninpnin Feb 2, 2024
b466c4f
fix: archive login bug
ninpnin Feb 2, 2024
f1bb9c5
feat: check what protocols exist and run pipeline with the same script
ninpnin Feb 2, 2024
75e5f5e
fix: remove debugging 'break' statement
ninpnin Feb 2, 2024
c2ab1e2
fix: remove unnecessary imports
ninpnin Feb 2, 2024
5355f73
refactor: split ALTO processing into three functions for generalizabi…
ninpnin Feb 5, 2024
560160f
fix: add docstring and wrong var bug
BobBorges Feb 5, 2024
3a0f185
feat:pipe local alto files
BobBorges Feb 5, 2024
d5995aa
pull from other working branch
BobBorges Feb 6, 2024
54fdaf2
fix: date inclusivity
BobBorges Feb 6, 2024
e75b468
chore: reformat csv (comma, no extra cols)
BobBorges Feb 6, 2024
4525787
fix: paragraph ID seed
ninpnin Feb 6, 2024
ae88210
chore: merge branch 'pipeline-refactor' of github.com:welfare-state-a…
ninpnin Feb 6, 2024
b63bade
fix: some errors
BobBorges Feb 6, 2024
92ec370
fix: better date handling
BobBorges Feb 6, 2024
4b5431f
feat: mapping between swerik_ids and biobook references
BobBorges Feb 8, 2024
34a116d
fix: rm edition statement
BobBorges Feb 9, 2024
6ac68db
Merge branch 'pipeline-refactor' of github.com:welfare-state-analytic…
BobBorges Feb 9, 2024
44297b0
fix: sort local input filenames
BobBorges Feb 9, 2024
2e9a6ab
fix: remove default version number
ninpnin Feb 9, 2024
0038534
fix: errors failing the tests
BobBorges Feb 9, 2024
2ef1eed
fix: errors failing the tests
BobBorges Feb 9, 2024
944011b
chore: merge pull request #468 from welfare-state-analytics/pipeline-…
ninpnin Feb 9, 2024
126a16f
fix: change stupid filename
BobBorges Feb 9, 2024
d9109b4
chore: pull from wd
BobBorges Feb 9, 2024
205c53c
docs: describe references_map file
BobBorges Feb 9, 2024
8ada544
Merge pull request #474 from welfare-state-analytics/biobook-map
BobBorges Feb 9, 2024
4bc2158
Merge pull request #452 from welfare-state-analytics/date-handling
BobBorges Feb 9, 2024
b88dbbe
reviewing_chairs_2024-02-12
SimonHallen Feb 13, 2024
d37555f
reviewing_chairs_2024-02-14
SimonHallen Feb 14, 2024
46f4b6a
reviewing_chairs_2024-02-19
SimonHallen Feb 19, 2024
7c64156
feat: post-1994 iorter, and refactor: start rearranging corpus as pla…
BobBorges Feb 20, 2024
ce2594a
refactor: relocate db unittest data
BobBorges Feb 20, 2024
6aba700
refactor: db test infiles in new location, write local output accordi…
BobBorges Feb 20, 2024
41ca7e1
chore: query metadata after adding moder orter
BobBorges Feb 20, 2024
b149a46
chore: add readme
BobBorges Feb 20, 2024
bc10d80
fix: path stuff
BobBorges Feb 20, 2024
8ea6f33
reviewing_chairs_2024-02-21
SimonHallen Feb 21, 2024
cd3a97b
fix: sort chair_mp.csv
ninpnin Feb 22, 2024
3ecb915
fix: sort chair_mp.csv properly
ninpnin Feb 22, 2024
1cebd40
fix: sort chair_mp.csv properly
ninpnin Feb 22, 2024
ba397b6
fix: remove duplicates
ninpnin Feb 22, 2024
8374790
chore: merge
ninpnin Feb 22, 2024
ff3fa04
feat: new test
BobBorges Feb 22, 2024
ed5c349
feat: get outpath for all test results
BobBorges Feb 22, 2024
fa4e149
feat: run empty speeches test on push
BobBorges Feb 22, 2024
fac4e4b
style: cleaning up formatting
BobBorges Feb 22, 2024
dd89986
fix: filename
BobBorges Feb 22, 2024
bde9c99
chore: merge pull request #480 from welfare-state-analytics/empty-spe…
ninpnin Feb 23, 2024
636f0a4
Merge branch 'dev' into iorter
BobBorges Feb 23, 2024
7280c5b
refactor: input file paths as argparse args
BobBorges Feb 23, 2024
5ed5cff
Merge branch 'iorter' of github.com:welfare-state-analytics/riksdagen…
BobBorges Feb 23, 2024
eca20de
chore: merge pull request #463 from welfare-state-analytics/seats
ninpnin Feb 23, 2024
5494fd6
Merge pull request #477 from welfare-state-analytics/iorter
BobBorges Feb 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions .github/workflows/push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,26 @@ jobs:
run: |
python -m unittest test.chairs

empty-speech:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .
pip install PyPDF2
- name: Test there are no empty u or seg elements
run: |
python -m unittest test.empty-speech

mp:
runs-on: ubuntu-latest
strategy:
Expand Down Expand Up @@ -189,3 +209,23 @@ jobs:
- name: Check that README updating script works
run: |
PYTHONPATH="$PYTHONPATH:." python scripts/stats-dashboard/generate-markdown.py -v v1.1.1

mandates:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .
pip install PyPDF2
- name: Test manually curated mandate dates do not change
run: |
python -m unittest test.mandates
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
[![Validate Parla-Clarin XML](https://github.com/welfare-state-analytics/riksdagen-corpus/actions/workflows/validate.yml/badge.svg)](https://github.com/welfare-state-analytics/riksdagen-corpus/actions/workflows/validate.yml)


# Swedish parliamentary proceedings --- 1867--today --- v0.13.0
# Swedish parliamentary proceedings --- 1867--today --- v0.13.1

_Westac Project_, 2020--2024 |
_Swerik Project_, 2023--2025
Expand Down Expand Up @@ -39,18 +39,18 @@ The data in the corpus is delivered as TEI XML files to follow established pract

Currently, we have an extensive set of Parliamentary Records (Riksdagens Protokoll) from 1867 until now. We are in the process of preparing Motions for inclusion in the corpus and other document types will follow.

| | v0.13.0 | v0.12.0 | v0.10.0 |
| | v0.13.1 | v0.13.0 | v0.12.0 |
|--------------------------------------|------------|------------|------------|
| Corpus size (GB) | 5.48 | 5.59 | 5.58 |
| Corpus size (GB) | 5.48 | 5.48 | 5.59 |
| Number of parliamentary records | 17642 | 17642 | 17642 |
| Total parliamentary record pages* | 1045458 | 1045458 | 1041807 |
| Total parliamentary record speeches | 1014214 | 1014214 | 1127027 |
| Total parliamentary record words | 442634322 | 442634322 | 441525242 |
| Total parliamentary record pages* | 1045458 | 1045458 | 1045458 |
| Total parliamentary record speeches | 1014214 | 1014214 | 1014214 |
| Total parliamentary record words | 442634322 | 442634322 | 442634322 |
| Number of Motions | 0 | 0 | 0 |
| Total motion pages | 0 | 0 | 0 |
| Total motion words | 0 | 0 | 0 |
| Number of people with MP role | 5975 | 5975 | 5975 |
| Number of people with minister role | 546 | 535 | 535 |
| Number of people with minister role | 546 | 546 | 535 |

\* Digital original parliamentary records for some years in the 1990s are not paginated and thus do not contribute to the page count.See also §_Number of Pages in Parliamentary Records_.

Expand Down Expand Up @@ -110,4 +110,4 @@ If you find any errors, it is possible to submit corrections to them. This is do
<img src="scripts/stats-dashboard/figures/logos/vr.png" width="250"/>

---
Last update: 2024-01-15, 08:56:24
Last update: 2024-01-16, 12:52:56
8 changes: 8 additions & 0 deletions corpus/metadata/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,14 @@ Individual level data such as born, gender, etc. for all persons in the metadata
- *gender*: gender
- *riksdagen_id*: id for riksdagen open data individual

### references_map.csv

This links SWERIK person_ids to references in the biography books.
- `swerik_id`: person ID
- `bibtex_key`: bibtex key from `../references/`
- `wiki_id`: wiki_id of the book referenced (volume can be identified by the bibtex key)
- `page`: page where the person identified is referenced

### speaker.csv

Same as member_of_parliament.csv but for speakers.
Expand Down
Loading
Loading