Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DMED-119 - BC-8072 - Update "master_schulcloud" from "master" #50

Merged
merged 673 commits into from
Oct 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
673 commits
Select commit Hold shift + click to select a range
128cd92
feat: LanguageMapper helper utility
Criamos Oct 25, 2023
d699b1f
feat: NormLanguagePipeline (normalize strings to 2-letter-language-co…
Criamos Oct 26, 2023
cb2d694
docs: explain SERLO_INSTANCE default setting
Criamos Oct 27, 2023
d9a998f
Merge pull request #94 from openeduhub/feat_languagemapper
Criamos Oct 27, 2023
6a37ea8
build/chore: upgrade dependencies (Scrapy 2.11 ...)
Criamos Nov 27, 2023
34c24cf
build: upgrade Dockerfile to Python v3.11.6
Criamos Nov 27, 2023
2c38142
build: upgrade valuespace_converter.Dockerfile to Python v3.11.6
Criamos Nov 27, 2023
9528934
build: introduce 'httpx' dependency for async requests
Criamos Nov 28, 2023
6e869d8
change: replace sync 'requests'-methods with async 'httpx'-methods
Criamos Nov 28, 2023
fb0512f
change: enable 'asyncio'-support in Scrapy settings.py
Criamos Nov 28, 2023
31e5459
change: replace synchronous 'requests'-calls with async 'httpx'-requests
Criamos Nov 28, 2023
ba51735
change: make web_tools.py methods async
Criamos Nov 28, 2023
2b1c12d
change: async "parse"-method / await web_tools
Criamos Nov 28, 2023
37a5044
change: enable Scrapy "PeriodicLog"-extension
Criamos Nov 28, 2023
ce6a5d8
change: async parse method
Criamos Nov 29, 2023
60f9f70
change: use a shared requests Session for GraphQL queries
Criamos Nov 30, 2023
0c2af18
fix: Scrapy DeprecationWarning ("REQUEST_FINGERPRINTER_IMPLEMENTATION")
Criamos Nov 30, 2023
9e10759
change: shared 'httpx.AsyncClient()' for es_connector.py
Criamos Nov 30, 2023
f1cfb39
change: shared 'httpx.AsyncClient()' for Thumbnail-Pipeline
Criamos Nov 30, 2023
fdbbd7f
build: introduce 'async-lru'-package to dependencies
Criamos Nov 30, 2023
83e525e
feat: LRU-Cache for Thumbnail-URLs, drop 'httpx' in Thumbnail-Pipelin…
Criamos Nov 30, 2023
0aafa96
feat: control async WebTools with Semaphores
Criamos Dec 6, 2023
be2e6d7
change: disable timeouts, use Semaphore
Criamos Dec 6, 2023
04b1739
fix: processThumbnailPipeline returned coroutine instead of item
Criamos Dec 6, 2023
80fb90c
build: upgrade to browserless v2
Criamos Dec 6, 2023
f9ec308
change: WebTools use 'Playwright' by default
Criamos Dec 6, 2023
7dcf2a7
feat: fallback to Playwright screenshot on failed Splash or Thumbnail…
Criamos Dec 7, 2023
4bae38f
change: browserless/chrome 'timeout'-setting to 120s
Criamos Dec 7, 2023
8b7541f
change: LomBase parse / mapResponse / getUrlData methods to async
Criamos Dec 7, 2023
e5ede09
fobizz_spider v0.0.5
Criamos Dec 7, 2023
630c934
rpi_virtuell_spider v0.0.7
Criamos Dec 5, 2023
9715f1a
fix: mapping for COPYRIGHT_LAW / COPYRIGHT_FREE ('license.internal')
Criamos Dec 5, 2023
8bd4ae7
change: async-await "mapResponse"- and "parse"-methods
Criamos Dec 7, 2023
37f1379
change: drop Semaphore from serlo_spider
Criamos Dec 7, 2023
342159b
change: enable Autothrottle / Playwright for rpi_virtuell
Criamos Dec 8, 2023
79f48d7
feat: additional (MIME-Type) checks for thumbnail URLs
Criamos Dec 8, 2023
b1cfc45
change: increase priority of thumbnail downloads
Criamos Dec 8, 2023
3a3334e
fix: fallback to Playwright on failed thumbnail download
Criamos Dec 9, 2023
989fe8a
change: decrease WebTools 'trafilatura' logging verbosity
Criamos Dec 9, 2023
3e7d799
change: increase autothrottle concurrency settings
Criamos Dec 9, 2023
8baf28d
feat: Exception Handling for failed Thumbnail downloads
Criamos Dec 10, 2023
8052ac7
oersi_spider v0.1.6
Criamos Dec 13, 2023
e171470
change: allow MIME-Type 'application/octet-stream' in Thumbnail-Pipeline
Criamos Dec 13, 2023
3831798
build: version pin "browserless v2" docker image to 2023-12-13 build
Criamos Dec 14, 2023
a00f37b
oersi_spider v0.1.7
Criamos Dec 14, 2023
398d48b
feat: clean up lifecycle 'name' strings before trying to split them i…
Criamos Dec 15, 2023
117a69a
fix: thumbnails URLs fail to download when obeying robots.txt directive
Criamos Dec 15, 2023
8e8f150
change: disable crawling of "OpenRub" (because all URLs are 404)
Criamos Dec 15, 2023
2e45dcb
change: enable two metadata providers (BC Campus / Finnish Library of…
Criamos Dec 17, 2023
c0eb051
oersi_spider v0.1.8 ("offline"-import-mode)
Criamos Dec 20, 2023
49b9cd0
debug: use class-based logger instead of 'root'-logger
Criamos Dec 21, 2023
d40cab9
build: update Pillow to 10.1.0
Criamos Dec 21, 2023
32e1542
change: Thumbnail fallback, Thumbnail URL handling for PNGs
Criamos Dec 21, 2023
45dda08
feat: WebTools file extension check before parsing
Criamos Dec 21, 2023
c20c333
change: use Playwright in LomBase as default
Criamos Dec 22, 2023
d705cc7
docs: DocStrings, explanations regarding Semaphore usage
Criamos Jan 16, 2024
067ff2f
Merge pull request #96 from openeduhub/feat_oersi_offline_import
Criamos Jan 18, 2024
3770841
Merge pull request #97 from openeduhub/oersi_bird_connector_v2
Criamos Jan 19, 2024
2b885a1
feat: improve Error-Handling for broken image files ("PIL.Unidentifie…
Criamos Jan 23, 2024
566ea48
rpi_virtuell_spider v0.0.9
Criamos Jan 24, 2024
0ca33df
Merge pull request #98 from openeduhub/feat_handle_broken_pngs
Criamos Jan 25, 2024
457351c
feat: EduSharingSourceTemplateHelper utility (squashed)
Criamos Nov 17, 2023
792c868
feat: attach whitelisted metadata properties from an "edu-sharing sou…
Criamos Nov 17, 2023
1f1e1b1
change: wait for 'load'-event before fetching HTML / screenshots with…
Criamos Jan 24, 2024
3da1131
docs: add edu-sharing "source template"-related documentation
Criamos Jan 24, 2024
9a86ef4
improve readability of source template variable name
Criamos Jan 25, 2024
84a4154
Merge pull request #99 from openeduhub/feat_crawler_source_dataset
Criamos Jan 25, 2024
9b6d447
Merge pull request #95 from openeduhub/feat_async_es_connector
Criamos Jan 26, 2024
96f0985
kmap_spider v0.0.7 (rework with additional metadata)
Criamos Jan 25, 2024
7912b55
change: improve readability of logging messages during hash check
Criamos Jan 25, 2024
d60a648
chore: update dependencies
Criamos Jan 26, 2024
87ef589
tutory_spider v0.2.0
Criamos Jan 26, 2024
e9aa79a
Merge pull request #100 from openeduhub/develop
Criamos Feb 6, 2024
c6c620c
dilertube_spider v0.0.2
Criamos Feb 6, 2024
ac487e6
docs: add LicenseMapper ToDos
Criamos Feb 7, 2024
1819a1c
tutory_spider v0.2.1
Criamos Feb 8, 2024
8208252
chore: update dependencies
Criamos Feb 8, 2024
7e33bf0
Fix warnings in 'getLRMI()'-method
Criamos Feb 9, 2024
9da72da
Fix broken API navigation of digitallearninglab_spider
Criamos Feb 9, 2024
211a7d3
change: lower Autothrottle target concurrency
Criamos Feb 13, 2024
5bad991
add: bpb_spider pyCharm runConfiguration
Criamos Feb 13, 2024
e469be5
bpb_spider v0.2.1 (complete rework after bpb website relaunch)
Criamos Feb 13, 2024
bc7756c
change/perf: drop SitemapSpider usage in favor of scrapy.Spider
Criamos Feb 14, 2024
df42bf5
change: reduce concurrent requests
Criamos Feb 15, 2024
fd663ca
logging: use spider-specific logger
Criamos Feb 15, 2024
6a98177
change: try to mitigate "/big_pipe/no-js?..." 404s by ignoring cookies
Criamos Feb 15, 2024
d35819d
change: extend deny_list (undesired "Impressum"-like URL paths)
Criamos Feb 16, 2024
9e809c5
change: increase autothrottle target concurrency
Criamos Feb 16, 2024
e62cbfb
logging: improve counters of expected (unique) URLs
Criamos Feb 16, 2024
e83a9a8
change: extend URL filter
Criamos Feb 20, 2024
bca0565
change: add missing (legacy) licenses
Criamos Feb 20, 2024
6173908
change: use class-specific logger instead of 'root' logging
Criamos Mar 6, 2024
482755b
tests: add edge-case from DiLerTube to LicenseMapper test-suite
Criamos Mar 6, 2024
8222d5f
dilertube_spider v0.0.3
Criamos Mar 6, 2024
68a133b
docs: add missing DocStrings for "ResponseItem" properties
Criamos Mar 15, 2024
6ba7be0
docs: update DocStrings with regard to 'full text' metadata
Criamos Mar 15, 2024
3ae37e1
dilertube_spider v0.0.4
Criamos Mar 20, 2024
b6f1921
dilertube_spider v0.0.5
Criamos Mar 21, 2024
1882c09
bpb_spider v0.2.2
Criamos Apr 9, 2024
24d7738
docs/style: add ToDos for YouTube captions API (fulltext extraction f…
Criamos Mar 22, 2024
c6bd717
add YouTube channel suggestions from ITSJOINTLY-1323 to youtube.csv
Criamos Mar 22, 2024
78a3b4b
youtube_spider v0.2.3 ("YouTube Handle" URLs)
Criamos Mar 22, 2024
a33b88a
disable "robots.txt" parsing for youtube_spider
Criamos Apr 10, 2024
409ad8b
Merge pull request #101 from openeduhub/feat_youtube_handles
Criamos Apr 10, 2024
b57aa52
bpb_spider v0.2.3
Criamos Apr 10, 2024
be402ec
Merge pull request #102 from openeduhub/2024-01-crawler-updates
Criamos Apr 10, 2024
79fae39
Merge pull request #103 from openeduhub/develop
Criamos Apr 11, 2024
dd4502d
Fix SkoHub "altLabel" processing in pipelines.py
Criamos Apr 23, 2024
de74758
logging: log transformed item for easier debugging in "edu-sharing"-mode
Criamos Apr 23, 2024
2c97de1
oersi_spider v0.1.9
Criamos Apr 16, 2024
b0094b1
add 3 "CourseItem" properties to data model (work-in-progress)
Criamos Apr 18, 2024
90c6351
oersi_spider v0.2.0 (work-in-progress)
Criamos Apr 18, 2024
e31cdec
feat: add BIRD metadata properties "course_description_short" and "co…
Criamos Apr 23, 2024
1958a5d
feat: activate BIRD metadata properties "course_learningoutcome", "co…
Criamos Apr 23, 2024
c8c04ed
BREAKING: oersi_spider v0.2.1
Criamos Apr 25, 2024
741f90d
oersi_spider v0.2.2
Criamos Apr 25, 2024
abe42fe
fix: typecheck "affiliation"-objects before trying to parse them
Criamos Apr 26, 2024
9b867cc
add BIRD CourseItem properties ("course_availability_from" and "cours…
Criamos May 8, 2024
8051028
feat: connect BIRD CourseItem properties ("course_availability_from" …
Criamos May 8, 2024
8516590
oersi_spider v0.2.3
Criamos May 8, 2024
7014d3c
introduce "course_schedule"-property to CourseItem and es_connector
Criamos May 14, 2024
490423b
oersi_spider v0.2.4
Criamos May 14, 2024
f7b5162
oersi_spider v0.2.5 (new public data API)
Criamos May 16, 2024
fab2136
todo: typicalLearningTime ToDos
Criamos May 16, 2024
72253b0
fix: add missing license constants to OER pipeline
Criamos May 16, 2024
a69a209
change: rename "CourseItem.course_availability_to" to "..._until"
Criamos May 27, 2024
0a8beaf
oersi_spider v0.2.6 (additional "iMoox"-metadata)
Criamos May 27, 2024
8e08e6c
refactor: "iMoox" and "vhb" metadata enrichment
Criamos May 28, 2024
3bc450c
chore: update "scrapy"-related dependencies (security bugfixes)
Criamos Jun 28, 2024
0ab56f0
chore: update dependencies and drop "pyOpenSSL"-dependency
Criamos Jun 28, 2024
dbc3e9c
change: drop "allowed_domains" custom setting
Criamos Jun 28, 2024
5b2bdb3
change: drop "overrides"-package from dependencies and crawlers
Criamos Jun 28, 2024
58bd2ec
change: upgrade docker files to Python 3.12.4 (+ browserless v2.14)
Criamos Jun 28, 2024
7e5a44d
Merge pull request #104 from openeduhub/chore_update_py311_dependencies
Criamos Jul 2, 2024
27a317c
refactor: "course_availability_..."-properties
Criamos Jul 2, 2024
3f7d4b0
fix: 2 weak warnings (PEP8: E713 & E721)
Criamos Jul 2, 2024
5807f6b
fix: if "course_availability_..."-property can't be handled, delete i…
Criamos Jul 2, 2024
be30b20
feat: implement CourseItemPipeline properties (course_description, co…
Criamos Jul 2, 2024
84031f3
feat: implement remaining "CourseItem"-properties in CourseItem pipeline
Criamos Jul 3, 2024
98bfdc1
chore: update pyproject.toml classifiers
Criamos Jul 3, 2024
da56704
change: move responsibility of course_duration transformation (to ms)…
Criamos Jul 4, 2024
f7586f9
refactor "duration"-handling in pipelines
Criamos Jul 4, 2024
8763c18
style: code formatting via black
Criamos Jul 5, 2024
98c28cd
Merge pull request #105 from openeduhub/feat_bird_vhb_hook
Criamos Jul 5, 2024
de63dbd
fix/test: duration conversion
Criamos Jul 11, 2024
6f664ed
feat: introduce (physical) "address"-properties to LomLifecycleItem
Criamos Jul 12, 2024
ec72564
todo: add "address"-related ToDos
Criamos Jul 12, 2024
0c45fca
Merge branch 'master' of https://github.com/openeduhub/oeh-search-etl…
bergatco Jul 16, 2024
5be831e
Remove unused GitHub Actions workflow file
bergatco Jul 16, 2024
4238fd8
Merge branch 'openeduhub-master' into update-from-openeduhub
bergatco Jul 16, 2024
dc10ffc
Merge pull request #49 from hpi-schul-cloud/update-from-openeduhub
bergatco Jul 16, 2024
b4f24b7
Merge branch 'master' into update_master-schulcloud_from_master
bergatco Jul 16, 2024
432728c
Re-add `LomAnnotationItem` and missing dependencies
bergatco Jul 16, 2024
4f9780a
"loosen" used `urllib3` version
bergatco Jul 16, 2024
59fec86
refactor `OehImporter` to use `async` for sending items to pipeline
bergatco Jul 17, 2024
7ada697
fix async / await issues
bergatco Jul 18, 2024
6a54960
remove unnecessary `await`
bergatco Jul 18, 2024
38033ef
add missing field `relation` to `LomBaseItem`
bergatco Jul 18, 2024
0e7fdff
Add fix for SkoHub "altLabel" processing in pipelines.py
bergatco Jul 18, 2024
c8d4456
only use await on `process_item` coroutines
bergatco Jul 19, 2024
ffe96d1
log/tests: improve logging messages related to handling of "0"-durations
Criamos Jul 23, 2024
6afa838
feat: 'CourseItem.course_learning_outcome'-pipeline handling for list…
Criamos Jul 23, 2024
b34d157
oersi_spider v0.2.7 (twillo metadata enrichment) (squashed)
Criamos Jul 23, 2024
b42d8b5
fix: convert course_learningoutcome to string
Criamos Jul 24, 2024
6205744
fix: course_duration handling of strings
Criamos Jul 24, 2024
e821734
feat: connect "LomEducationalItem.description" with the edu-sharing b…
Criamos Jul 25, 2024
661c54e
fix: LomLifecycle contributor 'date' parsing when encountering 'datet…
Criamos Jul 26, 2024
dedce7b
bne_portal_spider v0.0.1 (squashed)
Criamos Jul 31, 2024
03186e5
DMED-119 - update import paths in brb_sportinhalte files
bergatco Aug 2, 2024
f1cc630
DMED-119 - fix issues in spiders to be able to use `scapy crawl ...`
bergatco Aug 2, 2024
f9cfd9d
DMED-119 - apply nest_asyncio to improve asyncio compatibility as wel…
bergatco Aug 2, 2024
58f282c
bne_portal_spider v0.0.2 (squashed)
Criamos Aug 2, 2024
5927a2e
add pyCharm run configuration
Criamos Aug 6, 2024
b8ecb8b
change: allow higher image resolutions during image-to-thumbnail-conv…
Criamos Aug 6, 2024
88fa333
feat: enable spiders to have more control over playwright HTTP reques…
Criamos Aug 8, 2024
3104610
bne_portal_spider v0.0.3 (skip cookie banner during screenshot fallback)
Criamos Aug 8, 2024
505cee8
Merge pull request #106 from openeduhub/feat_bne_crawler
Criamos Aug 9, 2024
a3e4497
Merge pull request #107 from openeduhub/feat_bird_twillo
Criamos Aug 9, 2024
74f9d1a
fix: forgot to convert 'discipline'-set to list
Criamos Aug 13, 2024
7a0870a
remove hard-coded value for LOM Technical Format and increase autothr…
Criamos Aug 15, 2024
110e8f9
docs: remove hard-coded LOM Technical Format recommendation for web-s…
Criamos Aug 15, 2024
6eeda80
remove hard-coded value recommendation for LOM Technical Format from …
Criamos Aug 15, 2024
fbed559
serlo_spider v0.3.3
Criamos Aug 16, 2024
32534e2
remove "swagger"-generated API client
Criamos Aug 16, 2024
86c6c70
add "openapi-generator-cli"-generated API client for edu-sharing v9.x
Criamos Aug 16, 2024
68974e9
change: rework es_connector for edu-sharing v9 (work-in-progress)
Criamos Aug 16, 2024
5025419
change: filepath of edu-sharing v9.x openAPI client
Criamos Aug 27, 2024
559940b
add edu_sharing_client related dependencies and update lockfile
Criamos Aug 27, 2024
cf81695
DMED-119 - add `test_spider` and revert some of the changes to be mor…
bergatco Aug 28, 2024
d258b17
Fix SkoHub "altLabel" processing in pipelines.py
bergatco Aug 28, 2024
ab24bbf
fix: pydantic ValidationErrors for several properties
Criamos Aug 28, 2024
6765e5b
fix: pydantic ValidationError while setting permissions ("unexpected …
Criamos Aug 28, 2024
2fcb460
chore: update dependencies
Criamos Aug 28, 2024
d785419
DMED-119 - Remove unused import statement in `TestSpider`
bergatco Aug 29, 2024
377fc23
fix: edu-sharing API client init
Criamos Aug 29, 2024
591455c
fix: ValidationErrors (keywords / typicalAgeRange)
Criamos Aug 29, 2024
c991bf7
docs: URL to documentation of openapi-generator-cli commands
Criamos Aug 30, 2024
a4388ea
fix: flake8 E999 SyntaxError (f-string)
Criamos Aug 30, 2024
5b99a66
DMED-119 - revert `run.py` to its original state
bergatco Sep 3, 2024
b2c52a7
DMED-119 - remove `oeh_importer` and `test_spider`
bergatco Sep 3, 2024
1c2b200
DMED-119 - revert and minor changes in `Readme.md`
bergatco Sep 3, 2024
8a8f047
fix: missing dependencies (pydantic) in requirements.txt
Criamos Sep 3, 2024
d0238b2
change: GitHub workflow from Python 3.10 to 3.12
Criamos Sep 3, 2024
8b5485a
change/logging: edu-sharing API client init fallback when "services" …
Criamos Sep 3, 2024
1f74f23
fix: ValidationError for int values in "CourseItem.course_duration"
Criamos Sep 3, 2024
bb321bb
feat: enable EduSharingTypeValidationPipeline
Criamos Sep 3, 2024
f5475e2
change: remove outdated swagger config json
Criamos Sep 3, 2024
3176cb9
build: DEPRECATE requirements.txt in favor of poetry builds
Criamos Sep 3, 2024
9363807
change: use poetry to run shellscript
Criamos Sep 4, 2024
09c4d71
docs: update README
Criamos Sep 4, 2024
cb3b053
Merge pull request #109 from openeduhub/feat_edu_sharing_api_client_v9
Criamos Sep 4, 2024
a8013ff
Merge pull request #111 from openeduhub/develop
Criamos Sep 4, 2024
265d769
Merge branch 'openeduhub-master' into update-from-openeduhub
bergatco Sep 9, 2024
4e8b241
Merge pull request #53 from hpi-schul-cloud/update-from-openeduhub
bergatco Sep 9, 2024
b450ee8
fix merge issues
bergatco Sep 9, 2024
e12abb2
Merge pull request #54 from hpi-schul-cloud/fix-merge-issues
bergatco Sep 9, 2024
8036158
Merge branch 'master' into update_master-schulcloud_from_master
bergatco Sep 9, 2024
a0dddf2
DMED-119 - update Python version requirement and package dependencies
bergatco Sep 9, 2024
87d7515
DMED-119 - add missing dependencies to `pyproject.toml` and update `D…
bergatco Sep 9, 2024
ca1fc38
DMED-119 - remove `docker-compose-dev.yml`
bergatco Sep 9, 2024
37e00a3
DMED-119 - use synchronous requests for setting node
bergatco Sep 25, 2024
09bd034
DMED-119 - refactor SodixApi credentials to use spider-specific envir…
bergatco Sep 25, 2024
106f281
DMED-119 - fix needed envs for SodixApi
bergatco Sep 25, 2024
aad1615
DMED-119 - refactor Uploader class to handle cases where multiple nod…
bergatco Sep 25, 2024
cc5038d
DMED-119 - fix issue with wrong initialization of FoundTooManyExcepti…
bergatco Sep 25, 2024
2aa1a2a
DMED-119 - add missing await to LomBase.getUrlData function call in m…
bergatco Sep 25, 2024
34d4748
DMED-119 - fix getUrlData self issue
bergatco Sep 25, 2024
dcde346
change: replace 'httpx' async client with 'requests'-session
Criamos Sep 25, 2024
fbade3b
logging: reflect 'resetVersion'/'forceUpdate'-setting in log messages
Criamos Sep 25, 2024
ab2c80a
fix 10 weak warnings w.r.t. variable names and too broad exception cl…
Criamos Sep 26, 2024
5a9dc04
DMED-119 - fix merlin_spider issues
bergatco Sep 26, 2024
b683869
fix missing awaits and getHash()
Criamos Sep 26, 2024
dd6f082
add merlin_spider pyCharm runConfiguration
Criamos Sep 26, 2024
eda6ea4
Merge pull request #113 from openeduhub/fix_httpx_readerrors
Criamos Sep 26, 2024
e5e2446
DMED-119 - fixed `delete_too_many_children` function in `h5p_upload`
bergatco Sep 26, 2024
8e43c33
Merge pull request #114 from openeduhub/develop
Criamos Sep 26, 2024
993aa2f
DMED-119 - (hopefully) fix await issue in `mediothek_pixiothek_spider`
bergatco Sep 26, 2024
62c7e50
DMED-119 - remove `env.local`
bergatco Sep 26, 2024
090da20
DMED-119 - fix validation error for "cclom:duration"
bergatco Sep 26, 2024
af3abe4
fix: ValidationError during handling of "cclom:duration"-values in es…
Criamos Sep 26, 2024
b75e140
fix: add missing 'license.internal' mapping for "NONPUBLIC" licenses
Criamos Sep 26, 2024
ffc9f75
feat: convert "BaseItem.hash"-values to a string
Criamos Sep 26, 2024
084804f
fix: convert LOM General aggregationLevel int values to str
Criamos Sep 26, 2024
879875a
improve robustness of website-screenshot fallback in the thumbnail-pi…
Criamos Sep 26, 2024
d453684
style: code formatting via black
Criamos Sep 26, 2024
6c196ab
Merge pull request #115 from openeduhub/2024-09-26-fixes
Criamos Sep 26, 2024
63aa77b
Merge pull request #116 from openeduhub/develop
Criamos Sep 26, 2024
984a6de
Merge branch 'openeduhub-master' into update-from-openeduhub
bergatco Sep 27, 2024
211fbb7
Merge pull request #55 from hpi-schul-cloud/update-from-openeduhub
bergatco Sep 27, 2024
9084dc8
Merge branch 'master' into update_master-schulcloud_from_master
bergatco Sep 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
54 changes: 54 additions & 0 deletions .github/workflows/python.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Python package

on:
push:
branches: [ $default-branch, develop ]
pull_request:
branches: [ $default-branch, develop ]

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.12"]

steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Cache pip
uses: actions/cache@v2
with:
# This path is specific to Ubuntu
path: ~/.cache/pip
# Look to see if there is a cache hit for the corresponding requirements file
key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-
${{ runner.os }}-
- name: Install Poetry via pip
run: |
python -m pip install --upgrade pip
python -m pip install poetry
- name: Configure Poetry to use in-project .venv
run: |
python -m poetry config virtualenvs.in-project true
- name: Install Dependencies with Poetry
run: |
python -m poetry install
- name: Lint with flake8 (via Poetry)
run: |
# stop the build if there are Python syntax errors or undefined names
poetry run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --exclude=.venv/,edu_sharing_openapi/
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
poetry run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --exclude=.venv/,edu_sharing_openapi/
- name: Test with pytest
run: |
poetry run pytest
15 changes: 15 additions & 0 deletions .github/workflows/trivy-cron.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
name: Docker Image Trivy Image Vulnerability Scan Cron Job
on:
schedule:
- cron: '0 2 * * *'
permissions:
# security-events required for all workflows; action, contents only required for workflows in private repositories
security-events: write
actions: read
contents: read
jobs:
trivy_image_scan_cron:
uses: hpi-schul-cloud/infra-tools/.github/workflows/trivy-scan.yaml@master
with:
image-ref: 'ghcr.io/hpi-schul-cloud/oeh-search-etl:latest'
2 changes: 0 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,5 @@ converter/.env
converter/.env.*
*.csv
.env
.env.*
nohups
out
docker_run.sh
25 changes: 25 additions & 0 deletions .run/biologie_lernprogramme_spider.run.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="biologie_lernprogramme_spider" type="PythonConfigurationType" factoryName="Python">
<output_file path="$PROJECT_DIR$/logs/biologie_lernprogramme_console.log" is_save="true" />
<module name="oeh-search-etl" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="./.venv/bin/scrapy" />
<option name="PARAMETERS" value="crawl biologie_lernprogramme_spider -O &quot;../../logs/bio_lernprogramme.json&quot;" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
26 changes: 26 additions & 0 deletions .run/bne_portal_spider.run.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="bne_portal_spider" type="PythonConfigurationType" factoryName="Python">
<output_file path="$PROJECT_DIR$/logs/bne_portal_spider_console.log" is_save="true" />
<module name="oeh-search-etl" />
<option name="ENV_FILES" value="" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="./.venv/bin/scrapy" />
<option name="PARAMETERS" value="crawl bne_portal_spider -O &quot;../../logs/bne_portal_spider.json&quot;" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
26 changes: 26 additions & 0 deletions .run/bpb_spider.run.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="bpb_spider" type="PythonConfigurationType" factoryName="Python">
<output_file path="$PROJECT_DIR$/logs/bpb_spider_console.log" is_save="true" />
<module name="oeh-search-etl" />
<option name="ENV_FILES" value="" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="./.venv/bin/scrapy" />
<option name="PARAMETERS" value="crawl bpb_spider -O &quot;../../logs/bpb_spider.json&quot;" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
25 changes: 25 additions & 0 deletions .run/br_rss_spider.run.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="br_rss_spider" type="PythonConfigurationType" factoryName="Python">
<output_file path="$PROJECT_DIR$/logs/br_rss_spider_console.log" is_save="true" />
<module name="oeh-search-etl" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="./.venv/bin/scrapy" />
<option name="PARAMETERS" value="crawl br_rss_spider -O &quot;../../logs/br_rss_spider.json&quot;" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
25 changes: 25 additions & 0 deletions .run/chemie_lernprogramme_spider.run.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="chemie_lernprogramme_spider" type="PythonConfigurationType" factoryName="Python">
<output_file path="$PROJECT_DIR$/logs/chemie_lernprogramme_console.log" is_save="true" />
<module name="oeh-search-etl" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="./.venv/bin/scrapy" />
<option name="PARAMETERS" value="crawl chemie_lernprogramme_spider -O &quot;chemie_lernprogramme.json&quot;" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
25 changes: 25 additions & 0 deletions .run/digitallearninglab_spider.run.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="digitallearninglab_spider" type="PythonConfigurationType" factoryName="Python">
<output_file path="$PROJECT_DIR$/logs/digitallearninglab_spider_console.log" is_save="true" />
<module name="oeh-search-etl" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="./.venv/bin/scrapy" />
<option name="PARAMETERS" value="crawl digitallearninglab_spider -O &quot;../../logs/digitallearninglab_spider.json&quot;" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
25 changes: 25 additions & 0 deletions .run/dilertube_spider.run.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="dilertube_spider" type="PythonConfigurationType" factoryName="Python">
<output_file path="$PROJECT_DIR$/logs/dilertube_spider_console.log" is_save="true" />
<module name="oeh-search-etl" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="./.venv/bin/scrapy" />
<option name="PARAMETERS" value="crawl dilertube_spider -O &quot;../../logs/dilertube_spider.json&quot;" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
25 changes: 25 additions & 0 deletions .run/dwu_spider.run.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="dwu_spider" type="PythonConfigurationType" factoryName="Python">
<output_file path="$PROJECT_DIR$/logs/dwu_spider_console.log" is_save="true" />
<module name="oeh-search-etl" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="./.venv/bin/scrapy" />
<option name="PARAMETERS" value="crawl dwu_spider -O &quot;../../logs/dwu_spider.json&quot;" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
25 changes: 25 additions & 0 deletions .run/edulabs_spider.run.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="edulabs_spider" type="PythonConfigurationType" factoryName="Python">
<output_file path="$PROJECT_DIR$/logs/edulabs_spider_console.log" is_save="true" />
<module name="oeh-search-etl" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="./.venv/bin/scrapy" />
<option name="PARAMETERS" value="crawl edulabs_spider -O &quot;../../logs/edulabs_spider.json&quot;" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
25 changes: 25 additions & 0 deletions .run/fobizz_spider.run.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="fobizz_spider" type="PythonConfigurationType" factoryName="Python">
<output_file path="$PROJECT_DIR$/logs/fobizz_spider_console.log" is_save="true" />
<module name="oeh-search-etl" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="./.venv/bin/scrapy" />
<option name="PARAMETERS" value="crawl fobizz_spider -O &quot;../../logs/fobizz_spider.json&quot;" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
25 changes: 25 additions & 0 deletions .run/ginkgomaps_spider.run.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="ginkgomaps_spider" type="PythonConfigurationType" factoryName="Python">
<output_file path="$PROJECT_DIR$/logs/ginkgomaps_spider_console.log" is_save="true" />
<module name="oeh-search-etl" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="./.venv/bin/scrapy" />
<option name="PARAMETERS" value="crawl ginkgomaps_spider -O &quot;../../logs/ginkgomaps_spider.json&quot;" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
Loading
Loading