This repo scals the ideas of the ETH Hack4Good Hachathon 2021 with SDSN & GIZ
https://www.blog-datalab.com/policy-tracing-nlp-h4g
Worlds main problem is the progressing climate change and too few efforts to stop it. The 26th UN Climate Change Conference of the Parties (COP26) in 2021 was filled with promises of governmental action to help tackle climate change. The national adaptation and mitigation goals are laid down in the nationally determined contribution
(NDC).
Example NDC from South Africa: NDC South Africa
Excample Policy Document: National REDD SA
The goal of the policy implementation tracing is to help Ppolicy advisors to connect the national NDCs with policy documents to check if the goals are realised. We therefore build a web-application where polica analysts can upload a policy document which then gets analyzed for:
- TF-IDF / Text suammarization
- Classification using Transformer and data from https://osdg.ai/
- Using keyword ontologies and semantic search
- compare policy to NDC document
Open Questions:
- Processing only machine readable PDF and Docx (HayStack?)
- how to split long documents
- Summarization possible with OS ressources (Transformer; runtime, usage limits)
- Topic modelling on NDCs?
- Named Entity Recognition?
- SDG classification with OS ressources (Transformer; runtime, usage limits)
- vector search with OS ressources (Transformer; runtime, usage limits)
- query keyword(s) / questions / sentence / paragraph?
- Coherence measurement BLEU score vs. text similarity vs. ...
- Streamlit vs. Gradio (Spaces)
- Adapting model to domain
- Multiligunal models vs translation
- Deployment of Azure Cloud ressources
- Carbon Footprint
- Transparency on transformer model (training data)
Streamlit Demo:
https://huggingface.co/spaces/GIZ/SDSN-demo