Skip to content

Repo for NLP tasks built by adapting the existing framework developed by Haystack and HuggingFace.

License

Notifications You must be signed in to change notification settings

gizdatalab/haystack_utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Climate Policy Analysis Machine - utils

This is a repo made primarily for NLP tasks and is based mainly on Haystack and Hugging face already built components.

The tasks performed include:

  1. Document processing: Processing the text from docx/text/pdf files and creating the paragraphs list.
  2. Search: Performing lexical or semantic search on the paragraphs list created in step 1.
  3. SDG Classification: Performing the SDG classification on the paragraphs text.
  4. Extracting the keywords based on Textrank/TFIDF/KeyBert

Please use the colab notebook to get familiar with basic usage of utils (use branch =main for non-streamlit usage). For more detailed walkthrough use the advanced colab notebook. There are two branch in the repo. One for using in streamlit environment and another for generic usage like in colab or local machine. You can clone the repo for your own use, or also install it as package.

To install as package (non-streamlit use):

pip install -e "git+https://github.com/gizdatalab/haystack_utils.git@main#egg=utils"

To install as package for streamlit app:

pip install -e "git+https://github.com/gizdatalab/haystack_utils.git@streamlit#egg=utils"

To install as package (for CPU-trac Streamlit app https://huggingface.co/spaces/GIZ/cpu_tracs):

pip install -e "git+https://github.com/gizdatalab/haystack_utils.git@cputrac#egg=utils"

About

Repo for NLP tasks built by adapting the existing framework developed by Haystack and HuggingFace.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages