Skip to content

Notebooks for LAK 2023 workshop "Lowering the Technical Barriers to Trustworthy NLP"

License

Notifications You must be signed in to change notification settings

andersonh-uta/Lak2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LAK 2023 Workshop: Lowering the Technical Barriers to Trustworthy NLP

This repository contains information and materials from the Lowering the Technical Barriers to Trustworthy NLP workshop at LAK 2023.

Contents

There are a few directories containing some simple demo code for doing NLP analyses in various programming languages. Each one shows off different libraries/tools that exist in that language, and all use the same common dataset.

  • data/: Contains the common dataset in .csv format. This is the English language subset of the amazon_reviews_multi dataset from the Huggingface datasets hub, converted to CSV format, and with some extraneous columns removed.
  • demos/: demos on a few different common NLP approaches using Python, Julia, and R. Each set of demos has Jupyter notebooks (pre-run, so you can look at the code and outputs), plus plain .py/.r/.jl files with just the final code and less annotations.
  • demos/python/: Python demos.
  • demos/r/: R demos.
  • demos/julia/: Julia demos.

A note about the different languages

If you're going to be doing ny moderately intensive NLP work, you should probably be using Python. Python has, by far, the most mature and cohesive ecosystem for doing NLP as of right now (March 2023), and provides the best balance between high-level abstractions when you want them, and fairly easy (and relatively efficient) access to lower-level primitives when you need them.

R has some tools for NLP work, but they're usually not as feature-rich and have a few odd quirks. Julia is in a similar position--there aren't a lot of very mature NLP libraries right now, but Julia makes it much easier than Python or R to re-implement something yourself and not worry about the performance (because Julia is very fast). I like Julia a lot for this reason; most of the stuff we do in NLP isn't particularly weird or esoteric, and the lack of pre-built tools means you have to build more yourself, and see how simple most of it really is.

But, to try to encourage you to use Python, the Python notebooks will contain more details about the more general "what" and "why" of a given approach, and you should go to them first if you need some conceptual grounding. The R and Julia notebooks will be much more "just the code."

Licensing

The dataset used for these demos (the Multilingual Amazon Reviews Corpus) is made accessible by Amazon under their own license (https://docs.opendata.aws/amazon-reviews-ml/license.txt).

All code in these demos is licensed under an MIT license.

All text and non-code artifacts are licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/). The dataset used (

About

Notebooks for LAK 2023 workshop "Lowering the Technical Barriers to Trustworthy NLP"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published