LAK 2023 Workshop: Lowering the Technical Barriers to Trustworthy NLP

This repository contains information and materials from the Lowering the Technical Barriers to Trustworthy NLP workshop at LAK 2023.

data/: Contains the common dataset in .csv format. This is the English language subset of the amazon_reviews_multi dataset from the Huggingface datasets hub, converted to CSV format, and with some extraneous columns removed.
demos/: demos on a few different common NLP approaches using Python, Julia, and R. Each set of demos has Jupyter notebooks (pre-run, so you can look at the code and outputs), plus plain .py/.r/.jl files with just the final code and less annotations.
demos/python/: Python demos.
demos/r/: R demos.
demos/julia/: Julia demos.

A note about the different languages

If you're going to be doing ny moderately intensive NLP work, you should probably be using Python. Python has, by far, the most mature and cohesive ecosystem for doing NLP as of right now (March 2023), and provides the best balance between high-level abstractions when you want them, and fairly easy (and relatively efficient) access to lower-level primitives when you need them.

R has some tools for NLP work, but they're usually not as feature-rich and have a few odd quirks. Julia is in a similar position--there aren't a lot of very mature NLP libraries right now, but Julia makes it much easier than Python or R to re-implement something yourself and not worry about the performance (because Julia is very fast). I like Julia a lot for this reason; most of the stuff we do in NLP isn't particularly weird or esoteric, and the lack of pre-built tools means you have to build more yourself, and see how simple most of it really is.

But, to try to encourage you to use Python, the Python notebooks will contain more details about the more general "what" and "why" of a given approach, and you should go to them first if you need some conceptual grounding. The R and Julia notebooks will be much more "just the code."

Licensing

The dataset used for these demos (the Multilingual Amazon Reviews Corpus) is made accessible by Amazon under their own license (https://docs.opendata.aws/amazon-reviews-ml/license.txt).

All code in these demos is licensed under an MIT license.

All text and non-code artifacts are licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/). The dataset used (

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
demos		demos
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LAK 2023 Workshop: Lowering the Technical Barriers to Trustworthy NLP

Contents

A note about the different languages

Licensing

About

Releases

Packages

Languages

License

andersonh-uta/Lak2023

Folders and files

Latest commit

History

Repository files navigation

LAK 2023 Workshop: Lowering the Technical Barriers to Trustworthy NLP

Contents

A note about the different languages

Licensing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages