Skip to content

An efficient communication analysis of modals in natural language

License

Notifications You must be signed in to change notification settings

nathimel/modals-effcomm

Repository files navigation

An Efficient Communication Analysis of Modal Typology

This code accompanies the following paper:

N. Imel and S. Steinert-Threlkeld, Modals in natural language optimize the simplicity/informativeness trade-off, in Proceedings of Semantics and Linguistic Theory (SALT 32), 2022.

and an extension thereof. The codebase is structured to support computational experiments measuring natural and artificial modal vocabularies for communicative efficiency.

Setting up an experiment

A single file specifies the parameters and filepaths for an experiment, e.g. half_credit_literal.yml. These will include:

  • size of the semantic space to measure (number of quantificational forces and modal flavors).
  • vocabulary size for artificial modal languages
  • the number of total languages to generate
  • how long to run algorithm to estimate optimal languages

Sampling languages

Generate a large and diverse sample of mathematically possible languages.

This is accomplished by the scripts generate_expressions.py sample_languages.py and estimate_pareto_frontier.py, which perform the following steps:

  • Expression generating from the set of meanings
  • Sampling expresions into languages
  • Use an evolutionary algorithm to estimate the optimal languages
  • Explore the space of possible languages using the same algorithm

Adding natural languages

To add the natural language modal inventories to measure in an experiment, we use the add_natural_languages.py script to:

Estimating the communicative need distribution

To estimate the prior probability of a (force, flavor) meaning point, we obtain relative frequency statistics using the Modality Corpus. Run the script extract_prior.py to:

  • Load the data annotated for modal flavors, stored in this repo under data/modality_corpus.
  • Run each sentence through a parser to exract the verbal auxiliaries.
  • Automatically annotate the modals for force
  • Count the occurrences of the (force, flavor) pairs used in the experiment's meaning space.

Analyzing the simplicity/informativeness trade-off

Finally, analyze the resulting pool of languages for the relationship between efficiency and naturalness. We directly measure natural languages for their efficiency, and the hypothetical languages for their efficiency and satisfaction with semantic universals (e.g. IFF, SAV, DLSAV).

Measuring of languages:

  • Complexity
  • Communicative Cost
  • Satisfaction of semantic universal(s)
  • Optimality w.r.t a Pareto frontier

Analysis:

  • perform statistical analyses, including correlation between naturalness and optimality
  • plot tradeoff

Structure of the codebase

Map of repo
.
├── configs
│ # YAML files that define experimental parameters for# modal languages, sample size, the type of naturalness to measure,# file output paths, etc.
│   ├── half_credit_literal.yml
│   └── ...
├── data
│   └── natural_languages
│       ├── Gitksan
│       │   └── modals.csv
│       └── ...
├── outputs
│ # readable intermediate output and experimental results, e.g.
│   └── half_credit_literal
│       ├── analysis
│       │   │  # resulting dataframes and figures
│       │   ├── ...
│       │   ├── all_data.csv
│       │   └── plot.png
│       ├── expressions.yml
│       ├── languages
│       │   ├── # generated languages
│       │   ├── artificial.yml
│       │   └── natural.yml
│       └── system_output.txt # progress of the experiment printed to stdout,
├── scripts
│   └── run_full_experiment.sh # the main script to run
└── src
    │ # python scripts to construct the space of possible languages,# sample from this space,# and measure the communicative efficiency of the sample# by estimating a Pareto frontier using an evolutionary algorithm
    ├── ...
    ├── sample_languages.py
    └── modals
        │ # module that defines the meaning space for modals,# the modal language data structure,# measures of complexity and communicative cost,# and mutations that may apply during the evolutionary algorithm
        ├── ...
        └── modal_language.py

Requirements

Get the required packages by running

conda env create -f environment.yml

Additionally, this project requires the artificial language toolkit (ALTK). Install it via git with

python3 -m pip install git+https://github.com/nathimel/altk.git@e20657a122a54ff607344f6dc8c4f04a34a06bd0

Replicating the experimental results

The main experimental results can be reproduced by running ./scripts/run_full_experiment.sh configs/salt.yml.

This just runs the following python scripts, which can also be run individually:

individual scripts

python3 src/create_folders.py path_to_config

python3 src/build_meaning_space.py path_to_config

python3 src/generate_expressions.py path_to_config

python3 src/sample_languages.py path_to_config

python3 src/add_natural_languages.py path_to_config

python3 src/extract_prior.py path_to_config

python3 src/estimate_pareto_frontier.py path_to_config

python3 src/measure_tradeoff.py path_to_config

python3 src/analyze.py path_to_config

Citation

To cite this work, please use the following:

@article{Imeletal2023,
  author    = {Imel, Nathaniel, and Guo, Qingxia, and Steinert-Threlkeld, Shane},
  title     = {An efficient communication analysis of modal typology},
  year      = {2023},
  journal = {lingbuzz},
  url = {https://ling.auf.net/lingbuzz/007392},
}

About

An efficient communication analysis of modals in natural language

Resources

License

Stars

Watchers

Forks

Packages

No packages published