Standing Still Is Not An Option: Alternative Baselines for Attainable Utility Preservation

A test-bed for the Attainable Utility Preservation (AUP) method for quantifying and penalizing the change an agent has on the world around it. Current AUP approaches however assume the existence of a no-op action in the environment’s action space, which limits AUP to solve tasks where doing nothing for a single time-step is a valuable option. Depending on the environment, this cannot always be guaranteed. We introduce four different baselines that do not build on such actions and therefore extend the concept of AUP to a broader class of environments. We evaluate all introduced variants on different AI safety gridworlds and show that this approach generalizes AUP to a broader range of tasks, with only little performance losses.

This repository further augments this expansion to DeepMind's AI safety gridworlds. For discussion of AUP's potential contributions to long-term AI safety, see here.

Installation

Using Python 2.7 as the interpreter, acquire the libraries in requirements.txt.
Clone using --recursive to snag the pycolab submodule: git clone --recursive https://github.com/fkabs/attainable-utility-preservation.git.
Run python -m experiments.charts or python -m experiments.ablation, tweaking the code to include the desired environments.

Work published at Springer Nature Switzerland.

Please use following reference to cite this work:

@inproceedings{eresheim2023,
  title = {Standing {{Still Is Not}} an~{{Option}}: {{Alternative Baselines}} for~{{Attainable Utility Preservation}}},
  shorttitle = {Standing {{Still Is Not}} an~{{Option}}},
  booktitle = {Machine {{Learning}} and {{Knowledge Extraction}}},
  author = {Eresheim, Sebastian and Kovac, Fabian and Adrowitzer, Alexander},
  editor = {Holzinger, Andreas and Kieseberg, Peter and Cabitza, Federico and Campagner, Andrea and Tjoa, A. Min and Weippl, Edgar},
  date = {2023},
  series = {Lecture {{Notes}} in {{Computer Science}}},
  pages = {239--257},
  publisher = {{Springer Nature Switzerland}},
  location = {{Cham}},
  doi = {10.1007/978-3-031-40837-3_15},
  abstract = {Specifying reward functions without causing side effects is still a challenge to be solved in Reinforcement Learning. Attainable Utility Preservation (AUP) seems promising to preserve the ability to optimize for a correct reward function in order to minimize negative side-effects. Current approaches however assume the existence of a no-op action in the environment's action space, which limits AUP to solve tasks where doing nothing for a single time-step is a valuable option. Depending on the environment, this cannot always be guaranteed. We introduce four different baselines that do not build on such actions and therefore extend the concept of AUP to a broader class of environments. We evaluate all introduced variants on different AI safety gridworlds and show that this approach generalizes AUP to a broader range of tasks, with only little performance losses.},
  isbn = {978-3-031-40837-3},
  langid = {english}
}

Name		Name	Last commit message	Last commit date
Latest commit History 346 Commits
agents		agents
ai_safety_gridworlds		ai_safety_gridworlds
experiments		experiments
pycolab @ b389d1e		pycolab @ b389d1e
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Standing Still Is Not An Option: Alternative Baselines for Attainable Utility Preservation

Installation

About

Releases 1

Packages

Languages

License

fkabs/attainable-utility-preservation

Folders and files

Latest commit

History

Repository files navigation

Standing Still Is Not An Option: Alternative Baselines for Attainable Utility Preservation

Installation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages