Skip to content

Chance-constrained POMDP solver integrated into POMDPs.jl

Notifications You must be signed in to change notification settings

sisl/ConstrainedZero.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

ConstrainedZero.jl

arXiv

See the BetaZero.jl package for the code.

Belief-state planning algorithm for chance-constrained POMDPs (CC-POMDPs) using learned approximations; integrated into the POMDPs.jl ecosystem.

Citation

@inproceedings{moss2024constrainedzero,
  title={{ConstrainedZero: Chance-Constrained POMDP Planning Using Learned Probabilistic Failure Surrogates and Adaptive Safety Constraints}},
  author={Moss, Robert J. and Jamgochian, Arec and Fischer, Johannes and Corso, Anthony and Kochenderfer, Mykel J.},
  booktitle={International Joint Conference on Artificial Intelligence (IJCAI)},
  year={2024},
}

Installation

To install the ConstainedZero algorithm, use the #safety branch of BetaZero.jl. To install the constrained BetaZero solver, run:

using Pkg
Pkg.add(url="https://github.com/sisl/BetaZero.jl", rev="safety")

(Optional) To install the supporting example POMDP models (e.g., LightDark and MinEx), the RemoteJobs package, and the ParticleBeliefs wrapper, run:

using BetaZero
install_extras()

Usage

The following code sets up the necessary interface functions BetaZero.input_representation and the optional BetaZero.accuracy for the LightDark POMDP problem and solves it using BetaZero.

using BetaZero
using LightDark

pomdp = LightDarkPOMDP()
pomdp.incorrect_r = 0 # ConstainedZero: For LightDark CC-POMDP
up = BootstrapFilter(pomdp, 500)

function BetaZero.input_representation(b::ParticleCollection{LightDarkState})
    # Function to get belief representation as input to neural network.
    μ, σ = mean_and_std(s.y for s in particles(b))
    return Float32[μ, σ]
end

function BetaZero.accuracy(pomdp::LightDarkPOMDP, b0, s0, states, actions, returns)
    # Function to determine accuracy of agent's final decision.
    return returns[end] == pomdp.correct_r
end

solver = BetaZeroSolver(pomdp=pomdp,
                        updater=up,
                        is_constrained=true, # <-- ConstrainedZero flag.
                        params=BetaZeroParameters(
                            n_iterations=50,
                            n_data_gen=50,
                        ),
                        nn_params=BetaZeroNetworkParameters(
                            pomdp, up;
                            training_epochs=50,
                            n_samples=100_000,
                            batchsize=1024,
                            learning_rate=1e-4,
                            λ_regularization=1e-5,
                            use_dropout=true,
                            p_dropout=0.2,
                        ),
                        verbose=true,
                        collect_metrics=true,
                        plot_incremental_data_gen=true)

# ConstainedZero specific parameters
solver.mcts_solver.Δ0 = 0.01
solver.mcts_solver.η = 0.00001
solver.mcts_solver.final_criterion = MaxZQNS(zq=1, zn=1)

# Run ConstainedZero
policy = solve(solver, pomdp)
save_policy(policy, "policy.bson")
save_solver(solver, "solver.bson")

ConstrainedZero usage

To run using ConstainedZero, turn on the is_constrained flag in the BetaZeroSolver:

solver.is_constrained = true

NOTE: Make sure to remove any failure penalties from the POMDP reward function as well, e.g.,:

pomdp.incorrect_r = 0 # for LightDark CC-POMDP

Set the $\Delta_0$ chance constraint and the $\eta$ adaptive conformal inference learning rate via:

solver.mcts_solver.Δ0 = 0.01
solver.mcts_solver.η = 0.00001

ConstrainedZero criteria:

# ConstrainedZero: To use the CC-PUCT criteria that is subject to the failure probability threshold, use:
solver.mcts_solver.final_criterion = SampleZQNS=1, zq=1, zn=1)

# ConstrainedZero: Or for the argmax, use:
solver.mcts_solver.final_criterion = MaxZQNS(zq=1, zn=1)

About

Chance-constrained POMDP solver integrated into POMDPs.jl

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published