Contents

## What is belief in Pomdp?

This belief is called a belief state and is expressed as a probability distribution over the states. The solution of the POMDP is a policy prescribing which action is optimal for each belief state. The POMDP framework is general enough to model a variety of real-world sequential decision-making problems.

## What are AI rewards?

This is known as a reward function that will allow AI platforms to come to conclusions instead of arriving at a prediction. Reward Functions are used for reinforcement learning models. Reward Function Engineering determines the rewards for actions.

## What is the relation in POMDP for policy π?

(S) given a policy π –The expected sum of reward gained from starting in state s executing non-stationary policy π for t steps. •Relation –Value function is evaluation for policy based on the long-run value that agent expectsto gain from executing the policy.

## What are the preliminaries of the POMDP tutorial?

POMDP Tutorial Preliminaries: Problem Definition • Agent model, POMDP, Bayesian RL WORLD Beliefb Policy π ACTOR Transition Dynamics Action Observation Markov Decision Process -X: set of states [x s,x r] •state component •reward component –A: set of actions -T=P(x’|x,a): transition and reward probabilities -O: Observation function

## How to find policy that maximizes total reward over some duration?

=> Find a policy that can maximize total reward over some duration •Value Function-Measure of goodness of being in a belief state •Policy- a function that describes how to select actions in each belief state. Markov Decision Processes (MDPs) In RL, the environment is a modeled as an MDP, deﬁned by

## Which is MDP model maximizes the expected future reward?

MDP Model Find a policyπ: s∈S →a∈A(s)(could be stochastic) that maximizes the value (expected future reward) of each s : and each s,a pair: The Objective is to Maximize Long-term Total Discounted Reward