Optimality of Quasi-Open-Loop Policies for Discounted Semi-Markov Decision Processes
Quasi-open-loop policies consist of sequences of Markovian decision rules that are insensitive to one component of the state space. We consider a semi-Markov decision process (SMDP) in which one state component is independent of all other state components, the decision-makers actions, and the times between deci- sion epochs. When this exogenous component is a multiplicative compound Poisson process, we provide an almost-everywhere multiplicative separability condition on the reward function sufficient for the optimality of a quasi-open-loop policy. Depending on the relationship between the structure of the exogenous state process and the shape of the reward function, we can replace the almost-everywhere condition with one that holds only in expectation. These results hold even if the times between decision epochs depend on the decision-makers actions, the endogenous state process, and the Poisson process underlying the exogenous state component.