Markov Decision Processes

Home
Full List of Titles
1: Proceedings of CDC2000
Discrete Event Systems
Control in Communication Systems
Optimal Control and Applications I
Optimisation Approaches and Methods
Model Predictive Control
Advances in Linear Estimation
Stochastic and Uncertain Systems
Nonlinear Control and Applications
Nonlinear Estimation and Filtering
Formation Control and its Applications
New Approaches to Fuzzy Control
Manufacturing Systems
Automotive Applications
Stability Issues in Hybrid Control
Recent Advances in Stochastic Networks
Optimal Control and Applications II
Robust Controller Design - mu, L1 and H2
Constrained and Receding Horizon Control
Identification and Control around the World
Markov Decision Processes
Nonlinear Optimisation
Observers for Nonlinear Systems
Motion Planning
Neural / Fuzzy Stability and Control
Motor Control
Control of Quantum Phenomena I
Hybrid Systems Methods
Control in Communication Networks
Robustness and Optimisation
Bumpless Transfer, Antiwindup and Saturation
Adaptive Control: Linear Systems
Estimation and Closed Loop Identification
Control of Markov Processes
Nonlinear Filtering and Control
Modelling, Identification and Validation of Nonlinear Systems
Differential Geometric Control Theory for Mechanical Systems
Nonlinear Output Feedback Control
Pneumatics and Compression Systems
Control of Quantum Phenomena II
Stability of Hybrid Systems
Performance Analysis in Communication Networks
Adaptive Control of Nonlinear Systems
LMI Methods in Design
Robust Control of Time Delay Systems
Subspace Identification Methods
Nonlinear Stochastic Filtering and Estimation
Bifurcations, Chaos and Control I
New Progress in Synthesis of Nonlinear Systems I
Implementation Issues of Sliding Mode Control Theory
Control of Mixing in Shear Flows
Novel Neural Network Control Techniques for Industrial Motion Control Systems
Physiological Control Systems
Optimal Control of Hybrid Systems
Stochastic Models for Communication Networks
Control and Stabilisation of Nonlinear Systems
New Directions in Robust Control
Linear Systems Theory
Advanced Topics in Systems Theory
Estimation in Action
Bifurcations, Chaos and Control II
New Progress in Synthesis of Nonlinear Systems II
Numerical Design and Analysis Techniques for Nonlinear Systems
Analysis and Control of Underactuated Systems
Sliding Mode Control I
Challenges in the Application of Control to Computer Systems
Estimation and Diagnosis of Discrete Event Systems
Communications and Games
Optimal Control
Stochastic Systems
Model Reduction Methodologies
Identification and Subspace Methods
Applications of Nonlinear Adaptive Control
Advances in Nonlinear Output Feedback Design
The Behavioural Approach to Systems and Control
Vision Based Estimation and Control: Recent Advances and Open Problems
Agile Control of Military Operations
Sliding Mode Control II
Model-based Fault Diagnosis of Industrial Processes
Discrete Event Systems / Petri Nets
System Identification and Confidence Estimation
New Approaches to H-Infinity Control I
Probabilistic Approaches to Robust Control
Time Delay System Stabilisation
Identification Methods
Controlled Stochastic Processes
Output Feedback of Nonlinear Systems
Topics in Nonlinear Stabilisation
Mobile Robots: Tracking Control
Robust Control of Nonlinear Systems
Power Systems Stabilisation and Control
Disk Drive Control
Hybrid Control Applications
Discrete Time Systems
New Approaches to H-Infinity Control II
Linear Systems with Saturating Actuators
New Theories in Distributed Parameter Systems
Applications of Estimation and Identification
Stochastic Control and Tuning Methodologies
Control of Nonlinear Systems
Iterative Learning and Control
Coordinating Robot Systems
Nonlinear Time Varying Systems
Novel Applications of Neural Networks
Aerospace Applications
Switched Systems
Implicit and Descriptor Systems
LQG
Periodic Systems and Disturbances
New Horizons for Distributed Parameter Systems
State Estimation
Learning and Neuro-Control
Nonlinear Control and Stabilisation I
Tracking
Vision Servoing
Controllability of Nonlinear Systems
Control of Flexible Systems
Electro-Mechanical Systems
Robust Control Methods and Applications
Fault Detection and Diagnosis
Optimisation and Applications
Robust Stability Analysis
Numerical Methods in Control
Filtering in Continuous Time Stochastic Systems
Interplay between Control and Signal Processing
Fault Detection and Analysis
Nonlinear Dynamical Systems
Nonlinear Time Delay Systems
Computational Issues in Nonlinear Control
Disturbance Rejection
Process Control Industry Applications
Linear Parameter Varying Systems
Linear Control Systems
Dynamic and Nonlinear Programming
Model Reduction Applications
New Techniques for Control and Systems: Numerical Linear Algebra
Estimation and Identification using Hidden Markov Models
Applications of Stochastic Control
Topics in Linear Design
Nonlinear Control and Stabilisation II
Ambulatory Robot Systems
Chaotic and Oscillatory Systems
Biomedical System Control
Integrated Control and CPU Scheduling
Linear Design Techniques
Adaptive Disturbance / Noise Compensation
Nonlinear Model Predictive Control
Sensitivity Design, Analysis and Limitations
Analysis of Linear Systems
Linear Matrix Inequalities in Design
Lyapunov's 2nd Method
Robotics: Tracking Control
Lagrangian and Hamiltonian Theory
Variable Structure Control
Machine Vision
Signal Processing Methods in Control
Applied Nonlinear Control

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Markov Decision Processes with Constrained Stopping Times

Authors:

Masayuki Horiguchi, Masami Kurano, Masami Yasuda,

Volume: 1, Page 706 Paper number 1901

Abstract:

The optimization problem for a stopped Markov decision process is considered to be taken over stopping times (tau) constrained so that E (tau) <=q (alpha) for some fixed (alpha)>0. We introduce the concept of a randomized stationary stopping time which is a mixed extension of the entry time of a stopping region and prove the existence of an optimal constrained pair of stationary policy and stopping time by utilizing a Lagrange multiplier approach. Also, applying the idea of the one-step look ahead (OLA) policy the optimal constrained pair is sought concretely. As an example, constrained Markov deteriorating system is explained.

CD001901.PDF (From Author)

TOP


Decomposition and Parallel Processing Techniques for Two-Time Scale Controlled Markov Chains

Authors:

Jerzy A. Filar, Jacek Gondzio, Alain B. Haurie, Francesco Moresino, Jean-Philippe Vial,

Volume: 1, Page 711 Paper number 1902

Abstract:

This paper deals with a class of ergodic control problems for systems described by Markov chains with strong and weak interactions. These systems are composed of a set of m subchains that are weakly coupled. Using results recently established by Abbad et al. one formulates a limit control problem the solution of which can be obtained via an associated nondifferentiable convex programming (NDCP) problem. The technique used to solve the NDCP problem is the Analytic Center Cutting Plane Method (ACCPM) which implements a dialogue between, on one hand, a master program computing the analytical center of a localization set containing the solution and, on the other hand, an oracle proposing cutting planes that reduce the size of the localization set at each main iteration. The interesting aspect of this implementation comes from two characteristics: (i) the oracle proposes cutting planes by solving reduced sized Markov Decision Problems (MDP) via a linear program (LP) or a policy iteration method; (ii) several cutting planes can be proposed simultaneously through a parallel implementation on m processors. The paper concentrates on these two aspects and shows, on a large scale MDP obtained from the numerical approximation a la Kushner-Dupuis of a singularly perturbed hybrid stochastic control problem, the important computational speed-up obtained.

CD001902.PDF (From Author)

TOP


Adaptive Zero-Sum Stochastic Game for Two Finite Markov Chains

Authors:

Alexander S. Poznyak, Kaddour Najim,

Volume: 1, Page 717 Paper number 1903

Abstract:

A two finite Markov chains repeated zero-sum stochastic game with unknown transition matrices and payoffs is considered. The control objective is to obtain the equilibrium point based only on current measurements. The behavior of each players is modelled by a finite controlled Markov chain. A novel adaptive policy is developed based on Lagrange multipliers involved into ''learning through reinforcement'' procedure. A regularized Lagrange function and a new normalization procedure are introduced. The saddle-point of this function is shown to be unique. The convergence properties are proved and the order of almost sure convergence is estimated.

CD001903.PDF (From Author)

TOP


Nonatomic Total Rewards Markov Decision Processes with Multiple Criteria

Authors:

Eugene A. Feinberg, Aleksey B. Piunovskiy,

Volume: 1, Page 723 Paper number 1904

Abstract:

We consider a Markov decision process with an uncountable state space for which the vector performance functional has the form of expected total rewards. Under the single condition that initial distribution and transition probabilities are nonatomic, we prove that the performance space coincides with that generated by nonrandomized Markov policies.

CD001904.PDF (From Author)

TOP


Limiting Discounted-Cost Control Of Partially Observable Stochastic Systems

Authors:

Onésimo Hernández-Lerma, Rosario Romera,

Volume: 1, Page 729 Paper number 1905

Abstract:

This paper presents two main results on partially observable (PO) stochastic systems. In the first one, we consider a general PO system x_t+1=F(x_t, a_t, (xi)_t), y_t=G(x_t, (eta)_t) (t=0,1,...) on Borel spaces, with possibly unbounded cost-per- stage functions, and give conditions for the existence of (alpha)-discount optimal control policies (0<(alpha) <1). In the second result we specialize () to additive-noise systems x_t+1=F_n(x_t, a_t)+ (xi)_t, y_t=G_n(x_t)+ (eta)_t (t=0,1,...) in Euclidean spaces, with F_n(x, a) and G_n(x) converging pointwise to functions F_(infinity)(x, a) and G_(infinity)(x), respectively, and give conditions for the limiting PO model x_t+1=F_(infinity)(x_t, a_t)+ (xi)_t, y_t=G_(infinity)(x_t)+ (eta)_t to have an (alpha)-discount optimal policy.

CD001905.PDF (From Author)

TOP


The Averaging Principle For Perturbations Of Continuous Time Control Problems With Fast Controlled Jump Parameters

Authors:

Rachid El Azouzy, Eitan Altman, Vladimir Gaitsgory,

Volume: 1, Page 730 Paper number 1906

Abstract:

We consider a class of singularly perturbed zero-sum differential games with piecewise deterministic dynamics, where the changes from one structure (for the dynamics) to another are governed by a finite-state Markov process. Player 1 controls the continuous dynamics, whereas Player 2 controls the rate of transition for the finite-state Markov process; both have access to the states of both processes. Player 1 wishes to minimize a given quantity. For player 2, we consider two possible scenarios: one in which it wishes to minimize the same quantity (team framework), and one in which it wishes to maximize it (zero sum game). The transition rates of the Markov process are fast, of the order of 1/(epsilon). To solve the above problem, we use the dynamic programming approach. In particular, we study the asymptotic properties of the underlying system for sufficiently small epsilon. The viscosity solution method is employed to verify the convergence of the value function, which allows us to obtain the convergence in a general setting and helps us to characterize the structure of the limit system. We apply this to the special case of linear quadratic games with jump parameters, which allows us to obtain an explicit solution for the limiting problem.

CD001906.PDF (From Author)

TOP


On the Value of Learning for Bernoulli Bandits with Unknown Parameters

Authors:

Sandjai Bhulai, Ger Koole,

Volume: 1, Page 736 Paper number 1907

Abstract:

In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite sequence of Bernoulli distributed rewards. The parameters of these Bernoulli distributions are unknown and initially assumed to be Beta-distributed. Every time a bandit is selected its Beta-distribution is updated to new information in a Bayesian way. The objective is to maximize the long term discounted rewards. We study the relationship between the necessity of acquiring additional information and the reward. This is done by considering two extreme situations which occur when a bandit has been played N times; the situation where the decision maker stops learning and the situation where the decision maker acquires full information about that bandit. We show that the difference in reward between this lower and upper bound goes to zero as N grows large.

CD001907.PDF (From Author)

TOP