CDC2000 Markov Decision Processes

Markov Decision Processes
Home Full List of Titles 1: Proceedings of CDC2000 Discrete Event Systems Control in Communication Systems Optimal Control and Applications I Optimisation Approaches and Methods Model Predictive Control Advances in Linear Estimation Stochastic and Uncertain Systems Nonlinear Control and Applications Nonlinear Estimation and Filtering Formation Control and its Applications New Approaches to Fuzzy Control Manufacturing Systems Automotive Applications Stability Issues in Hybrid Control Recent Advances in Stochastic Networks Optimal Control and Applications II Robust Controller Design - mu, L1 and H2 Constrained and Receding Horizon Control Identification and Control around the World Markov Decision Processes Nonlinear Optimisation Observers for Nonlinear Systems Motion Planning Neural / Fuzzy Stability and Control Motor Control Control of Quantum Phenomena I Hybrid Systems Methods Control in Communication Networks Robustness and Optimisation Bumpless Transfer, Antiwindup and Saturation Adaptive Control: Linear Systems Estimation and Closed Loop Identification Control of Markov Processes Nonlinear Filtering and Control Modelling, Identification and Validation of Nonlinear Systems Differential Geometric Control Theory for Mechanical Systems Nonlinear Output Feedback Control Pneumatics and Compression Systems Control of Quantum Phenomena II Stability of Hybrid Systems Performance Analysis in Communication Networks Adaptive Control of Nonlinear Systems LMI Methods in Design Robust Control of Time Delay Systems Subspace Identification Methods Nonlinear Stochastic Filtering and Estimation Bifurcations, Chaos and Control I New Progress in Synthesis of Nonlinear Systems I Implementation Issues of Sliding Mode Control Theory Control of Mixing in Shear Flows Novel Neural Network Control Techniques for Industrial Motion Control Systems Physiological Control Systems Optimal Control of Hybrid Systems Stochastic Models for Communication Networks Control and Stabilisation of Nonlinear Systems New Directions in Robust Control Linear Systems Theory Advanced Topics in Systems Theory Estimation in Action Bifurcations, Chaos and Control II New Progress in Synthesis of Nonlinear Systems II Numerical Design and Analysis Techniques for Nonlinear Systems Analysis and Control of Underactuated Systems Sliding Mode Control I Challenges in the Application of Control to Computer Systems Estimation and Diagnosis of Discrete Event Systems Communications and Games Optimal Control Stochastic Systems Model Reduction Methodologies Identification and Subspace Methods Applications of Nonlinear Adaptive Control Advances in Nonlinear Output Feedback Design The Behavioural Approach to Systems and Control Vision Based Estimation and Control: Recent Advances and Open Problems Agile Control of Military Operations Sliding Mode Control II Model-based Fault Diagnosis of Industrial Processes Discrete Event Systems / Petri Nets System Identification and Confidence Estimation New Approaches to H-Infinity Control I Probabilistic Approaches to Robust Control Time Delay System Stabilisation Identification Methods Controlled Stochastic Processes Output Feedback of Nonlinear Systems Topics in Nonlinear Stabilisation Mobile Robots: Tracking Control Robust Control of Nonlinear Systems Power Systems Stabilisation and Control Disk Drive Control Hybrid Control Applications Discrete Time Systems New Approaches to H-Infinity Control II Linear Systems with Saturating Actuators New Theories in Distributed Parameter Systems Applications of Estimation and Identification Stochastic Control and Tuning Methodologies Control of Nonlinear Systems Iterative Learning and Control Coordinating Robot Systems Nonlinear Time Varying Systems Novel Applications of Neural Networks Aerospace Applications Switched Systems Implicit and Descriptor Systems LQG Periodic Systems and Disturbances New Horizons for Distributed Parameter Systems State Estimation Learning and Neuro-Control Nonlinear Control and Stabilisation I Tracking Vision Servoing Controllability of Nonlinear Systems Control of Flexible Systems Electro-Mechanical Systems Robust Control Methods and Applications Fault Detection and Diagnosis Optimisation and Applications Robust Stability Analysis Numerical Methods in Control Filtering in Continuous Time Stochastic Systems Interplay between Control and Signal Processing Fault Detection and Analysis Nonlinear Dynamical Systems Nonlinear Time Delay Systems Computational Issues in Nonlinear Control Disturbance Rejection Process Control Industry Applications Linear Parameter Varying Systems Linear Control Systems Dynamic and Nonlinear Programming Model Reduction Applications New Techniques for Control and Systems: Numerical Linear Algebra Estimation and Identification using Hidden Markov Models Applications of Stochastic Control Topics in Linear Design Nonlinear Control and Stabilisation II Ambulatory Robot Systems Chaotic and Oscillatory Systems Biomedical System Control Integrated Control and CPU Scheduling Linear Design Techniques Adaptive Disturbance / Noise Compensation Nonlinear Model Predictive Control Sensitivity Design, Analysis and Limitations Analysis of Linear Systems Linear Matrix Inequalities in Design Lyapunov's 2nd Method Robotics: Tracking Control Lagrangian and Hamiltonian Theory Variable Structure Control Machine Vision Signal Processing Methods in Control Applied Nonlinear Control Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Markov Decision Processes with Constrained Stopping Times Authors: Masayuki Horiguchi, Masami Kurano, Masami Yasuda, Volume: 1, Page 706 Paper number 1901 Abstract: The optimization problem for a stopped Markov decision process is considered to be taken over stopping times (tau) constrained so that E (tau) <=q (alpha) for some fixed (alpha)>0. We introduce the concept of a randomized stationary stopping time which is a mixed extension of the entry time of a stopping region and prove the existence of an optimal constrained pair of stationary policy and stopping time by utilizing a Lagrange multiplier approach. Also, applying the idea of the one-step look ahead (OLA) policy the optimal constrained pair is sought concretely. As an example, constrained Markov deteriorating system is explained. CD001901.PDF (From Author) TOP Decomposition and Parallel Processing Techniques for Two-Time Scale Controlled Markov Chains Authors: Jerzy A. Filar, Jacek Gondzio, Alain B. Haurie, Francesco Moresino, Jean-Philippe Vial, Volume: 1, Page 711 Paper number 1902 Abstract: This paper deals with a class of ergodic control problems for systems described by Markov chains with strong and weak interactions. These systems are composed of a set of m subchains that are weakly coupled. Using results recently established by Abbad et al. one formulates a limit control problem the solution of which can be obtained via an associated nondifferentiable convex programming (NDCP) problem. The technique used to solve the NDCP problem is the Analytic Center Cutting Plane Method (ACCPM) which implements a dialogue between, on one hand, a master program computing the analytical center of a localization set containing the solution and, on the other hand, an oracle proposing cutting planes that reduce the size of the localization set at each main iteration. The interesting aspect of this implementation comes from two characteristics: (i) the oracle proposes cutting planes by solving reduced sized Markov Decision Problems (MDP) via a linear program (LP) or a policy iteration method; (ii) several cutting planes can be proposed simultaneously through a parallel implementation on m processors. The paper concentrates on these two aspects and shows, on a large scale MDP obtained from the numerical approximation a la Kushner-Dupuis of a singularly perturbed hybrid stochastic control problem, the important computational speed-up obtained. CD001902.PDF (From Author) TOP Adaptive Zero-Sum Stochastic Game for Two Finite Markov Chains Authors: Alexander S. Poznyak, Kaddour Najim, Volume: 1, Page 717 Paper number 1903 Abstract: A two finite Markov chains repeated zero-sum stochastic game with unknown transition matrices and payoffs is considered. The control objective is to obtain the equilibrium point based only on current measurements. The behavior of each players is modelled by a finite controlled Markov chain. A novel adaptive policy is developed based on Lagrange multipliers involved into ''learning through reinforcement'' procedure. A regularized Lagrange function and a new normalization procedure are introduced. The saddle-point of this function is shown to be unique. The convergence properties are proved and the order of almost sure convergence is estimated. CD001903.PDF (From Author) TOP Nonatomic Total Rewards Markov Decision Processes with Multiple Criteria Authors: Eugene A. Feinberg, Aleksey B. Piunovskiy, Volume: 1, Page 723 Paper number 1904 Abstract: We consider a Markov decision process with an uncountable state space for which the vector performance functional has the form of expected total rewards. Under the single condition that initial distribution and transition probabilities are nonatomic, we prove that the performance space coincides with that generated by nonrandomized Markov policies. CD001904.PDF (From Author) TOP Limiting Discounted-Cost Control Of Partially Observable Stochastic Systems Authors: Onésimo Hernández-Lerma, Rosario Romera, Volume: 1, Page 729 Paper number 1905 Abstract: This paper presents two main results on partially observable (PO) stochastic systems. In the first one, we consider a general PO system x_t+1=F(x_t, a_t, (xi)_t), y_t=G(x_t, (eta)_t) (t=0,1,...) on Borel spaces, with possibly unbounded cost-per- stage functions, and give conditions for the existence of (alpha)-discount optimal control policies (0<(alpha) <1). In the second result we specialize () to additive-noise systems x_t+1=F_n(x_t, a_t)+ (xi)_t, y_t=G_n(x_t)+ (eta)_t (t=0,1,...) in Euclidean spaces, with F_n(x, a) and G_n(x) converging pointwise to functions F_(infinity)(x, a) and G_(infinity)(x), respectively, and give conditions for the limiting PO model x_t+1=F_(infinity)(x_t, a_t)+ (xi)_t, y_t=G_(infinity)(x_t)+ (eta)_t to have an (alpha)-discount optimal policy. CD001905.PDF (From Author) TOP The Averaging Principle For Perturbations Of Continuous Time Control Problems With Fast Controlled Jump Parameters Authors: Rachid El Azouzy, Eitan Altman, Vladimir Gaitsgory, Volume: 1, Page 730 Paper number 1906 Abstract: We consider a class of singularly perturbed zero-sum differential games with piecewise deterministic dynamics, where the changes from one structure (for the dynamics) to another are governed by a finite-state Markov process. Player 1 controls the continuous dynamics, whereas Player 2 controls the rate of transition for the finite-state Markov process; both have access to the states of both processes. Player 1 wishes to minimize a given quantity. For player 2, we consider two possible scenarios: one in which it wishes to minimize the same quantity (team framework), and one in which it wishes to maximize it (zero sum game). The transition rates of the Markov process are fast, of the order of 1/(epsilon). To solve the above problem, we use the dynamic programming approach. In particular, we study the asymptotic properties of the underlying system for sufficiently small epsilon. The viscosity solution method is employed to verify the convergence of the value function, which allows us to obtain the convergence in a general setting and helps us to characterize the structure of the limit system. We apply this to the special case of linear quadratic games with jump parameters, which allows us to obtain an explicit solution for the limiting problem. CD001906.PDF (From Author) TOP On the Value of Learning for Bernoulli Bandits with Unknown Parameters Authors: Sandjai Bhulai, Ger Koole, Volume: 1, Page 736 Paper number 1907 Abstract: In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite sequence of Bernoulli distributed rewards. The parameters of these Bernoulli distributions are unknown and initially assumed to be Beta-distributed. Every time a bandit is selected its Beta-distribution is updated to new information in a Bayesian way. The objective is to maximize the long term discounted rewards. We study the relationship between the necessity of acquiring additional information and the reward. This is done by considering two extreme situations which occur when a bandit has been played N times; the situation where the decision maker stops learning and the situation where the decision maker acquires full information about that bandit. We show that the difference in reward between this lower and upper bound goes to zero as N grows large. CD001907.PDF (From Author) TOP