Treffer: Mean Field for Markov Decision Processes: From Discrete to Continuous Optimization

Title:
Mean Field for Markov Decision Processes: From Discrete to Continuous Optimization
Source:
IEEE transactions on automatic control. 57(9):2266-2280
Publisher Information:
New York, NY: Institute of Electrical and Electronics Engineers, 2012.
Publication Year:
2012
Physical Description:
print, 25 ref
Original Material:
INIST-CNRS
Subject Terms:
Control theory, operational research, Automatique, recherche opérationnelle, Sciences exactes et technologie, Exact sciences and technology, Sciences et techniques communes, Sciences and techniques of general use, Mathematiques, Mathematics, Probabilités et statistiques, Probability and statistics, Théorie des probabilités et processus stochastiques, Probability theory and stochastic processes, Processus de markov, Markov processes, Sciences appliquees, Applied sciences, Recherche operationnelle. Gestion, Operational research. Management science, Recherche opérationnelle et modèles formalisés de gestion, Operational research and scientific management, Théorie de la décision. Théorie de l'utilité, Decision theory. Utility theory, Sciences biologiques et medicales, Biological and medical sciences, Sciences biologiques fondamentales et appliquees. Psychologie, Fundamental and applied biological sciences. Psychology, Generalites, General aspects, Mathématiques biologiques. Statistiques. Modèles. Métrologie. Informatique en biologie (généralités), Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects), Sciences medicales, Medical sciences, Sante publique. Hygiene-medecine du travail, Public health. Hygiene-occupational medicine, Santé publique. Hygiène, Public health. Hygiene, Généralités, Epidémiologie, Epidemiology, Approximation champ moyen, Mean field approximation, Aproximación campo medio, Biomathématique, Biomathematics, Biomatemáticas, Commande optimale, Optimal control, Control óptimo, Dynamique population, Population dynamics, Dinámica población, Décision Markov, Markov decision, Decisión Markov, Epidémiologie, Epidemiology, Epidemiología, Equation Bellman, Bellman equation, Ecuación Bellman, Equation Hamilton Jacobi, Hamilton Jacobi equation, Ecuación Hamilton Jacobi, Equation différentielle, Differential equation, Ecuación diferencial, File attente, Queue, Fila espera, Investissement, Investment, Inversión, Optimisation, Optimization, Optimización, Programmation discrète, Discrete programming, Programación discreta, Récompense, Reward, Recompensa, Epidemic model, Hamilton―Jacobi―Bellman (HJB), Markov decision processes, mean field, optimal control
Document Type:
Fachzeitschrift Article
File Description:
text
Language:
English
Author Affiliations:
EPFL IC-LCA2, 1015 Lausanne, Switzerland
INRIA Grenoble-Rhône-Alpes and LIG, 38330 Montbonnot, France
ISSN:
0018-9286
Rights:
Copyright 2015 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Biological sciences. Generalities. Modelling. Methods

Generalities in biological sciences

Mathematics

Operational research. Management

Public health. Hygiene-occupational medicine. Information processing
Accession Number:
edscal.26323720
Database:
PASCAL Archive

Weitere Informationen

We study the convergence of Markov decision processes, composed of a large number of objects, to optimization problems on ordinary differential equations. We show that the optimal reward of such a Markov decision process, which satisfies a Bellman equation, converges to the solution of a continuous Hamilton―Jacobi―Bellman (HJB) equation based on the mean field approximation of the Markov decision process. We give bounds on the difference of the rewards and an algorithm for deriving an approximating solution to the Markov decision process from a solution of the HJB equations. We illustrate the method on three examples pertaining, respectively, to investment strategies, population dynamics control and scheduling in queues. They are used to illustrate and justify the construction of the controlled ODE and to show the advantage of solving a continuous HJB equation rather than a large discrete Bellman equation.