*完成訂單後正常情形下約兩周可抵台。 *本賣場提供之資訊僅供參考,以到貨標的為正確資訊。 印行年月:202407*若逾兩年請先於私訊洽詢存貨情況,謝謝。 台灣(台北市)在地出版社,每筆交易均開具統一發票,祝您中獎最高1000萬元。 書名:強化學習的數學原理 (英文版) ISBN:9787302658528 出版社:清華大學 著編譯者:趙世鈺 頁數:301 所在地:中國大陸 *此為代購商品 書號:1674977 可大量預訂,請先連絡。 內容簡介 本書從強化學習最基本的概念開始介紹,將介紹基礎的分析工具,包括貝爾曼公式和貝爾曼最優公式,然後推廣到基於模型的和無模型的強化學習演算法,最後推廣到基於函數逼近的強化學法。本書強調從數學的角度引入概念、分析問題、分析演算法,並不強調演算法的編程實現。本書不要求讀者具備任何關於強化學習的知識背景,僅要求讀者具備一定的概率論和線性代數的知識。如果讀者已經具備強化學習的學習基礎,本書可以幫助讀者更深入地理解一些問題並提供新的視角。 本書面向對強化學習感興趣的本科生、研究生、研究人員和企業或研究所的從業者。作者簡介 趙世鈺,西湖大學工學院AI分支特聘研究員,智能無人系統實驗室負責人,國家海外高層次人才引進計劃青年項目獲得者;本碩畢業於北京航空航天大學,博士畢業於新加坡國立大學,曾任英國謝菲爾德大學自動控制與系統工程系Lecturer;致力於研發有趣、有用、有挑戰性的下一代機器人系統,重點關注多機器人系統中的控制、決策與感知等問題。目錄 Overview of this BookChapter 1 Basic Concepts 1 1 A grid world example 1 2 State and action 1 3 State transition 1 4 Policy 1 5 Reward 1 6 Trajectories, returns, and episodes 1 7 Markov decision processes 1 8 Summary 1 9 Q&A Chapter 2 State Values and the Bellman Equation 2 1 Motivating example 1: Why are returns important? 2 2 Motivating example 2: How to calculate returns? 2 3 State values 2 4 The Bellman equation 2 5 Examples for illustrating the Bellman equation 2 6 Matrix-vector form of the Bellman equation 2 7 Solving state values from the Bellman equation 2 7 1 Closed-form solution 2 7 2 Iterative solution 2 7 3 Illustrative examples 2 8 From state value to action value 2 8 1 Illustrative examples 2 8 2 The Bellman equation in terms of action values 2 9 Summary 2 10 Q&A Chapter 3 Optimal State Values and the Bellman Optimality Equation 3 1 Motivating example: How to improve policies? 3 2 Optimal state values and optimal policies 3 3 The Bellman optimality equation 3 3 1 Maximization of the right-hand side of the BOE 3 3 2 Matrix-vector form of the BOE 3 3 3 Contraction mapping theorem 3 3 4 Contraction property of the right-hand side of the BOE 3 4 Solving an optimal policy from the BOE 3 5 Factors that influence optimal policies 3 6 Summary 3 7 Q&A Chapter 4 Value Iteration and Policy Iteration 4 1 Value iteration 4 1 1 Elementwise form and implementation 4 1 2 Illustrative examples 4 2 Policy iteration 4 2 1 Algorithm analysis 4 2 2 Elementwise form and implementation 4 2 3 Illustrative examples 4 3 Truncated policy iteration 4 3 1 Comparing value iteration and policy iteration 4 3 2 Truncated policy iteration algorithm 4 4 Summary 4 5 Q&A Chapter 5 Monte Carlo Methods 5 1 Motivating example: Mean estimation 5 2 MC Basic: The simplest MC-based algorithm 5 2 1 Converting policy iteration to be model-free 5 2 2 The MC Basic algorithm 5 2 3 Illustrative examples 5 3 MC Exploring Starts 5 3 1 Utilizing samples more efficiently 5 3 2 Updating policies more efficiently 5 3 3 Algorithm description 5 4 MC ?-Greedy: Learning without exploring starts 5 4 1 ?-greedy policies 5 4 2 Algorithm description 5 4 3 Illustrative examples 5 5 Exploration and exploitation of ?-greedy policies 5 6 Summary 5 7 Q&A Chapter 6 Stochastic Approximation 6 1 Motivating example: Mean estimation 6 2 Robbins-Monro algorithm 6 2 1 Convergence properties 6 2 2 Application to mean estimation 6 3 Dvoretzky's convergence theorem 6 3 1 Proof of Dvoretzky's theorem 6 3 2 Application to mean estimation 6 3 3 Application to the Robbins-Monro theorem 6 3 4 An extension of Dvoretzky's theorem 6 4 Stochastic gradient descent 6 4 1 Application to mean estimation 6 4 2 Convergence pattern of SGD 6 4 3 A deterministic formulation of SGD 6 4 4 BGD, SGD, and mini-batch GD 6 4 5 Convergence of SGD 6 5 Summary 6 6 Q&A Chapter 7 Temporal-Difference Methods 7 1 TD learning of state values 7 1 1 Algorithm description 7 1 2 Property analysis 7 1 3 Convergence analysis 7 2 TD learning of action values: Sarsa 7 2 1 Algorithm description 7 2 2 Optimal policy learning via Sarsa 7 3 TD learning of action values: n-step Sarsa 7 4 TD learning of optimal action values: Q-learning 7 4 1 Algorithm description 7 4 2 Off-policy vs on-policy 7 4 3 Implementation 7 4 4 Illustrative examples 7 5 A unifed viewpoint 7 6 Summary 7 7 Q&A Chapter 8 Value Function Approximation 8 1 Value representation: From table to function 8 2 TD learning of state values with function approximation 8 2 1 O 詳細資料或其他書籍請至台灣高等教育出版社查詢,查後請於PChome商店街私訊告知ISBN或書號,我們即儘速上架。 |