強化學習的數學原理 (英文版) 趙世鈺 9787302658528 【台灣高等教育出版社】

Name: 強化學習的數學原理 (英文版) 趙世鈺 9787302658528 【台灣高等教育出版社】
Brand: abooksthep
Price: 750.0 TWD
Availability: InStock

圖書均為代購，正常情形下，訂後約兩周可抵台。
物品所在地：中國大陸
原出版社：清華大學

NT$750

商品編號:

供貨狀況: 尚有庫存

加入最愛

商品介紹

*完成訂單後正常情形下約兩周可抵台。
*本賣場提供之資訊僅供參考，以到貨標的為正確資訊。
印行年月：202407*若逾兩年請先於私訊洽詢存貨情況，謝謝。
台灣(台北市)在地出版社，每筆交易均開具統一發票，祝您中獎最高1000萬元。
書名：強化學習的數學原理 (英文版)
ISBN：9787302658528
出版社：清華大學
著編譯者：趙世鈺
頁數：301
所在地：中國大陸 *此為代購商品
書號：1674977
可大量預訂，請先連絡。

內容簡介

本書從強化學習最基本的概念開始介紹，將介紹基礎的分析工具，包括貝爾曼公式和貝爾曼最優公式，然後推廣到基於模型的和無模型的強化學習演算法，最後推廣到基於函數逼近的強化學法。本書強調從數學的角度引入概念、分析問題、分析演算法，並不強調演算法的編程實現。本書不要求讀者具備任何關於強化學習的知識背景，僅要求讀者具備一定的概率論和線性代數的知識。如果讀者已經具備強化學習的學習基礎，本書可以幫助讀者更深入地理解一些問題並提供新的視角。本書面向對強化學習感興趣的本科生、研究生、研究人員和企業或研究所的從業者。

作者簡介

趙世鈺，西湖大學工學院AI分支特聘研究員，智能無人系統實驗室負責人，國家海外高層次人才引進計劃青年項目獲得者；本碩畢業於北京航空航天大學，博士畢業於新加坡國立大學，曾任英國謝菲爾德大學自動控制與系統工程系Lecturer；致力於研發有趣、有用、有挑戰性的下一代機器人系統，重點關注多機器人系統中的控制、決策與感知等問題。

Overview of this Book
Chapter 1 Basic Concepts
1 1 A grid world example
1 2 State and action
1 3 State transition
1 4 Policy
1 5 Reward
1 6 Trajectories， returns， and episodes
1 7 Markov decision processes
1 8 Summary
1 9 Q&A
Chapter 2 State Values and the Bellman Equation
2 1 Motivating example 1: Why are returns important?
2 2 Motivating example 2: How to calculate returns?
2 3 State values
2 4 The Bellman equation
2 5 Examples for illustrating the Bellman equation
2 6 Matrix-vector form of the Bellman equation
2 7 Solving state values from the Bellman equation
2 7 1 Closed-form solution
2 7 2 Iterative solution
2 7 3 Illustrative examples
2 8 From state value to action value
2 8 1 Illustrative examples
2 8 2 The Bellman equation in terms of action values
2 9 Summary
2 10 Q&A
Chapter 3 Optimal State Values and the Bellman Optimality Equation
3 1 Motivating example: How to improve policies?
3 2 Optimal state values and optimal policies
3 3 The Bellman optimality equation
3 3 1 Maximization of the right-hand side of the BOE
3 3 2 Matrix-vector form of the BOE
3 3 3 Contraction mapping theorem
3 3 4 Contraction property of the right-hand side of the BOE
3 4 Solving an optimal policy from the BOE
3 5 Factors that influence optimal policies
3 6 Summary
3 7 Q&A
Chapter 4 Value Iteration and Policy Iteration
4 1 Value iteration
4 1 1 Elementwise form and implementation
4 1 2 Illustrative examples
4 2 Policy iteration
4 2 1 Algorithm analysis
4 2 2 Elementwise form and implementation
4 2 3 Illustrative examples
4 3 Truncated policy iteration
4 3 1 Comparing value iteration and policy iteration
4 3 2 Truncated policy iteration algorithm
4 4 Summary
4 5 Q&A
Chapter 5 Monte Carlo Methods
5 1 Motivating example: Mean estimation
5 2 MC Basic: The simplest MC-based algorithm
5 2 1 Converting policy iteration to be model-free
5 2 2 The MC Basic algorithm
5 2 3 Illustrative examples
5 3 MC Exploring Starts
5 3 1 Utilizing samples more efficiently
5 3 2 Updating policies more efficiently
5 3 3 Algorithm description
5 4 MC ?-Greedy: Learning without exploring starts
5 4 1 ?-greedy policies
5 4 2 Algorithm description
5 4 3 Illustrative examples
5 5 Exploration and exploitation of ?-greedy policies
5 6 Summary
5 7 Q&A
Chapter 6 Stochastic Approximation
6 1 Motivating example: Mean estimation
6 2 Robbins-Monro algorithm
6 2 1 Convergence properties
6 2 2 Application to mean estimation
6 3 Dvoretzky's convergence theorem
6 3 1 Proof of Dvoretzky's theorem
6 3 2 Application to mean estimation
6 3 3 Application to the Robbins-Monro theorem
6 3 4 An extension of Dvoretzky's theorem
6 4 Stochastic gradient descent
6 4 1 Application to mean estimation
6 4 2 Convergence pattern of SGD
6 4 3 A deterministic formulation of SGD
6 4 4 BGD， SGD， and mini-batch GD
6 4 5 Convergence of SGD
6 5 Summary
6 6 Q&A
Chapter 7 Temporal-Difference Methods
7 1 TD learning of state values
7 1 1 Algorithm description
7 1 2 Property analysis
7 1 3 Convergence analysis
7 2 TD learning of action values: Sarsa
7 2 1 Algorithm description
7 2 2 Optimal policy learning via Sarsa
7 3 TD learning of action values: n-step Sarsa
7 4 TD learning of optimal action values: Q-learning
7 4 1 Algorithm description
7 4 2 Off-policy vs on-policy
7 4 3 Implementation
7 4 4 Illustrative examples
7 5 A unifed viewpoint
7 6 Summary
7 7 Q&A
Chapter 8 Value Function Approximation
8 1 Value representation: From table to function
8 2 TD learning of state values with function approximation
8 2 1 O
詳細資料或其他書籍請至台灣高等教育出版社查詢，查後請於PChome商店街私訊告知ISBN或書號，我們即儘速上架。

強化學習的數學原理 (英文版) 趙世鈺 9787302658528 【台灣高等教育出版社】

商品介紹

規格說明

運送方式

強化學習的數學原理 (英文版) 趙世鈺 9787302658528 【台灣高等教育出版社】

商品介紹

規格說明

運送方式

相關商品