https://wiki.dong-min.kim/w/index.php?title=%EB%B2%A8%EB%A7%8C_%EB%B0%A9%EC%A0%95%EC%8B%9D&feed=atom&action=history
벨만 방정식 - 편집 역사
2024-03-29T10:28:18Z
이 문서의 편집 역사
MediaWiki 1.37.0-beta
https://wiki.dong-min.kim/w/index.php?title=%EB%B2%A8%EB%A7%8C_%EB%B0%A9%EC%A0%95%EC%8B%9D&diff=20&oldid=prev
2019년 10월 29일 (화) 08:06에 Kim135797531님의 편집
2019-10-29T08:06:57Z
<p></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="ko">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← 이전 판</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">2019년 10월 29일 (화) 08:06 판</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1">1번째 줄:</td>
<td colspan="2" class="diff-lineno">1번째 줄:</td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">=</del>==벨만 방정식<del style="font-weight: bold; text-decoration: none;">=</del>==</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>==벨만 방정식==</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* 이제 이것도 좀 외울때 되지 않았냐</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* 이제 이것도 좀 외울때 되지 않았냐</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>====배경====</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>====배경====</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l11">11번째 줄:</td>
<td colspan="2" class="diff-lineno">11번째 줄:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* 결정적 정책이라고 가정하면 -> 즉 off-policy로 바꿀 수 있다.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* 결정적 정책이라고 가정하면 -> 즉 off-policy로 바꿀 수 있다.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Q(s, a) = E[r(s, a) + Q(s', a')]</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Q(s, a) = E[r(s, a) + Q(s', a')]</div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">==DDPG==</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">* 액터크리틱, 오프폴리시</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">* (Q함수)랑 (정책함수) 만듦</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">* Q함수를 복제한 (타겟 Q함수)랑 (타겟 정책함수) 만듦</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">* 학습</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">** Q함수의 오차</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">*** y = r + γQ'(s', a) 구함 (보상이랑 타겟q로 계산한 가치 더함)</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">*** Q(s, a) 구함</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">*** 오차 = y - Q(s, a)의 평균</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">*** 즉, 현재 상태의 가치랑 다음 상태의 가치랑 차이가 없어야됨</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">*** 즉, 현재 상태가 이미 최고 가치여야됨</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">*** 즉, 다음 상태는 가치가 없어야 함 (이미 달성했으므로)</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">*** 오차 최소화로 최적화</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">** 정책 업데이트</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">*** Q(s, a)가 작아지는 방향으로 업데이트 (그러다언트)</ins></div></td></tr>
</table>
Kim135797531
https://wiki.dong-min.kim/w/index.php?title=%EB%B2%A8%EB%A7%8C_%EB%B0%A9%EC%A0%95%EC%8B%9D&diff=17&oldid=prev
Kim135797531: 새 문서: ===벨만 방정식=== * 이제 이것도 좀 외울때 되지 않았냐 ====배경==== * 시간 t에서의 상태 s, 그 때의 행동 a를 취했을 때 '''가치'''를 알고 싶...
2019-10-23T07:40:57Z
<p>새 문서: ===벨만 방정식=== * 이제 이것도 좀 외울때 되지 않았냐 ====배경==== * 시간 t에서의 상태 s, 그 때의 행동 a를 취했을 때 '''가치'''를 알고 싶...</p>
<p><b>새 문서</b></p><div>===벨만 방정식===<br />
* 이제 이것도 좀 외울때 되지 않았냐<br />
====배경====<br />
* 시간 t에서의 상태 s, 그 때의 행동 a를 취했을 때 '''가치'''를 알고 싶다.<br />
* 그럼 시간 t 이후의 모든 상황에 대한 보상을 다 알아야 하는데 어렵다.<br />
* Q(s, a) = E[R|s, a]<br />
====벨만 방정식====<br />
* 재귀 형태로 변형하는게 아이디어<br />
* (시간 t에서의 가치)는 (딱 t에서의 보상)+(t+1부터 그 이후의 가치로 변형)<br />
* Q(s, a) = E[r(s, a) + E[Q(s', a')]]<br />
* 결정적 정책이라고 가정하면 -> 즉 off-policy로 바꿀 수 있다.<br />
* Q(s, a) = E[r(s, a) + Q(s', a')]</div>
Kim135797531