- ÇöÀç À§Ä¡
- home > ÄÄÇ»ÅÍ¡¤ICT¡¤¾îÇÐ > ÄÄÇ»ÅÍ¡¤IT µµ¼ > ÇÁ·Î±×·¡¹Ö/¾ð¾î > ÆÄÀ̽㠱â¹Ý °ÈÇнÀ ¾Ë°í¸®µë(µ¥ÀÌÅÍ °úÇÐ)[¿¡ÀÌÄÜÃâÆÇ]
°ÈÇнÀ(RL)Àº ÀΰøÁö´ÉÀÇ Àαâ ÀÖ°í À¯¸ÁÇÑ ºÐ¾ß·Î º¯ÈÇÏ´Â ¿ä±¸»çÇ׿¡ ´ëÀÀÇØ ÀÌ»óÀûÀÎ ÇൿÀ» ÀÚµ¿À¸·Î °áÁ¤ÇÏ´Â ¿¡ÀÌÀüÆ®¿Í ½º¸¶Æ®ÇÑ ¸ðµ¨À» ¸¸µå´Â ¾Ë°í¸®µëÀÌ´Ù. ÀÌ Ã¥Àº °ÈÇнÀ ¾Ë°í¸®µëÀ» ¸¶½ºÅÍÇÏ°í ÀÚ°¡ÇнÀ(self-learning)ÇÏ´Â ¿¡ÀÌÀüÆ®¸¦ ±¸ÇöÇÏ´Â ¹æ¹ýÀ» ÀÌÇØÇÒ ¼ö ÀÖµµ·Ï µµ¿ÍÁØ´Ù. °ÈÇнÀ¿¡ ÇÊ¿äÇÑ Åø, ¶óÀ̺귯¸®, ¼³Á¤ »çÇ׿¡ ´ëÇÑ ¼Ò°³¸¦ ½ÃÀÛÀ¸·Î °ÈÇнÀÀÇ ºôµùºí·Ï, Q-·¯´×, SARSA ¾Ë°í¸®µë°ú °°Àº °¡Ä¡ ±â¹Ý ¹æ¹ýÀ» »ó¼¼È÷ ´Ù·é´Ù
ÀúÀÚ: ¾Èµå·¹¾Æ ·ÐÀÚ
1ºÎ. ¾Ë°í¸®µë°ú ȯ°æ
1Àå. °ÈÇнÀÀÇ °³¿ä
__°ÈÇнÀ ¼Ò°³
______°ÈÇнÀ°ú ÁöµµÇнÀÀÇ ºñ±³
____°ÈÇнÀÀÇ ¿ª»ç
____µö °ÈÇнÀ
__°ÈÇнÀÀÇ ±¸¼º ¿ä¼Ò
____Æú¸®½Ã
____°¡Ä¡ÇÔ¼ö
____º¸»ó
____¸ðµ¨
__°ÈÇнÀ ¾ÖÇø®ÄÉÀ̼Ç
____°ÔÀÓ
____·Îº¿°ú Àδõ½ºÆ®¸® 4.0
____±â°èÇнÀ
____°æÁ¦¿Í ±ÝÀ¶
____ÇコÄɾî
____Áö´ÉÇü ±³Åë½Ã½ºÅÛ
____¿¡³ÊÁö ÃÖÀûÈ¿Í ½º¸¶Æ® ±×¸®µå
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
2Àå. °ÈÇнÀ »çÀÌŬ°ú OpenAI Gym ±¸ÇöÇϱâ
__ȯ°æ ¼³Á¤Çϱâ
____OpenAI Gym ¼³Ä¡Çϱâ
____·Îº¸½ºÄ𠼳ġÇϱâ
__OpenAI Gym°ú °ÈÇнÀ »çÀÌŬ
____°ÈÇнÀ »çÀÌŬ °³¹ßÇϱâ
____°ø°£¿¡ Àͼ÷ÇØÁö±â
____ÅÙ¼Ç÷οì 2.X
________Áï½Ã ½ÇÇà
________¿ÀÅä±×·¡ÇÁ
__ÅÙ¼ÇÃ·Î¿ì ±â¹Ý ±â°èÇнÀ ¸ðµ¨ °³¹ß
____ÅÙ¼
________»ó¼ö
________º¯¼ö
________±×·¡ÇÁ »ý¼ºÇϱâ
____°£´ÜÇÑ ¼±Çüȸ±Í ¿¹Á¦
____ÅÙ¼º¸µå µµÀÔÇϱâ
__°ÈÇнÀ ȯ°æÀÇ À¯Çü
____¿Ö ´Ù¸¥ ȯ°æÀΰ¡?
____¿ÀǼҽº ȯ°æ
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
3Àå. µ¿Àû ÇÁ·Î±×·¡¹ÖDPÀ¸·Î ¹®Á¦ ÇØ°áÇϱâ
__MDP
____Æú¸®½Ã
____°¨°¡À²°ú ¸®ÅÏ
____°¡Ä¡ÇÔ¼ö
____º§¸¸ ¹æÁ¤½Ä
__°ÈÇнÀ ¾Ë°í¸®µë ºÐ·ù
____¸ðµ¨ ÇÁ¸® ¾Ë°í¸®µë
________°¡Ä¡ ±â¹Ý ¾Ë°í¸®µë
________Æú¸®½Ã ±×·¡µð¾ðÆ® ¾Ë°í¸®µë
________¾×ÅÍ Å©¸®Æ½ ¾Ë°í¸®µë
________ÇÏÀ̺긮µå ¾Ë°í¸®µë
____¸ðµ¨ ±â¹Ý °ÈÇнÀ
____¾Ë°í¸®µë ´Ù¾çÈ
__DP
____Æú¸®½Ã Æò°¡¿Í Æú¸®½Ã °³¼±
____Æú¸®½Ã ÀÌÅÍ·¹À̼Ç
________ÇÁ·ÎÁð·¹ÀÌÅ©¿¡ Àû¿ëµÈ Æú¸®½Ã ÀÌÅÍ·¹À̼Ç
____°¡Ä¡ ÀÌÅÍ·¹À̼Ç
________ÇÁ·ÎÁð·¹ÀÌÅ©¿¡ Àû¿ëÇÑ °¡Ä¡ ÀÌÅÍ·¹À̼Ç
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
2ºÎ. ¸ðµ¨ ÇÁ¸® °ÈÇнÀ ¾Ë°í¸®µë
4Àå. Q-·¯´×°ú SARSA ¾ÖÇø®ÄÉÀ̼Ç
__¸ðµ¨¾øÀÌ ÇнÀÇϱâ
____»ç¿ëÀÚ °æÇè
____Æú¸®½Ã Æò°¡
____Ž»ö ¹®Á¦
________¿Ö Ž»öÇØ¾ß Çϴ°¡?
________Ž»ö ¹æ¹ý
__½Ã°£Â÷ ÇнÀ
____½Ã°£Â÷ ¾÷µ¥ÀÌÆ®
____Æú¸®½Ã °³¼±
____¸óÅ×Ä«¸¦·Î¿Í ½Ã°£Â÷ ºñ±³
__SARSA
____¾Ë°í¸®µë
__Taxi-v2¿¡ SARSA Àû¿ëÇϱâ
__Q-·¯´×
____ÀÌ·Ð
____¾Ë°í¸®µë
__Taxi-v2¿¡ Q-·¯´× Àû¿ëÇϱâ
____SARSA¿Í Q-·¯´× ºñ±³
__¿ä¾à
__Áú¹®
5Àå. Deep Q-Network
__½ÉÃþ½Å°æ¸Á°ú Q-·¯´×
____ÇÔ¼ö ±Ù»ç
____½Å°æ¸ÁÀ» ÀÌ¿ëÇÑ Q-·¯´×
____µö Q-·¯´×ÀÇ ºÒ¾ÈÁ¤¼º
__DQN
____ÇØ°áÃ¥
________¸®Ç÷¹ÀÌ ¸Þ¸ð¸®
________Ÿ±ê ³×Æ®¿öÅ©
____DQN ¾Ë°í¸®µë
________¼Õ½ÇÇÔ¼ö
________ÀÇ»çÄÚµå
____¸ðµ¨ ¾ÆÅ°ÅØó
__DQNÀ» Æþ¿¡ Àû¿ëÇϱâ
____¾ÆŸ¸® °ÔÀÓ
____Àü ó¸®
____DQN ±¸Çö
________DNN
________°æÇè ¹öÆÛ
________°è»ê ±×·¡ÇÁ¿Í ÈÆ·Ã ·çÇÁ
____°á°ú
__DQN °³¼± ¾Ë°í¸®µë
____Double DQN
________DDQN ±¸Çö
________°á°ú
____DQN µà¾ó¸µÇϱâ
________µà¾ó¸µ DQN ±¸Çö
________°á°ú
____N-½ºÅÜ DQN
________±¸Çö
________°á°ú
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
6Àå. È®·ü ±â¹Ý PG ÃÖÀûÈ ÇнÀ
__Æú¸®½Ã ±×·¡µð¾ðÆ® ¸Þ¼Òµå
____Æú¸®½ÃÀÇ ±×·¡µð¾ðÆ®
____Æú¸®½Ã ±×·¡µð¾ðÆ® Á¤¸®
____±×·¡µð¾ðÆ® °è»êÇϱâ
____Æú¸®½Ã
____¿Â-Æú¸®½Ã PG
__REINFORCE ¾Ë°í¸®µë ÀÌÇØÇϱâ
____REINFORCE ±¸ÇöÇϱâ
____REINFORCE¸¦ ÀÌ¿ëÇØ Å½»ç¼± Âø·ú½ÃÅ°±â
________°á°ú ºÐ¼®Çϱâ
__º£À̽º¶óÀÎÀÌ ÀÖ´Â REINFORCE
____º£À̽º¶óÀÎÀ¸·Î REINFORCE ±¸ÇöÇϱâ
__AC ¾Ë°í¸®µë ÇнÀÇϱâ
____¾×ÅÍ°¡ ÇнÀÇϵµ·Ï µ½±â À§ÇØ Å©¸®Æ½ »ç¿ëÇϱâ
____n-step AC ¸ðµ¨
____AC ±¸Çö
____AC¸¦ »ç¿ëÇØ Å½»ç¼±spacecraft Âø·ú½ÃÅ°±â
____°í±Þ AC ÆÁ°ú Æ®¸¯
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
7Àå. TRPO¿Í PPO ±¸Çö
__·Îº¸½ºÄð
____¿¬¼Ó ½Ã½ºÅÛ Á¦¾î
__Natural Policy Gradient
____NPG¿¡ ´ëÇÑ ¾ÆÀ̵ð¾î
____¼öÇÐÀû °³³ä
________FIM°ú KL ¹ß»ê
____NG ¹®Á¦
__TRPO
____TRPO ¾Ë°í¸®µë
____TRPO ¾Ë°í¸®µë ±¸Çö
____TRPO ¾ÖÇø®ÄÉÀ̼Ç
__Proximal Policy Optimization
____PPOÀÇ °³¿ä
____PPO ¾Ë°í¸®µë
____PPOÀÇ ±¸Çö
____PPO ¾ÖÇø®ÄÉÀ̼Ç
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
8Àå. DDPG¿Í TD3 ¾ÖÇø®ÄÉÀ̼Ç
__Æú¸®½Ã ±×·¡µð¾ðÆ® ÃÖÀûÈ¿Í Q-·¯´× °áÇÕÇϱâ
____°áÁ¤·ÐÀû Æú¸®½Ã ±×·¡µð¾ðÆ®
____DDPG ¾Ë°í¸®µë
____DDPG ±¸Çö
____DDPG¸¦ BipedalWalker-v2¿¡ Àû¿ëÇϱâ
__TD3 Æú¸®½Ã ±×·¡µð¾ðÆ®
____°ú´ëÆò°¡ ÆíÇâ ¹®Á¦ ÇØ°á
________TD3ÀÇ ±¸Çö
____ºÐ»ê °¨¼Ò ÇØ°á
________Áö¿¬µÈ Æú¸®½Ã ¾÷µ¥ÀÌÆ®
________Ÿ±ê Á¤±ÔÈ
____BipedalWalker¿¡ TD3¸¦ Àû¿ëÇϱâ
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
3ºÎ. ¸ðµ¨ ÇÁ¸® ¾Ë°í¸®µë°ú °³¼±
9Àå. ¸ðµ¨ ±â¹Ý °ÈÇнÀ
__¸ðµ¨ ±â¹Ý ¸Þ¼Òµå
____¸ðµ¨ ±â¹Ý ÇнÀ¿¡ ´ëÇÑ Æø³ÐÀº °üÁ¡
________¾Ë·ÁÁø ¸ðµ¨
________¹ÌÁöÀÇ ¸ðµ¨
____Àå´ÜÁ¡
__¸ðµ¨ ±â¹Ý ÇнÀ°ú ¸ðµ¨ ÇÁ¸® ÇнÀ °áÇÕÇϱâ
____¸ðµ¨ ±â¹Ý°ú ¸ðµ¨ ÇÁ¸® Á¢±Ù¹ýÀÇ À¯¿ëÇÑ Á¶ÇÕ
____À̹ÌÁö¿¡¼ ¸ðµ¨ ¸¸µé±â
__¿ªÁøÀÚ¿¡ Àû¿ëÇÑ ME-TRPO ¸ðµ¨
____ME-TRPO ÀÌÇØÇϱâ
____ME-TRPO ±¸ÇöÇϱâ
____·Îº¸½ºÄð ½ÇÇèÇϱâ
________·Îº¸½ºÄ𠿪ÁøÀÚ ½ÇÇè °á°ú
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
10Àå. DAgger ¾Ë°í¸®µëÀ¸·Î ¸ð¹æ ÇнÀÇϱâ
__±â¼úÀû ¿ä±¸ »çÇ×
____Flappy Bird ¼³Ä¡
__¸ð¹æ Á¢±Ù
____¿îÀü º¸Á¶ »ç·Ê
____IL°ú RL ºñ±³Çϱâ
____¸ð¹æ ÇнÀ¿¡¼ Àü¹®°¡ÀÇ ¿ªÇÒ
____IL ±¸Á¶
________¼öµ¿ ¸ð¹æ°ú ´Éµ¿ ¸ð¹æ ºñ±³Çϱâ
__Flappy Bird °ÔÀÓÇϱâ
____ȯ°æÀ» ÀÌ¿ëÇÏ´Â ¹æ¹ý
__µ¥ÀÌÅÍ ÁýÇÕdataset Áý°è ¾Ë°í¸®µë ÀÌÇØÇϱâ
____DAgger ¾Ë°í¸®µë
____DAggerÀÇ ±¸Çö
________Àü¹®°¡ Ãß·Ð ¸ðµ¨ ÀûÀç
________ÇнÀÀÚÀÇ °è»ê ±×·¡ÇÁ ¸¸µé±â
________DAgger loop ¸¸µé±â
____Flappy Bird °á°ú ºÐ¼®
__IRL
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
11Àå. ºí·¢¹Ú½º ÃÖÀûÈ ¾Ë°í¸®µë ÀÌÇØÇϱâ
__°ÈÇнÀÀÇ ´ë¾È
____°ÈÇнÀ¿¡ ´ëÇÑ °£´ÜÇÑ ¿ä¾à
____´ë¾È
________EAs
__EAÀÇ ÇÙ½É
____À¯ÀüÀÚ ¾Ë°í¸®µëGA
____ÁøÈ Àü·«
________CMA-ES
________ES ´ë RL
__È®Àå °¡´ÉÇÑ ÁøÈ Àü·«
____ÇÙ½É
________ES º´·ÄÈÇϱâ
________´Ù¸¥ Æ®¸¯
________ÀÇ»ç ÄÚµå
____È®Àå °¡´ÉÇÑ ±¸Çö
________¸ÞÀÎ ÇÔ¼ö
________ÀÛ¾÷ÀÚ
__È®Àå °¡´ÉÇÑ ES¸¦ LunarLander¿¡ Àû¿ëÇϱâ
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
12Àå. ESBAS ¾Ë°í¸®µë °³¹ßÇϱâ
__Ž»ö ´ë È°¿ë
____¸ÖƼ ¾Ïµå ¹êµ÷
__Ž»ö Á¢±Ù¹ý
____Ž¿å Àü·«
____UCB ¾Ë°í¸®µë
________UCB1
____Ž»ö º¹Àâµµ
__ESBAS
____¾Ë°í¸®µë ¼±Åà ¾Ë¾Æº¸±â
____ESBAS ³»ºÎ ±¸Á¶
____±¸Çö
____Acrobot ½ÇÇàÇϱâ
________°á°ú
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
13Àå. °ÈÇнÀ ¹®Á¦¸¦ ÇØ°áÇϱâ À§ÇÑ ½ÇÁ¦ ±¸Çö
__µö °ÈÇнÀÀÇ ¸ð¹ü »ç·Ê
____ÀûÇÕÇÑ ¾Ë°í¸®µë ¼±ÅÃÇϱâ
____°ÈÇнÀ ¾Ë°í¸®µë °³¹ßÇϱâ
__µö °ÈÇнÀÀÇ µµÀü °úÁ¦
____¾ÈÁ¤¼º°ú ÀçÇö¼º
____È¿À²¼º
____ÀϹÝÈ
__°í±Þ ±â¼ú
____ºñÁöµµ °ÈÇнÀ
________³»ÀçÀû º¸»ó
____ÀüÀÌ ÇнÀ
________ÀüÀÌ ÇнÀÀÇ À¯Çü
__Çö½Ç¿¡¼ÀÇ °ÈÇнÀ
____°ÈÇнÀÀ» Çö½Ç¿¡ Àû¿ëÇÒ ¶§ ÇØ°áÇØ¾ß ÇÒ ¹®Á¦
____½Ã¹Ä·¹À̼ǰú Çö½Ç »çÀÌÀÇ Â÷ÀÌ ÁÙÀ̱â
____Àڱ⸸ÀÇ È¯°æ ¸¸µé±â
__°ÈÇнÀÀÇ ¹Ì·¡¿Í »çȸ¿¡ ¹ÌÄ¡´Â ¿µÇâ
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
1Àå. °ÈÇнÀÀÇ °³¿ä
__°ÈÇнÀ ¼Ò°³
______°ÈÇнÀ°ú ÁöµµÇнÀÀÇ ºñ±³
____°ÈÇнÀÀÇ ¿ª»ç
____µö °ÈÇнÀ
__°ÈÇнÀÀÇ ±¸¼º ¿ä¼Ò
____Æú¸®½Ã
____°¡Ä¡ÇÔ¼ö
____º¸»ó
____¸ðµ¨
__°ÈÇнÀ ¾ÖÇø®ÄÉÀ̼Ç
____°ÔÀÓ
____·Îº¿°ú Àδõ½ºÆ®¸® 4.0
____±â°èÇнÀ
____°æÁ¦¿Í ±ÝÀ¶
____ÇコÄɾî
____Áö´ÉÇü ±³Åë½Ã½ºÅÛ
____¿¡³ÊÁö ÃÖÀûÈ¿Í ½º¸¶Æ® ±×¸®µå
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
2Àå. °ÈÇнÀ »çÀÌŬ°ú OpenAI Gym ±¸ÇöÇϱâ
__ȯ°æ ¼³Á¤Çϱâ
____OpenAI Gym ¼³Ä¡Çϱâ
____·Îº¸½ºÄ𠼳ġÇϱâ
__OpenAI Gym°ú °ÈÇнÀ »çÀÌŬ
____°ÈÇнÀ »çÀÌŬ °³¹ßÇϱâ
____°ø°£¿¡ Àͼ÷ÇØÁö±â
____ÅÙ¼Ç÷οì 2.X
________Áï½Ã ½ÇÇà
________¿ÀÅä±×·¡ÇÁ
__ÅÙ¼ÇÃ·Î¿ì ±â¹Ý ±â°èÇнÀ ¸ðµ¨ °³¹ß
____ÅÙ¼
________»ó¼ö
________º¯¼ö
________±×·¡ÇÁ »ý¼ºÇϱâ
____°£´ÜÇÑ ¼±Çüȸ±Í ¿¹Á¦
____ÅÙ¼º¸µå µµÀÔÇϱâ
__°ÈÇнÀ ȯ°æÀÇ À¯Çü
____¿Ö ´Ù¸¥ ȯ°æÀΰ¡?
____¿ÀǼҽº ȯ°æ
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
3Àå. µ¿Àû ÇÁ·Î±×·¡¹ÖDPÀ¸·Î ¹®Á¦ ÇØ°áÇϱâ
__MDP
____Æú¸®½Ã
____°¨°¡À²°ú ¸®ÅÏ
____°¡Ä¡ÇÔ¼ö
____º§¸¸ ¹æÁ¤½Ä
__°ÈÇнÀ ¾Ë°í¸®µë ºÐ·ù
____¸ðµ¨ ÇÁ¸® ¾Ë°í¸®µë
________°¡Ä¡ ±â¹Ý ¾Ë°í¸®µë
________Æú¸®½Ã ±×·¡µð¾ðÆ® ¾Ë°í¸®µë
________¾×ÅÍ Å©¸®Æ½ ¾Ë°í¸®µë
________ÇÏÀ̺긮µå ¾Ë°í¸®µë
____¸ðµ¨ ±â¹Ý °ÈÇнÀ
____¾Ë°í¸®µë ´Ù¾çÈ
__DP
____Æú¸®½Ã Æò°¡¿Í Æú¸®½Ã °³¼±
____Æú¸®½Ã ÀÌÅÍ·¹À̼Ç
________ÇÁ·ÎÁð·¹ÀÌÅ©¿¡ Àû¿ëµÈ Æú¸®½Ã ÀÌÅÍ·¹À̼Ç
____°¡Ä¡ ÀÌÅÍ·¹À̼Ç
________ÇÁ·ÎÁð·¹ÀÌÅ©¿¡ Àû¿ëÇÑ °¡Ä¡ ÀÌÅÍ·¹À̼Ç
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
2ºÎ. ¸ðµ¨ ÇÁ¸® °ÈÇнÀ ¾Ë°í¸®µë
4Àå. Q-·¯´×°ú SARSA ¾ÖÇø®ÄÉÀ̼Ç
__¸ðµ¨¾øÀÌ ÇнÀÇϱâ
____»ç¿ëÀÚ °æÇè
____Æú¸®½Ã Æò°¡
____Ž»ö ¹®Á¦
________¿Ö Ž»öÇØ¾ß Çϴ°¡?
________Ž»ö ¹æ¹ý
__½Ã°£Â÷ ÇнÀ
____½Ã°£Â÷ ¾÷µ¥ÀÌÆ®
____Æú¸®½Ã °³¼±
____¸óÅ×Ä«¸¦·Î¿Í ½Ã°£Â÷ ºñ±³
__SARSA
____¾Ë°í¸®µë
__Taxi-v2¿¡ SARSA Àû¿ëÇϱâ
__Q-·¯´×
____ÀÌ·Ð
____¾Ë°í¸®µë
__Taxi-v2¿¡ Q-·¯´× Àû¿ëÇϱâ
____SARSA¿Í Q-·¯´× ºñ±³
__¿ä¾à
__Áú¹®
5Àå. Deep Q-Network
__½ÉÃþ½Å°æ¸Á°ú Q-·¯´×
____ÇÔ¼ö ±Ù»ç
____½Å°æ¸ÁÀ» ÀÌ¿ëÇÑ Q-·¯´×
____µö Q-·¯´×ÀÇ ºÒ¾ÈÁ¤¼º
__DQN
____ÇØ°áÃ¥
________¸®Ç÷¹ÀÌ ¸Þ¸ð¸®
________Ÿ±ê ³×Æ®¿öÅ©
____DQN ¾Ë°í¸®µë
________¼Õ½ÇÇÔ¼ö
________ÀÇ»çÄÚµå
____¸ðµ¨ ¾ÆÅ°ÅØó
__DQNÀ» Æþ¿¡ Àû¿ëÇϱâ
____¾ÆŸ¸® °ÔÀÓ
____Àü ó¸®
____DQN ±¸Çö
________DNN
________°æÇè ¹öÆÛ
________°è»ê ±×·¡ÇÁ¿Í ÈÆ·Ã ·çÇÁ
____°á°ú
__DQN °³¼± ¾Ë°í¸®µë
____Double DQN
________DDQN ±¸Çö
________°á°ú
____DQN µà¾ó¸µÇϱâ
________µà¾ó¸µ DQN ±¸Çö
________°á°ú
____N-½ºÅÜ DQN
________±¸Çö
________°á°ú
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
6Àå. È®·ü ±â¹Ý PG ÃÖÀûÈ ÇнÀ
__Æú¸®½Ã ±×·¡µð¾ðÆ® ¸Þ¼Òµå
____Æú¸®½ÃÀÇ ±×·¡µð¾ðÆ®
____Æú¸®½Ã ±×·¡µð¾ðÆ® Á¤¸®
____±×·¡µð¾ðÆ® °è»êÇϱâ
____Æú¸®½Ã
____¿Â-Æú¸®½Ã PG
__REINFORCE ¾Ë°í¸®µë ÀÌÇØÇϱâ
____REINFORCE ±¸ÇöÇϱâ
____REINFORCE¸¦ ÀÌ¿ëÇØ Å½»ç¼± Âø·ú½ÃÅ°±â
________°á°ú ºÐ¼®Çϱâ
__º£À̽º¶óÀÎÀÌ ÀÖ´Â REINFORCE
____º£À̽º¶óÀÎÀ¸·Î REINFORCE ±¸ÇöÇϱâ
__AC ¾Ë°í¸®µë ÇнÀÇϱâ
____¾×ÅÍ°¡ ÇнÀÇϵµ·Ï µ½±â À§ÇØ Å©¸®Æ½ »ç¿ëÇϱâ
____n-step AC ¸ðµ¨
____AC ±¸Çö
____AC¸¦ »ç¿ëÇØ Å½»ç¼±spacecraft Âø·ú½ÃÅ°±â
____°í±Þ AC ÆÁ°ú Æ®¸¯
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
7Àå. TRPO¿Í PPO ±¸Çö
__·Îº¸½ºÄð
____¿¬¼Ó ½Ã½ºÅÛ Á¦¾î
__Natural Policy Gradient
____NPG¿¡ ´ëÇÑ ¾ÆÀ̵ð¾î
____¼öÇÐÀû °³³ä
________FIM°ú KL ¹ß»ê
____NG ¹®Á¦
__TRPO
____TRPO ¾Ë°í¸®µë
____TRPO ¾Ë°í¸®µë ±¸Çö
____TRPO ¾ÖÇø®ÄÉÀ̼Ç
__Proximal Policy Optimization
____PPOÀÇ °³¿ä
____PPO ¾Ë°í¸®µë
____PPOÀÇ ±¸Çö
____PPO ¾ÖÇø®ÄÉÀ̼Ç
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
8Àå. DDPG¿Í TD3 ¾ÖÇø®ÄÉÀ̼Ç
__Æú¸®½Ã ±×·¡µð¾ðÆ® ÃÖÀûÈ¿Í Q-·¯´× °áÇÕÇϱâ
____°áÁ¤·ÐÀû Æú¸®½Ã ±×·¡µð¾ðÆ®
____DDPG ¾Ë°í¸®µë
____DDPG ±¸Çö
____DDPG¸¦ BipedalWalker-v2¿¡ Àû¿ëÇϱâ
__TD3 Æú¸®½Ã ±×·¡µð¾ðÆ®
____°ú´ëÆò°¡ ÆíÇâ ¹®Á¦ ÇØ°á
________TD3ÀÇ ±¸Çö
____ºÐ»ê °¨¼Ò ÇØ°á
________Áö¿¬µÈ Æú¸®½Ã ¾÷µ¥ÀÌÆ®
________Ÿ±ê Á¤±ÔÈ
____BipedalWalker¿¡ TD3¸¦ Àû¿ëÇϱâ
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
3ºÎ. ¸ðµ¨ ÇÁ¸® ¾Ë°í¸®µë°ú °³¼±
9Àå. ¸ðµ¨ ±â¹Ý °ÈÇнÀ
__¸ðµ¨ ±â¹Ý ¸Þ¼Òµå
____¸ðµ¨ ±â¹Ý ÇнÀ¿¡ ´ëÇÑ Æø³ÐÀº °üÁ¡
________¾Ë·ÁÁø ¸ðµ¨
________¹ÌÁöÀÇ ¸ðµ¨
____Àå´ÜÁ¡
__¸ðµ¨ ±â¹Ý ÇнÀ°ú ¸ðµ¨ ÇÁ¸® ÇнÀ °áÇÕÇϱâ
____¸ðµ¨ ±â¹Ý°ú ¸ðµ¨ ÇÁ¸® Á¢±Ù¹ýÀÇ À¯¿ëÇÑ Á¶ÇÕ
____À̹ÌÁö¿¡¼ ¸ðµ¨ ¸¸µé±â
__¿ªÁøÀÚ¿¡ Àû¿ëÇÑ ME-TRPO ¸ðµ¨
____ME-TRPO ÀÌÇØÇϱâ
____ME-TRPO ±¸ÇöÇϱâ
____·Îº¸½ºÄð ½ÇÇèÇϱâ
________·Îº¸½ºÄ𠿪ÁøÀÚ ½ÇÇè °á°ú
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
10Àå. DAgger ¾Ë°í¸®µëÀ¸·Î ¸ð¹æ ÇнÀÇϱâ
__±â¼úÀû ¿ä±¸ »çÇ×
____Flappy Bird ¼³Ä¡
__¸ð¹æ Á¢±Ù
____¿îÀü º¸Á¶ »ç·Ê
____IL°ú RL ºñ±³Çϱâ
____¸ð¹æ ÇнÀ¿¡¼ Àü¹®°¡ÀÇ ¿ªÇÒ
____IL ±¸Á¶
________¼öµ¿ ¸ð¹æ°ú ´Éµ¿ ¸ð¹æ ºñ±³Çϱâ
__Flappy Bird °ÔÀÓÇϱâ
____ȯ°æÀ» ÀÌ¿ëÇÏ´Â ¹æ¹ý
__µ¥ÀÌÅÍ ÁýÇÕdataset Áý°è ¾Ë°í¸®µë ÀÌÇØÇϱâ
____DAgger ¾Ë°í¸®µë
____DAggerÀÇ ±¸Çö
________Àü¹®°¡ Ãß·Ð ¸ðµ¨ ÀûÀç
________ÇнÀÀÚÀÇ °è»ê ±×·¡ÇÁ ¸¸µé±â
________DAgger loop ¸¸µé±â
____Flappy Bird °á°ú ºÐ¼®
__IRL
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
11Àå. ºí·¢¹Ú½º ÃÖÀûÈ ¾Ë°í¸®µë ÀÌÇØÇϱâ
__°ÈÇнÀÀÇ ´ë¾È
____°ÈÇнÀ¿¡ ´ëÇÑ °£´ÜÇÑ ¿ä¾à
____´ë¾È
________EAs
__EAÀÇ ÇÙ½É
____À¯ÀüÀÚ ¾Ë°í¸®µëGA
____ÁøÈ Àü·«
________CMA-ES
________ES ´ë RL
__È®Àå °¡´ÉÇÑ ÁøÈ Àü·«
____ÇÙ½É
________ES º´·ÄÈÇϱâ
________´Ù¸¥ Æ®¸¯
________ÀÇ»ç ÄÚµå
____È®Àå °¡´ÉÇÑ ±¸Çö
________¸ÞÀÎ ÇÔ¼ö
________ÀÛ¾÷ÀÚ
__È®Àå °¡´ÉÇÑ ES¸¦ LunarLander¿¡ Àû¿ëÇϱâ
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
12Àå. ESBAS ¾Ë°í¸®µë °³¹ßÇϱâ
__Ž»ö ´ë È°¿ë
____¸ÖƼ ¾Ïµå ¹êµ÷
__Ž»ö Á¢±Ù¹ý
____Ž¿å Àü·«
____UCB ¾Ë°í¸®µë
________UCB1
____Ž»ö º¹Àâµµ
__ESBAS
____¾Ë°í¸®µë ¼±Åà ¾Ë¾Æº¸±â
____ESBAS ³»ºÎ ±¸Á¶
____±¸Çö
____Acrobot ½ÇÇàÇϱâ
________°á°ú
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á
13Àå. °ÈÇнÀ ¹®Á¦¸¦ ÇØ°áÇϱâ À§ÇÑ ½ÇÁ¦ ±¸Çö
__µö °ÈÇнÀÀÇ ¸ð¹ü »ç·Ê
____ÀûÇÕÇÑ ¾Ë°í¸®µë ¼±ÅÃÇϱâ
____°ÈÇнÀ ¾Ë°í¸®µë °³¹ßÇϱâ
__µö °ÈÇнÀÀÇ µµÀü °úÁ¦
____¾ÈÁ¤¼º°ú ÀçÇö¼º
____È¿À²¼º
____ÀϹÝÈ
__°í±Þ ±â¼ú
____ºñÁöµµ °ÈÇнÀ
________³»ÀçÀû º¸»ó
____ÀüÀÌ ÇнÀ
________ÀüÀÌ ÇнÀÀÇ À¯Çü
__Çö½Ç¿¡¼ÀÇ °ÈÇнÀ
____°ÈÇнÀÀ» Çö½Ç¿¡ Àû¿ëÇÒ ¶§ ÇØ°áÇØ¾ß ÇÒ ¹®Á¦
____½Ã¹Ä·¹À̼ǰú Çö½Ç »çÀÌÀÇ Â÷ÀÌ ÁÙÀ̱â
____Àڱ⸸ÀÇ È¯°æ ¸¸µé±â
__°ÈÇнÀÀÇ ¹Ì·¡¿Í »çȸ¿¡ ¹ÌÄ¡´Â ¿µÇâ
__¿ä¾à
__Áú¹®
__½ÉÈÇнÀ ÀÚ·á