AI puzzle

How I taught the agent to assemble the 2048 cage in the game "2048"

AI assembles cell 2048
AI assembles cell 2048

Hello! My name is Rinat Maksutov, I work in the Intelligent Engineering Services division of the Technology department of the Russian office of Accenture, and I lead custom development projects. Over the course of my long career at Axencher, I have tried many different areas: mobile development, front-end, back-end, and even data science with mashlern. However, my story will not be about work, but about a hobby. I really enjoy learning and exploring new areas on my own pet projects. Today I will tell you about one of them - how I taught the Reinforcement learning (RL) agent to play the famous puzzle "2048". The article deliberately will not contain code, mathematics, state-of-the-art approaches and the latest discoveries in the field, so people who are well acquainted with RL will not discover anything new for themselves. This article is a story for the general public abouthow I set myself an unusual goal and achieved it.

. , , Nanodegree Udacity (Nanodegree - ). Deep Learning Nanodegree , . 

RL, : , , - , , , - . , .

, RL , . , , , - , ( , ). 

, - , ( , RL), . - 2048 ( : https://play2048.co/). , (, , , ), , . , ( 0.9) ( 0.1). , , .

, 2048 . , 4096, 8192, . - 131 072, 2^17: 

Source: Wikipedia
: Wikipedia

. , , . , . , , , (, ), , - . , “” , , .

  1. - , “” , , , .

  2. ( , ) . , “” , . 

, , , . 

Reinforcement learning

, RL, - . - , . (, ), , . , , , , . , .

Source: https://medium.com/@dgquintero02/how-to-explain-machine-learning-to-your-family-77a3bac3593a
: https://medium.com/@dgquintero02/how-to-explain-machine-learning-to-your-family-77a3bac3593a

, , , .  , , , , . “”. , , . - “” - , . - “”, , , - ( discourage) . ( , ) .

Udacity . , , . : , , , . , , . , , . - - - , .

: AlphaGo, StarCraft . , , - , . , , . , , , . 

, . , , . , . 

, , : 1) , 2) , 3) . , , , - , . , , : , , . 

. - ( , ) , . - -, , , , StarCraft . , , , . , , , . , . , , - . , . , . , .

Another meme with Boromir

2048 ( - , 2048 - ) - , , , , . 

: , Deep Q-network Udacity, , . . 

, 3 ( , ):

  • One-hot encoded (16 * 18 )

  • “ ”

  • Log2

  • 4 4

  • log2

  • log2

  • 10 , 1024, ε: 0.05, ε: 0.9999, 

  • 1, 3, 5, 20

  • ε ( ) 1.0 0.01

  • 100 000

  • ( )

  • 50 000 200 000

  • , , , , ..

()

  • “ ”: N , ,

  • “ ”: 3 ,

  • 2

  • 5-: 288-31024-4, ReLU Adam optimizer

  • 2, 4

  • 256, 512

  • learning rate

- , , - , - . .

, - . , .

, . - . “” , . , , 44, . fully-connected , , 116:

. , 512. , . , : 0 . , : , . 

- . , . , , - . , , , , , . , , , . 

. : , log2 . , , :

, . 512, 1024. . , . 

- , , . , . :

, a+a = b, b+b=c .., , a, b . (“+” - , “”). ? , , . , one-hot encoded . , 18, , , , . - . , , , , .

. , , , , . , . - . 

, , Space Invaders. Google .

Space Invaders.
Space Invaders.

, , “ ”. (“”), (“”) .

2048 . . , 2 , . , … . . , , . , , , 2 4. , , [ +  2 4]. , , , . - . 

-. , . , : , , - . , , : , . , , . 

, . , , . , , , . , . , , - . , , . 

, “” - . , , , . , , . . - . 1.0 0.1. , , , . , , , . - “” , . 

RL , , . , - , - , . , ( ) . , . , , . , , , , . , , - , “” - .

Distribution of the shares of the chosen directions of moves in each of the games.
.

, : , “” - .

, . , . , , , , , . , . - - , - , , , . , - , . , , . , , . , “” ( - - , , , ). , “” , .

The WOW signal
The WOW signal

. - - 2048.

, 2048 60 . , , . , , 1024. , 1024 , - 30 1024. , “” 2048, , , , , , - 4096.

, , . 20- , 2048 ( 16:40).

( !), . , 2048 - . , - GitHub ! , . !

PS: , back-end Python Java, front-end React. , --. , , proof-of-concept . , , !




All Articles