Wednesday, March 9, 2016

alphago



터미네이터나 진격의거인과 싸우는 인간의 모습이 떠오른다.
이세돌 화이팅.

-------------------

알파고의 2판 해설을 보면서 답답함을 금할 수 없다.

이 수가 알파고의 데이타베이스에 있는 수냐에 대한 반복적인 질문이 나온다는 것은 알파고의 알고리즘에 대한 기본적인 기사 검색도 해보지 않았기 때문이다.

알파고는 끊임없이 바둑판의 모든 위치에 대한 값을 다시 계산하고 있다.
제한 시간 내에 가장 높은 값을 보이는 수를 출력하는 것으로 끝이다.

공부 안 하는 바둑기사들을 보니 바둑의 미래가 보이는 것 같다.

-------------------


정책망 - 몬테카를로 트리검색을 사용해서 수를 검색. 기존의 게임용 ai도 동일.
밸류망 - 검색공간을 줄여서 트리검색의 효율을 증가. 최근의 성과. 판세를 읽는 능력.

- 셀프경기로 얻은 데이타를 트리검색에 사용하고, 밸류망의 가중치를 변화시켜서 스스로 실력을 늘릴 수 있을 듯. 인간이 따라잡기 어려운 부분.
- 상대와의 경기는 수준 평가와 경기 환경 적응 훈련일 듯.
- 제한 시간이 길면 길수록 인간이 기하급수적으로 불리할 듯. 제한시간을 팍팍 줄여야...


"The Monte-Carlo Revolution in Go".
http://www.remi-coulom.fr/JFFoS/JFFoS.pdf


Mastering the game of Go with deep neural networks and tree search
http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

How AlphaGo (black, to play) selected its move in an informal game against Fan Hui.


http://www.willamette.edu/~levenick/cs448/goNature.pdf

https://en.wikipedia.org/wiki/Monte_Carlo_tree_search

https://en.wikipedia.org/wiki/Deep_learning

https://en.wikipedia.org/wiki/AlphaGo

https://deepmind.com/alpha-go.html

https://googleblog.blogspot.kr/2016/01/alphago-machine-learning-game-go.html

"The first game mastered by a computer was noughts and crosses (also known as tic-tac-toe) in 1952. Then fell checkers in 1994. In 1997 Deep Blue famously beat Garry Kasparov at chess. It’s not limited to board games either—IBM's Watson [PDF] bested two champions at Jeopardy in 2011, and in 2014 our own algorithms learned to play dozens of Atari games just from the raw pixel inputs. But to date, Go has thwarted AI researchers; computers still only play Go as well as amateurs."
"Traditional AI methods—which construct a search tree over all possible positions—don’t have a chance in Go"

http://www.theverge.com/2016/3/9/11184362/google-alphago-go-deepmind-result

http://www.theguardian.com/technology/2016/mar/07/go-board-game-google-alphago-lee-se-dol
“The big jump was the discovery of the value network, which was last summer,” Hassabis says. That was the realisation that a finely tuned neural network could solve one of the problems previously thought impossible, and learn to predict the winner of a game by looking at the board.
From there, progress was rapid. The value network, paired with a second neural network, the policy network, would work to pick a few possible moves (based on similar plays seen in previous matches) and then estimate which of the resulting board states would be strongest for the AlphaGo player.

가디안 기사가 알파고의 특성을 이해하는데 가장 도움이 되는 듯.


http://www.theverge.com/2016/3/9/11185030/google-deepmind-alphago-go-artificial-intelligence-impact
There are certainly parts of Go that require very deep search but it’s more a game about intuition and evaluation of features and seeing how they interact. In chess there’s really no substitute for search, and modern programs — the best program I know is a program called Komodo — it’s incredibly efficient at searching through the many possible moves and searching incredibly deeply as well.
Hassabis makes a distinction between "narrow" AIs like Deep Blue and artificial "general" intelligence (AGI), the latter being more flexible and adaptive.

체스에서는 바둑과 달리 검색만 필요하고, 바둑보다 더 깊은 검색이 필요하다고.
대신 딥마인드의 ai가 더 유연하고, 적응력이 뛰어나다고.


----------
추가

http://www.economist.com/news/science-and-technology/21694540-win-or-lose-best-five-battle-contest-another-milestone

[특별기고] 감동근교수 “알파고, 선택의 문제로 치부하던 영역까지 정확하게 계산한 뒤 둬”
http://www.hankookilbo.com/v/a47e9dcb03854a2d9bc528b817c6c517

알파고의 바둑에 대해 내가 이해한 바와 가장 비슷한 내용.