Learning the Game of Go by Scalable Network without Prior Knowledge of Komi

2020 
AlphaGo trains a value network to predict the win rate of the current state with 7.5 komi on a 19 × 19 board. The komi of most rectangular boards is unknown, so we do not know who the winner is at the end of the game. We need to use the human experience to guess a komi and then train the value network with this komi. Therefore, the accuracy of the value network is related to the accuracy of the guess. This article uses the board value network to calculate the score of the current state and tries to maximize the score. Then, we do not need to guess the komi. We also modify the network structure to support the board with arbitrary board size as input. Furthermore, we can transfer knowledge of the small board to the large board. We propose an algorithm that can adapt to the bonus rule. We have experimentally proved that our method is effective on a small board and has the ability to transfer knowledge to the large board. In order to better understand the learning process, we visualize the policy and score of some major branches. Finally, we show the solution that our program obtained on 6 × 6, 6 × 7, and 7 × 8 boards.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    1
    Citations
    NaN
    KQI
    []