\usepackage[utf8]{inputenc}
\usepackage{geometry}
\usepackage{hyperref}
+\usepackage{todonotes}
\geometry{a4paper}
\title{Imago -- Go Image Recognition\\
\small \texttt{tomik.musil@gmail.com}}
\date{\small summer 2012}
+\newcommand{\todoi}[1]{\todo[inline]{#1}}
+\newcommand{\todoig}[1]{\todo[inline,color=green!40]{#1}}
+\newcommand{\todof}[1]{\todo[fancyline]{#1}}
+\newcommand{\todog}[1]{\todo[color=green!40]{#1}}
+
\newcommand{\pclass}[2]{\subsection*{#1}
#2
}
The grid detection algorithm will be based on a method described in~\cite{thirsima05}.
Since this grid-finding algorithm does not work on boards filled with stones, another algorithm will be devised for these. It will be based on further research.
+\todoig{We found that boards with lot of stones are not problematic, after all. Except for
+the obvious extreme cases, for which we propose an algorithm, but have not
+implemented it.}
+
\section{User interface specification}
As the process should be as automatic as possible, there will be very few options. Therefore there is no need for a GUI (with exceptions mentioned below) and all the options (output format etc.) will be specified on the command line.
\paragraph{Single image processing}
Ideally, user would supply an image and get a position representation in specified output format right away, no further questions asked.
Should the automatic grid-finding system fail, the user will be prompted to run a simple GUI and point out corners of the grid.
+
+\todoig{Mostly done. The GUI is ready, but we are not able to detect failure of the automatic system, as it is stochastic and just returns its best guess no matter how wrong it might be.}
+
\paragraph{Multiple images processing}
It will be possible to process a set of images as a game record. The grid will be found in the first picture and remembered for following ones (assuming the camera position had not changed in between) or found in each picture separately. When player's hand or other object hides part of the board, Imago will try to infer the position from previous and following pictures. If it fails to do so, it will ask the user to open a simple GUI, look at the pictures and fill in the move.
+
+\todoi{Mostly abandoned. We can process a set of images, but with no inference
+or user-interaction. We propose better methods (namely HMMs) to move towards
+video analysis, than selecting one frame for each move, which was the
+motivation for this.}
+
\paragraph{Output formats}
The ASCII output will be similar to the {\em GNU Go} command-line mode output. Each situation will be described by 19 rows of 19 characters each, with the respective symbol ('.', 'X', 'O') being used for empty intersection, black stone and white stone.
It will be possible to export both single positions and complete game records to {\em Smart Game Format} (its specification can be found at~\cite{sgf}).
+
+\todoig{Done. We use '.', 'B' and 'W'
+for ASCII output, but that can be changed.}
+
\paragraph{Supporting scripts}
There will be a script to capture images from a web camera. It will be capable of taking pictures periodically or running a go clock and taking picture with every clock-press.
+\todoig{Done.}
+
\section{Technical specification}
\paragraph{Image processing}
Imago will first try to find the grid based on visible lines. The procedure used here relies on the Hough transform and is similar to the method described~\cite{thirsima05}. If it fails to do so, it will try to locate sufficient number of stones to infer the grid position. Algorithm for finding stones will be subject to further research. If neither of aforementioned methods succeeds, it will prompt the user to open a GUI and locate corners of the grid manually. When it has the grid, it figures out stones position and color based on the color around each intersection.
Multiple images processing is essentially the above mentioned process repeated, possibly without the grid-locating part, which could be omitted after the first picture. Methods for inferring missing moves will be based on further research.
+\todoig{We found that methods based on Hough transform and RANSAC are sufficient for reasonable images, therefore we abandoned the idea of locating the stones first. Another reason is that because our method is stochastic and it is quite flexible (works even with a limited number of visible lines), failure is hard to detect.}
+
\paragraph{Programming languages and libraries}
Most parts of the program will be written in Python 2.7.
{\em Python Imaging Library} (PIL, ~\cite{pil}) will be used for basic image manipulation. {\em Pygame} will be used for the GUI-related stuff. Web camera will be accessed using {\em openCV} on unix and {\em VideoCapture} module (\cite{vidcap}) on Windows.