doc/spec.tex

   1 % !TEX TS-program = pdflatex
   2 % !TEX encoding = UTF-8 Unicode
   3
   4 \documentclass[12pt]{article}
   5
   6 \usepackage[utf8]{inputenc}
   7 \usepackage{geometry}
   8 \usepackage{hyperref}
   9 \usepackage{todonotes}
  10 \geometry{a4paper}
  11
  12 \title{Imago -- Go Image Recognition\\
  13         \small specification}
  14 \author{Tomáš Musil\\
  15         \small \texttt{tomik.musil@gmail.com}}
  16 \date{\small summer 2012}
  17
  18 \newcommand{\todoi}[1]{\todo[inline]{#1}}
  19 \newcommand{\todoig}[1]{\todo[inline,color=green!40]{#1}}
  20 \newcommand{\todof}[1]{\todo[fancyline]{#1}}
  21 \newcommand{\todog}[1]{\todo[color=green!40]{#1}}
  22
  23 \newcommand{\pclass}[2]{\subsection*{#1}
  24         #2
  25 }
  26
  27 \newcommand{\pfunc}[4]{
  28         \begin{center}
  29                 \begin{tabular*}{0.8\textwidth}{ll}
  30                         \multicolumn{2}{l}{{\bf function} #1} \\
  31                         {\bf input} #2 & {\bf output} #3 \\
  32                 \multicolumn{2}{p{0.8\textwidth}}{#4} \\
  33         \end{tabular*}
  34         \end{center}
  35 }
  36
  37 \begin{document}
  38
  39 \maketitle
  40
  41 \section{General information}
  42 \paragraph{}
  43 Imago will be a program for automatic processing of Go images. It will take an image (or a set of images), find the board-grid and stones and produce an abstract representation of the game situation (or a game record).
  44
  45 \paragraph{}
  46 It will support JPEG, BMP, TIFF (and other) image formats on input. It will be capable of output to ASCII and SGF format. As the process should be fully automatic, the program will be operated from command line. There will, however, be a GUI for manual grid location in case the automatic one should fail.
  47
  48 \paragraph{}
  49 The program will be written mainly in Python, performance-critical parts later refactored in C. It will be distributed under a free (but not copyleft) license.
  50 It will be platform independent.
  51 The grid detection algorithm will be based on a method described in~\cite{thirsima05}.
  52 Since this grid-finding algorithm does not work on boards filled with stones, another algorithm will be devised for these. It will be based on further research.
  53
  54 \todoig{We found that boards with lot of stones are not problematic, after all. Except for
  55 the obvious extreme cases, for which we propose an algorithm, but have not
  56 implemented it.}
  57
  58 \section{User interface specification}
  59 As the process should be as automatic as possible, there will be very few options. Therefore there is no need for a GUI (with exceptions mentioned below) and all the options (output format etc.) will be specified on the command line.
  60 \paragraph{Single image processing}
  61 Ideally, user would supply an image and get a position representation in specified output format right away, no further questions asked.
  62 Should the automatic grid-finding system fail, the user will be prompted to run a simple GUI and point out corners of the grid.
  63
  64 \todoig{Mostly done. The GUI is ready, but we are not able to detect failure of the automatic system, as it is stochastic and just returns its best guess no matter how wrong it might be.}
  65
  66 \paragraph{Multiple images processing}
  67 It will be possible to process a set of images as a game record. The grid will be found in the first picture and remembered for following ones (assuming the camera position had not changed in between) or found in each picture separately. When player's hand or other object hides part of the board, Imago will try to infer the position from previous and following pictures. If it fails to do so, it will ask the user to open a simple GUI, look at the pictures and fill in the move.
  68
  69 \todoi{Mostly abandoned. We can process a set of images, but with no inference
  70 or user-interaction. We propose better methods (namely HMMs) to move towards
  71 video analysis, than selecting one frame for each move, which was the
  72 motivation for this.}
  73
  74 \paragraph{Output formats}
  75 The ASCII output will be similar to the {\em GNU Go} command-line mode output. Each situation will be described by 19 rows of 19 characters each, with the respective symbol ('.', 'X', 'O') being used for empty intersection, black stone and white stone.
  76 It will be possible to export both single positions and complete game records to {\em Smart Game Format} (its specification can be found at~\cite{sgf}).
  77
  78 \todoig{Done. We use '.', 'B' and 'W'
  79 for ASCII output, but that can be changed.}
  80
  81 \paragraph{Supporting scripts}
  82 There will be a script to capture images from a web camera. It will be capable of taking pictures periodically or running a go clock and taking picture with every clock-press.
  83
  84 \todoig{Done.}
  85
  86 \section{Technical specification}
  87 \paragraph{Image processing}
  88 Imago will first try to find the grid based on visible lines. The procedure used here relies on the Hough transform and is similar to the method described~\cite{thirsima05}. If it fails to do so, it will try to locate sufficient number of stones to infer the grid position. Algorithm for finding stones will be subject to further research. If neither of aforementioned methods succeeds, it will prompt the user to open a GUI and locate corners of the grid manually. When it has the grid, it figures out stones position and color based on the color around each intersection.
  89
  90 Multiple images processing is essentially the above mentioned process repeated, possibly without the grid-locating part, which could be omitted after the first picture. Methods for inferring missing moves will be based on further research.
  91
  92 \todoig{We found that methods based on Hough transform and RANSAC are sufficient for reasonable images, therefore we abandoned the idea of locating the stones first. Another reason is that because our method is stochastic and it is quite flexible (works even with a limited number of visible lines), failure is hard to detect.}
  93
  94 \paragraph{Programming languages and libraries}
  95 Most parts of the program will be written in Python 2.7.
  96 {\em Python Imaging Library} (PIL, ~\cite{pil}) will be used for basic image manipulation. {\em Pygame} will be used for the GUI-related stuff. Web camera will be accessed using {\em openCV} on unix and {\em VideoCapture} module (\cite{vidcap}) on Windows.
  97 \section{Code architecture}
  98 The code will be divided into following modules and corresponding classes:
  99 \pclass{camera}
 100 {accessing the web camera}
 101 \pclass{capture}
 102 {capturing images}
 103 \pclass{filters}
 104 {graphic filters}
 105         \pfunc{components}
 106                 {grayscale image}{binary image}
 107                 {Returns an image where each pixel of value 1 represents center of one connected component in the input image.}
 108         \pfunc{edge\_detection}
 109                 {grayscale image}{grayscale image}
 110                 {Provides some method of edge detection.}
 111         \pfunc{high-pass}
 112                 {grayscale image, threshold}{binary image}
 113                 {High pass filter.}
 114         \pfunc{peaks}
 115                 {grayscale image}{grayscale image}
 116                 {Finds peaks in the image.}
 117 \pclass{grid}
 118 {grid finding}
 119         \pfunc{find}
 120                 {list of lines}{two lists of 19 lines each}
 121                 {Given the lines found by Hough transform, tries to find the grid.}
 122 \pclass{hough}
 123 {Hough transform and related functions}
 124         \pfunc{all\_lines}
 125                 {binary image}{list of lines}
 126                 {Gets (filtered) result of the Hough transform. Returns a list of lines represented by angle and distance from the center of the image.}
 127         \pfunc{transform}
 128                 {binary image}{grayscale image}
 129                 {Hough transform of the image.}
 130 \pclass{imago}
 131 {the main program}
 132 \pclass{output}
 133 {output to different formats}
 134 \pclass{record}
 135 {game record processing}
 136 \pclass{stones}
 137 {stone finding}
 138         \pfunc{stone\_color}
 139                 {image, coordinates}{char}
 140                 {Returns a character representing stone color or empty intersection on given coordinates in the given image.}
 141 \pclass{timer}
 142 {go clock with image capture}
 143
 144 \begin{thebibliography}{9}
 145
 146         \bibitem{thirsima05}
 147                 Teemu Hirsimäki,
 148                 Extracting Go Game Positions from Photographs,
 149                 Helsinky University of Technology,
 150                 2005.
 151                 \url{http://www.cis.hut.fi/thirsima/gocam/gocam.pdf}
 152
 153         \bibitem{sgf}
 154                 SGF File Format FF[4],
 155                 last updated 2006.
 156                 \url{http://www.red-bean.com/sgf/}
 157
 158         \bibitem{pil}
 159                 Python Imaging Library (PIL)
 160                 \url{http://www.pythonware.com/products/pil/}
 161
 162         \bibitem{pygame}
 163                 pygame
 164                 \url{http://www.pygame.org/}
 165
 166         \bibitem{vidcap}
 167                 VideoCapture
 168                 \url{http://videocapture.sourceforge.net/}
 169
 170 \end{thebibliography}
 171
 172 \end{document}