Contextual-RNN-GAN/index.html at master · arnabgho/Contextual-RNN-GAN · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
<!DOCTYPE html>
<html lang="en-us">
  <head>
    <meta charset="UTF-8">
    <title>Contextual RNN-GANs for Abstract Reasoning Diagram Generation</title>
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="stylesheet" type="text/css" href="stylesheets/normalize.css" media="screen">
    <link href='https://fonts.googleapis.com/css?family=Open+Sans:400,700' rel='stylesheet' type='text/css'>
    <link rel="stylesheet" type="text/css" href="stylesheets/stylesheet.css" media="screen">
    <link rel="stylesheet" type="text/css" href="stylesheets/github-light.css" media="screen">
  </head>
  <body>
    <script>
     (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
     (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
     m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
     })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

     ga('create', 'UA-71335999-2', 'auto');
     ga('send', 'pageview');

    </script>
    <section class="page-header">
      <h1 class="project-name">Contextual RNN-GANs for Abstract Reasoning Diagram Generation</h1>
      <a href="https://arxiv.org/abs/1609.09444" class="btn" target="_blank">Link to Paper</a>
    </section>
    <section class="main-content">
      <h1>
<a id="contextual-rnn-gan" class="anchor" href="#contextual-rnn-gan" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a><a href="https://arxiv.org/abs/1609.09444" target="_blank">Contextual RNN-GANs for Abstract Reasoning Diagram Generation</a></h1>

<p>Arnab Ghosh<sup>*</sup>, Viveka Kulharia<sup>*</sup>, Amitabha Mukerjee, Vinay Namboodiri, Mohit Bansal</p>

<p><sup>*</sup>Equal contribution</p>

<h2>
<a id="motivation" class="anchor" href="#motivation" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Motivation</h2>

<ul>
<li><p>Understanding, predicting, and generating object motions and transformations is a core problem in artificial intelligence.</p></li>
<li><p>Modeling sequences of evolving images may provide better representations and models of motion and may ultimately be used for forecasting, simulation, or video generation.</p></li>
<li><p>Diagrammatic Abstract Reasoning is an avenue in which diagrams evolve in complex patterns and one needs to infer the underlying pattern sequence and generate the next image in the sequence.</p></li>
</ul>

<h2>
<a id="an-example-with-an-explanation" class="anchor" href="#an-example-with-an-explanation" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>An Example with an Explanation</h2>

<p style="text-align:center;"><img src="https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-6.png" width="850px" height="180px"></p>

<p>An explanation of the ground truth is that the dashed line first goes to the left, then to the right, and then on both sides, and also changes from single to double, hence the ground truth should have double dashed lines on both the sides. On the corners, the number of slanted lines increase by one after every two images, hence the ground truth should have four slant lines on both the corners.</p>

<h2>
<a id="some-more-example-problems-from-dat-dar-dataset" class="anchor" href="#some-more-example-problems-from-dat-dar-dataset" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Some More Example Problems From DAT-DAR Dataset</h2>

<p style="text-align:center;"><img src="https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-8.png" width="900px" height="440px"></p>

<hr>
<h2>
<a id="the-model" class="anchor" href="#the-model" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>The Model</h2>
<hr>
<h2>
<a id="contextual-rnn-gan-1" class="anchor" href="#contextual-rnn-gan-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Contextual RNN-GAN</h2>

<ul>
<li><p>GANs have been shown to be useful in several image generation and manipulation tasks and hence it was a natural choice to prevent the model make fuzzy generations.</p></li>
<li><p>In Context-RNN-GAN, 'context' refers to the adversary receiving previous images (modeled as an RNN) and the generator is also an RNN. The name distinguishes it from our simpler RNN GAN model where the adversary is not contextual (as it only uses a single image) and only the generator is an RNN.</p></li>
<li><p>The discriminator is modeled as a GRU-RNN which gets all the preceding images to decide whether the generation by the Generator is the correct image for the timestep. </p></li>
<li><p>The generator is modeled as a GRU-RNN which tries to generate an image using the preceding images. It is guided by the contextual discriminator to produce real looking images.</p></li>
</ul>

<p style="text-align:center;"><img src="https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model.png"></p>

<p>The above figure corresponds to our Context-RNN-GAN model, where the generator G and the discriminator D (where D<sub>i</sub> represents its ith timestep snapshot) are both RNNs. G generates an image at every timestep while D receives all the preceding images as context to decide whether the current image output by G is real vs generated for that particular timestep. x<sub>i</sub> are the input images.</p>

<h2>
<a id="impact-of-adversarial-loss-" class="anchor" href="#impact-of-adversarial-loss-" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Impact of Adversarial Loss </h2>

<ul>
<li><p>When using an L-2 loss function, some of the generated images were superimpositions of the component parts and were too cluttered.</p></li>
<li><p>When using an L-1 loss function, although it was sharper than using an L-2 loss, it was missing some components of the actual diagrams.</p></li>
<li><p>A weighted combination of L-1 and adversarial loss (as defined for the context based discriminator model described above) was used for the Context-RNN-GAN model to produce the best results based on empirical evaluation.</p></li>
</ul>

<p style="text-align:center;"><img src="https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-3.png" width="400px" height="220px"></p>

<h2>
<a id="some-generations-contextual-rnn-gan" class="anchor" href="#some-generations-contextual-rnn-gan" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Some Generations (Contextual-RNN-GAN)</h2>

<p style="text-align:center;"><img src="https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-10.png"></p>

<hr>

<h2>
<a id="modeling-of-consecutive-timesteps-using-siamese-networks-for-better-accuracy" class="anchor" href="#modeling-of-consecutive-timesteps-using-siamese-networks-for-better-accuracy" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Modeling of Consecutive Timesteps using Siamese Networks for better accuracy</h2>

<p style="text-align:center;"><img src="https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-13.png"></p>

<p style="text-align:center;"><img src="https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-14.png"></p>

<hr>

<h2>
<a id="comparison-with-human-performance" class="anchor" href="#comparison-with-human-performance" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Comparison With Human Performance</h2>

<table>
  <tr>
    <th></th>
    <th>College-grade</th>
    <th>10th-grade</th>
  </tr>
  <tr>
    <td>Age range</td>
    <td>20-22</td>
    <td>14-16</td>
  </tr>
  <tr>
    <td>#Students</td>
    <td>21</td>
    <td>48</td>
  </tr>
  <tr>
    <td>Mean</td>
    <td>44.17%</td>
    <td>36.67%</td>
  </tr>
  <tr>
    <td>Std</td>
    <td>16.67%</td>
    <td>17.67%</td>
  </tr>
  <tr>
    <td>Max</td>
    <td>66.67%</td>
    <td>75.00%</td>
  </tr>
  <tr>
    <td>Min</td>
    <td>8.33%</td>
    <td>8.33%</td>
  </tr>
</table>
<p>Context-RNN-GAN with features obtained from Siamese CNN is competitive with humans in 10th grade in the sense that it is able to achieve accuracy of 35.4% when the generated features are compared with the features obtained from actual answer images.
It needs to be noted that humans can see the options to get the best possible overall sequence of six images and hence can select the best choice while our model is just comparing the generations (obtained using sequence of five images in the problem) with options to get the best option. So, we can say that our model is very good generator and comparable to even 10th grade humans. An interesting aspect is that the model is never trained on the correct answers, it is just trained on multiple sequences from the problem images and still performs remarkably well.
</p>

<hr>

<h2>
<a id="interesting-cases" class="anchor" href="#interesting-cases" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Interesting Cases</h2>

<p style="text-align:center;"><img src="https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-12.png" width="900px" height="440px"></p>

<ul>
<li><p><strong>First Generation</strong> In the first example generation it is interesting to note that the model correctly predicted elements in off diagonal while faltered in the shape of the elements in leading diagonals.</p></li>
<li><p><strong>Second Generation</strong> Second example shows an interesting case whereby the image generated by our model is also plausible (if by symmetry it is considered that first and third the semicircular ring is solid and hence fourth and sixth should be solid) while the actual answer is of course plausible according to the reasoning that the (solid vs hollow) flipped in the first two cases then stayed the same for the next two timesteps. Even more interesting is its analysis of the spatial dynamics of the ball and the semicircular ring which it almost correctly captured.</p></li>
<li><p><strong>Third Generation</strong> Another very interesting case is the generation which it gets correct. However, in this case the answer figure is exactly similar to the second figure in the sequence. Therefore, it is not illustrative of the ability of the model to generate the sequence of the pattern.</p></li>
</ul>

<hr>

<h2>
<a id="application-of-the-model-to-moving-mnist" class="anchor" href="#application-of-the-model-to-moving-mnist" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Application of the model to Moving-MNIST</h2>

<p style="text-align:center;"><img src="https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-11.png" width="900px" height="440px"></p>
<p>The model can be applied to video prediction tasks as illustrated by above figure. The <a href="http://www.cs.toronto.edu/~nitish/unsupervised_video/"  target="_blank">Moving MNIST task</a> consists of videos of two moving MNIST digits and the next frame has to be predicted from the preceding frames.</p>
      <footer class="site-footer">
        <span class="site-footer-owner"><a href="https://github.com/arnabgho/Contextual-RNN-GAN"  target="_blank">Contextual-rnn-gan</a> is maintained by <a href="https://github.com/arnabgho"  target="_blank">arnabgho</a>.</span>

        <span class="site-footer-credits">This page was generated by <a href="https://pages.github.com"  target="_blank">GitHub Pages</a> using the <a href="https://github.com/jasonlong/cayman-theme"  target="_blank">Cayman theme</a> by <a href="https://twitter.com/jasonlong"  target="_blank">Jason Long</a>.</span>
      </footer>

    </section>


  </body>
</html>