Contextual-RNN-GAN/params.json at master · arnabgho/Contextual-RNN-GAN · GitHub

1
2
3
4
5
6
{
  "name": "Contextual-rnn-gan",
  "tagline": "Contextual RNN-GANs for Abstract Reasoning Diagram Generation",
  "body": "# CONTEXTUAL RNN-GAN\r\n\r\nArnab Ghosh<sup> * </sup> , Viveka Kulharia <sup> * </sup> , Amitabha Mukerjee , Vinay Namboodiri , Mohit Bansal \r\n\r\n<sup> * </sup> Equal contribution\r\n\r\nProject page for the paper [Contextual RNN-GANs for Abstract Reasoning Diagram Generation](https://arxiv.org/abs/1609.09444)\r\n\r\nThe Task\r\n---\r\n\r\n* Understanding, predicting, and generating object motions and transformations is a core problem in artificial intelligence.\r\n\r\n* Modeling sequences of evolving images may provide better representations and models of motion and may ultimately be used for forecasting, simulation, or video generation.\r\n\r\n* Diagrammatic Abstract Reasoning is an avenue in which diagrams evolve in complex patterns and one needs to infer the underlying pattern sequence and generate the next image in the sequence.\r\n\r\nAn Example with an Explanation\r\n------\r\n\r\n<img src=\"https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-6.png\" width=\"850px\" height=\"180px\"/>\r\n\r\nAn explanation of the ground truth is that the dashed line first goes to the left, then to the right, and then on both sides, and also changes from single to double, hence the ground truth should have double dashed lines on both the sides. On the corners, the number of slanted lines increase by one after every two images, hence the ground truth should have four slant lines on both the corners.\r\n\r\nSome More Example Problems From DAT-DAR Dataset\r\n------\r\n\r\n<img src=\"https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-8.png\" width=\"900px\" height=\"440px\"/>\r\n\r\n---\r\n\r\nThe Model\r\n---\r\n\r\nContextual RNN-GAN\r\n----\r\n* GANs have been showed to be useful in several image generation and manipulation tasks and hence it was a natural choice to combat the model making fuzzy generations.\r\n\r\n* Contextual RNN-GAN incorporates context for the timestep as all the preceding images prior to the current timestep.  \r\n\r\n* The discriminator is modeled as a GRU-RNN which gets all the preceding images to decide whether the generation by the Generator is a correct image for the timestep. \r\n\r\n* The generator is modeled as a GRU-RNN which tries to generate an image using the preceding images to produce a real looking image for the particular timestep and guided by the contextual discriminator to produce better looking images.\r\n\r\n<img src=\"https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model.png\" width=\"500px\" height=\"440px\"/>\r\n\r\nImpact of Adversarial Loss \r\n-----\r\n\r\n* When using an L-2 loss function, some of the generated images were superimpositions of the component parts and were too cluttered.\r\n\r\n* When using an L-1 loss function, although it was sharper than using an L-2 loss, it was missing some components of the actual diagrams.\r\n\r\n<img src=\"https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-3.png\" width=\"400px\" height=\"220px\"/>\r\n\r\nSome Generations (Contextual-RNN-GAN)\r\n----\r\n\r\n<img src=\"https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-7.png\" width=\"900px\" height=\"300px\"/>\r\n\r\n---\r\nModeling of Consecutive Timesteps using Siamese Networks for better accuracy\r\n-----\r\n\r\n<img src=\"https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-4.png\" width=\"900px\" height=\"440px\"/>\r\n\r\n<img src=\"https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-5.png\" width=\"900px\" height=\"440px\"/>\r\n\r\n---\r\n\r\nInteresting Cases\r\n------\r\n\r\n<img src=\"https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-9.png\" width=\"900px\" height=\"440px\"/>\r\n\r\n* **First Generation** In the first example generation it is interesting to note that the model correctly predicted elements in off diagonal while faltered in the shape of the elements in leading diagonals.\r\n\r\n* **Second Generation** Second example shows an interesting case whereby the image gen- erated by our model is also plausible (if by symmetry it is considered that first and third the semicircular ring is solid and hence fourth and sixth should be solid) while the actual answer is of course plausible according to the reasoning that the (solid vs hollow) flipped in the first two cases then stayed the same for the next two timesteps. Even more interesting is its analysis of the spatial dynamics of the ball and the semicircular ring which it almost correctly captured.\r\n\r\n* **Third Generation** Another very interesting case is the generation which it gets correct. However, in this case the answer figure is exactly similar to the second figure in the se- quence. Therefore, it is not illustrative of the ability of the model to generate the sequence of the pattern.\r\n\r\n---\r\n\r\nApplication of the model to Moving-MNIST\r\n-----\r\n\r\n<img src=\"https://arnabgho.github.io/Contextual-RNN-GAN/images/RNN-GAN%20model-2.png\" width=\"900px\" height=\"440px\"/>",
  "note": "Don't delete this file! It's used internally to help with page regeneration."
}