Supplementary Materials

In addition to the annotated bibliography which I generated for the 2010 course, I’ve created a supplement that includes my notes and references for the “controversial hypotheses” paper. In the supplement, you’ll find a lot of relevant papers with many of them directly linked to PDF versions so you can quickly check out the paper and decide whether it is relevant to your interests. I compile this document using a Python script and will update it during the quarter. It is offered in the spirit of one of the central tenets of this class, namely, that building tools and in particular information mining tools are critical ingredients in conducting science today.

Annotated Bibliography

There is still a good deal of mystery shrouding exactly how shape is represented in the latter stages of the ventral pathway and, in particular, debate about the role of the inferotemporal (IT) area of the ventral pathway. Shimon Ullman has interesting computational theories concerning both the image-coding function of neurons in IT and the role of overlap in constructing compositional features from simpler ones. Stu Geman’s paper [8] addresses a fundamental tradeoff between invariance and selectivity is which hinted at the Ullman and Soloviev paper [51]. For interesting insight into the Gestalt Psychology / Psychophysics perspective on how primates perceive spatial and temporal structure check out the papers by Gepshtein and Kubovy [9, 18] or the work by Jitendra Malik on segmentation which draws inspiration from the Gestalt spatial and temporal grouping principles. Papers coming out of Manabu Tanifuji’s lab investigate candidate optimal stimuli for IT neurons as well as hypotheses and evidence for how the associated features might be combined to explain responses to more complex stimuli:

A neural code for three-dimensional object shape in macaque inferotemporal cortex [56] (PDF)
Invariance and Selectivity in the Ventral Visual Pathway [8] (PDF)
Representation of the spatial relationship among object parts by neurons in macaque inferotemporal cortex [55] (PDF)
Visual Features of Intermediate Complexity and their use in Classification [52] (PDF)
Complex objects are represented in macaque inferotemporal cortex by the combination of feature columns [50] (PDF)
The emergence of visual objects in space-time [9] (HTML)
Perceptual grouping in space and in space-time: An exercise in phenomenological psychophysics [18] (PDF)
Computation of pattern invariance in brain-like structures [51] (PDF)

We still do not have an adequate explanation for the extensive feedback connections that originate in extrastriate regions and terminate in V1 nor do we understand completely the role of lateral connections within V1. The following papers explore some of the related issues and posit functional roles for less-well-studied striate circuits:

Surround Suppression of V1 Neurons Mediates Orientation-Based Representation of High-Order Visual Features [48]
Contour and boundary detection improved by surround suppression [11]
The contribution of feedforward, lateral and feedback connections to the classical receptive field center and extra-classical receptive field surround of primate V1 neurons [26]

It has become common in the practice of machine learning involving visual data to either whiten the training data or perform an operation called local contrast normalization. There is some evidence that the latter operation is carried out by a nonlinear transformation called divisive normalization which is also implicated in surround suppression. Here are two classic papers investigating divisive normalization:

Natural image statistics and divisive normalization: Modeling nonlinearity and adaptation in cortical neurons [53]
Local contrast in natural images: normalisation and coding efficiency [3]

Here are a few influential papers that don’t fit easily into a single category, but I separate them out for their impact on the field. The paper by Peter Földiák influenced Wiskott and Sejnowski’s slow feature analysis [54] and played a key role in a number of practical applications including the high-throughput method of Pinto et al [39] which we note elsewhere in this bibliography. The work by Riesenhuber and Poggio gave rise to what is called — at least by the authors — the standard model. A variant of this model was implemented by Thomas Serre and compared with state-of-the-art computer vision algorithms:

Learning invariance from transformation sequences [6] (PDF)
Hierarchical Bayesian Inference in the Visual Cortex [24]
Hierarchical models of object recognition in cortex [42] (PDF)
Object Recognition with Cortex-like Mechanisms [44] (PDF)

In the second lecture, I mentioned a number of projects and startups interested in building large-scale models of the cortex that operate at the individual cell level. The BlueBrain project at EPFL is directed by Henry Markram and is aimed at modeling all of the cells in single cortical column. The Eugene Izhikevich and Gerald Edelman at the Neuroscience Institute have developed several models that use the leaky-integrate-and-fire model of neural activation and the spike-timing-dependent-plasticity model of learning. Eugene’s NSI web page has a bunch of interesting papers which include the details of his large-scale simulations. A presentation that covers both the leaky-integrate-and-fire and the spike-timing-dependent-plasticity models and summarizes one or two of Eugene Izhikevich’s papers would be interesting and encouraged. There is less recently-published work from Paul Rhodes since he has started Evolved Machines, but you can find some of his early work on thalamo-cortical relays — David Mumford’s papers on the interaction between the cortex and thalamus might provides additional insight into this interesting topic:

Large-scale model of mammalian thalamo-cortical systems [13]
The blue brain project [27]
A model of thalamocortical relay cells [41]
On the computational architecture of the neocortex II: The role of cortico-cortical loops [30]
On the computational architecture of the neocortex I: The role of the thalamo-cortical loop [29]

Mammals and primates in particular are the product of millions of years of natural selection in which functions of the peripheral nervous system were incorporated into the cortex, a process called encephalization. There is still a great deal of processing that happens in the eye and ear and the following paper is a classic on the processing that is carried on in the frog’s eye and a lesson to us all in terms of the degree to which visual representations are dictated by the demands of the organism and not by any abstract principle that can be applied across sensory modalities. Of course, there’s always the possibility that the brain only hit upon the right abstract principle relatively late in the vast time scale of natural selection:

What the Frog’s Eye Tells the Frog’s Brain [25]

Here is a small sample of work on analyzing fMRI data as a means of inferring what an experimental subject is looking at or thinking about in the case of a word that evokes a visual memory. Some of this work has been exaggerated in the popular press — newspaper articles use phases like “mind reading” to dramatize the experimental studies — but is very interesting for what it says about patterns of neural activity that are common across different subjects and the same subjects at different times:

Predicting Human Brain Activity Associated with the Meanings of Nouns [28]
Identifying natural images from human brain activity [16]

Pawan Sinha and his colleagues wrote this first article to inspire computer vision researchers interested face recognition to look at what is known in cognitive neuroscience about human face recognition. The next two articles by Michael Tarr and Heinrich Bülthoff are similar in their intent, and the final article by Marty Banks and P. Bennett is a classic that attempts to explain the developmental mechanism that limits visual acuity in infants. The reason I include this last paper is to pique your curiosity about why neonates might have such a deficit and whether it might confer some advantage, say, in learning a primitive shape vocabulary:

Face Recognition by Humans; Nineteen Results All Computer Vision Researchers Should Know About [45] (PDF)
Visual Object Recognition: Can a Single Mechanism Suffice? [cite{Tarr03}] (PDF)
Image-based object recognition in man, monkey and machine [49]
Optical and photoreceptor immaturities limit the spatial and chromatic vision of human neonates [1]

Here is a sample of papers by Bruno Oshausen, his colleagues — of special note are David Field and David Van Essen, and his students. In addition to his seminal work with Field on sparse coding to model of simple cells in V1 [33], he has also applied sparse coding to representing video [36, 4], and, building on earlier work of Geoff Hinton, he has looked at the possibility of neural circuits that transform retinal patches into a standard scale and pose [32]. If you become interested in sparse coding, you might seriously consider looking into the work of Horace Barlow [2, 2] which inspired the efficient coding hypothesis:

Learning Transformational Invariants from Time-Varying Natural Images [4]
Learning Horizontal Connections in a Sparse Coding Model of Natural Images [7]
Learning invariant and variant components of time-varying natural images [36]
How close are we to understanding V1? [34] (PDF)
The recognition of partially visible natural objects in the presence and absence of their occluders [15]
Learning Sparse, Overcomplete Representations of Time-varying Natural Images [35]
Processing Shape, Motion and Three-dimensional Shape-from-motion in the Human Cortex [31] (HTML)
Sparse coding with an overcomplete basis set: A strategy employed by V1? [33]
A neurobiological model of visual attention and pattern recognition based on dynamic routing of information [32]

Geoff Hinton is generally credited with the development of so-called deep belief networks and his most recent work on restricted Boltzmann machines is cited when deep networks are invoked. However, there have been many hierarchical models developed over the years and they employ diverse learning and inference algorithms. One recurring theme is the idea of using a great deal of unlabeled data to perform unsupervised learning of the layers, one layer at a time. Yann LeCun’s work on convolutional networks has been particularly influential for its parsimonious use of parameters. I’ve also include work that was described by Jim DiCarlo in his presentation at the Clark Center back in March. Note that the paper by Jarrett et al [14] looks more deeply at the architectural components in shallow (one or two layer) convolutional networks with some interesting conclusions in light of the Pinto et al [39] Here is sampling of papers on deep networks including Geoff’s recent Science article and an early paper by LeCun on convolutional networks:

A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation [39] (HTML)
What is the Best Multi-Stage Architecture for Object Recognition? [14] (PDF)
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations [23] (PDF)
Sparse deep belief net model for visual area V2 [22] (PDF)
Why is Real-World Visual Object Recognition Hard? [38]
Learning a non-linear embedding by preserving class neighbourhood structure [43]
Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition [40]
Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting [21]
Handwritten Zip Code Recognition with Multilayer Networks [20]

I mentioned in the introductory lecture, that Eugene Charniak wrote a — now classic but still frequently cited and read — introduction to Bayesian Networks which constitute an important class of probabilistic graphical models. I’ve included both the reference and a link to the document on Kevin Murphy’s web site at the University of British Columbia; Kevin has written a somewhat longer and more comprehensive survey paper that also covers what are now called dynamic Bayesian networks. The original work by Judea Pearl that really introduced the basic idea of Bayesian networks as well as their mathematical and algorithmic foundations is still and excellent introduction to the subject. By far, the most comprehensive, mathematically detailed and up-to-date treatment of the field is the book by Koller and Friedman. I have also included two papers by Erik Sudderth which I believe represent some of the most interesting graphical models of visual representation and that would be interesting to explore in terms of their utility for modeling primate vision. The models by Sudderth are particularly interested for their use of stochastic processes which are capable of adapting the complexity of the graphical models to accommodate the data. If you want to get a deeper understanding of energy-based graphical models which are discussed in the work of Lee et al [22, 23], take a look at the LeCun et al tutorial [19]. You’ll also find a couple of recent papers on inferring scene layout which address some basic forms of visual inference that are critical in recognizing objects and establishing a context from which to draw further inference:

Probabilistic Graphical Models: Principles and Techniques [17]
Decomposing a Scene into Geometric and Semantically Consistent Regions [10]
Describing Visual Scenes Using Transformed Objects and Parts [47] (PDF)
Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes [46] (PDF)
Recovering Surface Layout from an Image [12] (PDF)
A Tutorial on Energy-Based Learning [19] (PDF)
A Brief Introduction to Graphical Models and Bayesian Networks (PDF)
Bayesian Networks without Tears [5] (PDF)
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference [37]

References

[1]	M.S. Banks and P.J. Bennett. Optical and photoreceptor immaturities limit the spatial and chromatic vision of human neonates. Journal of the Optical Society of America A, 5:2059–2079, 1988.
[2]	Horace B. Barlow. Unsupervised learning. Neural Computation, 1:295–311, 1989.
[3]	Nuala Brady and David Field. Local contrast in natural images: Normalisation and coding efficiency. Perception, 29(9):1041–1055, 2000.
[4]	Charles F. Cadieu and Bruno A. Olshausen. Learning transformational invariants from time-varying natural images. In Dale Schuurmans and Yoshua Bengio, editors, Advances in Neural Information Processing Systems 21. MIT Press, Cambridge, MA, 2008.
[5]	Eugene Charniak. Bayesian networks without tears. AI Magazine, Winter:51–63, 1991.
[6]	P. Földiák. Learning invariance from transformation sequences. Neural Computation, 3:194–200, 1991.
[7]	Pierre Garrigues and Bruno Olshausen. Learning horizontal connections in a sparse coding model of natural images. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 505–512. MIT Press, Cambridge, MA, 2008.
[8]	Stuart Geman. Invariance and selectivity in the ventral visual pathway. Journal of Physiology — Paris, 100(4):212–224, 2006.
[9]	Sergei Gepshtein and Michael Kubovy. The emergence of visual objects in space-time. Proceedings of the National Academy of Science, 97(14):8186–8191, 2001.
[10]	S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. In Proceedings of the International Conference on Computer Vision (ICCV 2009), 2009.
[11]	Cosmin Grigorescu, Nicolai Petkov, and Michel A. Westenberg. Contour and boundary detection improved by surround suppression of texture edges. Image and Vision Computing, 22:609–622, 2004.
[12]	Derek Hoiem, Alexei Efros, and Martial Hebert. Recovering surface layout from an image. International Journal of Computer Vision, 75(1):151–172, 2007.
[13]	Eugene M. Izhikevich and Gerald M. Edelman. Large-scale model of mammalian thalamo-cortical systems. Proceedings of the National Academy of Science, 105(9):3593–3598, 2008.
[14]	Kevin Jarrett, Koray Kavukcuoglu, Marc’Aurelio Ranzato, and Yann LeCun. What is the best multi-stage architecture for object recognition? In Proceedings of the International Conference on Computer Vision. IEEE Computer Society, 2009.
[15]	Jeffrey S. Johnson and Bruno A. Olshausen. The recognition of partially visible natural objects in the presence and absence of their occluders. Vision Research, 45:3262–3276, 2005.
[16]	K.N. Kay, T. Naselaris, R.J. Prenger, and J.L. Gallant. Identifying natural images from human brain activity. Nature, 452:352–355, 2008.
[17]	Daphne Koller and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA, 2009.
[18]	M. Kubovy and S. Gepshtein. Perceptual grouping in space and in space-time: An exercise in phenomenological psychophysics. In M. Behrmann, R. Kimchi, and C. R. Olson, editors, Perceptual Organization in Vision: Behavioral and Neural Perspectives, pages 45–85. Lawrence Erlbaum, Mahwah, NJ, 2003.
[19]	Y. LeCun, S. Chopra, M. Ranzato, and F. J. Huang. A tutorial on energy-based learning. In G. Bakir, T. Hofmann, B. Schölkopf, A. Smola, and B. Taskar, editors, Predicting Structured Outputs. MIT Press, Cambridge, MA, 2006.
[20]	Y. LeCun, O. Matan, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, and H. S. Baird. Handwritten zip code recognition with multilayer networks. In Proceedings of the International Conference on Pattern Recognition, volume II, pages 35–40. IEEE Press, 1990.
[21]	Yann LeCun, Fu-Jie Huang, and Leon Bottou. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of IEEE Computer Vision and Pattern Recognition. IEEE Press, 2004.
[22]	Honglak Lee, Chaitanya Ekanadham, and Andrew Ng. Sparse deep belief net model for visual area V2. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 873–880. MIT Press, Cambridge, MA, 2008.
[23]	Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, pages 609–616, New York, NY, 2009. ACM.
[24]	Tai Sing Lee and David Mumford. Hierarchical Bayesian inference in the visual cortex. Journal of the Optical Society of America, 2(7):1434–1448, 2003.
[25]	J. Y. Lettvin, H. R. Maturana, W. S. McCulloch, and W. H. Pitts. What the frog’s eye tells the frog’s brain. Proceedings of the Institute for Radio Engineers, 47:1940–1951, 1959.
[26]	J. S. Lund, A. Angelucci, and P. C. Bressloff. The contribution of feedforward, lateral and feedback connections to the classical receptive field center and extra-classical receptive field surround of primate V1 neurons. Prog. Brain Ressearch, 154:93–121, 2006.
[27]	Henry Markram. The blue brain project. Nature Review Neuroscience, 7:153–160, 2006.
[28]	T. M. Mitchell, S. V. Shinkareva, A. Carlson, K.M. Chang, V. L. Malave, R. A. Mason, and M. A. Just. Predicting human brain activity associated with the meanings of nouns. Science, 320(5880):1191–1195, 2008.
[29]	David Mumford. On the computational architecture of the neocortex I: The role of the thalamo-cortical loop. Biological Cybernetics, 65:135–145, 1991.
[30]	David Mumford. On the computational architecture of the neocortex II: The role of cortico-cortical loops. Biological Cybernetics, 66:241–251, 1992.
[31]	Scott O. Murray, Bruno A. Olshausen, and David L. Woods. Processing shape, motion and three-dimensional shape-from-motion in the human cortex. Cerebral Cortex, 13(5):508–516, 2003.
[32]	B. A. Olshausen, A. Anderson, and D. C. Van Essen. A neurobiological model of visual attention and pattern recognition based on dynamic routing of information. Journal of Neuroscience, 13(11):4700–4719, 1993.
[33]	B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37(23):3311–3325, 1997.
[34]	B. A. Olshausen and D. J. Field. How close are we to understanding V1? Neural Computation, 17:1665–1699, 2005.
[35]	Bruno Olshausen. Learning sparse, overcomplete representations of time-varying natural images. In Proceedings of the IEEE International Conference on Image Processing, volume 1, pages 41–44. IEEE Computer Society, September 2003.
[36]	Bruno Olshausen and Charles Cadieu. Learning invariant and variant components of time-varying natural images. Journal of Vision, 7(9):964–964, 2007.
[37]	Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, California, 1988.
[38]	Nicolas Pinto, David D Cox, and James J DiCarlo. Why is real-world visual object recognition hard? PLoS Computational Biology, 4(1):27, January 2008.
[39]	Nicolas Pinto, David Doukhan, James DiCarlo, and David Cox. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Computational Biology, 5(11):e1000579, November 2009.
[40]	M. Ranzato, F. J. Huang, Y. Boureau, and Y. LeCun. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Proceedings 2007 of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2007.
[41]	Paul Rhodes and Rodolfo Llinás. A model of thalamo-cortical relay cells. Journal of the Physiological Society, 2005.
[42]	M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11):1019–1025, November 1999.
[43]	Ruslan Salakhutdinov and Geoffrey Hinton. Learning a non-linear embedding by preserving class neighbourhood structure. In Proceedings of AI and Statistics, 2007.
[44]	T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio. Object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3):411–426, 2007.
[45]	Pawan Sinha, Benjamin Balas, Yuri Ostrovsky, and Richard Russell. Face recognition by humans; nineteen results all computer vision researchers should know about. Proceedings of the IEEE, 94(11):1948–1962, 2006.
[46]	Erik Sudderth and Michael Jordan. Shared segmentation of natural scenes using dependent Pitman-Yor processes. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1585–1592. MIT Press, 2009.
[47]	Erik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky. Describing visual scenes using transformed objects and parts. International Journal of Computer Vision, 77(1-3):291–330, 2008.
[48]	Hiroki Tanaka and Izumi Ohzawa. Surround suppression of V1 neurons mediates orientation-based representation of high-order visual features. Journal of Neurophysiology, 101:1444–1462, 2009.
[49]	Michael Tarr and Heinrich Bülthoff. Image-based object recognition in man, monkey and machine. Cognition, 67:1–20, 1998.
[50]	K. Tsunoda, Y. Yamane, M. Nishizaki, and M. Tanifuji. Complex objects are represented in macaque inferotemporal cortex by the combination of feature columns. Nature Neuroscience, 4:832–838, 2001.
[51]	S. Ullman and S. Soloviev. Computation of pattern invariance in brain-like structures. Neural Networks, 12:1021–1036, 1999.
[52]	Shimon Ullman, Michel Vidal-Naquet, and Erez Sali. Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5(7):682–687, 2002.
[53]	M. J. Wainwright, O. Schwartz, and E. P. Simoncelli. Natural image statistics and divisive normalization: Modeling nonlinearity and adaptation in cortical neurons. In R. Rao, B. Olshausen, and M. Lewicki, editors, Probabilistic Models of the Brain: Perception and Neural Function, chapter 10, pages 203–222. MIT Press, February 2002.
[54]	Laurenz Wiskott and Terrence Sejnowski. Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4):715–770, 2002.
[55]	Y. Yamane, K. Tsunoda, M. Matsumoto, N. A. Phillips, and M. Tanifuji. Representation of the spatial relationship among object parts by neurons in macaque inferotemporal cortex. Journal Neurophysiology, 96:3147–3156, 2006.
[56]	Yukako Yamane, Eric T. Carlson, Katherine C. Bowman, Zhihong Wang, and Charles E. Connor. A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nature Neuroscience, 11:1352–1360, 2008.