Y. Karklin, C. Ekanadham, and E. P. Simoncelli,
Hierarchical spike coding of sound,
Adv in Neural Information Processing Systems (NIPS), 2012.
[
pdf]
[
abstract]
[
bibtex]
Natural sounds exhibit complex statistical regularities at multiple scales. Acoustic events underlying speech, for example, are characterized by precise temporal and frequency relationships, but they can also vary substantially according to the pitch, duration, and other high-level properties of speech production. Learning this structure from data while capturing the inherent variability is an important first step in building auditory processing systems, as well as understanding the mechanisms of auditory perception. Here we develop Hierarchical Spike Coding, a two-layer probabilistic generative model for complex acoustic structure. The first layer consists of a sparse spiking representation that encodes the sound using kernels positioned precisely in time and frequency. Patterns in the positions of first layer spikes are learned from the data: on a coarse scale, statistical regularities are encoded by a second-layer spiking representation, while fine-scale structure is captured by recurrent interactions within the first layer. When fit to speech data, the second layer acoustic features include harmonic stacks, sweeps, frequency modulations, and precise temporal onsets, which can be composed to represent complex acoustic events. Unlike spectrogram-based methods, the model gives a probability distribution over sound pressure waveforms. This allows us to use the second-layer representation to synthesize sounds directly, and to perform model-based denoising, on which we demonstrate a significant improvement over standard methods.
@InProceedings{Karklin-Ekanadham-Simoncelli-NIPS2012,
title = "Hierarchical spike coding of sound",
booktitle = "Advances in Neural Information Processing Systems (NIPS*12)",
volume = "25",
publisher = {{MIT} Press},
editor = {P. Bartlett and F.C.N. Pereira and C.J.C. Burges and L. Bottou and K.Q. Weinberger},
pages = {3041-3049},
year = {2012},
}
Y. Karklin and E. P. Simoncelli,
Efficient coding of natural images and movies with populations of noisy nonlinear neurons,
Computational and Systems Neuronscience (CoSyNe), 2012.
[
abstract]
[
bibtex]
Efficient coding provides a powerful principle for explaining early sensory processing. Most attempts to test this principle have been limited to linear, noiseless models, and when applied to natural images, have yielded localized oriented filters (e.g., Bell and Sejnowski, 1995). Although this is generally consistent with cortical representations, it fails to account for basic properties of early vision, such as the receptive field organization, temporal dynamics, and nonlinear behaviors in retinal ganglion cells (RGCs). Here we show that an efficient coding model that incorporates ingredients critical to biological computation -- input and output noise, nonlinear response functions, and a metabolic cost on the firing rate -- can predict several basic properties of retinal processing. Specifically, we develop numerical methods for simultaneously optimizing linear filters and response nonlinearities of a population of model neurons so as to maximize information transmission in the presence of noise and metabolic costs. We place no restrictions on the form of the linear filters, and assume only that the nonlinearities are monotonically increasing.
In the case of vanishing noise, our method reduces to a generalized version of independent component analysis; training on natural image patches produces localized oriented filters and smooth nonlinearities. When the model includes biologically realistic levels of noise, the predicted filters are center-surround and the nonlinearities are rectifying, consistent with properties of RGCs. The model yields two populations of neurons, with On- and Off-center responses, which independently tile the visual space. As observed in the primate retina, Off-center neurons are more numerous and have filters with smaller spatial extent. Applied to natural movies, the model yields filters that are approximately space-time separable, with a center-surround spatial profile, a biphasic temporal profile, and a surround response that is slightly delayed relative to the center, consistent with retinal processing.
@InProceedings{Karklin-Simoncelli-COSYNE2012,
TITLE= "Efficient coding of natural images and movies with populations of noisy nonlinear neurons",
AUTHOR= "Y Karklin and E P Simoncelli",
BOOKTITLE= "Computational and Systems Neuroscience (CoSyNe)",
ADDRESS= "Salt Lake City, Utah",
MONTH= "February",
YEAR= {2012},
}
Y. Karklin and E. P. Simoncelli,
Efficient coding of natural images with a population of noisy linear-nonlinear neurons,
Adv in Neural Information Processing Systems (NIPS), 2011.
[
pdf]
[
abstract]
[
bibtex]
Efficient coding provides a powerful principle for explaining early sensory coding. Most attempts to test this principle have been limited to linear, noiseless models, and when applied to natural images, have yielded oriented filters consistent with responses in primary visual cortex. Here we show that an efficient coding model that incorporates biologically realistic ingredients - input and output noise, nonlinear response functions, and a metabolic cost on the firing rate - predicts receptive fields and response nonlinearities similar to those observed in the retina. Specifically, we develop numerical methods for simultaneously learning the linear filters and response nonlinearities of a population of model neurons, so as to maximize information transmission subject to metabolic costs. When applied to an ensemble of natural images, the method yields filters that are center-surround and nonlinearities that are rectifying. The filters are organized into two populations, with On- and Off-centers, which independently tile the visual space. As observed in the primate retina, the Off-center neurons are more numerous and have filters with smaller spatial extent. In the absence of noise, our method reduces to a generalized version of independent components analysis, with an adapted nonlinear "contrast"function; in this case, the optimal filters are localized and oriented.
@InProceedings{Karklin-Simoncelli-NIPS11,
author = "Karklin, Yan and Simoncelli, Eero P.",
title = "Efficient coding of natural images with a population of noisy linear-nonlinear neurons",
booktitle = "Advances in Neural Information Processing Systems (NIPS*11)",
volume = "24",
editor = "J. Shawe-Taylor and R.S. Zemel and P. Bartlett and F. Pereira and K.Q. Weinberger",
publisher = {{MIT} Press},
year = {2011},
}
Y. Karklin and E. P. Simoncelli,
Optimal information transfer in a noisy nonlinear neuron,
Computational and Systems Neuroscience (CoSyNe),
2011.
[
poster pdf]
Y. Karklin and M. S. Lewicki,
Is early vision optimized for extracting higher-order dependencies?,
Adv in Neural Information Processing Systems (NIPS),
2006.
[
pdf]
[
abstract]
[
bibtex]
Linear implementations of the efficient coding hypothesis, such as independent component analysis (ICA) and sparse coding models, have provided functional explanations for properties of simple cells in V1. These models, however, ignore the non-linear behavior of neurons and fail to match individual and population properties of neural receptive fields in subtle but important ways. Hierarchical models, including Gaussian Scale Mixtures and other generative statistical models, can capture higher-order regularities in natural images and explain non-linear aspects of neural processing such as normalization and context effects. Previously, it had been assumed that the lower level representation is independent of the hierarchy, and had been fixed when training these models. Here we examine the optimal lower-level representations derived in the context of a hierarchical model and find that the resulting representations are strikingly different from those based on linear models. Unlike the the basis functions and filters learned by ICA or sparse coding, these functions individually more closely resemble simple cell receptive fields and collectively span a broad range of spatial scales. Our work unifies several related approaches and observations about natural image structure and suggests that hierarchical models might yield better representations of image structure throughout the hierarchy.
@InProceedings{Karklin-Lewicki-NIPS05,
author = "Karklin, Yan and Lewicki, Michael S.",
title = "Is Early Vision Optimized for Extracting Higher-order Dependencies?",
booktitle = "Advances in Neural Information Processing Systems (NIPS*05)",
volume = "18",
editor = "Y. Weiss and B. Sch\"{o}lkopf and J. Platt",
publisher = {{MIT} Press},
year = {2005},
}
Y. Karklin and M. S. Lewicki,
A model for learning variance components of natural images,
Adv in Neural Information Processing Systems (NIPS),
2003.
[
pdf]
[
abstract]
[
bibtex]
We present a hierarchical Bayesian model for learning efficient codes of higher-order structure in natural images. The model, a non-linear generalization of independent component analysis, replaces the standard assumption of independence for the joint distribution of coefficients with a distribution that is adapted to the variance structure of the coefficients of an efficient image basis. This offers a novel description of higherorder image structure and provides a way to learn coarse-coded, sparsedistributed representations of abstract image properties such as object location, scale, and texture.
@InProceedings{Karklin-Lewicki-NIPS03,
author = "Karklin, Yan and Lewicki, Michael S.",
title = "A Model for Learning Variance Components of Natural Images",
booktitle = "Advances in Neural Information Processing Systems 15",
publisher = {{MIT} Press},
pages = "1367-1374",
year = {2003},
}