Brain Informatics 2010

Toronto, Canada  August 28-30, 2010

 

Special Panel on Computational Neurolinguistics

Tentative Program – Subject to change

To be held Sunday Aug. 29, 16:10-18:10

 

 

 

 

Distributed Time Series Analysis for Studying Brain and Language in Context

Sarah Kenny (speaker), Michael Andric and Steve Small, University of Chicago

 

 

Collecting the human brain's functional responses to experimental stimuli generates enormous data sets for researchers to analyze.  Typical experiments expose a group of participants to anywhere between 20 to 40 minutes of stimuli, generating on the order of 70,000 time series, each more than 1000 records long, per participant.  Because analysis of these enormous data sets is constrained by the computational power available to a researcher, constraint is also inherently put on the types of questions that can be asked of the data and the experimental designs under which it is collected. In attempting to free researchers from these constraints, we developed a computational framework that uses relational databases, Grid-computing, and the Swift workflow system (Zhao et al., 2007) to manage, analyze, and share neuroimaging data (Small et al., 2009).  Here, we further demonstrate this infrastructure's computational power by describing its use in a time series analysis that would not only otherwise be unfeasible, but also, because we are not constrained to conventional experimental methods, allows pursuing novel questions about the human brain.

    Briefly described, we are exploring how the human brain functions in natural communication, under conditions more reflecting typical experience, rather than the highly constrained, experimentally controlled exposures conventionally used in neuroimaging studies.  We record participants' blood oxygen level dependent (BOLD) signals as they view continuous videos of a woman spontaneously talking about various topics, i.e., more reflecting typical conversation.  We project all acquired time series from the 3-dimensional volume image to a 2-dimensional surface representation (Fischl et al., 1999; Saad et al., 2004), resulting in 392,004 time series across the brain of an individual participant, and enter this data into a relational database.  Using high performance computing (HPC) in a parallelized workflow, we then query each individual time series to find the position of extrema ("turning points", either peaks or valleys; for similar application, see Skipper et al., 2009) across that time series.  At every surface vertex, the number of peaks and valleys corresponding to particular features of interest coded in the stimuli (e.g., specific words, syllables, gestures), as well as the statistical significance of their distribution, are entered into another set of relational tables.  We then query these tables to view the resulting spatial layout of their significance as a brain map depicting those areas showing sensitivity to the stimuli of interest.



Fischl, B., Sereno, M.I., Dale, A.M. 1999. Cortical surface-based analysis. II: inflation, flattening, and a surface-based coordinate system. Neuroimage, 9, 195-207.

Saad, Z.S., Reynolds, R.C., Argall, B.D., Japee, S., and Cox, R.W. (2004). SUMA: An interface for surface-based intra- and inter-subject analysis with AFNI. Arlington, VA, IEEE International Symposium on Biomedical Imaging, pp. 1510-1513.

Skipper, J.I., and Zevin, J.D. (2009). The Neurobiology of Communication in Natural Settings.  Paper presented at the Neurobiology of Language Conference, Chicago, IL.

Small, S.L., Wilde, M., Kenny, S., Andric, M., and Hasson, U. (2009).  Database-managed Grid-enabled analysis of neuroimaging data: The CNARI framework.  International Journal of Psychophysiology, 73, 82-72.

Zhao, H., Clifford, F., von, L., Nefedova, R., and Stef-Praun, W. (2007).  Swift: Fast, reliable, loosely coupled parallel computation.  IEEE Congress on Services, pp. 199-206.

 

 

Capturing Structure in Human Semantic Knowledge via Semantic Features Learned from Topic Models

Francisco Pereira, Princeton University

 

Over the last 15 years, functional magnetic resonance imaging (fMRI) has become the primary tool for identifying the neural correlates of mental activity. Traditionally, this consisted of finding brain regions active during performance of a task. More recently, it has become increasingly clear that there is much more information in the data, though often present diffusely over the entire pattern of brain activation rather than in any specific location. The tools of choice for capturing this information have been machine learning classifiers.  Using them, it has been possible to predict which of several stimuli a subject is seeing, a subject's decisions or mistakes, whether a stimulus is recognized or will be remembered and even, controversially, subject deception or pre-conscious purpose. After these successes, interest expanded to discovering how the information present is encoded or testing scientific hypotheses about that encoding. Early on, this took the shape of dissecting an existing classifier, with awareness of its induction bias, in order to explain how it made a successful prediction. Of late, it has meant formulating forward models of early visual processing or how the meaning of a word is represented in the brain and testing them by predicting the resulting fMRI activation.  Conversely, it has also been shown to be possible to reconstruct a complex scene a subject is seeing from fMRI activation captured while she is doing it.

 

The current research challenge is to extend this type of work to situations where there are no good forward models or understanding of the computation being done by the brain. In this talk, I will describe the ongoing effort at our research group to use topic models on special text corpora to learn semantic features that capture structure of human semantic knowledge. Given these models, it becomes possible to decompose the pattern of brain activation when considering the meaning of a word into constituent patterns associated with the presence of each semantic feature. I will show that this approach allows us to make predictions about subject performance in psychological tasks, classification or prediction of brain activation in response to novel stimuli and even generation of text about brain images.

 

 

A Latent Feature Analysis of the Neural Representation of Object Knowledge

Kai-min Kevin Chang, Carnegie Mellon University

 

Computational neurolinguistics is an emerging research area which integrates recent advances in computational linguistics and cognitive neuroscience, with the objective of developing cognitively plausible models of language and gaining a better understanding of the human language system. Advances in computational neurolinguistics require close collaboration between computational linguists and neuroscientists. To assist researchers who are new to this topic, the Center for Cognitive Brain Imaging at Carnegie Mellon University is providing the data used in Mitchell et al. (2008). In an object-contemplation task, participants were presented with 60 line drawings and/or text labels of objects in 12 categories, and were instructed to think of the same properties of the stimulus object consistently during multiple presentations of each item. For each concept there are 6 instances of ~20k brain activity features (brain blood oxygenation levels).

 

In this talk, I will describe the CMU fMRI data set and a new analysis that uses a generative probabilistic model to describe how fMRI-measured brain activity is generated from some latent semantic representation. More specifically, a linear-Gaussian infinite latent feature model (ILFM) with an Indian Buffet Process (IBP) prior can be used to derive a binary feature representation of object knowledge from the brain activity. I show that the semantic features recovered by ILFM are consistent with the human ratings of the shelter, manipulation, and eating factors that are recovered by factor analysis. Furthermore, different areas of the brain encode different psycholinguistics features: the latent features discovered at different brain areas are consistent with some existing conjectures regarding the role of different brain areas in processing different psycholinguistics features.

 

 

Development of Neural Electromagnetic Ontologies (NEMO): Ontology-based Tools for Representation and Integration of Event-related Brain Potentials

Gwen Frishkoff, Georgia State University

 

 

We describe a first-generation ontology for representation and integration of event-related brain potentials (ERPs). The ontology is designed following OBO “best practices” and is augmented with tools to perform ontology-based labeling and annotation of ERP data, and a database that enables semantically based reasoning over these data. Because certain high-level concepts in the ERP domain are ill-defined, we have developed methods to support coordinated updates to each of these three components. This approach consists of “top-down” (knowledge-driven) design and implementation, followed by “bottom-up” (data-driven) validation and refinement. Our goal is to build an ERP ontology that is logically valid, empirically sound, robust in application, and transparent to users. This ontology will be used to support sharing and meta-analysis of EEG and MEG data collected within our Neural Electromagnetic Ontologies (NEMO) project.