Brain Informatics 2010
Toronto, Canada August 28-30, 2010
Special Panel on Computational Neurolinguistics
Tentative
Program – Subject to change
To be held Sunday Aug. 29, 16:10-18:10
Distributed Time Series Analysis for Studying Brain and
Language in Context
Sarah Kenny
(speaker), Michael Andric and Steve Small, University of Chicago
Collecting the human brain's
functional responses to experimental stimuli generates enormous data sets for
researchers to analyze. Typical experiments expose a group of
participants to anywhere between 20 to 40 minutes of stimuli, generating on the
order of 70,000 time series, each more than 1000 records long, per participant.
Because analysis of these enormous data sets is constrained by the computational
power available to a researcher, constraint is also inherently put on the types
of questions that can be asked of the data and the experimental designs under
which it is collected. In attempting to free researchers from these
constraints, we developed a computational framework that uses relational
databases, Grid-computing, and the Swift workflow system (Zhao et al., 2007) to
manage, analyze, and share neuroimaging data (Small
et al., 2009). Here, we further demonstrate this infrastructure's
computational power by describing its use in a time series analysis that would
not only otherwise be unfeasible, but also, because we are not constrained to
conventional experimental methods, allows pursuing novel questions about the
human brain.
Briefly
described, we are exploring how the human brain functions in natural
communication, under conditions more reflecting typical experience, rather than
the highly constrained, experimentally controlled exposures conventionally used
in neuroimaging studies. We record
participants' blood oxygen level dependent (BOLD) signals as they view
continuous videos of a woman spontaneously talking about various topics, i.e.,
more reflecting typical conversation. We project all acquired time series
from the 3-dimensional volume image to a 2-dimensional surface representation (Fischl et al., 1999; Saad et al.,
2004), resulting in 392,004 time series across the brain of an individual
participant, and enter this data into a relational database. Using high
performance computing (HPC) in a parallelized workflow, we then query each
individual time series to find the position of extrema
("turning points", either peaks or valleys; for similar application,
see Skipper et al., 2009) across that time series. At every surface
vertex, the number of peaks and valleys corresponding to particular features of
interest coded in the stimuli (e.g., specific words, syllables, gestures), as
well as the statistical significance of their distribution, are entered into
another set of relational tables. We then query these tables to view the
resulting spatial layout of their significance as a brain map depicting those
areas showing sensitivity to the stimuli of interest.
Fischl, B., Sereno, M.I., Dale, A.M.
1999. Cortical surface-based analysis. II: inflation,
flattening, and a surface-based coordinate system. Neuroimage,
9, 195-207.
Saad, Z.S.,
Reynolds, R.C., Argall, B.D., Japee, S., and Cox,
R.W. (2004). SUMA: An interface for surface-based intra- and inter-subject
analysis with AFNI. Arlington, VA, IEEE International Symposium on Biomedical
Imaging, pp. 1510-1513.
Skipper,
J.I., and Zevin, J.D. (2009). The Neurobiology of Communication in Natural Settings.
Paper presented at the Neurobiology of Language Conference, Chicago, IL.
Small,
S.L., Wilde, M., Kenny, S., Andric, M., and Hasson,
U. (2009). Database-managed Grid-enabled analysis of neuroimaging data: The CNARI framework. International Journal of Psychophysiology, 73, 82-72.
Zhao,
H., Clifford, F., von, L., Nefedova, R., and Stef-Praun, W. (2007). Swift: Fast, reliable,
loosely coupled parallel computation. IEEE Congress on Services, pp.
199-206.
Capturing Structure in Human Semantic Knowledge via
Semantic Features Learned from Topic Models
Francisco
Pereira, Princeton University
Over the last 15 years, functional magnetic resonance imaging (fMRI) has become the primary tool for identifying the neural correlates of mental activity. Traditionally, this consisted of finding brain regions active during performance of a task. More recently, it has become increasingly clear that there is much more information in the data, though often present diffusely over the entire pattern of brain activation rather than in any specific location. The tools of choice for capturing this information have been machine learning classifiers. Using them, it has been possible to predict which of several stimuli a subject is seeing, a subject's decisions or mistakes, whether a stimulus is recognized or will be remembered and even, controversially, subject deception or pre-conscious purpose. After these successes, interest expanded to discovering how the information present is encoded or testing scientific hypotheses about that encoding. Early on, this took the shape of dissecting an existing classifier, with awareness of its induction bias, in order to explain how it made a successful prediction. Of late, it has meant formulating forward models of early visual processing or how the meaning of a word is represented in the brain and testing them by predicting the resulting fMRI activation. Conversely, it has also been shown to be possible to reconstruct a complex scene a subject is seeing from fMRI activation captured while she is doing it.
The current
research challenge is to extend this type of work to situations where there are
no good forward models or understanding of the computation being done by the
brain. In this talk, I will describe the ongoing effort at our research group
to use topic models on special text corpora to learn semantic features that
capture structure of human semantic knowledge. Given these models, it becomes
possible to decompose the pattern of brain activation when considering the
meaning of a word into constituent patterns associated with the presence of
each semantic feature. I will show that this approach allows us to make
predictions about subject performance in psychological tasks, classification or
prediction of brain activation in response to novel stimuli and even generation
of text about brain images.
A Latent Feature Analysis of the Neural
Representation of Object Knowledge
Kai-min Kevin
Chang, Carnegie Mellon University
Computational neurolinguistics is an emerging research area which
integrates recent advances in computational linguistics and cognitive
neuroscience, with the objective of developing cognitively plausible models of
language and gaining a better understanding of the human language system.
Advances in computational neurolinguistics require
close collaboration between computational linguists and neuroscientists. To
assist researchers who are new to this topic, the Center for Cognitive Brain
Imaging at Carnegie Mellon University is providing the data used in Mitchell et
al. (2008). In an object-contemplation task, participants were presented with
60 line drawings and/or text labels of objects in 12 categories, and were
instructed to think of the same properties of the stimulus object consistently
during multiple presentations of each item. For each concept there are 6
instances of ~20k brain activity features (brain blood oxygenation levels).
In this talk, I
will describe the CMU fMRI data set and a new
analysis that uses a generative probabilistic model to describe how fMRI-measured brain activity is generated from some latent
semantic representation. More specifically, a linear-Gaussian infinite latent
feature model (ILFM) with an Indian Buffet Process (IBP) prior can be used to
derive a binary feature representation of object knowledge from the brain
activity. I show that the semantic features recovered by ILFM are consistent
with the human ratings of the shelter, manipulation, and eating factors that
are recovered by factor analysis. Furthermore, different areas of the brain
encode different psycholinguistics features: the latent features discovered at
different brain areas are consistent with some existing conjectures regarding
the role of different brain areas in processing different psycholinguistics
features.
Development of
Neural Electromagnetic Ontologies (NEMO):
Ontology-based Tools for Representation and Integration of Event-related Brain
Potentials
Gwen
Frishkoff, Georgia State University
We describe a first-generation ontology for
representation and integration of event-related brain potentials (ERPs). The
ontology is designed following OBO “best practices” and is augmented with tools
to perform ontology-based labeling and annotation of ERP data, and a database
that enables semantically based reasoning over these data. Because certain
high-level concepts in the ERP domain are ill-defined, we have developed
methods to support coordinated updates to each of these three components. This
approach consists of “top-down” (knowledge-driven) design and implementation,
followed by “bottom-up” (data-driven) validation and refinement. Our goal is to
build an ERP ontology that is logically valid, empirically sound, robust in
application, and transparent to users. This ontology will be used to support
sharing and meta-analysis of EEG and MEG data collected within our Neural
Electromagnetic Ontologies (NEMO) project.