By the NIPS 2006 Program Committee
With input from Andrew Ng, Peter Dayan, Daphne
Koller, Sebastian Thrun, Bruno Olshausen, Yair Weiss, and Bernhard Schölkopf
In this informal essay, we describe some of the
criteria that will be used to evaluate NIPS submissions. This document should
not be construed as an official NIPS policy statement; but through it, we also
hope to give advice for writing a good NIPS paper.
We’ll take for granted that your paper will be clearly
written, be technically sound and correct, and reference previous work. Thus,
we will not further dwell on the issues of clarity and soundness, despite their
importance. We will instead focus on how one might shape a paper’s content so
as to maximize its chance of being published and influencing others.
A few notes:
A significant fraction of NIPS papers either describe
or study artificial systems. This includes the majority of papers published
with the Bioinformatics; Clustering; Control and reinforcement learning;
Dimensionality reduction and manifolds; Feature selection; Gaussian processes;
Graphical models; Kernels; Learning theory; Machine vision; Margins and
boosting; Monte Carlo methods; Neural networks; Other algorithms;
Semi-supervised learning; Speech and signal processing; Text and language.
Examples of such papers may include: a paper proposing
a new learning algorithm; one that describes a solution to a difficult
application; or one that proves bounds on the error of some learning method.
Papers submitted with the keywords listed above are
expected to make a significant (i) algorithmic, (ii) application, or (iii)
theoretical contribution. NIPS seeks to publish papers that will have a high
impact in the world---both within our research community, and beyond. Whenever
appropriate, papers will therefore be evaluated on the basis of the following
five criteria[1]:
Not all papers are expected to address all of these
criteria, and a paper that is extremely strong on only one of them may well be
acceptable for publication. For example, a learning theory paper that studies
an existing algorithm may be reasonably expected to address only the last of
these criteria.
However, in some cases where the research can be
reasonably expected to address more than one of the criteria above, a paper may
have a better chance of acceptance if it does indeed address them. For example,
a paper that gives an elegant mathematical derivation of a new algorithm (Criterion
#1) may fare better if it is also demonstrated through rigorous empirical
evaluation to do well (Criterion #4), or demonstrated on a real/non-trivial
application (Criterion #3). This is because such experiments can help build a
significantly stronger case for the algorithm’s actual utility. Similarly, a
paper describing an impressive application of machine learning (Criterion #2 or
#3) may fare better if beyond reporting success, it further elucidates the
structure of the problem or algorithm that made the application work, and
thereby conveys insight (Criterion #5).
For empirical studies, a good result can lie along
many different axes, all of which compare to the best state-of-the-art
algorithm. These axes may include: better accuracy, better ROC performance,
faster, less memory, more generally applicable, easier out-of-the-box usage,
much simpler to code. If an algorithm does not excel along any of these axes, a
reviewer may wonder why it is worth publishing at NIPS.
Although NIPS strongly encourages interdisciplinary
work that spans multiple keywords, we now also describe some evaluation
criteria that are more specialized and may apply only to individual keywords.
Algorithmic papers (e.g., Clustering, Dimensionality
reduction and manifolds, Feature selection, Gaussian processes, Graphical
models, Kernels, Margins and boosting, Monte Carlo methods, Neural networks,
Other algorithms, Semi-supervised learning) Authors of papers that propose new algorithms for well-established,
existing problems are encouraged to provide evidence for the practical
applicability of their methods, such as through rigorous empirical evaluation
of their methods on real data or on real problems. For example, a paper about a
new mathematical trick (or about a beautiful new mathematical derivation) would
be stronger if it is supported by empirical evidence that the resulting
algorithm really helps on a problem. We also encourage submission of papers
that describe algorithmic or implementation principles that may have a large impact
on applications or on practitioners of machine learning.
Control and Reinforcement learning: Authors of papers that propose new algorithms for
existing problems (such as solving MDPs) are encouraged to provide rigorous
empirical evaluation of their methods on real problems, and show its relevance
to real/difficult decision making or control tasks. For example, rather than
demonstrating your idea only on a grid-world or on mountain-car, also show if
it works on a more challenging task. The other comments for AA papers also
apply here.
Learning Theory, which may have appropriate
algorithmic keywords also: Any
Learning theory paper should have a theorem about learning and a proof. Leaving
out the proof is not an option in a double blind setting! Several styles of
papers exist:
Technically difficulty or novelty is not the goal.
Impact on the process and practice of learning is the goal. Experimental
results are nice but not necessary in general.
Applications, such as text, bioinformatics, or other
applications : Application papers
should describe your work on a “real” as opposed to “hypothetical” application;
specifically, it should describe work that has direct relevance to, and
addresses the full complexity of, solving a non-trivial problem. Authors are
also encouraged to convey insight about the problem, algorithms, and/or
application. For example, one might describe the more general lessons learned,
or elucidate (through an ablative analysis/lesion analysis, which removes one
component of an algorithm at a time) which were the key components of the
system needed to get the application to work. An NIPS application paper should
be comparable in quality to paper in the corresponding application domain
conference: for example, a text paper should be acceptable to SIGIR, EMNLP, or
other appropriate conference
Application papers should not
only present concrete application results, but also contain at least one of the
below elements:
Machine vision:
Authors of vision papers are encouraged to provide rigorous empirical
evaluation of their methods to demonstrate value added not just for a few
selected images, but more broadly. Ideally, a NIPS paper proposes a machine
learning algorithm or system that can be used by a computer vision researcher
to help solve a difficult computer vision problem. NIPS papers in this area
should be comparable in quality to those accepted in the major computer vision
conferences, such as ICCV or CVPR.
Speech and signal processing: Similar to computer vision, a NIPS paper should solve
a difficult audio, speech, or other signal processing problem via machine
learning; and be useful for a signal processing practitioner. The quality bar
for NIPS is higher than those of a typical signal processing conference (such
as ICASSP or ICIP): the NIPS papers are 30% longer, the reviews are more
detailed, and the acceptance rate is about half. Therefore, a NIPS signal
processing paper should be more significant than the average ICASSP paper.
Hardware technology: In addition to describing a successful implementation, a NIPS
hardware paper should also convey insight into the underlying principles behind
your implementation that serve as useful lessons learned to non-hardware
researchers, such as computer scientists or neurobiologists.
A significant fraction of NIPS papers, comprising
mainly ones from the Neuroscience, Biological vision, or Cognitive Science
keywords, either describe or study natural systems. Examples include a paper
proposing a new model of human decision making, a paper describing evidence for
a neural code, and so on.
Papers submitted to the keywords listed above should
make significant contributions to the computational, psychological and/or
neural understanding of an important biological and/or behavioral system or
function. Such papers will be evaluated on the basis of some or all of the
following seven criteria:
Neuroscience:
A good neuroscience model should make testable predictions - and they should be
interesting, too. An interesting prediction is something you may not have
thought about otherwise: a prediction that is non-obvious, or does not derive
directly from the limitation assumptions made in the model. A neuroscience
model should give you a new way of looking at the system which inspires new
experiments. NIPS neuroscience papers should either be neuroscientifically or
computationally well-grounded, ideally both. The paper should make a serious
attempt at connecting to state-of-the-art neurobiology, and/or provide a
rigorous mathematical treament or comparison to state-of-the-art engineering
method.
Brain imaging and brain computer interfaces: Papers with this keyword tend to fall between the
natural and artificial systems. A good brain imaging paper may lead to
neurobiological insight, or it may propose an experimental method for obtaining
new kinds of measurements. A good brain computer interface would either be
useful as a computer interface, or also lead to neurobiological insight.
[1] These criteria were selected with the goals of encouraging good research, and of maximizing NIPS’ long term impact. Note that this is not as simple as accepting papers with high expected impact. For example, a paper that makes ambitious but poorly substantiated claims may have high expected impact---largely on the off-chance that the claims turn out to be correct---but is still likely to be rejected. Some of these evaluation criteria exactly address this issue of providing evidence for the utility of one’s work.