By the NIPS 2006 Program Committee
With input from Andrew Ng, Peter Dayan, Daphne Koller, Sebastian Thrun, Bruno Olshausen, Yair Weiss, and Bernhard Schölkopf
In this informal essay, we describe some of the criteria that will be used to evaluate NIPS submissions. This document should not be construed as an official NIPS policy statement; but through it, we also hope to give advice for writing a good NIPS paper.
We’ll take for granted that your paper will be clearly written, be technically sound and correct, and reference previous work. Thus, we will not further dwell on the issues of clarity and soundness, despite their importance. We will instead focus on how one might shape a paper’s content so as to maximize its chance of being published and influencing others.
A few notes:
A significant fraction of NIPS papers either describe or study artificial systems. This includes the majority of papers published with the Bioinformatics; Clustering; Control and reinforcement learning; Dimensionality reduction and manifolds; Feature selection; Gaussian processes; Graphical models; Kernels; Learning theory; Machine vision; Margins and boosting; Monte Carlo methods; Neural networks; Other algorithms; Semi-supervised learning; Speech and signal processing; Text and language.
Examples of such papers may include: a paper proposing a new learning algorithm; one that describes a solution to a difficult application; or one that proves bounds on the error of some learning method.
Papers submitted with the keywords listed above are expected to make a significant (i) algorithmic, (ii) application, or (iii) theoretical contribution. NIPS seeks to publish papers that will have a high impact in the world---both within our research community, and beyond. Whenever appropriate, papers will therefore be evaluated on the basis of the following five criteria:
Not all papers are expected to address all of these criteria, and a paper that is extremely strong on only one of them may well be acceptable for publication. For example, a learning theory paper that studies an existing algorithm may be reasonably expected to address only the last of these criteria.
However, in some cases where the research can be reasonably expected to address more than one of the criteria above, a paper may have a better chance of acceptance if it does indeed address them. For example, a paper that gives an elegant mathematical derivation of a new algorithm (Criterion #1) may fare better if it is also demonstrated through rigorous empirical evaluation to do well (Criterion #4), or demonstrated on a real/non-trivial application (Criterion #3). This is because such experiments can help build a significantly stronger case for the algorithm’s actual utility. Similarly, a paper describing an impressive application of machine learning (Criterion #2 or #3) may fare better if beyond reporting success, it further elucidates the structure of the problem or algorithm that made the application work, and thereby conveys insight (Criterion #5).
For empirical studies, a good result can lie along many different axes, all of which compare to the best state-of-the-art algorithm. These axes may include: better accuracy, better ROC performance, faster, less memory, more generally applicable, easier out-of-the-box usage, much simpler to code. If an algorithm does not excel along any of these axes, a reviewer may wonder why it is worth publishing at NIPS.
Although NIPS strongly encourages interdisciplinary work that spans multiple keywords, we now also describe some evaluation criteria that are more specialized and may apply only to individual keywords.
Algorithmic papers (e.g., Clustering, Dimensionality reduction and manifolds, Feature selection, Gaussian processes, Graphical models, Kernels, Margins and boosting, Monte Carlo methods, Neural networks, Other algorithms, Semi-supervised learning) Authors of papers that propose new algorithms for well-established, existing problems are encouraged to provide evidence for the practical applicability of their methods, such as through rigorous empirical evaluation of their methods on real data or on real problems. For example, a paper about a new mathematical trick (or about a beautiful new mathematical derivation) would be stronger if it is supported by empirical evidence that the resulting algorithm really helps on a problem. We also encourage submission of papers that describe algorithmic or implementation principles that may have a large impact on applications or on practitioners of machine learning.
Control and Reinforcement learning: Authors of papers that propose new algorithms for existing problems (such as solving MDPs) are encouraged to provide rigorous empirical evaluation of their methods on real problems, and show its relevance to real/difficult decision making or control tasks. For example, rather than demonstrating your idea only on a grid-world or on mountain-car, also show if it works on a more challenging task. The other comments for AA papers also apply here.
Learning Theory, which may have appropriate algorithmic keywords also: Any Learning theory paper should have a theorem about learning and a proof. Leaving out the proof is not an option in a double blind setting! Several styles of papers exist:
Technically difficulty or novelty is not the goal. Impact on the process and practice of learning is the goal. Experimental results are nice but not necessary in general.
Applications, such as text, bioinformatics, or other applications : Application papers should describe your work on a “real” as opposed to “hypothetical” application; specifically, it should describe work that has direct relevance to, and addresses the full complexity of, solving a non-trivial problem. Authors are also encouraged to convey insight about the problem, algorithms, and/or application. For example, one might describe the more general lessons learned, or elucidate (through an ablative analysis/lesion analysis, which removes one component of an algorithm at a time) which were the key components of the system needed to get the application to work. An NIPS application paper should be comparable in quality to paper in the corresponding application domain conference: for example, a text paper should be acceptable to SIGIR, EMNLP, or other appropriate conference
Application papers should not only present concrete application results, but also contain at least one of the below elements:
Machine vision: Authors of vision papers are encouraged to provide rigorous empirical evaluation of their methods to demonstrate value added not just for a few selected images, but more broadly. Ideally, a NIPS paper proposes a machine learning algorithm or system that can be used by a computer vision researcher to help solve a difficult computer vision problem. NIPS papers in this area should be comparable in quality to those accepted in the major computer vision conferences, such as ICCV or CVPR.
Speech and signal processing: Similar to computer vision, a NIPS paper should solve a difficult audio, speech, or other signal processing problem via machine learning; and be useful for a signal processing practitioner. The quality bar for NIPS is higher than those of a typical signal processing conference (such as ICASSP or ICIP): the NIPS papers are 30% longer, the reviews are more detailed, and the acceptance rate is about half. Therefore, a NIPS signal processing paper should be more significant than the average ICASSP paper.
Hardware technology: In addition to describing a successful implementation, a NIPS hardware paper should also convey insight into the underlying principles behind your implementation that serve as useful lessons learned to non-hardware researchers, such as computer scientists or neurobiologists.
A significant fraction of NIPS papers, comprising mainly ones from the Neuroscience, Biological vision, or Cognitive Science keywords, either describe or study natural systems. Examples include a paper proposing a new model of human decision making, a paper describing evidence for a neural code, and so on.
Papers submitted to the keywords listed above should make significant contributions to the computational, psychological and/or neural understanding of an important biological and/or behavioral system or function. Such papers will be evaluated on the basis of some or all of the following seven criteria:
Neuroscience: A good neuroscience model should make testable predictions - and they should be interesting, too. An interesting prediction is something you may not have thought about otherwise: a prediction that is non-obvious, or does not derive directly from the limitation assumptions made in the model. A neuroscience model should give you a new way of looking at the system which inspires new experiments. NIPS neuroscience papers should either be neuroscientifically or computationally well-grounded, ideally both. The paper should make a serious attempt at connecting to state-of-the-art neurobiology, and/or provide a rigorous mathematical treament or comparison to state-of-the-art engineering method.
Brain imaging and brain computer interfaces: Papers with this keyword tend to fall between the natural and artificial systems. A good brain imaging paper may lead to neurobiological insight, or it may propose an experimental method for obtaining new kinds of measurements. A good brain computer interface would either be useful as a computer interface, or also lead to neurobiological insight.
 These criteria were selected with the goals of encouraging good research, and of maximizing NIPS’ long term impact. Note that this is not as simple as accepting papers with high expected impact. For example, a paper that makes ambitious but poorly substantiated claims may have high expected impact---largely on the off-chance that the claims turn out to be correct---but is still likely to be rejected. Some of these evaluation criteria exactly address this issue of providing evidence for the utility of one’s work.