Marc joined Microsoft Research Silicon Valley in October 2001. He is currently working on heuristics for detecting "spam" web pages, and on link-based ranking algorithms for web search results. Past projects at Microsoft include Boxwood, a distributed B-Tree system, and PageTurner, a large-scale study of the evolution of web pages.
Before joining MSR, Marc spent 8 years at DEC's, then Compaq's (and now HP's) Systems Research Center. Projects at SRC included Mercator, a high-performance distributed web crawler, JCAT, a web-based algorithm animation system, and Obliq-3D, a scripting system for 3D animations.
Marc served as program co-chair of WWW 2004 and conference chair of WSDM 2008. He is co-chairing the news section of CACM, and he is an associate editor of ACM TWEB and of JVLC.
Marc received a Ph.D. in Computer Science from UIUC for his work on Cube, a 3D visual programming language.
- Marc Najork, Web Crawler Architecture, in Encyclopedia of Database Systems, Springer Verlag, September 2009
- Hugo Zaragoza and Marc Najork, Web Search Relevance Ranking, in Encyclopedia of Database Systems, Springer Verlag, September 2009
- Marc Najork, Web Spam Detection, in Encyclopedia of Database Systems, Springer Verlag, September 2009
- Marc Najork, The Scalable Hyperlink Store, in 20th ACM Conference on Hypertext and Hypermedia, Association for Computing Machinery, Inc., June 2009
- Marc Najork, Sreenivas Gollapudi, and Rina Panigrahy, Less is More: Sampling the Neighborhood Graph Makes SALSA Better and Faster, in 2nd ACM International Conference on Web Search and Data Mining (WSDM), Association for Computing Machinery, Inc., February 2009
- Marc Najork and Nick Craswell, Efficient and Effective Link Analysis with Precomputed SALSA Maps, in 17th ACM Conference on Information and Knowledge Management (CIKM), Association for Computing Machinery, Inc., October 2008
- Frank McSherry and Marc Najork, Computing Information Retrieval Performance Measures Efficiently in the Presence of Tied Scores, in Proceedings of the 30th European Conference on Information Retrieval (ECIR), Springer-Verlag, April 2008
- Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy, Using Bloom Filters to Speed Up HITS-like Ranking Algorithms, in 5th Workshop on Algorithms and Models for the Web Graph (WAW), Springer-Verlag, December 2007
- Marc Najork, Comparing the Effectiveness of HITS and SALSA, in 16th ACM Conference on Information and Knowledge Management (CIKM), Association for Computing Machinery, Inc., November 2007
- Marc Najork, Hugo Zaragoza, and Michael Taylor, HITS on the Web: How does it Compare?, in 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Association for Computing Machinery, Inc., Amsterdam, Netherlands, July 2007
- Alexandros Ntoulas, Marc Najork, Mark Manasse, and Dennis Fetterly, Detecting Spam Web Pages Through Content Analysis, in 15th International World Wide Web Conference (WWW), Association for Computing Machinery, Inc., Edinburgh, Scotland, May 2006
- Dennis Fetterly, Mark Manasse, and Marc Najork, Detecting Phrase-Level Duplication on the World Wide Web, in 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Association for Computing Machinery, Inc., Salvador, Brazil, August 2005
- John MacCormick, Nick Murphy, Marc Najork, Chandramohan A. Thekkath, and Lidong Zhou, Boxwood: Abstractions as the Foundation for Storage Infrastructure, in Symposium on Operating System Design and Implementation (OSDI), USENIX, December 2004
- Dennis Fetterly, Mark Manasse, and Marc Najork, On the Evolution of Clusters of Near-Duplicate Web Pages, in Journal of Web Engineering, vol. 2, no. 4, pp. 228-246, Institute of Electrical and Electronics Engineers, Inc., October 2004
- Dennis Fetterly, Mark Manasse, and Marc Najork, Spam, Damn Spam, and Statistics: Using statistical analysis to locate spam web pages, in 7th International Workshop on the Web and Databases (WebDB), Association for Computing Machinery, Inc., June 2004
- Dennis Fetterly, Mark Manasse, Marc Najork, and Janet Wiener, A Large-Scale Study of the Evolution of Web Pages, in Software: Practice & Experience, vol. 34, no. 2, pp. 213-237, Wiley, February 2004
- Dennis Fetterly, Mark Manasse, and Marc Najork, On the Evolution of Clusters of Near-Duplicate Web Pages, in Proceedings of the 1st Latin American Web Congress (LA-WEB), IEEE Computer Society, Washington, DC, USA, November 2003
- Andrei Z. Broder, Marc Najork, and Janet L. Wiener, Efficient URL caching for World Wide Web crawling, in Proceedings of the 12th International World Wide Web Conference (WWW), Budapest, Hungary, May 2003
- Dennis Fetterly, Mark Manasse, Marc Najork, and Janet Wiener, A large-scale study of the evolution of web pages, in Proceedings of the 12th International World Wide Web Conference (WWW), ACM, New York, NY, USA, May 2003
Issued Patents
- Marc A. Najork. System and method for maintaining a distributed database of hyperlinks. US Patent 7,340,467, issued 3/4/2008.
- Marc A. Najork. System and method for distributed web crawling. US Patent 7,139,747, issued 11/21/2006.
- Marc A. Najork and Chandramohan A. Thekkath. Algorithm for tree traversals using left links. US Patent 7,082,438, issued 7/25/2006.
- Marc A. Najork and Chandramohan A. Thekkath. Deletion and compaction using versioned nodes. US Patent 7,072,904, issued 7/4/2006.
- Marc A. Najork and Chandramohan A. Thekkath. Algorithm for tree traversals using left links. US Patent 7,007,027, issued 2/28/2006.
- Marc A. Najork and Clark A. Heydon. System and method for efficient filtering of data set addresses in a web crawler. US Patent 6,952,730, issued 10/4/2005.
- Marc A. Najork. System and method for identifying cloaked web servers. US Patent 6,910,077, issued 6/21/2005.
- Marc A. Najork, Clark A. Heydon, Michael Mitzenmacher, and Monika H. Henzinger. System and method for near-uniform sampling of web page addresses. US Patent 6,594,694, issued 7/15/2003.
- Marc A. Najork and Clark A. Heydon. Web crawler system using parallel queues for queing data sets having common address and concurrently downloading data associated with data set in each queue. US Patent 6,377,984, issued 4/23/2002.
- Marc A. Najork and Clark A. Heydon. System and method for associating an extensible set of data with documents downloaded by a web crawler. US Patent 6,351,755, issued 2/26/2002.
- Marc A. Najork and Clark A. Heydon. System and method for enforcing politeness while scheduling downloads in a web crawler. US Patent 6,321,265, issued 11/20/2001.
- Marc A. Najork and Clark A. Heydon. System and method for efficient representation of data set addresses in a web crawler. US Patent 6,301,614, issued 10/9/2001.
- Marc A. Najork, Clark A. Heydon, and Janet L. Wiener. Web crawler system using plurality of parallel priority level queues having distinct associated download priority levels for prioritizing document downloading and maintaining document freshness. US Patent 6,263,364, issued 7/17/2001.



