Jiantao Sun, Benyu Zhang, Zheng Chen, Yuchang Lu, Cuiyi Shi, and Wei-Ying Ma
Most of current researches on Web page classification focus on leveraging heterogeneous features such as plain text, hyperlinks and anchor texts in an effective and efficient way. Composite kernel method is one topic of interest among them. It first selects a bunch of initial kernels, each of which is determined separately by a certain type of features. Then a classifier is trained based on a linear combination of these kernels. In this paper, we propose an effective way to optimize the linear combination of kernels. We proved that this problem is equivalent to solving a generalized eigenvalue problem. And the weight vector of the kernels is the eigenvector associated with the largest eigen-value. A support vector machine (SVM) classifier is then trained based on this optimized combination of kernels. Our experiment on the WebKB dataset has shown the effectiveness of our proposed method.
|Published in||IEEE/WIC International Conference on Web Intelligence 2004|
|Publisher||IEEE Computer Society|
Copyright © 2004 IEEE. Reprinted from IEEE Computer Society. This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to firstname.lastname@example.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.