Ming-Wei Chang, Wen-tau Yih, and Robert McCann
21 August 2008
Gray mail, messages that could reasonably be considered either spam or good by different email users, is a commonly observed issue in production spam filtering systems. In this paper we study this class of mail using a large real-world email corpus and signature-based campaign detection techniques. Our analysis shows that even an optimal filter will inevitably perform unsatisfactorily on gray mail, unless user preferences are taken into account. To overcome this difficulty we design a light-weight user model that is highly scalable and can be easily combined with a traditional global spam filter. Our approach is able to incorporate both partial and complete user feedback on message labels and catches up to 40% more spam from gray mail in the low false-positive region.
In Proceedings of The 5th Conference on Email and Anti-Spam