Resources for Research on Web Spam
We recommend you to subscribe to our mailing list. Datasets, challenges and conferences related to Web spam are posted to this low-volume, announcements-only mailing list.
We host Web spam datasets developed by a collaborative effort by a team of volunteers. The goal of our dataset activity is to make available reference collections that should be:
- Large: the collections should include many examples of spam and non-spam content.
- Clean: the collections should contain little classification errors.
- Uniform: the collections should represent a uniform random sample over a set of pages or hosts.
- Broad: the collections should include as many different Web spam aspects as possible.
- Open: the collections should be freely available for researchers.
Currently we are hosting a set of collections for research on Web Spam. See datasets >>.
ECML/PKDD Discovery Challenge 2010 — competition to identify methods for assessing Web Quality including spam.
Web Spam Challenge 2007/2008 — competition to identify methods for detecting Web Spam.
AIRWeb — workshop on Adversarial Information Retrieval on the Web
Source code (archived) — Truncated PageRank and Adaptive Estimation of Supporters, the algorithms proposed in a WebKDD'06 paper.For inquiries please contact Carlos Castillo
Last updated: September 10, 2012.