Have had a variety of comment spam on this blog over the past few months.
For those who haven't had the pleasure, comment spam is spam delivered into a blog by inserting comments on topics. The intent of the comment spammers isn't necessarily to drive traffic from the blog to their site, but rather to increase their search score on the various search engines by appearing on multiple sites.
I explored a number of ways to eliminate this. I didn't want legitimate comments to be removed, but was weary of checking the blog several times a day. I had a trigger that called my cell phone when a comment was added, but wasn't always available to drop everything and remove an unwanted entry.
As my blog is hosted on .Text (thanks Scott!), all the content is stored in SQL Server. Specifically, the text of the posts, articles and comments are stored in a text field. Because of this, there are some SQL-based options; specifically:
- Table triggers
- Maintenance jobs
Either of these could provide a means to detect the content of an incoming comment, and act accordingly.
More than my desire to avoid losing legitimate comments was my desire to keep my blog in a supported state for any upgrades .Text might issue. As a result, I decided to eliminate the trigger option.
I set up a maintenance job modeled after the BizTalk maintenance jobs. These jobs run at scheduled intervals and execute Transact SQL statements. My job runs once per minute and performs the following actions:
- Loops through a list of common keywords used by the current group of comment spammers (I store these in a table).
- Inserts found records into a table with the same structure as the content table (without the identity field).
- Deletes the offending row from the Content table.
Now, I only have to check the rows in the deleted table every few days to ensure I'm not losing legitimate comments. Next up: write an easy way to restore a comment. This is a bit more tricky than it sounds: the challenge is to prevent the comment from being removed because of a keyword without modifying the original text.