pycs-devel archive weblog: Phillip Pearson: Re: [PyCS-devel] Comment Spam in PyCS comments | Phillip Pearson: Re: [PyCS-devel] Comment Spam in PyCS comments | Bauer, Georg: [PyCS-devel] Comment Spam in PyCS comments

2005-1-20

Phillip Pearson: Re: [PyCS-devel] Comment Spam in PyCS comments

OK, I've checked in my changes to comments, so now the next time you

update and restart your PyCS server, it will copy all your comments

from your MK DB into your PG DB.  Check your etc.log to make sure that

nothing bad happens during this process; I'm not entirely sure how

error-tolerant it is.  It seems to handle the utf-8 comments in my DB

OK, but watch out for them, because they seem to be the biggest causes

of errors.



So...if you want to start playing with the DB now and putting in new

features, go ahead!  Remember that if you want to change the structure

of the DB (add new tables etc), you should do it in pycs_db.py and

increment the DB version number after you do it, so everyone else will

get the changes too.



Cheers,

Phil



On Wed, Jan 19, 2005 at 02:38:47PM +0100, Bauer, Georg wrote:

> Hi!

>

> Recently comment spam showed up in several blogs on pycs.net, so I think it

> might be time to do something against it. It's not a simple thing and there

> are no bulletproff ways, but I fear if we don't start to address this, it

> will only become even more ugly.

>

> A first take could make use of ideas from the OSA plugin for WordPress:

>

> - a comment form carries two hidden fields: the IP and the timestamp of the

> requester, both encoded.

> - a comment POST request must carry those two hidden fields with it

> - the IP of the POST request must be identical to the IP of the GET on the

> comment form

> - the POST must come in some predefined timeframe with regard to the GET (3

> sec min  to 10 min max is OSA default)

> - the POST must come with a referrer that points to the GET

>

> And some other checks that make sense, I think:

>

> - the comment mustn't contain more than 2 URLs

> - the IP shouldn't have more than 5 comments in a given timeframe (1

> Minute?)

>

> If a comment doesn't match those requirements, it should be set to be

> moderated. Moderated comments should only show to the logged-in admin and

> should allow either to accept or reject the comment. Maybe add a checkbox

> "accept/reject all from the same IP" to allow mass deletion. Comment feeds

> maybe can show the to-be-moderated comments, too - but only if the user

> gives the password in the URL with an additional parameter, so feedster and

> friends (who happily spider comment feeds) won't show them. So the user

> needs to fetch the full feed with password to get a notion on what comments

> need to be moderated - maybe we can set up an alternative

> moderated-comment-feed, too.

>

> Additionally trackbacks should go into the moderation queue directly until

> someone figures out a better way to filter those that doesn't just use

> keywords.

>

> Another idea might be to make use of the fact that we run several blogs on

> the community server and keep a global storage of IPs whose comments went

> into the moderation queue and to check first against that to allow other

> blogs to benefit from each others.

>

> I don't think it would be a good idea to build this on the old metakit based

> version, so how is the status on the PostgreSQL stuff? Is the current

> version fully migrated?

>

> bye, Georg

>

>

>

>

> -------------------------------------------------------

> The SF.Net email is sponsored by: Beat the post-holiday blues

> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.

> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

> _______________________________________________

> PyCS-devel mailing list

> PyCS-devel@lists.sourceforge.net

> https://lists.sourceforge.net/lists/listinfo/pycs-devel

Comment on this post [ so far]

Phillip Pearson: Re: [PyCS-devel] Comment Spam in PyCS comments

Hi Georg,



All good ideas, yes.  My thoughts so far have been along the lines of

your last point:



> Another idea might be to make use of the fact that we run several blogs on

> the community server and keep a global storage of IPs whose comments went

> into the moderation queue and to check first against that to allow other

> blogs to benefit from each others.

>

> I don't think it would be a good idea to build this on the old metakit based

> version, so how is the status on the PostgreSQL stuff? Is the current

> version fully migrated?



I haven't touched comments, but the code is all ready for anyone else

to do that, if you're interested :-)



There's a file somewhere (forgotten the name - but grep for CREATE

TABLE and you'll get it) that upgrade the Postgres DB to the latest

format and copies data over from the Metakit DB...



Cheers,

Phil

Comment on this post [ so far]

Bauer, Georg: [PyCS-devel] Comment Spam in PyCS comments

Hi!



Recently comment spam showed up in several blogs on pycs.net, so I think it

might be time to do something against it. It's not a simple thing and there

are no bulletproff ways, but I fear if we don't start to address this, it

will only become even more ugly.



A first take could make use of ideas from the OSA plugin for WordPress:



- a comment form carries two hidden fields: the IP and the timestamp of the

requester, both encoded.

- a comment POST request must carry those two hidden fields with it

- the IP of the POST request must be identical to the IP of the GET on the

comment form

- the POST must come in some predefined timeframe with regard to the GET (3

sec min  to 10 min max is OSA default)

- the POST must come with a referrer that points to the GET



And some other checks that make sense, I think:



- the comment mustn't contain more than 2 URLs

- the IP shouldn't have more than 5 comments in a given timeframe (1

Minute?)



If a comment doesn't match those requirements, it should be set to be

moderated. Moderated comments should only show to the logged-in admin and

should allow either to accept or reject the comment. Maybe add a checkbox

"accept/reject all from the same IP" to allow mass deletion. Comment feeds

maybe can show the to-be-moderated comments, too - but only if the user

gives the password in the URL with an additional parameter, so feedster and

friends (who happily spider comment feeds) won't show them. So the user

needs to fetch the full feed with password to get a notion on what comments

need to be moderated - maybe we can set up an alternative

moderated-comment-feed, too.



Additionally trackbacks should go into the moderation queue directly until

someone figures out a better way to filter those that doesn't just use

keywords.



Another idea might be to make use of the fact that we run several blogs on

the community server and keep a global storage of IPs whose comments went

into the moderation queue and to check first against that to allow other

blogs to benefit from each others.



I don't think it would be a good idea to build this on the old metakit based

version, so how is the status on the PostgreSQL stuff? Is the current

version fully migrated?



bye, Georg

Comment on this post [ so far]