pycs-devel archive weblog

A blog for archiving the pycs-devel mailing list

2005-1-20

Phillip Pearson: Re: [PyCS-devel] Comment Spam in PyCS comments

OK, I've checked in my changes to comments, so now the next time you
update and restart your PyCS server, it will copy all your comments
from your MK DB into your PG DB. Check your etc.log to make sure that
nothing bad happens during this process; I'm not entirely sure how
error-tolerant it is. It seems to handle the utf-8 comments in my DB
OK, but watch out for them, because they seem to be the biggest causes
of errors.

So...if you want to start playing with the DB now and putting in new
features, go ahead! Remember that if you want to change the structure
of the DB (add new tables etc), you should do it in pycs_db.py and
increment the DB version number after you do it, so everyone else will
get the changes too.

Cheers,
Phil

On Wed, Jan 19, 2005 at 02:38:47PM +0100, Bauer, Georg wrote:
> Hi!
>
> Recently comment spam showed up in several blogs on pycs.net, so I think it
> might be time to do something against it. It's not a simple thing and there
> are no bulletproff ways, but I fear if we don't start to address this, it
> will only become even more ugly.
>
> A first take could make use of ideas from the OSA plugin for WordPress:
>
> - a comment form carries two hidden fields: the IP and the timestamp of the
> requester, both encoded.
> - a comment POST request must carry those two hidden fields with it
> - the IP of the POST request must be identical to the IP of the GET on the
> comment form
> - the POST must come in some predefined timeframe with regard to the GET (3
> sec min to 10 min max is OSA default)
> - the POST must come with a referrer that points to the GET
>
> And some other checks that make sense, I think:
>
> - the comment mustn't contain more than 2 URLs
> - the IP shouldn't have more than 5 comments in a given timeframe (1
> Minute?)
>
> If a comment doesn't match those requirements, it should be set to be
> moderated. Moderated comments should only show to the logged-in admin and
> should allow either to accept or reject the comment. Maybe add a checkbox
> "accept/reject all from the same IP" to allow mass deletion. Comment feeds
> maybe can show the to-be-moderated comments, too - but only if the user
> gives the password in the URL with an additional parameter, so feedster and
> friends (who happily spider comment feeds) won't show them. So the user
> needs to fetch the full feed with password to get a notion on what comments
> need to be moderated - maybe we can set up an alternative
> moderated-comment-feed, too.
>
> Additionally trackbacks should go into the moderation queue directly until
> someone figures out a better way to filter those that doesn't just use
> keywords.
>
> Another idea might be to make use of the fact that we run several blogs on
> the community server and keep a global storage of IPs whose comments went
> into the moderation queue and to check first against that to allow other
> blogs to benefit from each others.
>
> I don't think it would be a good idea to build this on the old metakit based
> version, so how is the status on the PostgreSQL stuff? Is the current
> version fully migrated?
>
> bye, Georg
>
>
>
>
> -------------------------------------------------------
> The SF.Net email is sponsored by: Beat the post-holiday blues
> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
> _______________________________________________
> PyCS-devel mailing list
> PyCS-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pycs-devel

Comment on this post [ so far]

Phillip Pearson: Re: [PyCS-devel] Comment Spam in PyCS comments

Hi Georg,

All good ideas, yes. My thoughts so far have been along the lines of
your last point:

> Another idea might be to make use of the fact that we run several blogs on
> the community server and keep a global storage of IPs whose comments went
> into the moderation queue and to check first against that to allow other
> blogs to benefit from each others.
>
> I don't think it would be a good idea to build this on the old metakit based
> version, so how is the status on the PostgreSQL stuff? Is the current
> version fully migrated?

I haven't touched comments, but the code is all ready for anyone else
to do that, if you're interested :-)

There's a file somewhere (forgotten the name - but grep for CREATE
TABLE and you'll get it) that upgrade the Postgres DB to the latest
format and copies data over from the Metakit DB...

Cheers,
Phil

Comment on this post [ so far]

Bauer, Georg: [PyCS-devel] Comment Spam in PyCS comments

Hi!

Recently comment spam showed up in several blogs on pycs.net, so I think it
might be time to do something against it. It's not a simple thing and there
are no bulletproff ways, but I fear if we don't start to address this, it
will only become even more ugly.

A first take could make use of ideas from the OSA plugin for WordPress:

- a comment form carries two hidden fields: the IP and the timestamp of the
requester, both encoded.
- a comment POST request must carry those two hidden fields with it
- the IP of the POST request must be identical to the IP of the GET on the
comment form
- the POST must come in some predefined timeframe with regard to the GET (3
sec min to 10 min max is OSA default)
- the POST must come with a referrer that points to the GET

And some other checks that make sense, I think:

- the comment mustn't contain more than 2 URLs
- the IP shouldn't have more than 5 comments in a given timeframe (1
Minute?)

If a comment doesn't match those requirements, it should be set to be
moderated. Moderated comments should only show to the logged-in admin and
should allow either to accept or reject the comment. Maybe add a checkbox
"accept/reject all from the same IP" to allow mass deletion. Comment feeds
maybe can show the to-be-moderated comments, too - but only if the user
gives the password in the URL with an additional parameter, so feedster and
friends (who happily spider comment feeds) won't show them. So the user
needs to fetch the full feed with password to get a notion on what comments
need to be moderated - maybe we can set up an alternative
moderated-comment-feed, too.

Additionally trackbacks should go into the moderation queue directly until
someone figures out a better way to filter those that doesn't just use
keywords.

Another idea might be to make use of the fact that we run several blogs on
the community server and keep a global storage of IPs whose comments went
into the moderation queue and to check first against that to allow other
blogs to benefit from each others.

I don't think it would be a good idea to build this on the old metakit based
version, so how is the status on the PostgreSQL stuff? Is the current
version fully migrated?

bye, Georg

Comment on this post [ so far]