pycs-devel archive weblog

A blog for archiving the pycs-devel mailing list

2003-3-13

Phillip Pearson: [PyCS-devel] did I mention?

http://www.pycs.net/allyourrss.html

a little mini-aggregator for pycs.net.

In CVS now, as /rss ... do what you will. If anyone feels like
hacking it to use a template (Cheetah or something) to generate the
output, that would be cool. Adding it to the Makefile so it's
installed with PyCS, and getting it to use pycs_paths.py would be
handy too ;)

Cheers,
Phil

Phillip Pearson: [PyCS-devel] forking inside a module script

Hi,

I've decided that the safest way to run htsearch is to fork inside the
module script, run htsearch in the child process, and let the OS clean
up after it.

(how to do this in Python, for ppl who're interested:
http://www.myelin.co.nz/post/2003/3/13/#200303135)

However, I found that the module handler catches SystemExit
exceptions, meaning that the child processes weren't being allowed to
exit properly. I've changed pycs_module_handler to just re-raise if
it gets a SystemExit, but it looks like Medusa also catches it.
Here's a quick diff to get Medusa to re-raise too:

RCS file: /cvsroot/oedipus/medusa/http_server.py,v
retrieving revision 1.10
diff -u -r1.10 http_server.py
--- http_server.py 18 Dec 2002 14:55:44 -0000 1.10
+++ http_server.py 13 Mar 2003 09:58:20 -0000
@@ -495,6 +495,8 @@
# This isn't used anywhere.
# r.handler = h # CYCLE
h.handle_request (r)
+ except SystemExit:
+ raise
except:
self.server.exceptions.increment()
(file, fun, line), t, v, tbinfo = asyncore.compact_traceback()

I guess we should push this one over to the Medusa people too ...

(I haven't put any of the search stuff into CVS yet BTW, but will
soonish hopefully).

Cheers,
Phil

Georg Bauer: Re: [PyCS-devel] question: better logging (per user)

Hi!

> Congratulations!

One more patch (the timezone in the logging was wrong and it did log in
gmt instead of localtime) later and now it looks quite good:

http://muensterland.org/statistics/

Nice. Next thing would be to find a way to do that per user. Maybe I just
split the stuff by user path and run single instances of webalizer, or I
just put in some grouping for some of the users.

> In other news, we almost have another search engine backend available:
> http://www.myelin.co.nz/post/2003/3/13/#200303131

Fine! The context in search results is one thing I miss with swish++.

bye, Georg

Phillip Pearson: Re: [PyCS-devel] question: better logging (per user)

> But it works. I now have a nice and shiny combined log with remote host
> IPs, referrers and user agent informations, but it is created on the
> community server. And it uses all rewriting rules, so I get only
> normalized URLs (/users/xxxxxx/ stuff). This can be splitted by user and
> so I could set up webalizer to just sum up stuff for one user. Or do other
> nice things with that :-)

Congratulations!

In other news, we almost have another search engine backend available:
http://www.myelin.co.nz/post/2003/3/13/#200303131

I just realised that ht://Dig has a number of classes using static member
variables that don't seem to be cleaned up properly, so I'm going to have to
change all that if we want to ever be able to do more than one search per
PyCS process (reloading _htsearch.so might help, but I bet I'd end up with
one hell of a memory leak). Ahh, CGI ...

Cheers,
Phil :)

Georg Bauer: Re: [PyCS-devel] question: better logging (per user)

Hi!

> I already have a hack working (not yet checked in, though) that will
> patch the http_request objects in a way that they log in the combined
> log format (with referrers and user-agent info). I currently investigate
> how
> complicated it would be to get Apache pass on the client address in a
> header, so I could use that in the logging to replace the apache machine
> header.

Ok, it is now working. I have added a new vhostfrom rule to the
rewrite.conf.default and added several patches in pycs.py and
pycs_rewrite_handler.py. The main problem is, that medusa doesn't give a
nice way to specify what class to use for http requests. So to do all this
nicely, I would have to overload the full hierarchy and make changes to
several methods and classes. To prevent that (as that would likely break
with newer releases where the inner workings change), I just patch some
class objects with setattr. This will break with newer versions, too, if
some key components change. But that's only very small code added, and
only actually one dependency on inner workings at all: I assume that
http_request objects have a header and _header_cache instance variable
like they do now.

So if someone want's to dig into the code, be warned. It is butt ugly ;-)

But it works. I now have a nice and shiny combined log with remote host
IPs, referrers and user agent informations, but it is created on the
community server. And it uses all rewriting rules, so I get only
normalized URLs (/users/xxxxxx/ stuff). This can be splitted by user and
so I could set up webalizer to just sum up stuff for one user. Or do other
nice things with that :-)

bye, Georg

Phillip Pearson: Re: [PyCS-devel] question: better logging (per user)

> > I only analyse what comes in from Apache, because that gives me the
> > client IP address.
>
> I am currently working out how to solve that, too. :-)
>
> I already have a hack working (not yet checked in, though) that will patch
> the http_request objects in a way that they log in the combined log format
> (with referrers and user-agent info). I currently investigate how
> complicated it would be to get Apache pass on the client address in a
> header, so I could use that in the logging to replace the apache machine
> header.

You could always continue the ~~vhost~~ thing and turn it into
~~vhost~~/ip.address/server/path ...

BTW this may be useful:
http://httpd.apache.org/docs/mod/mod_headers.html

Now, can we get it to take input from mod_rewrite? :-)

Cheers,
Phil

Georg Bauer: Re: [PyCS-devel] question: better logging (per user)

Hi!

> Nothing from my end. In fact I totally ignore the logs coming out of
> the PyCS process ;-)
>
> I only analyse what comes in from Apache, because that gives me the
> client IP address.

I am currently working out how to solve that, too. :-)

I already have a hack working (not yet checked in, though) that will patch
the http_request objects in a way that they log in the combined log format
(with referrers and user-agent info). I currently investigate how
complicated it would be to get Apache pass on the client address in a
header, so I could use that in the logging to replace the apache machine
header.

This would allow me to create full combined logs for the machine and so
split that up to produce statistics for user directories with all
informations that would be available from the apache machine.

Actually I don't like running webalizer on the apache machine because
there it doesn't have the rewritten addresses. Since I use manila style
host names, I get a lot access to stuff like /weblog/index.html - but
can't tell wether that's for hugo.muensterland.org, witch.muensterland.org
or pyds.muensterland.org :-/

bye, Georg

Phillip Pearson: Re: [PyCS-devel] question: better logging (per user)

Hi,

> If you know of something that exists and might break with this change,
> notify me and I will have to make this logging behaviour configureable.

Nothing from my end. In fact I totally ignore the logs coming out of
the PyCS process ;-)

I only analyse what comes in from Apache, because that gives me the
client IP address.

> Another change in CVS is that now there is the /status activated in
> medusa. It's only a simple status page and doesn't include too much
> information, but I think we should support it with our own handlers, in
> the long run. Might be a nice place for a quick glance on how your server
> performs.

Good point. When I coded the server in the first place, I turned off
everything I didn't immediately need, because I was in a hurry and
didn't want to have to bother checking to make sure it was secure.

Then, I never went back to do the extra work and get it all going
again ... ;-)

So if you think /status is OK, I don't mind having that that turned
back on again.

Cheers,
Phil

Georg Bauer: Re: [PyCS-devel] question: better logging (per user)

Hi!

> I am unsatisfied on how PyCS currently does logging: it's all in one big
> file and _before_ rewriting takes place. This makes up for very ugly
> URLs when PyCS runs behind an Apache. My idea is to provide common log
> file format per user, but _after_ rewriting takes place.

I think I found it. pycs-rewrite_handler.py doesn't change the
request.request field on rewriting. I changed this so that it now
constructs a new request and put's it in there. This should work out
nicely, as it doesn't change anything else in the system, just the field
and code that depends on that (and that should - in my opinion - get the
rewritten address).

But this change (just checked it into CVS) might break stuff that depends
on the access.log written by pycs. So if you have a log analyzer working
on your pycs-generated access.log, things have changed and you won't find
original URIs in there. I checked Phils make_referer.py script, that reads
the apache log files and so isn't influenced by my change. But there might
be other stuff outside.

If you know of something that exists and might break with this change,
notify me and I will have to make this logging behaviour configureable.

Another change in CVS is that now there is the /status activated in
medusa. It's only a simple status page and doesn't include too much
information, but I think we should support it with our own handlers, in
the long run. Might be a nice place for a quick glance on how your server
performs.

bye, Georg