Tuesday, June 4, 2002

Brent Simmons mentions the wonder of ssh and, in particular, the incredibly useful scp command. He uses it to back up his web site with a single command.

Ssh is an incredibly useful tool and not just for security. The compression feature-- enabled with -C at the command line-- can vastly improve throughput if the data sent/received is primarily ASCII.

For example, using ssh for all CVS operations typically yields a 70% to 90% increase in throughput because the CVS wire protocol is generally compresses extremely well (it is mostly diffs, ASCII commands and source files). As well, cvs's pserver mode is horrendously insecure. By using ssh with cvs, that problem disappears.

ssh can also be used to preserve data in environments that want to corrupt it. The rcp/rsh command under Windows has a great affinity for translating line endings. This can wreak havoc on, say, a GIF/JPEG image that happens to have a newline character in the middle of the binary data. However, if you use scp/ssh to move the data about, the problem goes away! Since we standardized on cvs+ssh for revision control to our Windows clients, end-of-line translation issues have been entirely resolved.

ssh can also be a great boon to reading email. When accessing a terminal server to use a mail reader such as pine or mutt, the compression option greatly accelerates the mail reading experience.

However, ssh can also be used to accelerate and secure mail reading for client side mail readers. With OS X, I use the included Mail.app for all of my mail reading needs. Within Mail's preferences, I have configured it such that it connects to port 2143 on 'localhost' to grab my codefab.com email. The SMTP host is set to localhost (currently, Mail insists on connecting to port 25 for SMTP connections-- fixed in a 'future version', I hear).

Then, I use ssh's port forwarding feature to effectively mirror the local ports to the remote machines. The command to do so looks like the following:

sudo ssh -l bbum -L25:mail.codefab.com:25 -L2143:imapserver.codefab.com:143 -C -N loginhost.codefab.com

This causes ssh to transparently forward port 2143 of my localhost to port 143-- the standard IMAP port-- of imapserver.codefab.com. The only clear-text across-the-wire communication is between loginhost and imapserver. And, of course, by using the -C option, the entire connection is compressed and all email related activities are greatly accelerated (as most email is ASCII).

Port forwarding is totally cool stuff: it basically causes all activity on a local port to be transparently and securely forwarded to any port on any machine you specify. When I worked on site at a large bank, they had an astoundingly slow proxy server through which all web traffic "had" to be directed. However, they had also opened up their firewall to pass ssh traffic. As such, I used ssh's port forwarding (with compression) to forward a port on my local machine to CodeFab's HTTP proxy server (a squid server). I then configured the browser to use the localhost's port as the proxy server. Not only could I browse the web without the restrictions caused by the company's proxy server (I wasn't even trying to browse anything questionable-- it wouldn't let me hit technical resource sites that helped me do my job more effectively! Something about 'freshmeat' offended it, for example), but I could browse web pages about 10x faster than folks going through the proxy!

Brent mentioned the use of scp to backup the contents of his web site. -C will work with scp, as well. Personally, I use rsync to move data between my development environment and my production web server. Actually, I use rsync for a tremendous number of different tasks.

For Friday.COM -- a site that I own (and whose main content resides behind the facade at that URL)-- I use rsync to synchronize new versions of the site from my development environment into the production server.

rsync --exclude=stats -e 'ssh -C' -a -v --delete www-friday/ www.friday.com:/var/www/

Trivial and, of course, through the use of 'ssh -C', I gain both security and acceleration through the use of compression. In and of itself, rsync is amazing stuff. As the name implies, it synchronizes filesystems. However, for large files, it can do a block level synchronization. That is, it effectively figures out what parts of the large file have changed and will only synchronize those bytes!

Given the incredibly cheap price of hard drives, we have found it cheaper to maintain live backups of critical chunks of filesystem by simply running the appropriate rsync command on a regular basis.

(When testing rsync, it is always wise to specify --dry-run. It will cause rsync to spew a list of all changes that will be made to the filesystem, but won't actually change anything. Pretty critical prior to cutting loose with a --delete.)

Very cool stuff.
7:36:52 PM