NOTE: The Tcl support described in this file is disabled. The code is all still there, but you have to define DO_TCL manually while compiling to enable it. Compiling in Tcl filtering was causing random innd segfaults even if no Tcl filters were defined, so it's been turned off to prevent confusion. The Tcl code will be removed in the next major release of INN since no one appears to be using it and the code is unmaintained and has no champion. If you want to resurrect it, it may be better to start from scratch, since a lot has changed about INN since the filters were originally written and the Perl and Python filters have far more capabilities. Note, you need tcl 7.4. Rumour has it that 7.5 won't work. --------------------------------------------------------------------------- Subject: TCL-based Filtering for INN 1.5 Date: Mon, 07 Feb 94 12:36:47 -0800 From: Bob Heiney Several times in the past few months, a site or two has started posting the same article over and over again, but with a different message id. Usually this is caused by broken software (e.g. mail <-> news gateways, which many have written, but few have written correctly). Occasionally, however, the reposting is intentional. A recent example would be the "Global Alert: Jesus Is Coming" message which was posted to over 2200 newsgroups (each copy with its own message id). I expect this to happen more often as the Internet continues its explosive growth. Although my site (decwrl) usually has enough excess capacity to weather these problems, many other sites cannot. One problem on comp.sys.sgi.misc several months ago spewed 40MB of duplicate articles before the offending sites were fixed, and this overflowed the spool at many sites. Even for sites with lots of resources, there's still no need to propagate erroneous or malicious duplicates. I wanted a way to protect my site that was highly specific, flexible, and quick. Examination of duplicated articles showed that although the message ids were different, it was usually easy for a news admin to come up with a few rules based on the headers of the article that could be used to differentiate the duplicates from other articles. (E.g. from John.Doe@foo.com to comp.sys.sgi.misc with 'foobar' in the subject".) I concluded that modifying innd to let me say "kill things that look like _this_" would solve my problem. I also wanted to allow enough flexibilty in the design that I could later work on automatic detection and elimination of excessive duplicates (using a body checksum instead of headers). Since I needed a fairly powerful language to do all this, and since the world doesn't need yet another special language, my solution was to add TCL support to INN. I then modified "ARTpost" to call a TCL procedure which could then accept or reject the article. The TCL code has access to an associative array called "Headers", which contains all of the articles headers. The TCL code may also call a 32-bit article-body checksum procedure (this is to aid in future automatic detection of duplicates). Here's what a sample TCL filter procedure looks like: proc filter_news {} { global o Headers set sum [checksum_article] puts $o "$Headers(Message-ID) $sum" set newsgroups [split $Headers(Newsgroups) ,] foreach i $newsgroups { if {$i=="alt.test" && [string match "*heiney@pa.dec.com*" $Headers(From)]} { return "dont like alt.test from heiney" } } return "accept" } The above TCL code does a few things. First it computes a 32-bit checksum and writes it and the message ID to a file. It then rejects articles from me to alt.test. The work I've done is totally integrated into the INN build and runtime environments. For example, to turn filtering off, you'd just type ctlinnd filter n To reload the TCL code that does the filtering, you just say ctlinnd reload filter.tcl 'your comment here' (You may specify TCL callbacks to be executed right before and/or right after reloading, in case your filter is doing fancy stuff.) See the ctlinnd man page for more info. Filtering capability that's this powerful can be used for many purposes, some benign and useful (excessive duplicate detections, on-the-fly statistics), others abusive. I would ask that news admins think carefully about any filtering they do. /Bob