Hacking INN This file is for people who are interested in making modifications to INN. Normal users can safely skip reading it. It is intended primarily as a guide, resource, and accumulation of tips for maintainers and contributors, and secondarily as documentation of some of INN's internals. This is $Revision: 7736 $ dated $Date: 2006-04-15 04:52:06 +0200 (Sat, 15 Apr 2006) $. First of all, if you plan on working on INN source, please start from the current development tree. There may be significant changes from the previous full release, so starting from development sources will make it considerably easier to integrate your work. You can get nightly snapshots of the current development source from ftp.isc.org in /isc/inn/snapshots (the snapshots named inn-CURRENT-*.tar.gz), or you can get the current CVS tree by using CVSup (see "Using CVSup"). Configuring and Portability All INN code should be written expecting ANSI C and POSIX. There is no need to attempt to support pre-ANSI compilers, and ANSI-only features such as , string concatenation, #elif, and token pasting may be used freely. So far as possible, INN is written to attempt to be portable to any system new enough that someone is likely to want to run a news server on it, but whenever possible this portability should be provided by checking for standard behavior in configure and supplying replacements for standard functions that are missing. When there is a conflict between ANSI C and C99, INN code should be written expecting C99 and autoconf used to patch up the differences. Try to avoid using #ifdef and the like in the middle of code as much as possible. Instead, try to isolate the necessary portability bits and include them in libinn or at least in conditional macros separate from the code. Trying to read code littered with conditional compilation directives is much more difficult. The shell script configure at the top level of the source tree is generated by autoconf from configure.in, and include/config.h.in is generated by autoheader from configure.in and include/acconfig.h. At configure time, configure generates include/config.h and several other files based on options it was given and what it discovers about the target system. All modifications to configure should instead be made to configure.in. Similarly, modifications to include/config.h.in should instead be made to include/acconfig.h. The autoconf manual (available using info autoconf if you have autoconf and the GNU info utilities installed on your system) is a valuable reference when making any modifications. To regenerate configure, just run "autoconf". To regenerate include/config.h.in, run: autoheader -l include to tell it where to find acconfig.h. Please don't include patches to either configure or include/config.h.in when sending patches to INN; instead, note in your patch that those files must be regenerated. The generated files are checked into the CVS repository so that people working on INN don't have to have autoconf on their system, and to make packaging easier. At the time of this writing, autoconf 2.13 is required. The supporting files for autoconf are in the support subdirectory, including the files config.guess and config.sub to determine the system name and and ltmain.sh for libtool support. The latter file comes from the libtool distribution; the canonical version of the former two are available from ftp.gnu.org in /gnu/config. In addition, m4/libtool.m4 is just a copy of libtool.m4 from the libtool distribution. (Using libtool without using automake requires a few odd hacks.) These files used to be on a separate vendor branch so that we could make local modifications, but local modifications have not been necessary for some time. Now, new versions can just be checked in like any other file modifications. INN should not compile with libtool by default, only when requested, since otherwise normal compilations are quite slow. (Using libtool is not without some cost.) Basic compilation with libtool works fine as of this writing, with both static and shared compiles, but the dependencies aren't quite right for make -j using libtool. Documentation INN's documentation is currently somewhat in a state of flux. The vast majority is still in the form of man pages written directly in nroff. Some parts of the documentation have been rewritten in POD; that documentation can be found in doc/pod. The canonical source for README, INSTALL, NEWS, doc/hook-perl, doc/hook-python, and this file are also in POD. If you're modifying some part of INN's documentation and see that it has a POD version in doc/pod, it's preferred if you can make the modifications to the POD source and then regenerate the derived files. For a quick introduction to POD, see the perlpod(1) man page on your system (it should be installed if you have Perl installed). When writing new documentation, write in whatever format you care to; if necessary, we can always convert it to POD or whatever else we want to use. Having the documentation exist in *some* form is more important than what language you write it in. If you really don't have any particular preference, there's a slight preference currently for POD. If you use POD or regenerate POD documentation, please install something close to the latest versions of the POD processing utilities to avoid changes to the documentation depending on who generated it last. You can find the latest version on CPAN (ftp.perl.org or another mirror) in modules/by-module/Pod. You'll need PodParser (for versions of Perl before 5.6.1; 5.6.1 and later come with a recent enough version) and the latest version of podlators. For versions of Perl earlier than 5.005, you'll also need File::Spec in modules/by-module/File. podlators 1.25 or later will build INN's documentation without significant changes from the versions that are checked into the repository. There are Makefile rules in doc/pod/Makefile to build all of the documentation whose master form is POD; if you add additional documentation, please add a rule there as well. Documentation should be generated by cd'ing to doc/pod and typing "make file" where "file" is the relative path to the documentation file. This will get all of the various flags right for pod2text or pod2man. Error Handling INN has a set of generic error handling routines that should be used as much as possible so that the same syntax can be used for reporting errors everywhere in INN. The four basic functions are warn, syswarn, die, and sysdie; warn prints or logs a warning, and die does the same and then exits the current program. The sys* versions add a colon, a space, and the value of strerror(errno) to the end of the message, and should be used to report failing system calls. All of the actual error reporting is done via error handlers, and a program can register its own handlers in addition to or instead of the default one. The default error handler (error_log_stderr) prints to stderr, prepending the value of error_program_name if it's set to something other than NULL. Three other error handlers are available, error_log_syslog_crit, error_log_syslog_err, and error_log_syslog_warning, which log the message to syslog at LOG_CRIT, LOG_ERR, or LOG_WARNING priority, respectively. There is a different set of error handlers for warn/syswarn and die/sysdie. To set them, make calls like: warn_set_handlers(2, error_log_stderr, error_log_syslog_warning); die_set_handlers(2, error_log_stderr, error_log_syslog_err); The first argument is the number of handlers, and the remaining arguments are pointers to functions taking an int (the length of the formatted message), a const char * (the format), a va_list (the arguments), and an int that's 0 if warn or die was called and equal to the value of errno if syswarn or sysdie was called. The length of the formatted message is obtained by calling vsnprintf with the provided format and arguments, and therefore is reliable to use as the size of a buffer to malloc to hold the result of formatting the message provided that vsnprintf is used to format it (warning: the system vsprintf may produce more output under some circumstances, so always use vsnprintf). The error handler can do anything it wishes; each error handler is called in the sequence given. Error handlers shouldn't call warn or die unless great caution is taken to prevent infinite recursion. Also be aware that sysdie is called if malloc fails in xmalloc, so if the error handler needs to allocate memory, it must not use xmalloc or a related function to do so and it shouldn't call die to report failure. The default syslog handlers report memory allocation failure to stderr and exit. Finally, die and sysdie support an additional handler that's called immediate before exiting, takes no arguments, and returns an int which is used as the argument for exit. It can do any necessary global cleanup, call abort instead to generate a core dump or the like. The advantage of using this system everywhere in INN is that library code can use warn and die to report errors and each calling program can set up the error handlers as appropriate to make sure the errors go to the right place. The default handler is fine for interactive programs; for programs that run from interactive scripts, adding something like: error_program_name = "program"; to the beginning of main (where program is the name of the program) will make it easier to figure out which program the script calls is failing. For programs that may also be called non-interactively, like inndstart, one may want to set up handlers like: warn_set_handlers(2, error_log_stderr, error_log_syslog_warning); die_set_handlers(2, error_log_stderr, error_log_syslog_err); Finally, for daemons and other non-interactive programs, one may want to do: warn_set_handlers(1, error_log_syslog_warning); die_set_handlers(1, error_log_syslog_err); to report errors only via syslog. (Note that if you use syslog error handlers, the program should call openlog first thing to make sure they are logged with the right facility.) For historical reasons, error messages that are fatal to the news subsystem are logged at the LOG_CRIT priority, and therefore die in innd should use error_log_syslog_crit. Test Suite The test suite for INN is located in the tests directory and is just getting started. The test suite consists of a set of programs listed in tests/TESTS and the scaffolding in the runtests program. Adding new tests is very straightforward and very flexible. Just write a program that tests some part of INN, put it in a directory under tests named after the part of INN it's testing (all the tests so far are in lib because they're testing libinn routines), and have it output first a line containing the count of test cases in that file, and then for each test a line saying "ok n" or "not ok n" where n is the test case number. (If a test is skipped for some reason, such as a test of an optional feature that wasn't compiled into INN, the test program should output "ok n # skip".) Add any rules necessary to build the test to tests/Makefile (note that for simplicity it doesn't recurse into subdirectories) and make sure it creates an executable ending in .t. Then add the name of the test to tests/TESTS, without the .t ending. One naming convention: to distinguish more easily between e.g. lib/error.c (the implementation) and tests/lib/error-t.c (the test suite), we add -t to the end of the test file names. So tests/lib/error-t.c is the source that compiles into an executable tests/lib/error.t which is run by putting a line in tests/TESTS of just "lib/error". Note that tests don't have to be written in C; in fact, lib/xmalloc.t is just a shell script (that calls a supporting C program). Tests can be written in shell or Perl (but other languages should be avoided because someone who wants to run the test suite may not have it) and just have to follow the above output conventions. Additions to the test suite, no matter how simple, are very welcome. Makefiles All INN makefiles include Makefile.global at the top level, and only that makefile is a configure substitution target. This has the disadvantage that configure's normal support for building in a tree outside of the source tree doesn't work, but it has the significant advantage of making configure run much faster and allowing one to run make in any subdirectory and pick up all the definitions and settings from the top level configuration. All INN makefiles should also set $(top) to be the path to the top of the build directory (usually relative). This path is used to find various programs like fixscript and libtool so that the same macros (set in Makefile.global) can be used all over INN. The format of INN's makefiles is mostly standardized; the best examples of the format are probably frontends/Makefile and backends/Makefile, at least for directories with lots of separate programs. The ALL variable holds all the files that should be generated, EXTRA those additional files that were generated by configure, and SOURCES the C source files for generating tag information. There are a set of standard installation commands defined in make variables by Makefile.global, and these should be used for all file installations. See the comment blocks in Makefile.global.in for information on what commands are available and when they should be used. There are also variables set for each of the installation directories that INN uses, for use in building the list of installed paths to files. Each subdirectory makefile should have the targets all (the default), clean, clobber, install, tags, and profiled. The tags target generates vi tags files, and the profiled target generates a profiling version of the programs (although this hasn't been tested much recently). These rules should be present and empty in those directories where they don't apply. Be sure to test compiling with both static and dynamic libraries and make sure that all the libtool support works correctly. All linking steps, and the compile steps for all library source, should be done through $(LIBTOOL) (which will be set to empty in Makefile.global if libtool support isn't desired). Scripts INN comes with and installs a large number of different scripts, both Bourne shell and Perl, and also comes with support for Tcl scripts (although it doesn't come with any). Shell variables containing both configure-time information and configuration information from inn.conf are set by the innshellvars support libraries, so the only system-specific configuration that should have to be done is fixing the right path to the interpretor and adding a line to load the appropriate innshellvars. support/fixscript, built by configure, does this. It takes a .in file and generates the final script (removing the .in) by fixing the path to the interpretor on the first line and replacing the second line, whatever it is, with code to load the innshellvars appropriate for that interpretor. (If invoked with -i, it just fixes the interpretor path.) Scripts should use innshellvars (via fixscript) to get the right path and the right variables whenever possible, rather than having configure substitute values in them. Any values needed at run-time should instead be available from all of the different innshellvars. See the existing scripts for examples of how this is done. Include Files Include files relevant to all of INN, or relevant to the two libraries built as part of INN (the utility libinn library and the libstorage library that contains all storage and overview functions) are found in the include directory; other include files relevant only to a portion of INN are found in the relevant directory. Practically all INN source files will start with: #include "config.h" #include "clibrary.h" The first picks up all defines generated by autoconf and is necessary for types that may not be present on all systems (uid_t, pid_t, size_t, int32_t, and the like). It therefore should be included before any other headers that use those types, as well as to get general configuration information. The second is portably equivalent to: #include #include #include #include #include #include #include #include except that it doesn't include headers that are missing on a given system, replaces functions not found on the system with the INN equivalents, provides macros that INN assumes are available but which weren't found, and defines some additional portability things. Even if this is more headers than the source file actually needs, it's generally better to just include clibrary.h rather than trying to duplicate the autoconf-driven hackery that it does to do things portably. The primary exception is for source files in lib that only define a single function and are used for portability; those may want to include only config.h so that they can be easily used in other projects that use autoconf. config.h is a fairly standard header name for this purpose. clibrary.h does also include config.h, but it's somewhat poor form to rely on this; it's better to explicitly list the header dependencies for the benefit of someone else reading the code. There are portable wrappers around several header files that have known portability traps or that need some fixing up on some platforms. Look in include/portable and familiarize yourself with them and use them where appropriate. Another frequently included header file is libinn.h, which among other things defines xmalloc(), xrealloc(), xstrdup(), and xcalloc(), which are checked versions of the standard memory allocation routines that terminate the program if the memory allocation fails. These should generally always be used instead of the regular C versions. libinn.h also provides various other utility functions that are frequently used. paths.h includes a wide variety of paths determined at configure time, both default paths to various parts of INN and paths to programs. Don't just use the default paths, though, if they're also configurable in inn.conf; instead, call ReadInnConf() and use the global innconf structure. Other files in include are interfaces to particular bits of INN library functionality or are used for other purposes; see the comments in each file. Eventually, the header files will be separated into installed header files and uninstalled header files; the latter are those headers that are used only for compiling INN and aren't useful for users of INN's libraries (such as clibrary.h). All of the installed headers will live in include/inn and be installed in a subdirectory named inn in the configured include directory. This conversion is still in progress. When writing header files, remember that C reserves all identifiers beginning with two underscores and all identifiers beginning with an underscore and a capital letter for the use of the implementation; don't use any identifiers with names like that. Additionally, any identifier beginning with an underscore and a lower-case letter is reserved in file scope, which means that such identifiers can only be used by INN for the name of structure members or function arguments in function prototypes. Try to pay attention to the impact of a header file on the program namespace, particularly for installed header files in include/inn. All symbols defined by a header file should ideally begin with INN_, inn_, or some other unique prefix indicating the subsystem that symbol is part of, to avoid accidental conflicts with symbols defined by the program that uses that header file. Coding Style INN has quite a variety of coding styles intermixed. As with all programs, it's preferrable when making minor modifications to keep the coding style of the code you're modifying. In INN, that will vary by file. (Over time we're trying to standardize on one coding style, so changing the region you worked on to fit the general coding style is also acceptable). If you're writing a substantial new piece of code, the prevailing "standard" INN coding style appears to be something like the following: * Write in regular ANSI C whenever possible. Use the normal ANSI and POSIX constructs and use autoconf or portability wrappers to fix things up beforehand so that the code itself can read like regular ANSI or POSIX code. Code should be written so that it works as expected on a modern platform and is fixed up with portability tricks for older platforms, not the other way around. You may assume an ANSI C compiler. Try to use const wherever appropriate. Don't use register; modern compilers will do as good of a job as you will in choosing what to put into a register. Don't bother with restrict (at least yet). * Use string handling functions that take counts for the size of the buffer whenever possible. This means using snprintf in preference to sprintf and using strlcpy and strlcat in preference to strcpy and strcat. Also, use strlcpy and strlcat instead of strncpy and strncat unless the behavior of the latter is specifically required, as it is much easier to audit uses of the former than the latter. (strlcpy is like strncpy except that it always nul-terminates and doesn't fill the rest of the buffer with nuls, making it more efficient. strlcat is like strncat except that it always nul-terminates and it takes the total size of the buffer as its third argument rather than just the amount of space left.) All of these functions are guaranteed to be available; there are replacements in lib for systems that don't have them. * Avoid #ifdef and friends whenever possible. Particularly avoid using them in the middle of code blocks. Try to hide all portability preprocessor magic in header files or in portability code in lib. When something just has to be done two completely different ways depending on the platform or compile options or the like, try to abstract that functionality out into a generic function and provide two separate implementations using #ifdef; then the main code can just call that function. If you do have to use preprocessor defines, note that if you always define them to either 0 or 1 (never use #define without a second argument), you can use the preprocessor define in a regular if statement rather than using #if or #ifdef. Make use of this instead of #ifdef when possible, since that way the compiler will still syntax-check the other branch for you and it makes it far easier to convert the code to use a run-time check if necessary. (Unfortunately, this trick can't be used if one branch may call functions unavailable on a particular platform.) * Avoid uses of fixed-width buffers except in performance-critical code, as it's harder to be sure that such code is correct and it tends to be less flexible later on. If you need a reusable, resizable memory buffer, one is provided in lib/buffer.c. * Avoid uses of static variables whenever possible, particularly in libraries, because it interferes with making the code re-entrant down the road and makes it harder to follow what's going on. Similarly, avoid using global variables whenever possible, and if they are required, try to wrap them into structures that could later be changed into arguments to the affected functions. * Roughly BSD style but with four-space indents. This means no space before the parens around function arguments, open brace on the same line as if/while/for, and close and open brace on the same line as else). * Introductory comments for functions or files are generally written as: /* ** Introductory comment. */ Other multiline comments in the source are generally written as: /* This is a multiline comment. */ Comments before functions saying what they do are nice to have. In general, the RCS/CVS Id tag is on the first line of each source file since it's useful to know when a file was last modified. * Checks for NULL pointers are preferrably written out explicitly; in other words, use: if (p != NULL) rather than: if (p) to make it clearer what the code is assuming. * It's better to always put the body of an if statement on a separate line, even if it's only a single line. In other words, write: if (p != NULL) return p; and not: if (p != NULL) return p; This is in part for a practical reason: some code coverage analysis tools like purecov will count the second example above as a single line and won't notice if the condition always evaluates the same way. * Plain structs make perfectly reasonable abstract data types; it's not necessary to typedef the struct to something else. Structs are actually very useful for opaque data structures, since you can predeclare them and then manipulate pointers to them without ever having to know what the contents look like. Please try to avoid typedefs except for function pointers or other extremely confusing data types, or for data types where we really gain some significant data abstraction from hiding the underlying data type. Also avoid using the _t suffix for any type; all types ending in _t are reserved by POSIX. For typedefs of function pointer types, a suffix of _func usually works. This style point is currently widely violated inside of INN itself; INN originally made extensive use of typedefs. * When noting something that should be improved later, add a comment containing "FIXME:" so that one can easily grep for such comments. INN's indentation style roughly corresponds to that produced by GNU indent 2.2.6 with the following options: -bad -bap -nsob -fca -lc78 -cd41 -cp1 -br -ce -cdw -cli0 -ss -npcs -ncs -di1 -nbc -psl -brs -i4 -ci4 -lp -ts8 -nut -ip5 -lps -l78 -bbo -hnl Unfortunately, indent currently doesn't get everything right (it has problems with spacing around struct pointer arguments in functions, wants to put in a space between a dereference of a function pointer and the arguments to the called function, misidentifies some macro calls as being type declarations, and fouls up long but simple case statements). It would be excellent if someday we could just run all of INN's code through indent routinely to enforce a consistant coding style, but indent isn't quite ready for that. For users of emacs cc-mode, use the "bsd" style but with: (setq c-basic-offset 4) Finally, if possible, please don't use tabs in source files, since they can expand differently in different environments. In particular, please try not to use the mix of tabs and spaces that is the default in emacs. If you use emacs to edit INN code, you may want to put: ; Use only spaces when indenting or centering, no tabs. (setq-default indent-tabs-mode nil) in your ~/.emacs file. Note that this is only a rough guideline and the maintainers aren't style nazis; we're more interested in your code contribution than in how you write it. Using CVSup If you want to get updated INN source more easily or more quickly than by downloading nightly snapshots, or if you want to see the full CVS history, you may want to use CVSup to download the source. CVSup is a client and server designed for replicating CVS repositories between sites. Unfortunately, CVSup is written in Modula-3, so getting a working binary can be somewhat difficult. Binaries are available in the *BSD ports collection or (for a wide variety of different platforms) available from and its mirrors. Alternately, you can get a compiler from (this is more actively maintained than the DEC Modula-3 compiler) and the source from . After you have the CVSup client, you need to have space to download the INN repository and space for CVSup to store its data files. You also need to write a configuration file (a supfile) for CVSup. The following supfile will download the latest versions from the mainline source: *default host=inn-cvs.isc.org *default base= *default prefix= *default release=cvs *default tag=. *default delete use-rel-suffix inn where should be a directory where CVSup can put its data files and is where the downloaded source will go (it will be put into a subdirectory named inn). If you want to pull down the entire CVS repository instead (warning: this is much larger than just the latest versions of the source), delete the "*default tag=." line. The best way to download the CVS repository is to download it into a portion of a locally-created CVS repository, so that then you can perform standard CVS operations (like cvs log) against the downloaded repository. Creating your own local CVS repository is outside the scope of this document. Note that only multiplexed mode is supported (this mode should be the default). For more general information on using CVSup, see the FreeBSD page on it at . Making a Release This is a checklist that INN maintainers should go through when preparing a new release of INN. 1. If making a major release, branch the source tree and create a new STABLE branch tag. This branch will be used for minor releases based on that major release and can be done a little while before the .0 release of that major release. At the same time as the branch is cut, tag the trunk with a STABLE--branch marker tag so that it's easy to refer to the trunk at the time of the branch. 2. Update doc/pod/news.pod and regenerate NEWS. Be more detailed for a minor release than for a major release. For a major release, also add information on how to upgrade from the last major release, including anything special to be aware of. (Minor releases shouldn't require any special care when upgrading.) 3. Make sure that support/config.sub and support/config.guess are the latest versions (from ). See the instructions in "Configuring and Portability" for details on how to update these files. 4. Make sure that samples/control.ctl is in sync with the master version at . 5. Check out a copy of the release branch. It's currently necessary to run configure to generate Makefile.global. Then run "make check-manifest". The only differences should be files that are generated by configure; if there are any other differences, fix the MANIFEST. 6. Run "make release". Note that you need to have a copy of svn2cl from to do this; at least version 0.7 is required. Start the ChangeLog at the time of the previous release. (Eventually, the script will be smart enough to do this for you.) 7. Make the resulting tar file available for testing in a non-listable directory on ftp.isc.org and announce its availability on inn-workers. Install it on at least one system and make sure that system runs fine for at least a few days. This is also a good time to send out a draft of the release announcement to inn-workers for proof-reading. 8. Generate a diff between this release and the previous release if feasible (always for minor releases, possibly not a good idea due to the length of the diff for major releases). 9. Move the release into the public area of the ftp site and update the inn.tar.gz link. Make an MD5 checksum of the release tarball and put it on the ftp site as well, and update the inn.tar.gz.md5 link. Put the diff up on the ftp site as well. Contact the ISC folks to get the release PGP-signed. Possibly move older releases off into the OLD directory. 10. Announce the new release on inn-announce and in news.software.nntp. 11. Tag the checked-out tree that was used for generating the release with a release tag (INN-). 12. Bump the revision number in Makefile.global.in. References Some additional references that may be hard to find and may be of use to people working on INN: The home page for the IETF NNTP standardization effort, including links to the IETF NNTP working group archives and copies of the latest drafts of the new NNTP standard. The old archived mailing list traffic contains a lot of interesting discussion of why NNTP is the way it is. The archives for the USEFOR IETF working group, the working group for the RFC 1036 replacement (the format of Usenet articles). Also contains a lot of references to other relevant work, such as the RFC 822 replacement work. Forrest Cavalier provides several tools for following INN development at this page and elsewhere in the Usenet RKT. Under here is a web-accessible checked-out copy of the current INN source tree and pointers to how to use CVSup. The standards for large file support on Unix that are being generally implemented by vendors. INN sort of partially uses these, but a good full audit of the code to check them should really be done and there are occasional problems. A primer on IPv6 with pointers to the appropriate places for more technical details as needed, useful when working on IPv6 support in INN.