23.1. Some INN Internals

INN's core program is the innd daemon. innd 's task is to handle all incoming articles, storing them locally, and to pass them on to any outgoing newsfeeds if required. It is started at boot time and runs continually as a background process. Running as a daemon improves performance because it has to read its status files only once when starting. Depending on the volume of your news feed, certain files such as history (which contain a list of all recently processed articles) may range from a few megabytes to tens of megabytes.

Another important feature of INN is that there is always only one instance of innd running at any time. This is also very beneficial to performance, because the daemon can process all articles without having to worry about synchronizing its internal states with other copies of innd rummaging around the news spool at the same time. However, this choice affects the overall design of the news system. Because it is so important that incoming news is processed as quickly as possible, it is unacceptable that the server be tied up with such mundane tasks as serving newsreaders accessing the news spool via NNTP, or decompressing newsbatches arriving via UUCP. Therefore, these tasks have been broken out of the main server and implemented in separate support programs. Figure 23-1 attempts to illustrate the relationships between innd, the other local tasks, and remote news servers and newsreaders.

Today, NNTP is the most common means of transporting news articles around, and innd doesn't directly support anything else. This means that innd listens on a TCP socket (port 119) for connections and accepts news articles using the “ihave” protocol.

Articles arriving by transports other than NNTP are supported indirectly by having another process accept the articles and forward them to innd via NNTP. Newsbatches coming in over a UUCP link, for instance, are traditionally handled by the rnews program. INN's rnews decompresses the batch if necessary, and breaks it up into individual articles; it then offers them to innd one by one.

Newsreaders can deliver news when a user posts an article. Since the handling of newsreaders deserves special attention, we will come back to this a little later.

Figure 23-1. INN architecture (simplified for clarity)

When receiving an article, innd first looks up its message ID in the history file. Duplicate articles are dropped and the occurrences are optionally logged. The same goes for articles that are too old or lack some required header field, such as Subject:.[1] If innd finds that the article is acceptable, it looks at the Newsgroups: header line to find out what groups it has been posted to. If one or more of these groups are found in the active file, the article is filed to disk. Otherwise, it is filed to the special group junk.

Individual articles are kept below /var/spool/news, also called the news spool. Each newsgroup has a separate directory, in which each article is stored in a separate file. The file names are consecutive numbers, so that an article in comp.risks may be filed as comp/risks/217, for instance. When innd finds that the directory it wants to store the article in does not exist, it creates it automatically.

Apart from storing articles locally, you may also want to pass them on to outgoing feeds. This is governed by the newsfeeds file that lists all downstream sites along with the newsgroups that should be fed to them.

Just like innd 's receiving end, the processing of outgoing news is handled by a single interface, too. Instead of doing all the transport-specific handling itself, innd relies on various backends to manage the transmission of articles to other news servers. Outgoing facilities are collectively dubbed channels. Depending on its purpose, a channel can have different attributes that determine exactly what information innd passes on to it.

For an outgoing NNTP feed, for instance, innd might fork the innxmit program at startup, and, for each article that should be sent across that feed, pass its message ID, size, and filename to innxmit 's standard input. For an outgoing UUCP feed, on the other hand, it might write the article's size and file name to a special logfile, which is head by a different process at regular intervals in order to create batches and queue them to the UUCP subsystem.

Besides these two examples, there are other types of channels that are not strictly outgoing feeds. These are used, for instance, when archiving certain newsgroups, or when generating overview information. Overview information is intended to help newsreaders thread articles more efficiently. Old-style newsreaders had to scan all articles separately in order to obtain the header information required for threading. This would put an immense strain on the server machine, especially when using NNTP; furthermore, it was very slow.[2] The overview mechanism alleviates this problem by prerecording all relevant headers in a separate file (called .overview) for each newsgroup. This information can then be picked up by newsreaders either by reading it directly from the spool directory, or by using the XOVER command when connected via NNTP. INN has the innd daemon feed all articles to the overchan command, which is attached to the daemon through a channel. We'll see how this is done when we discuss configuring news feeds later.

Notes

[1]

This is indicated by the Date: header field; the limit is usually two weeks.

[2]

Threading 1,000 articles when talking to a loaded server could easily take around five minutes, which only the most dedicated Usenet addict would find acceptable.