ISPadmin
July, 2001
Usenet News
Introduction
In this installment, I look at many die hard Internet user's favorite application
and every service provider's headache, news. Usenet news is defined by RFC977
(NNTP proposed standard) and RFC1036 (Usenet message standard).
The problem of news is a difficult one for a service provider, due to the
following attributes:
Very high volume of both posts and news data itself
Very high (exponential?) rate of increase in both number of posts and MB
of data year over year
High number of end users
The distributed distribution model implemented by NNTP, making reliability
and "correctness" (i.e. posts containing all of the data they are supposed
to) problematic
Infrastructure costs, including bandwidth and disk space
Personnel costs to monitor spam, illegal articles originating on their network,
as well as manage the news infrastructure itself
Legal problems associated with spam and illegal pornography and warez
Small Provider Infrastructure
A small provider (and most enterprises who offer news to their employees
for that matter) will likely utilize a single machine for news. This single
machine performs all three functions typically required in a news infrastructure:
inbound news relay, outbound news relay, and serving news to clients.
Figure 1
Figure 1 contains a diagram of how a small provider might set up their news
infrastructure. The center box labeled "news server" handles all three basic
news functions: inbound news relay, outbound news relay and client news readers.
The inbound articles come into the machine from the various sources of news
(peers, commercial providers, upstream ISP(s), etc.). The outbound articles
leave the server though the outbound news connection(s) (usually the sources
as the inbound news streams). All of the customers who want news point their
news clients to this same machine.
It is very likely that a small provider is not going to want to deal with
the hassles of news and will outsource news to a provider like Critical Path
(formerly Supernews/RemarQ). (The References section contains a pointer to
a listing of news providers at the Open Directory project home page.) While
some larger providers do utilize commercial news services providers, it is
usually more cost effective for a big provider to setup their own news infrastructure.
Large Provider Infrastructure
A large provider is likely to deploy separate machines (or groups of machines)
for each function: inbound news relay, outbound news relay and client news
serving. Of course, functions can be combined; for example, inbound and outbound
news relay can be the same machine if the inbound and outbound news volume
isn't too high.
Figure 2
Figure 2 outlines how a larger provider might setup their inbound news infrastructure.
The box marked "INR" illustrates an inbound news relay machine. This machine
takes all off-site incoming feeds and consolidates them into a single feed.
Only the very largest providers would need more than one INR machine for
performance reasons. (Of course, they may have multiple INR machines for
redundancy purposes.) A single machine to consolidate incoming feeds keeps
transit costs to a minimum.
Multiple inbound news relay machines accepts feeds from the inbound news
relay machine(s). Each inbound news relay machine sends news articles to
multiple news reading servers (indicated by "NRS1" and "NRSx") which news
reading clients (not shown) attach to. Note that depending upon the news
server hardware and software running on the news reading servers, each server
can feed hundreds of news clients concurrently. The first class of machines
to require scaling is usually the news reading servers, followed by the inbound
news relay and finally the outbound news relay machine, which would be a
distant third.
Figure 3
Figure 3 shows how a big provider could set up their outbound news (posts
originating on their network) infrastructure. The boxes marked "ONR1" and
"ONRx" indicate outbound news relay machines, which take articles from the
news reading servers (labeled "NRS1" and "NRSx") and send them to the machine
labeled "ONR". The outbound news relay might be located on the same machine
that provides the inbound news relay function, depending upon the number
of articles originating on the providers network. This outbound news relay
machine is tasked with sending articles originating on the providers network
to the Internet at large through the outbound news feeds previously configured.
The outbound news relay machines (machines labeled "ONRx") are typically
the last part of the news infrastructure that require scaling (after news
reading servers and inbound news relays), as news clients don't usually originate
many news articles.
Cyclical News File System (a.k.a. the Trash Can)
The vast majority of news implementations utilize either Internet Software
Consortium's INN (originally written by Rich Salz) which is open source or
Openwave's Typhoon/Cyclone series of commercial software. Before discussing
the applications in particular, a short history and discussion of the circular
news file system would be helpful.
Prior to the implementation of the Cyclical News File System (or CNFS) within
INN, many providers (including Time Warner Cable of Maine and Ziplink) who
had implemented INN 1.x switched to Typhoon and Cyclone because INN simply
could not handle the load, nor expire articles automatically. Typhoon/Cyclone
(and, more recently, INN) both feature an implementation of a cyclical news
file system that eliminates many of the headaches when managing a news infrastructure,
namely article expiration.
Article expiration is the process by which articles are "cancelled" and deleted
from the list of available articles for download. As articles "age", they
are expired. Historically, article expiration handled by setting parameters
within an INN configuration file. Ever day at a certain time, a process ran
that deleted articles that met the criteria set in the configuration file
and all of the news indexes were re-indexed. This process could take hours
for a large news system, frequently causing service interruptions.
With the rapid growth of the size of newsfeeds, the partition(s) articles
were stored on frequently filled up if an administrator was not diligent
in keeping the expiration configuration file up to date with the added news
groups. Also, performing the expiration would often cause service interruptions
due to the load put on the news server while running the article expiration.
A cyclical news file system has no concept of article expiration. Therefore,
there is no need to perform the CPU consuming expiration process and its
associated overhead. With the advent of the 2.0 release of INN, INN includes
CNFS support.
Openwave's Twister and Cyclone
Twister and Cyclone are commercial service provider grade news server implementations.
Many service providers utilize these products for serving news to their customers.
Some of the features of Openwave's Twister and Cyclone products include:
Virtual server support
Customizable anti-spam filtering
Synchronized article numbering across entire news server infrastructure
Real time statistics and logs capable of generating bills from
Post filtering
Automated moderator support
Feeds automatically adjusted without administrator intervention for optimal
throughput and efficiency
Openwave has a free version of their discussion software named Breeze. Of
course, there are limits as to how many feeds and how many readers can connect
to it, but it might be worth some investigation if you are in the market
for Usenet news server software.
Internet Software Consortium's INN
INN is freeware and doesn't have all of the bells and whistles that a commercial
application like Openwave's does. However, it is a fully functional news
server and perfectly capable of serving news. The advent of the CNFS in INN
makes it much more robust and usable in a service provider environment. Features
of INN 2.3.1 include:
Python, Perl and TCL authentication and filtering plugin support
News reading over SSL
Email gateway to news
Exponential backoff for posting, enabling some level of anti-spam support
For many providers, large and small, INN is a fine solution to the problem
of Usenet news. If the added features of a commercial news product (like
article synchronization, virtual server support and real time statistics)
are required, then a provider would likely utilize a commercial grade server.
News Client Software
News client software bears a brief mention. Both Microsoft and Netscape browsers
contain client news reading capability. While they both can read news, I
personally find them not nearly as functional as the Forte Agent news reading
client. For those folks who have been around since before the GUI days, you
can still read news from the Unix command line utilizing tin, trn, pine or
a multitude of other character based news readers. If you are interested
in finding out more about news clients, please check out the appropriate
web site in the References.
Storage Considerations
When designing news infrastructure, many details must be considered. In the
area of storage, single disk spindles (i.e., not RAID or other fault tolerant
storage technology) are usually utilized for storage as losing articles is
a tolerable event. Also, backups are almost never performed (except for those
news providers who archive such things) because once again, losing articles
is acceptable. Once the hardware failure is repaired, news will begin filling
the disks again very rapidly!
News articles can be stored and shared via NFS mounts. Historically, many
problems arose utilizing NFS for article storage, which accounts for the
limited use of NFS in news implementations. Issues with utilizing NFS with
news are:
file locking
performance
It is not recommended that NFS be utilized for news implementation, as there
are much better ways (i.e. Storage Area Networks or SAN) to achieve similar
functionality and performance.
Other Considerations
As you are probably aware Usenet news hosts thousands of newsgroups in a
multitude of languages. For providers with networks located solely in the
US, it is sufficient to carry the 50,000 or so English-only groups. International
service providers would likely carry a complete feed with all non-English
groups as well.
The end subscriber controls the groups to which they subscribe. When their
news client initially connects to the news server, the news server will query
the client as to what group headers to download (usually all are downloaded).
Once the group headers are downloaded, the end user can subscribe (download
article headers within each group) to whatever set of groups interests them,
and then download individual article bodies that they wish to view.
Most providers carry local news groups (a group dedicated to restaurant reviews
in the provider's city, for example). In fact, the Openwave series of news
servers enables "virtual groups" to be located across servers and only visible
to certain classes of clients (for example, the customers of a particular
ISP in a wholesale ISP's news infrastructure). This is a very useful feature
for a wholesale service provider.
One might wonder how much effort and hardware it took to run a news infrastructure
at a moderate sized ISP. At Ziplink, we had a moderate size English language
only (50,000 groups) news infrastructure (200 news clients). These clients
were served by two Sun Ultra 5 machines, each with 512MB of RAM and approximately
36 GBs of disk space. One machine ran Cyclone and was the inbound and outbound
news relay, while the other machine ran Typhoon and was the news reader machine.
The load on either machine was never higher than 1. The aggregate feeds were
on the order of 2 megabits/second, from a handful of UUNET news feeds and
several news peers.
Occasionally a news spam complaint will arrive in the abuse mailbox of a
provider. Usually, it is very easy to track down the perpetrators of news
spam, as the logs and message headers themselves contain exactly when and
where the message originated. Forging message headers makes this process
much more difficult, but the logs again make it easy to determine positively
whether or not a message originated on a particular provider's network. Generally,
Usenet news spam is much less of a problem for an ISP than junk email. In
2.5 years at Ziplink, I handled one Usenet spam complaint but hundreds of
unsolicited commercial email complaints.
Legal Aspects of News
I've asked John Nicholson (the lawyer who writes in ;login: about legal issues
surrounding computers) to cover ISP legal areas, as I'm not an attorney.
Usenet news would definitely be an important topic for any discussion around
service provider legal liability.
Most ISP's consider themselves a "common carrier". Having "common carrier"
legal status would exempt ISPs from liability of what is carried over their
infrastructure. How true this belief really is, I am not sure. If considered
a "common carrier", a service provider cannot be held liable for pornography
or illegal software (warez) originating or residing on their infrastructure.
(An analogy to this would be holding the US Post Office responsible for someone
sending illegal drugs through the mail.)
Most if not all providers perform no *content based* censoring (moderating)
of what content flows through their network. Of course, decisions *not* based
on content but strictly on technical capacities and related areas (for example,
limits placed on news articles based upon the amount of disk space or network
bandwidth available) is an acceptable means of controlling one's destiny
as an ISP while not jeopardizing the potential for "common carrier" legal
status.
Conclusion
Providing Usenet news functionality can be a difficult task for any provider.
For a smaller provider, one machine can handle inbound, outbound and news
reading capability. For a lager provider, the functionality is split up based
on the type of functionality.
A good commercial news server is Openwave's discussion products, while INN
remains the open source stalwart. Spam is not too much of an issue when it
comes to news, and those who do are relatively easy to catch. Most ISP's
do not filter groups for fear of losing "carrier" status, which is unclear
ISPs have at this point under any circumstances.
Next time, I'll take a look at how service providers deploy their name service
infrastructure. In the meantime, please send your questions and comments
regarding ISP's, system administration or related topics to me.
References
NNTP proposed standard (RFC977): ftp://ftp.isi.edu/in-notes/rfc977.txt
Usenet message standard (RFC1036): ftp://ftp.isi.edu/in-notes/rfc1036.txt
RFC Editor: http://www.rfc-editor.org/
Critical Path Supernews/RemarQ: http://www.supernews.net/
Open Directory Project list of commercial news providers: http://dmoz.org/Computers/Usenet/Feed_Services/
ISC's INN: http://www.isc.org/products/INN/
Openwave's discussion server software: http://discussion.openwave.com/
Microsoft Internet Explorer: http://www.microsoft.com/windows/ie/
Netscape Navigator: http://home.netscape.com/browsers/index.html
Forte Inc., Agent/Free Agent: http://www.forteinc.com/
tin: http://www.tin.org/
pine: http://www.washington.edu/pine/
trn: http://trn.sourceforge.net/