DESIRE Information Gateways Handbook
HomeTable of contentsAuthors-
Search | Help   
-1.4. System requirements overview

In this chapter...
 
  • reliability - making sure your gateway is always available
  • responsiveness - how will your gateway perform?
  • efficiency - making the best of available resources
  • scalability - coping with more users, more data and more services

Introduction
 

Subject gateway services need to be provided in such a way that they are:

  • reliable
  • responsive
  • efficient
  • scalable

A reliable service is one that is available all (well, almost all) of the time, is secure and does not lose all your data in the event of disk failure or security breaches. A responsive service is one that can be browsed, searched and maintained in a way that does not subject the end-user and cataloguer to undue delays. An efficient service makes the best use of the available hardware and network resources. A scalable service is one that can cope with demands placed on it by growing numbers of end-users, increasing database size and new service scenarios.


Background
 

Subject gateways operate in a Web environment. This means that they must be available all the time. End-users expect reasonable response times while they browse the gateway and fast and predictable performance when they search the database. Subject gateway cataloguers expect reasonable response times as they add resource descriptions to the database. Subject gateway managers want to be able to deliver all this at a reasonable cost - both in terms of the initial cost of establishing the gateway and in terms of ongoing hardware and software support costs.

You can achieve this through the use of appropriate:

  • network connectivity
  • hardware configuration (memory, CPU speed, disk space)
  • operating system software
  • subject gateway database and associated software
  • Web server software

Hardware and software requirements; issues for managers
 

Reliability

You want your subject gateway to be reliable. You want it to be available for use for as much of the time as possible - preferably 24 hours a day, 365 days a year. In order to achieve this, there are several issues you will need to think about when you are setting up and running the gateway.

Use reliable hardware

Use reliable hardware to run your subject gateway. This probably means using hardware with which you are familiar. Get a hardware support contract for your machine with an appropriate call-out time. If you are nervous, make sure that you can offer your service from some other hardware if your main kit is seriously broken. If you are really nervous, set aside a machine specifically for this purpose. As regards cost, you are likely to get a much better price/performance ratio by choosing Intel (PC) hardware. However, remember that you are likely to be accessing your disks heavily during subject gateway operation so choose an appropriate disk configuration and connection method.

Use reliable software

Remember that a subject gateway operates in a hostile networked environment and needs to support multiple users. Choose an operating system that can reliably handle this. Again, it may be sensible to choose an operating system with which you are familiar. However, it is worth noting that UNIX-based operating systems have a much longer track record of providing Internet-based services. Think carefully before choosing anything else! Much of the software developed by the DESIRE project is aimed at (or will only run under) UNIX-based operating systems. If you've chosen Intel-based hardware, using Linux as the operating system is an obvious choice. Remember that you may need software support both for your operating system and for the subject gateway software that you are running. If you prefer to pay for such support, fine; but remember that the freely available and fairly informal support which is usually available for Open Source software through mailing lists and Web sites can often be extremely good. Remember also that your subject gateway software is likely to rely on a separate Web server; the widely deployed, well maintained and supported and freely available Apache Web server is a sensible choice.

Make sure your data is regularly backed up

What happens when something goes seriously wrong with your machine: a disk crashes or you are hacked and your data is deleted? Make sure that all your software and data is backed up in such a way that you can quickly and easily recover your service. You may choose some sort of RAID architecture for your disks. You may choose to copy your data automatically to a second disk partition. In any case, you are advised to archive your data to tape regularly. You may even do all three of these things ... but do something! And don't forget your software and configuration files; in the event of a serious problem you may need to re-install absolutely everything!

Make sure your server is secure

An insecure server is a disaster waiting to happen. Follow the advice in your operating system manuals concerning security. Apply all known security patches and get someone in your team on to the right mailing lists so that you find out about potential problems early. Only run the minimum number of network services that you have to. Position your machine behind a firewall if you can, with access to the Internet only on those ports that you actually need.

Coping with external problems

Your subject gateway will rely on various external services. If your network connection goes down, you can't offer a service. If your DNS entry isn't available for some time, people may be unable to access you. An off-site secondary for your DNS entries is a good idea; an off-continent secondary is even better! As your subject gateway grows, you might think about mirroring your service at another location. One way of achieving this is to have a reciprocal mirroring arrangement with another subject gateway.

Staffing issues

Unless you hand over completely the running and administration of your subject gateway server to a third party, you are highly likely to need one technically competent member of staff to run a subject gateway. For DESIRE developed software solutions, this will mean someone familiar with administering UNIX machines. Familiarity with the Perl programming language would be a distinct advantage as well. Other software solutions may not require UNIX or Perl experience; however, a technical understanding of the issues related to the operation of a networked service will be very helpful.

Responsiveness and efficiency

Hardware and software issues

More details concerning hardware and software issues are given in the Systems Requirements Specifics section. The main rules of thumb are:

  • hardware requirements will be software-specific - in particular, database-specific. Check your software manual!
  • more memory is likely to mean better performance
  • faster CPU speed is likely to mean better performance
  • Linux will give better performance than NT given the same hardware
  • NT and Perl may not mix well
  • more network bandwidth means better performance
  • multiple DNS secondaries will give better performance

Cross reference
System requirements specifics, hardware and software

Network and design issues

The design of the Web interface to your subject gateway will have an effect on the efficiency with which you use the available network bandwidth. Make as many of your pages as possible suitable for caching. For example, most of your browsable interface (assuming that you have one) can probably be designed so that it can be cached by remote Web caches and at the Web browser. Your user interface will be much more responsive because of this.

Cross reference
User interface implementation

Scalability

Scalability is discussed in more detail in the Scalability section. As a general point it is worth noting that:

  • supporting more users may require more memory and more network bandwidth
  • having more records in the database may require more memory and more disk space
  • introducing new service scenarios may require more memory and more disk space

Cross reference
Scalability

Costs

Unless you are very lucky, the hardware on which you run your subject gateway is going to cost money. As mentioned above, Intel-based hardware is likely to give a much better price/performance ratio than other hardware. Software may well be free - all the software developed by the DESIRE project will be made available on an Open Source basis. Hardware and software support is likely to cost money; though again it is worth noting that the support you can get for free from the Internet community may well be good enough for your needs (and may even be better than that provided commercially). Technical staff will cost money.


Future proofing
 

Software and hardware systems need to be regularly reviewed to measure how far they are meeting business requirements. The gateway will want to choose software and hardware solutions which provide sufficient flexibility to accommodate change. Such products will probably:

  • offer regular upgrades
  • comply with open standards
  • respond to customer requests
  • impose no restrictions which tie you to that product, for example by ensuring that you have access to proprietary specifications of data structures which may be needed to convert to a new supplier's format The gateway will want to ensure that decisions regarding the choice of products are informed by strategic objectives, for example:
  • use products that have a good reputation in areas which are important for the gateway (by being innovative, reliable, flexible, customisable . . . )
  • use products that support inter-working with key collaborators
  • implement systems with potential audiences in mind (the technologies they use, the features they value)
E X A M P L E

Scout/SOSIG mirroring

SOSIG, the Social Science Information Gateway, is a ROADS database of over 5500 Internet resource descriptions operated by ILRT at the University of Bristol in the UK. In order to make the database more accessible to end-users in North America, SOSIG has been working closely with staff from the Internet Scout Project, located at the University of Wisconsin-Madison (USA) and funded by the National Science Foundation. A mutual mirroring service has been set up so that users from North America can access a mirror of SOSIG, based on the Scout server, and European users can access a mirror of Scout from the SOSIG server. The SOSIG ROADS database is mirrored weekly using some locally developed scripts that make a 'tar' copy of the complete SOSIG ROADS installation (after making some site-specific changes).

Cross reference
Co-operation between gateways


Glossary
 

DNS - Domain Name Server. A general-purpose distributed, replicated, data query service chiefly used on Internet for translating hostnames into Internet addresses.
Linux - Linux is a free Unix-type operating system originally created by Linus Torvalds with the assistance of developers around the world.
RAID - Redundant Arrays of Independent Disks
ROADS - Resource Organisation And Discovery in Subject-based Services

References
 

Apache, http://www.apache.org/

Internet Scout Project - SOSIG mirror, http://scout18.cs.wisc.edu/sosig_mirror/

Linux, http://www.linux.org/

SOSIG, http://www.sosig.ac.uk/

AE. Frisch, Essential System Administration, 2nd ed. (ISBN: 1-56592-127-5). http://www.oreilly.com/catalog/esa2/

B. Laurie & P. Laurie, Apache: The Definitive Guide, 2nd ed. (ISBN: 1-56592-528-9). http://www.oreilly.com/catalog/apache2/

M. Loukides, System Performance Tuning (ISBN: 0-937175-60-9). http://www.oreilly.com/catalog/spt/

E. Siever, et al., Linux in a Nutshell: A Desktop Quick Reference (ISBN: 1-56592-585-8). http://www.oreilly.com/catalog/linuxnut2/


Credits
 

Chapter author: Andy Powell

<< P R E V I O U S 1 | 2 | 3 | 4 | 5 N E X T >>
  Go to the table of contents  

Return to:
Handbook Home
DESIRE Home
Search | Full Glossary | All References

Last updated : 20 April 00
Contact Us
© 1999-2000 DESIRE