CUC 2004 / New Frontiers / New Techhnologies for New Needs
CARNet logo
MWP Project: Measuring the Growth of Croatian Web Space / G4
Authors: Miroslav Milinović; Dubravko Penezić; Hrvoje Stipetić; Nebojša Topolščak, University Computing Centre – Srce, Croatia
| Full paper | Presentation |

Abstract

In this paper we present the results of our research on measuring the Croatian Web space that has been started in 2002. First we brifly present the architecture and the system built and used for the measurement. We discuss, in details, the results of the 3 conducted measurements in order to estimate the growth of Croatian Web space. Finally, limitations and challenges caused by non-standard use of Web technologies are examined and future plans presented.

Our measurement system, called MWP, initially developed in 2002. , revised in 2003, is under constant development. It has been developed to enable us:

  • to regularly perform measurement of size and content types of the resources accessible through the http or https protocol from sites in the .hr top level Internet domain
  • to analyze available meta-data entries
  • to regularly publish measurement results and provide their further analysis and comparison.

We have conducted 3 measurements:

  • MWP-1 - the first ever measurement of Croatian Web space (from March 29 to May 7, 2002), carried out in the framework of cooperation with the National and University Library
  • MWP-2 - the second (May 14 to July 22, 2003) and MWP-3 - third (from September 8 to November 25, 2003) measurement carried out in the framework of the project which was financed by the Ministry of Science and Technology of the Republic of Croatia.

The result of MWP-3 shows 41% growth of the estimated Web space size in comparison with MWP-1. We also found the higher percentage of Web pages with metadata (including Dublin Core standard). Regarding the content types we got the expected results showing that web is actually simple and most of the resources (over 90%) are covered with not more than 4 different content types (text/html, img/jpeg, img/gif, text/plain).

This research relates only to the so-called surface Web, so it cannot include Web sites with protected access, Web pages with dynamically generated addresses or databases accessible through the Web. The analysis does not include contents, i.e. the context in which the Web sites appear. MWP gatherer, robot program which we use in this research follows the standards for robot exclusion.

Detailed information about the MWP and all conducted measurements are available at http//www.srce.hr/mwp/ .

 
 
Copyright © 1991- 2004. CARNet. All rights reserved. / Mail to cuc@carnet.hr / Legal notes / Impressum