MWP Project: Measuring the Growth of Croatian Web Space / G4
In this paper we present the results of our research on measuring the Croatian Web space that has been started in 2002. First we brifly present the architecture and the system built and used for the measurement. We discuss, in details, the results of the 3 conducted measurements in order to estimate the growth of Croatian Web space. Finally, limitations and challenges caused by non-standard use of Web technologies are examined and future plans presented.
Our measurement system, called MWP, initially developed in 2002. , revised in 2003, is under constant development. It has been developed to enable us:
We have conducted 3 measurements:
The result of MWP-3 shows 41% growth of the estimated Web space size in comparison with MWP-1. We also found the higher percentage of Web pages with metadata (including Dublin Core standard). Regarding the content types we got the expected results showing that web is actually simple and most of the resources (over 90%) are covered with not more than 4 different content types (text/html, img/jpeg, img/gif, text/plain).
This research relates only to the so-called surface Web, so it cannot include Web sites with protected access, Web pages with dynamically generated addresses or databases accessible through the Web. The analysis does not include contents, i.e. the context in which the Web sites appear. MWP gatherer, robot program which we use in this research follows the standards for robot exclusion.
Detailed information about the MWP and all conducted measurements are available at http//www.srce.hr/mwp/ .