There is currently huge amount of
data on the web and almost no classification information. The
key problem is how to embed knowledge into information mining
algorithms. The authors analyze techniques of information retrieval
and give their strong and weak points. Although most Web documents
are text-oriented, there are plenty of them that contain multimedia
elements, which are not easily accessible through common search
methods. Web information is dynamic, semi-structured, and interwound
with hyperlinks. Several advanced methods for Web information
mining are analyzed: (1) syntax analysis (HTML tags), (2) knowledge
annotation by use of conceptual graphs, (3) KPS: Keyword, Pattern,
Sample search techniques, and (4) techniques of obtaining descriptions
by fuzzification and back-propagation. The problem of choosing
proper keywords is also stressed out. The authors suggest the
usage of already accepted standards for classification hierarchy,
such as Dewey Decimal Classification (DDC).
|