PPT Slide
Information Retrieval Techniques
- KPS: Keyword, Pattern, Sample techniques
- goal: to extract information from irregular pages automatically or with minimal human efforts, using KPS
Assumptions for using this algorithm:
- important information is always highlighted by keywords
- common patterns exist in many languages, e.g. M.SC. or PhD.
- similar structures or patterns exist usually in the same organisation
Keyword-based mining: value related to a keyword (publications, research interests, E-mail, etc.)
Pattern-based mining: performs string matching based on patterns which are specified by users ([Dr. /Name received /Degree from * in /Year], telephone, addresses etc.)
Sample-based mining: extracts information based on pattern and style similarities defined by users.