Metadata extraction and text categorization using Universal Resource Locator expansions

No Thumbnail Available
Date
2003-10-01T00:00:00Z
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can indicate metadata about a resource. This paper explores the mining of URLs to yield categoric metadata about web resources via a three-phase pipeline of word segmentation, abbreviation expansion and classification. I apply this approach to the problem of subject metadata generation and quantify its performance relative to title- and document-based methods, both which require the retrieval of the source document.
Description
Keywords
Citation