Metadata extraction and text categorization using Universal Resource Locator expansions
| dc.contributor.author | Min-Yen KAN | en_US |
| dc.date.accessioned | 2004-10-21T14:28:52Z | en_US |
| dc.date.accessioned | 2017-01-23T06:59:54Z | |
| dc.date.available | 2004-10-21T14:28:52Z | en_US |
| dc.date.available | 2017-01-23T06:59:54Z | |
| dc.date.issued | 2003-10-01T00:00:00Z | en_US |
| dc.description.abstract | Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can indicate metadata about a resource. This paper explores the mining of URLs to yield categoric metadata about web resources via a three-phase pipeline of word segmentation, abbreviation expansion and classification. I apply this approach to the problem of subject metadata generation and quantify its performance relative to title- and document-based methods, both which require the retrieval of the source document. | en_US |
| dc.format.extent | 267515 bytes | en_US |
| dc.format.mimetype | application/pdf | en_US |
| dc.identifier.uri | https://dl.comp.nus.edu.sg/xmlui/handle/1900.100/1436 | en_US |
| dc.language.iso | en | en_US |
| dc.relation.ispartofseries | TR10/03 | en_US |
| dc.title | Metadata extraction and text categorization using Universal Resource Locator expansions | en_US |
| dc.type | Technical Report | en_US |