A Framework for Hierarchical Cost-sensitive Web Resource Aquisition

No Thumbnail Available
Date
2010-03-30T06:33:05Z
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Many record matching problems involve information that is insufficient or incomplete, and thus solutions that classify which pairs of records are matches often involve acquiring additional information at some cost. For example, web resources impose extra query or download time. As the amount of resources that can be acquired is large, solutions invariably acquire only a subset of the resources to achieve a balance between acquisition cost and benefit. At the same time, resources often have hierarchical dependencies between themselves, e.g., the search engine results for two queries must be obtained before the TF-IDF cosine similarity between their snippets can be computed. We propose a framework for performing cost-sensitive acquisition of resources with hierarchical dependencies, and apply it to the web resource context. Our framework is versatile, applicable to a large variety of problems. We show that many problems involving selective resource acquisitions can be formulated using resource dependency graphs. We then solve the resource acquisition problem by casting it as a combinatorial search problem. As the support vector machine is commonly used to effectively solve record matching problems, we also propose a benefit function that works with this classifier. Finally, we demonstrate the effectiveness of our acquisition framework on record matching problems.
Description
Keywords
Citation