DSpace Angular :: Browsing by Author "KAN, Min-Yen"

Browsing by Author "KAN, Min-Yen"

Now showing 1 - 8 of 8

Cost-sensitive Attribute Value Acquisition for Support Vector
(2010-03-30T06:40:59Z) TAN, Yee Fan; KAN, Min-Yen
We consider cost-sensitive attribute value acquisition in classification problems, where missing attribute values in test instances can be acquired at some cost. We examine this problem in the context of the support vector machine, employing a generic, iterative framework that aims to minimize both acquisition and misclassification costs. Under this framework, we propose an attribute value acquisition algorithm that is driven by the expected cost savings of acquisitions, and for this we propose a method for estimating the misclassification costs of a test instance before and after acquiring one or more missing attribute values. In contrast to previous solutions, we show that our proposed solutions generalize to support vector machines that use arbitrary kernels. We conclude with a set of experiments that show the effectiveness of our proposed algorithm.
Extending corpus-based identification of light verb constructions using a supervised learning framework
(2005-08-19T02:32:51Z) TAN, Yee Fan; KAN, Min-Yen; CUI, Hang
Light verb constructions (LVC) such as "make a call" and "give a presentation" pose challenges for natural language processing and understanding. We propose corpus-based methods to automatically identify LVCs. We extend existing corpus-based measures for identifying LVCs among verb-object pairs, using new features that use mutual information and assess the influence of other words in the context of a candidate verb-object pair, such as nouns and prepositions. To our knowledge, our work is the first to incorporate both existing and new LVC features into a unified machine learning approach. We experimentally demonstrate the superior performance of our framework and the effectiveness of the newlyproposed features.
Fast Webpage Classification Using URL Features
(2005-08-25T03:36:30Z) KAN, Min-Yen; NGUYEN THI, Hoanh Oanh
We demonstrate the usefulness of the uniform resource locator (URL)alone in performing web page classification. This approach is magnitudes faster than typical web page classification, as the pages themselves do not have to be fetched and analyzed. Our approach segments the URL into meaningful chunks and adds component, sequential and orthographic features to model salient patterns. The resulting binary features are used in supervised maximum entropy modeling. We analyze our approach's effectiveness in binary, multi-class and hierarchical classification. Our results show that, in certain scenarios, URL-based methods approach and sometime exceeds the performance of full-text and link-based methods. We also use these features to predict the prestige of a webpage (as modeled by Pagerank), and show that it can be predicted with an average error of less than one point (on a ten-point scale) in a topical set of web pages.
A Framework for Hierarchical Cost-sensitive Web Resource Aquisition
(2010-03-30T06:33:05Z) TAN, Yee Fan; KAN, Min-Yen
Many record matching problems involve information that is insufficient or incomplete, and thus solutions that classify which pairs of records are matches often involve acquiring additional information at some cost. For example, web resources impose extra query or download time. As the amount of resources that can be acquired is large, solutions invariably acquire only a subset of the resources to achieve a balance between acquisition cost and benefit. At the same time, resources often have hierarchical dependencies between themselves, e.g., the search engine results for two queries must be obtained before the TF-IDF cosine similarity between their snippets can be computed. We propose a framework for performing cost-sensitive acquisition of resources with hierarchical dependencies, and apply it to the web resource context. Our framework is versatile, applicable to a large variety of problems. We show that many problems involving selective resource acquisitions can be formulated using resource dependency graphs. We then solve the resource acquisition problem by casting it as a combinatorial search problem. As the support vector machine is commonly used to effectively solve record matching problems, we also propose a benefit function that works with this classifier. Finally, we demonstrate the effectiveness of our acquisition framework on record matching problems.
How Do People Organize Their Photos in Each Event and How Does It Affect Storytelling, Searching and Interpretation Tasks?
(2012-04-04) GOZALI, Jesse Prabawa; KAN, Min-Yen; SUNDARAM, Hari
This paper explores photo organization within an event photo stream, i.e. the chronological sequence of photos from a single event. The problem is important: with the advent of inexpensive, easy-to-use photo capture devices, people can take a large number of photos per event. A family trip, for example, may include hundreds of photos. In this work, we have developed a photo browser that uses automatically segmented groups of photos|referred to as chapters|to organize such photos. The photo browser also affords users with a drag-and-drop interface to refine the chapter groupings. We conducted an exploratory study of 23 college students with their 8096 personal photos from 92 events, to understand the role of different spatial organization strategies in our chapter-based photo browser, in performing storytelling, photo search and photo set interpretation tasks. We also report novel insights on how the subjects organized their photos into chapters. We tested three layout strategies: bi-level, grid-stacking and space-filling, against a baseline plain grid layout. We found that subjects value the chronological order of the chapters more than maximizing screen space usage and that they value chapter consistency more than the chronological order of the photos. For automatic chapter groupings, having low chapter boundary misses is more important than having low chapter boundary false alarms; the choice of chapter criteria and granularity for chapter groupings are very subjective; and subjects found that chapter-based photo organization helps in all three tasks of the user study. Users preferred the chapter-based layout strategies to the baseline at a statistically significant level, with the grid-stacking strategy preferred the most.
Perspectives on Crowdsourcing Annotations for Natural Language Processing
(2010-07-27T01:46:23Z) WANG, Aobo; HOANG, Cong Duy Vu; KAN, Min-Yen
Crowdsourcing has emerged as a new method for obtaining annotations for training models for machine learning. While many variants of this process exist, they largely differ in their method of motivating subjects to contribute and the scale of their applications. To date, however, there has yet to be a study that helps a practitioner to decide what form an annotation application should take to best reach its objectives within the constraints of a project. We first provide a faceted analysis of existing crowdsourcing annotation applications. We then use our analysis to discuss our recommendations on how practitioners can take advantage of crowdsourcing and discuss our view on potential opportunities in this area.
Product Review Summarization based on Facet Identification and Sentence Clustering
(2011-10-07T01:28:56Z) LY, Duy Khang; SUGIYAMA, Kazunari; LIN, Ziheng; KAN, Min-Yen
Product review nowadays has become an important source of information, not only for customers to find opinions about products easily and share their reviews with peers, but also for product manufacturers to get feedback on their products. As the number of product reviews grows, it becomes difficult for users to search and utilize these resources in an efficient way. In this work, we build a product review summarization system that can automatically process a large collection of reviews and aggregate them to generate a concise summary. More importantly, the drawback of existing product summarization systems is that they cannot provide the underlying reasons to justify users’ opinions. In our method, we solve this problem by applying clustering, prior to selecting representative candidates for summarization.
Rich and Dynamic Library Catalogs: A Case Study of Online Search Interfaces
(2007-08-14) GOZALI, Jesse Prabawa; KAN, Min-Yen
We redesign the user interface of an online library catalog, leveraging current web technologies that allow dynamic and fine-grained user interaction. Over the course of our iterative design and test cycle, we identified four key areas where such dynamic web technologies can be used to improve the support for typical information seeking strategies, namely: 1) the user of overview + details, 2) a tabular data display, 3) using tabs as a history mechanism and 4) embedding a suggestion bar. We believe that the revised affordances created by our changes in these four areas will inform the design of future search interfaces.

Browsing by Author "KAN, Min-Yen"

Results Per Page

Sort Options