A Behavioral Investigation of Dimensionality Reduction

2012 
A Behavioral Investigation of Dimensionality Reduction Joshua M. Lewis Laurens van der Maaten Virginia R. de Sa josh@cogsci.ucsd.edu lvdmaaten@gmail.com desa@cogsci.ucsd.edu Department of Cognitive Science Pattern Recognition & Bio-informatics Lab Department of Cognitive Science University of California, San Diego Delft University of Technology University of California, San Diego wide range of techniques? 1 To answer this question, we need to evaluate whether humans are good at evaluating embed- dings. As there is no external authority we can appeal to, this is a daunting task. However, it is relatively easy to find out whether human data analysts are at least consistent in their evaluations, which is the first aim of this study. Consistency, across individuals and across a wide range of inputs, is a rea- sonable prerequisite for evaluation. Beyond investigating whether human data analysts are con- sistent when they evaluate embeddings, the second aim of this study is to investigate what humans are doing when they eval- uate embeddings. Such information could be useful for deter- mining whether humans are appropriate for an evaluation task with a known structure (e.g. if they naturally prefer embed- ding characteristics appropriate to the structure), or for devel- oping techniques that are tailored towards producing results that humans will find helpful (e.g. algorithms that selectively emphasize informative data structure). We can to some extent infer human strategies from the algorithms humans prefer, but we can also investigate those strategies by correlating embed- ding characteristics with human evaluations. Motivated by the two aims described above, we solicit em- bedding quality judgments from both novice and expert sub- jects in an effort to determine whether they are consistent in their ratings, and which embedding characteristics they find appealing. For the novice subjects, we manipulate dataset knowledge—half read a description and see samples from each dataset, and half do not. We hypothesize that provid- ing dataset information will increase consistency, as it should if the evaluative process is principled. The study consists of two experiments. The first presents subjects with a selection of embeddings derived from nine distinct dimensionality re- duction algorithms; the second uses embeddings from a sin- gle algorithm with several different parameter settings for a more controlled comparison between “clustered” and “grad- ual” embeddings. Abstract A cornucopia of dimensionality reduction techniques have emerged over the past decade, leaving data analysts with a wide variety of choices for reducing their data. Means of eval- uating and comparing low-dimensional embeddings useful for visualization, however, are very limited. When proposing a new technique it is common to simply show rival embeddings side-by-side and let human judgment determine which embed- ding is superior. This study investigates whether such human embedding evaluations are reliable, i.e., whether humans tend to agree on the quality of an embedding. We also investigate what types of embedding structures humans appreciate a pri- ori. Our results reveal that, although experts are reasonably consistent in their evaluation of embeddings, novices gener- ally disagree on the quality of an embedding. We discuss the impact of this result on the way dimensionality reduction re- searchers should present their results, and on applicability of dimensionality reduction outside of machine learning. Keywords: dimensionality reduction; unsupervised machine learning; psychophysics Introduction There is an evaluative vacuum in the dimensionality reduc- tion literature. In many other unsupervised machine learn- ing fields, such as density modeling, evaluation may be per- formed by measuring likelihoods of held-out test data. Al- ternatively, in domains such as topic modeling, human com- putation (Ahn, Maurer, McMillen, Abraham, & Blum, 2008) resources such as Amazon’s Mechanical Turk may be em- ployed to exploit the fact that humans are phenoms in evaluat- ing semantic structure (Chang, Boyd-Graber, Gerrish, Wang, & Blei, 2009). Human evaluations have also been used to assess image segmentation techniques (Martin, Fowlkes, Tal, & Malik, 2001). The field of dimensionality reduction, how- ever, lacks a standard evaluation measure (Venna, Peltonen, Nybo, Aidos, & Kaski, 2010), and is not as obvious a target for human intuition. Two or three dimensional embeddings can be visualized as scatter plots, but on what intuitive basis can we judge a 200 to 2-dimensional reduction to be good? In addition, Gestalt effects or simple rotations may bias human evaluations of scatter plots. Nevertheless, with no broadly agreed upon embedding quality measure (though a few have been proposed, see below), human judgment is often explic- itly and implicitly solicited in the literature. The most com- mon form of this solicitation consists of placing a scatter plot of the preferred embedding next to those of rival embeddings and inviting the reader to conclude that the preferred embed- ding is superior (e.g., (Maaten & Hinton, 2008)). If one is interested in applying a dimensionality reduction algorithm to visualize a dataset, is this a valid way to select from the Dimensionality reduction techniques Dimensionality reduction techniques can be subdivided into several categories: linear or non-linear, convex or non- convex, parametric or non-parametric, etc. (Lee & Verley- sen, 2007). Whilst many new techniques have been proposed over the last decade, data analysts still often resort to linear, convex, parametric techniques such as PCA to visualize their 1 Moreover, one should note that dimensionality reduction com- prises only a small part of the “visualization zoo” (Heer, Bostock, & Ogievetsky, 2010).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    27
    Citations
    NaN
    KQI
    []