Week 1: Getting Acquainted

September 11, 2017

The project is off to a good start. I met the rest of the students with whom I will be working this year. While they are developing the application itself, my task will be researching and preparing novel ranking tools.

Introduction

Rankings are a simple way compare and evaluate things. They are used all the time; we see rankings in news articles, surveys, and media and technology reviews. Their power comes from their intuitiveness. For example, the New York Times Bestseller list ranks novels based on their sales, but we easily assume that a book at the top of the list will be much more worthwhile than one that didn't make the list. Unfortunately, the simplistic nature of rankings can also be misleading. A national ranking of colleges in the US might seem informative, but a student choosing a college to attend would be sorely misinformed unless he understood exactly what factors contributed to the rankings, and in what proportion.

The goal of the project is to help users understand rankings. A user will log in to an online application where they can manipulate an existing ranking or build one of their own.

Two Novel Ranking Tools

Learn an Existing Rank

Overview

Users input an existing ranking (or partial information about a ranking) and a dataset. The application learns which attributes of the dataset best explain the ranking.

Use Case

Imagine you are the president a university that is currently ranked #34 on a national ranking of colleges. Let's say you want to increase the ranking of your school. Specifically, you want to be among the top 20 since those are the ones most often cited in reports and subsequently receive the most student applications. You input the published ranking into our application, and you choose a dataset that contains characteristics of each ranked school (e.g. student/faculty ratio, cost of attendance, public or private, retention rate, geographic location). You run the program and find that the top 20 schools have low student/faculty ratios. You decide to investigate this further and end up hiring more faculty.

Key Features

This tool requires a way for users to input data, both the independent attributes, or characteristics, and the dependent outcome, or ranking.
There needs to be a way to process the data so that attributes are analyzed appropriately. For example, public vs private should be treated as a categorical variable, but student/faculty ratio is continuous.
The tool should inform the user how well the given attributes explain the rank. For example, the tool might report that a model built off of cost of attendance is 90% accurate. A competing model, built off of student/faculty ratio and geographic location is 93% accurate.

Build a Rank

Overview

Users have a vague idea of how items should be ranked. This tool uses partial input to create a rank.

Use Case

Imagine you are a student trying to choose a college to attend. You know you want to go to the Northeast United States, but you have a hard time comparing all the colleges. You want a school with a low student/faculty ratio and a reasonable cost of attendance. You know WPI is a better school than MIT. You also think MIT is better than Boston University. You input this information into our application and use an existing dataset of college statistics. The tool asks if public vs private is important to you. You say that private is better, but this characteristic matters less to you than cost of attendance. The tool returns a ranking.

Key Features

This tool allows users to input partial information related to the dataset. Additionally, the user can order the attributes in terms of importance (for example, cost of attendance matters more than public vs private).
The tool prompts for more input as necessary to generate a ranking. The visualization can show grey areas, such as where two colleges are ranked the same or their order is negligible.

Existing Application

This week, I reviewed LineUp, an existing application that shows how attributes can be weighted and used to create an aggregate rank score.

LineUp visualization. From publication[1]

This visualization does a great job showing how a ranking can be built from multiple attributes. The user can expand columns to increase the significance of a particular attribute. However, the amount to which a column is expanded is based on the whims of the user, and might be difficult to quantify. I want to investigate if simply ordering the attributes (x is better than y) can provide a consistent ranking.

Summary

In general, I would like to understand more how rankings are created. How can I reverse engineer an existing ranking? Standard data science methods exist to predict continuous and categorical variables. How does ranking differ from linear or logarithmic regression? Assuming I am able to create a model to explain a rank, how do I report that model back to a user in an informative way?

Stay tuned until next week when I answer these and future questions.

[1] Gratzl, S., Lex, A., Gehlenborg, N., Pfister, H. and Streit, M. Lineup: Visual analysis of multi-attribute rankings. IEEE transactions on visualization and computer graphics, 19, 12 (2013), 2277-2286.

Search This Blog

MaryAnn's Blog