MovieLens

Last updated

MovieLens is a web-based recommender system and virtual community that recommends movies for its users to watch, based on their film preferences using collaborative filtering of members' movie ratings and movie reviews. It contains about 11 million ratings for about 8500 movies. [1] MovieLens was created in 1997 by GroupLens Research, a research lab in the Department of Computer Science and Engineering at the University of Minnesota, [2] in order to gather research data on personalized recommendations. [3]

Contents

History

MovieLens was not the first recommender system created by GroupLens. In May 1996, GroupLens formed a commercial venture called Net Perceptions, which served clients that included E! Online and Amazon.com. E! Online used Net Perceptions' services to create the recommendation system for Moviefinder.com, [3] while Amazon.com used the company's technology to form its early recommendation engine for consumer purchases. [4]

When another movie recommendation site, eachmovie.org, [5] closed in 1997, the researchers who built it publicly released the anonymous rating data they had collected for other researchers to use. The GroupLens Research team, led by Brent Dahlen and Jon Herlocker, used this data set to jumpstart a new movie recommendation site, which they chose to call MovieLens. Since its inception, MovieLens has become a very visible research platform: its data findings have been featured in a detailed discussion in a New Yorker article by Malcolm Gladwell, [6] as well as a report in a full episode of ABC Nightline. [7] Additionally, MovieLens data has been critical for several research studies, including a collaborative study between Carnegie Mellon University, University of Michigan, University of Minnesota, and University of Pittsburgh, "Using Social Psychology to Motivate Contributions to Online Communities". [8]

During Spring in 2015, a search for "movielens" produced 2,750 results in Google Books and 7,580 in Google Scholar. [9]

Recommendations

MovieLens bases its recommendations on input provided by users of the website, such as movie ratings. [2] The site uses a variety of recommendation algorithms, including collaborative filtering algorithms such as item-item, [10] user-user, and regularized SVD. [11] In addition, to address the cold-start problem for new users, MovieLens uses preference elicitation methods. [12] The system asks new users to rate how much they enjoy watching various groups of movies (for example, movies with dark humor, versus romantic comedies). The preferences recorded by this survey allow the system to make initial recommendations, even before the user has rated a large number of movies on the website.

For each user, MovieLens predicts how the user will rate any given movie on the website. [13] Based on these predicted ratings, the system recommends movies that the user is likely to rate highly. The website suggests that users rate as many fully watched films as possible, so that the recommendations given will be more accurate, since the system would then have a better sample of the user's film tastes. [3] However, MovieLens' rating incentive approach is not always particularly effective, as researchers found more than 20% of the movies listed in the system have so few ratings that the recommender algorithms cannot make accurate predictions about whether subscribers will like them or not. [8] The recommendations on movies cannot contain any marketing values that can tackle the large number of movie ratings as a "seed dataset". [1]

In addition to movie recommendations, MovieLens also provides information on individual films, such as the list of actors and directors of each film. Users may also submit and rate tags (a form of metadata, such as "based on a book", "too long", or "campy"), which may be used to increase the film recommendations system's accuracy. [3]

Reception

By September 1997, the website had reached over 50,000 users. [3] When the Akron Beacon Journal 's Paula Schleis tried out the website, she was surprised at how accurate the website was in terms of recommending new films for her to watch based on her film tastes. [13]

Outside of the realm of movie recommendations, data from MovieLens has been used by Solution by Simulation to make Oscar predictions. [14]

Research

In 2004, a collaborative study with researchers from Carnegie Mellon University, University of Michigan, University of Minnesota and University of Pittsburgh designed and tested incentives derived from the social psychology principles of social loafing and goal-setting on MovieLens users. [8] The researchers saw that under-contribution seemed to be a problem for the community and set up a study to discern the most effective way to motivate users to rate and review more films. The study executed two field experiments; one involved email messages that reminded users of the uniqueness of their contributions and the benefits that follow from them, and the other gave users a range of individual or group goals for contribution.

The first experiment, based on the analysis of the MovieLens community’s cumulative response, found that users were more likely to contribute to the community when they were reminded of their uniqueness, leading them to think that their contributions are not duplicates of what other users are able to provide. Contrary to the researchers’ hypothesis, they also found that users were less likely to contribute when it was made salient to them the benefit they receive from rating or the benefit others receive when they rate. Lastly, they found no support for the relationship between uniqueness and benefit.

The second experiment found that users were also more likely to contribute when they were given specific and challenging goals and were led to believe that their contributions were needed in order to accomplish the group’s goal. The study found that in this particular context, giving users group-level goals actually increased contributions compared to individual goals, where the researchers predicted that the reverse would be true due to the effects of social loafing. The relationship between goal difficulty and user contributions in both the group and individual cases gave weak evidence that beyond a certain difficulty threshold, performance drops, instead of plateaus as previously hypothesized in Locke and Latham’s goal-setting theory.

Datasets

GroupLens Research, a human-computer interaction research lab at the University of Minnesota, provides the rating data sets collected from MovieLens website for research use. The full data set contains 26,000,000 ratings and 750,000 tag applications applied to 45,000 movies by 270,000 users. It also includes tag genome data with 12 million relevance scores across 1,100 tags (Last updated 8/2017). [15] There are many types of research conducted based on the MovieLens data sets. Liu et al. used MovieLens data sets to test the efficiency of an improved random walk algorithm by depressing the influence of large-degree objects. [16] GroupLens has terms of use for the dataset, and it accepts requests via the internet.

Related Research Articles

<span class="mw-page-title-main">Collaborative filtering</span> Algorithm

Collaborative filtering (CF) is a technique used by recommender systems. Collaborative filtering has two senses, a narrow one and a more general one.

A recommender system, or a recommendation system, is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.

In Internet culture, a lurker is typically a member of an online community who observes, but does not participate by posting. The exact definition depends on context. Lurkers make up a large proportion of all users in online communities. Lurking allows users to learn the conventions of an online community before they participate, improving their socialization when they eventually "de-lurk". However, a lack of social contact while lurking sometimes causes loneliness or apathy among lurkers.

Social computing is an area of computer science that is concerned with the intersection of social behavior and computational systems. It is based on creating or recreating social conventions and social contexts through the use of software and technology. Thus, blogs, email, instant messaging, social network services, wikis, social bookmarking and other instances of what is often called social software illustrate ideas from social computing.

Slope One is a family of algorithms used for collaborative filtering, introduced in a 2005 paper by Daniel Lemire and Anna Maclachlan. Arguably, it is the simplest form of non-trivial item-based collaborative filtering based on ratings. Their simplicity makes it especially easy to implement them efficiently while their accuracy is often on par with more complicated and computationally expensive algorithms. They have also been used as building blocks to improve other algorithms. They are part of major open-source libraries such as Apache Mahout and Easyrec.

Reputation systems are programs or algorithms that allow users to rate each other in online communities in order to build trust through reputation. Some common uses of these systems can be found on E-commerce websites such as eBay, Amazon.com, and Etsy as well as online advice communities such as Stack Exchange. These reputation systems represent a significant trend in "decision support for Internet mediated service provisions". With the popularity of online communities for shopping, advice, and exchange of other important information, reputation systems are becoming vitally important to the online experience. The idea of reputation systems is that even if the consumer can't physically try a product or service, or see the person providing information, that they can be confident in the outcome of the exchange through trust built by recommender systems.

Everyone's a Critic (EaC) was a film community website. It began as an experiment using a collaborative filtering algorithm to obtain film recommendations from people who share similar tastes in film. Over time, this recommendation system website grew into an internet community of cinephiles, critics and reviewers.

Product finders are information systems that help consumers to identify products within a large palette of similar alternative products. Product finders differ in complexity, the more complex among them being a special case of decision support systems. Conventional decision support systems, however, aim at specialized user groups, e.g. marketing managers, whereas product finders focus on consumers.

Cold start is a potential problem in computer-based information systems which involves a degree of automated data modelling. Specifically, it concerns the issue that the system cannot draw any inferences for users or items about which it has not yet gathered sufficient information.

The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users being identified except by numbers assigned for the contest.

Social information processing is "an activity through which collective human actions organize knowledge." It is the creation and processing of information by a group of people. As an academic field Social Information Processing studies the information processing power of networked social systems.

<span class="mw-page-title-main">GroupLens Research</span> Computer science research lab

GroupLens Research is a human–computer interaction research lab in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities specializing in recommender systems and online communities. GroupLens also works with mobile and ubiquitous technologies, digital libraries, and local geographic information systems.

Collaborative search engines (CSE) are Web search engines and enterprise searches within company intranets that let users combine their efforts in information retrieval (IR) activities, share information resources collaboratively using knowledge tags, and allow experts to guide less experienced people through their searches. Collaboration partners do so by providing query terms, collective tagging, adding comments or opinions, rating search results, and links clicked of former (successful) IR activities to users having the same or a related information need.

Discoverability is the degree to which something, especially a piece of content or information, can be found in a search of a file, database, or other information system. Discoverability is a concern in library and information science, many aspects of digital media, software and web development, and in marketing, since products and services cannot be used if people cannot find it or do not understand what it can be used for.

Gravity R&D is an IT vendor specialized in recommender systems. Gravity was founded by members of the Netflix Prize team "Gravity".

Incentive-centered design (ICD) is the science of designing a system or institution according to the alignment of individual and user incentives with the goals of the system. Using incentive-centered design, system designers can observe systematic and predictable tendencies in users in response to motivators to provide or manage incentives to induce a greater amount and more valuable participation. ICD is often considered when designing a system to induce desirable behaviors from users, such as participation and cooperation. It draws from principles in various areas such as economics, psychology, sociology, design, and engineering. ICD has been gaining attention in research communities due to the role it can play in helping systems benefit their users and ultimately achieve better results.

<span class="mw-page-title-main">John T. Riedl</span> American computer scientist

John Thomas Riedl was an American computer scientist and the McKnight Distinguished Professor at the University of Minnesota. His published works include highly influential research on the social web, recommendation systems, and collaborative systems.

Robust collaborative filtering, or attack-resistant collaborative filtering, refers to algorithms or techniques that aim to make collaborative filtering more robust against efforts of manipulation, while hopefully maintaining recommendation quality. In general, these efforts of manipulation usually refer to shilling attacks, also called profile injection attacks. Collaborative filtering predicts a user's rating to items by finding similar users and looking at their ratings, and because it is possible to create nearly indefinite copies of user profiles in an online system, collaborative filtering becomes vulnerable when multiple copies of fake profiles are introduced to the system. There are several different approaches suggested to improve robustness of both model-based and memory-based collaborative filtering. However, robust collaborative filtering techniques are still an active research field, and major applications of them are yet to come.

<span class="mw-page-title-main">Paul Resnick</span> American computer scientist

Paul Resnick is Michael D. Cohen Collegiate Professor of Information and Associate Dean for Research and Faculty Affairs at the School of Information at the University of Michigan.

Matrix factorization is a class of collaborative filtering algorithms used in recommender systems. Matrix factorization algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices. This family of methods became widely known during the Netflix prize challenge due to its effectiveness as reported by Simon Funk in his 2006 blog post, where he shared his findings with the research community. The prediction results can be improved by assigning different regularization weights to the latent factors based on items' popularity and users' activeness.

References

  1. 1 2 "MovieLens Database available from Technology Commercialization".
  2. 1 2 Schofield, Jack (2003-05-22). "Land of Gnod". The Guardian. London.
  3. 1 2 3 4 5 Ojeda-Zapata, Julio (1997-09-15). "New Site Personalizes Movie Reviews". St. Paul Pioneer Press. p. 3E.
  4. Booth, Michael (2005-01-30). "How do computers know so much about us?". The Denver Post. p. F01.
  5. Lim, Myungeun; Kim, Juntae (2001). "Web Intelligence: Research and Development". Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development. Asia-Pacific Conference on Web Intelligence. Lecture Notes in Computer Science. Vol. 2198/2001. Springer Berlin/Heidelberg. pp.  438–442. doi:10.1007/3-540-45490-X_56. ISBN   978-3-540-42730-8.
  6. Gladwell, Malcolm (October 4, 1999). "Annals of Marketing: The Science of the Sleeper: How the Information Age Could Blow Away the Blockbuster". New Yorker. 75 (29): 48–55. Archived from the original on December 30, 2009. Retrieved 2009-12-29.
  7. Krulwich, Robert (December 10, 1999). "ABC Nightline: Soulmate". ABC.
  8. 1 2 3 Beenen, Gerard; Ling, Kimberly; Wang, Xiaoqing; Chang, Klarissa; Frankowski, Dan; Resnick, Paul; Kraut, Robert E. (2004). "Using Social Psychology to Motivate Contributions to Online Communities". CommunityLab: 93–116. CiteSeerX   10.1.1.320.5540 .
  9. http://files.grouplens.org/papers/harper-tiis2015.pdf [ bare URL PDF ]
  10. Sarwar, Badrul, et al. "Item-based collaborative filtering recommendation algorithms." Proceedings of the 10th international conference on World Wide Web. ACM, 2001.
  11. Ekstrand, Michael D. Towards Recommender Engineering Tools and Experiments for Identifying Recommender Differences. Diss. UNIVERSITY OF MINNESOTA, 2014.
  12. Chang, Shuo, F. Maxwell Harper, and Loren Terveen. "Using Groups of Items to Bootstrap New Users in Recommender Systems." Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 2015.
  13. 1 2 Schleis, Paula (2000-11-13). "Site Lets Everybody be a Critic". Akron Beacon Journal. p. D2.
  14. Hickey, Walt. "Do Your Oscar Predictions Stack Up? Here's What The Data Says." FiveThirtyEight. N.p., 18 Feb. 2016. Web. 08 Mar. 2016. <http://fivethirtyeight.com/features/oscar-data-model-predictions-2015/>
  15. "GroupLens".
  16. Chuang Liu, Zhen Liu, Zi-Ke Zhang, Jun-Lin Zhou, Yan Fu, Da-Cheng Nie (2014). "A personalized recommendation algorithm via biased random walk". 11th International Joint Conference on Computer Science and Software Engineering (JCSSE).{{cite news}}: CS1 maint: multiple names: authors list (link)