Usability testing

Last updated

Usability testing is a technique used in user-centered interaction design to evaluate a product by testing it on users. This can be seen as an irreplaceable usability practice, since it gives direct input on how real users use the system. [1] It is more concerned with the design intuitiveness of the product and tested with users who have no prior exposure to it. Such testing is paramount to the success of an end product as a fully functioning application that creates confusion amongst its users will not last for long. [2] This is in contrast with usability inspection methods where experts use different methods to evaluate a user interface without involving users.

Contents

Usability testing focuses on measuring a human-made product's capacity to meet its intended purposes. Examples of products that commonly benefit from usability testing are food, consumer products, websites or web applications, computer interfaces, documents, and devices. Usability testing measures the usability, or ease of use, of a specific object or set of objects, whereas general human–computer interaction studies attempt to formulate universal principles.

What it is not

Simply gathering opinions on an object or a document is market research or qualitative research rather than usability testing. Usability testing usually involves systematic observation under controlled conditions to determine how well people can use the product. [3] However, often both qualitative research and usability testing are used in combination, to better understand users' motivations/perceptions, in addition to their actions.

Rather than showing users a rough draft and asking, "Do you understand this?", usability testing involves watching people trying to use something for its intended purpose. For example, when testing instructions for assembling a toy, the test subjects should be given the instructions and a box of parts and, rather than being asked to comment on the parts and materials, they should be asked to put the toy together. Instruction phrasing, illustration quality, and the toy's design all affect the assembly process.

Methods

Setting up a usability test involves carefully creating a scenario, or a realistic situation, wherein the person performs a list of tasks using the product being tested while observers watch and take notes (dynamic verification). Several other test instruments such as scripted instructions, paper prototypes, and pre- and post-test questionnaires are also used to gather feedback on the product being tested (static verification). For example, to test the attachment function of an e-mail program, a scenario would describe a situation where a person needs to send an e-mail attachment, and asking them to undertake this task. The aim is to observe how people function in a realistic manner, so that developers can identify the problem areas and fix them. Techniques popularly used to gather data during a usability test include think aloud protocol, co-discovery learning and eye tracking.

Hallway testing

Hallway testing, also known as guerrilla usability, is a quick and cheap method of usability testing in which people — such as those passing by in the hallway—are asked to try using the product or service. This can help designers identify "brick walls", problems so serious that users simply cannot advance, in the early stages of a new design. Anyone but project designers and engineers can be used (they tend to act as "expert reviewers" because they are too close to the project).

This type of testing is an example of convenience sampling and thus the results are potentially biased.

Remote usability testing

In a scenario where usability evaluators, developers and prospective users are located in different countries and time zones, conducting a traditional lab usability evaluation creates challenges both from the cost and logistical perspectives. These concerns led to research on remote usability evaluation, with the user and the evaluators separated over space and time. Remote testing, which facilitates evaluations being done in the context of the user's other tasks and technology, can be either synchronous or asynchronous. The former involves real time one-on-one communication between the evaluator and the user, while the latter involves the evaluator and user working separately. [4] Numerous tools are available to address the needs of both these approaches.

Synchronous usability testing methodologies involve video conferencing or employ remote application sharing tools such as WebEx. WebEx and GoToMeeting are the most commonly used technologies to conduct a synchronous remote usability test. [5] However, synchronous remote testing may lack the immediacy and sense of "presence" desired to support a collaborative testing process. Moreover, managing interpersonal dynamics across cultural and linguistic barriers may require approaches sensitive to the cultures involved. Other disadvantages include having reduced control over the testing environment and the distractions and interruptions experienced by the participants in their native environment. [6] One of the newer methods developed for conducting a synchronous remote usability test is by using virtual worlds. [7]

Asynchronous methodologies include automatic collection of user's click streams, user logs of critical incidents that occur while interacting with the application and subjective feedback on the interface by users. [6] Similar to an in-lab study, an asynchronous remote usability test is task-based and the platform allows researchers to capture clicks and task times. Hence, for many large companies, this allows researchers to better understand visitors' intents when visiting a website or mobile site. Additionally, this style of user testing also provides an opportunity to segment feedback by demographic, attitudinal and behavioral type. The tests are carried out in the user's own environment (rather than labs) helping further simulate real-life scenario testing. This approach also provides a vehicle to easily solicit feedback from users in remote areas quickly and with lower organizational overheads. In recent years, conducting usability testing asynchronously has also become prevalent and allows testers to provide feedback in their free time and from the comfort of their own home.

Expert review

Expert review is another general method of usability testing. As the name suggests, this method relies on bringing in experts with experience in the field (possibly from companies that specialize in usability testing) to evaluate the usability of a product.

A heuristic evaluation or usability audit is an evaluation of an interface by one or more human factors experts. Evaluators measure the usability, efficiency, and effectiveness of the interface based on usability principles, such as the 10 usability heuristics originally defined by Jakob Nielsen in 1994. [8]

Nielsen's usability heuristics, which have continued to evolve in response to user research and new devices, include:

Automated expert review

Similar to expert reviews, automated expert reviews provide usability testing but through the use of programs given rules for good design and heuristics. Though an automated review might not provide as much detail and insight as reviews from people, they can be finished more quickly and consistently. The idea of creating surrogate users for usability testing is an ambitious direction for the artificial intelligence community.

A/B testing

In web development and marketing, A/B testing or split testing is an experimental approach to web design (especially user experience design), which aims to identify changes to web pages that increase or maximize an outcome of interest (e.g., click-through rate for a banner advertisement). As the name implies, two versions (A and B) are compared, which are identical except for one variation that might impact a user's behavior. Version A might be the one currently used, while version B is modified in some respect. For instance, on an e-commerce website the purchase funnel is typically a good candidate for A/B testing, as even marginal improvements in drop-off rates can represent a significant gain in sales. Significant improvements can be seen through testing elements like copy text, layouts, images and colors.

Multivariate testing or bucket testing is similar to A/B testing but tests more than two versions at the same time.

Number of participants

In the early 1990s, Jakob Nielsen, at that time a researcher at Sun Microsystems, popularized the concept of using numerous small usability tests—typically with only five participants each—at various stages of the development process. His argument is that, once it is found that two or three people are totally confused by the home page, little is gained by watching more people suffer through the same flawed design. "Elaborate usability tests are a waste of resources. The best results come from testing no more than five users and running as many small tests as you can afford." [9]

The claim of "Five users is enough" was later described by a mathematical model [10] which states for the proportion of uncovered problems U

where p is the probability of one subject identifying a specific problem and n the number of subjects (or test sessions). This model shows up as an asymptotic graph towards the number of real existing problems (see figure below).

Virzis Formula.PNG

In later research Nielsen's claim has been questioned using both empirical evidence [11] and more advanced mathematical models. [12] Two key challenges to this assertion are:

  1. Since usability is related to the specific set of users, such a small sample size is unlikely to be representative of the total population so the data from such a small sample is more likely to reflect the sample group than the population they may represent
  2. Not every usability problem is equally easy-to-detect. Intractable problems happen to decelerate the overall process. Under these circumstances, the progress of the process is much shallower than predicted by the Nielsen/Landauer formula. [13]

It is worth noting that Nielsen does not advocate stopping after a single test with five users; his point is that testing with five users, fixing the problems they uncover, and then testing the revised site with five different users is a better use of limited resources than running a single usability test with 10 users. In practice, the tests are run once or twice per week during the entire development cycle, using three to five test subjects per round, and with the results delivered within 24 hours to the designers. The number of users actually tested over the course of the project can thus easily reach 50 to 100 people. Research shows that user testing conducted by organisations most commonly involves the recruitment of 5-10 participants. [14]

In the early stage, when users are most likely to immediately encounter problems that stop them in their tracks, almost anyone of normal intelligence can be used as a test subject. In stage two, testers will recruit test subjects across a broad spectrum of abilities. For example, in one study, experienced users showed no problem using any design, from the first to the last, while naive users and self-identified power users both failed repeatedly. [15] Later on, as the design smooths out, users should be recruited from the target population.

When the method is applied to a sufficient number of people over the course of a project, the objections raised above become addressed: The sample size ceases to be small and usability problems that arise with only occasional users are found. The value of the method lies in the fact that specific design problems, once encountered, are never seen again because they are immediately eliminated, while the parts that appear successful are tested over and over. While it's true that the initial problems in the design may be tested by only five users, when the method is properly applied, the parts of the design that worked in that initial test will go on to be tested by 50 to 100 people.

Example

A 1982 Apple Computer manual for developers advised on usability testing: [16]

  1. "Select the target audience. Begin your human interface design by identifying your target audience. Are you writing for businesspeople or children?"
  2. Determine how much target users know about Apple computers, and the subject matter of the software.
  3. Steps 1 and 2 permit designing the user interface to suit the target audience's needs. Tax-preparation software written for accountants might assume that its users know nothing about computers but are experts on the tax code, while such software written for consumers might assume that its users know nothing about taxes but are familiar with the basics of Apple computers.

Apple advised developers, "You should begin testing as soon as possible, using drafted friends, relatives, and new employees": [16]

Our testing method is as follows. We set up a room with five to six computer systems. We schedule two to three groups of five to six users at a time to try out the systems (often without their knowing that it is the software rather than the system that we are testing). We have two of the designers in the room. Any fewer, and they miss a lot of what is going on. Any more and the users feel as though there is always someone breathing down their necks.

Designers must watch people use the program in person, because [16]

Ninety-five percent of the stumbling blocks are found by watching the body language of the users. Watch for squinting eyes, hunched shoulders, shaking heads, and deep, heart-felt sighs. When a user hits a snag, he will assume it is "on account of he is not too bright": he will not report it; he will hide it ... Do not make assumptions about why a user became confused. Ask him. You will often be surprised to learn what the user thought the program was doing at the time he got lost.

Education

Usability testing has been a formal subject of academic instruction in different disciplines. [17] Usability testing is important to composition studies and online writing instruction (OWI). [18] Scholar Collin Bjork argues that usability testing is "necessary but insufficient for developing effective OWI, unless it is also coupled with the theories of digital rhetoric." [19]

Survey research

Survey products include paper and digital surveys, forms, and instruments that can be completed or used by the survey respondent alone or with a data collector. Usability testing is most often done in web surveys and focuses on how people interact with survey, such as navigating the survey, entering survey responses, and finding help information. Usability testing complements traditional survey pretesting methods such as cognitive pretesting (how people understand the products), pilot testing (how will the survey procedures work), and expert review by a subject matter expert in survey methodology. [20]

In translated survey products, usability testing has shown that "cultural fitness" must be considered in the sentence and word levels and in the designs for data entry and navigation, [21] and that presenting translation and visual cues of common functionalities (tabs, hyperlinks, drop-down menus, and URLs) help to improve the user experience. [22]

See also

Related Research Articles

<span class="mw-page-title-main">Prototype</span> Early sample or model built to test a concept or process

A prototype is an early sample, model, or release of a product built to test a concept or process. It is a term used in a variety of contexts, including semantics, design, electronics, and software programming. A prototype is generally used to evaluate a new design to enhance precision by system analysts and users. Prototyping serves to provide specifications for a real, working system rather than a theoretical one. In some design workflow models, creating a prototype is the step between the formalization and the evaluation of an idea.

Usability engineering is a professional discipline that focuses on improving the usability of interactive systems. It draws on theories from computer science and psychology to define problems that occur during the use of such a system. Usability Engineering involves the testing of designs at various stages of the development process, with users or with usability experts. The history of usability engineering in this context dates back to the 1980s. In 1988, authors John Whiteside and John Bennett—of Digital Equipment Corporation and IBM, respectively—published material on the subject, isolating the early setting of goals, iterative evaluation, and prototyping as key activities. The usability expert Jakob Nielsen is a leader in the field of usability engineering. In his 1993 book Usability Engineering, Nielsen describes methods to use throughout a product development process—so designers can ensure they take into account the most important barriers to learnability, efficiency, memorability, error-free use, and subjective satisfaction before implementing the product. Nielsen’s work describes how to perform usability tests and how to use usability heuristics in the usability engineering lifecycle. Ensuring good usability via this process prevents problems in product adoption after release. Rather than focusing on finding solutions for usability problems—which is the focus of a UX or interaction designer—a usability engineer mainly concentrates on the research phase. In this sense, it is not strictly a design role, and many usability engineers have a background in computer science because of this. Despite this point, its connection to the design trade is absolutely crucial, not least as it delivers the framework by which designers can work so as to be sure that their products will connect properly with their target usership.

<span class="mw-page-title-main">Paper prototyping</span> Software design technique

In human–computer interaction, paper prototyping is a widely used method in the user-centered design process, a process that helps developers to create software that meets the user's expectations and needs – in this case, especially for designing and testing user interfaces. It is throwaway prototyping and involves creating rough, even hand-sketched, drawings of an interface to use as prototypes, or models, of a design. While paper prototyping seems simple, this method of usability testing can provide useful feedback to aid the design of easier-to-use products. This is supported by many usability professionals.

<span class="mw-page-title-main">Usability</span> Capacity of a system for its users to perform tasks

Usability can be described as the capacity of a system to provide a condition for its users to perform the tasks safely, effectively, and efficiently while enjoying the experience. In software engineering, usability is the degree to which a software can be used by specified consumers to achieve quantified objectives with effectiveness, efficiency, and satisfaction in a quantified context of use.

Educational software is a term used for any computer software which is made for an educational purpose. It encompasses different ranges from language learning software to classroom management software to reference software. The purpose of all this software is to make some part of education more effective and efficient.

<span class="mw-page-title-main">Jakob Nielsen (usability consultant)</span> American computer scientist and usability professional (born 1957)

Jakob Nielsen is a Danish web usability consultant, human–computer interaction researcher, and co-founder of Nielsen Norman Group. He was named the “guru of Web page usability” in 1998 by The New York Times and the “king of usability” by Internet Magazine.

A heuristic evaluation is a usability inspection method for computer software that helps to identify usability problems in the user interface design. It specifically involves evaluators examining the interface and judging its compliance with recognized usability principles. These evaluation methods are now widely taught and practiced in the new media sector, where user interfaces are often designed in a short space of time on a budget that may restrict the amount of money available to provide for other types of interface testing.

Interaction design, often abbreviated as IxD, is "the practice of designing interactive digital products, environments, systems, and services." While interaction design has an interest in form, its main area of focus rests on behavior. Rather than analyzing how things are, interaction design synthesizes and imagines things as they could be. This element of interaction design is what characterizes IxD as a design field, as opposed to a science or engineering field.

The cognitive walkthrough method is a usability inspection method used to identify usability issues in interactive systems, focusing on how easy it is for new users to accomplish tasks with the system. A cognitive walkthrough is task-specific, whereas heuristic evaluation takes a holistic view to catch problems not caught by this and other usability inspection methods. The method is rooted in the notion that users typically prefer to learn a system by using it to accomplish tasks, rather than, for example, studying a manual. The method is prized for its ability to generate results quickly with low cost, especially when compared to usability testing, as well as the ability to apply the method early in the design phases before coding even begins.

Human-centered computing (HCC) studies the design, development, and deployment of mixed-initiative human-computer systems. It is emerged from the convergence of multiple disciplines that are concerned both with understanding human beings and with the design of computational artifacts. Human-centered computing is closely related to human-computer interaction and information science. Human-centered computing is usually concerned with systems and practices of technology use while human-computer interaction is more focused on ergonomics and the usability of computing artifacts and information science is focused on practices surrounding the collection, manipulation, and use of information.

Iterative design is a design methodology based on a cyclic process of prototyping, testing, analyzing, and refining a product or process. Based on the results of testing the most recent iteration of a design, changes and refinements are made. This process is intended to ultimately improve the quality and functionality of a design. In iterative design, interaction with the designed system is used as a form of research for informing and evolving a project, as successive versions, or iterations of a design are implemented.

End-user development (EUD) or end-user programming (EUP) refers to activities and tools that allow end-users – people who are not professional software developers – to program computers. People who are not professional developers can use EUD tools to create or modify software artifacts and complex data objects without significant knowledge of a programming language. In 2005 it was estimated that by 2012 there would be more than 55 million end-user developers in the United States, compared with fewer than 3 million professional programmers. Various EUD approaches exist, and it is an active research topic within the field of computer science and human-computer interaction. Examples include natural language programming, spreadsheets, scripting languages, visual programming, trigger-action programming and programming by example.

User experience design defines the experience a user would go through when interacting with a company, its services, and its products. User experience design is a user centered design approach because it considers the user's experience when using a product or platform. Research, data analysis, and test results drive design decisions in UX design rather than aesthetic preferences and opinions. Unlike user interface design, which focuses solely on the design of a computer interface, UX design encompasses all aspects of a user's perceived experience with a product or website, such as its usability, usefulness, desirability, brand perception, and overall performance. UX design is also an element of the customer experience (CX), and encompasses all aspects and stages of a customer's experience and interaction with a company.

RITE Method, for Rapid Iterative Testing and Evaluation, typically referred to as "RITE" testing, is an iterative usability method. It was defined by Michael Medlock, Dennis Wixon, Bill Fulton, Mark Terrano and Ramon Romero. It has been publicly championed by Dennis Wixon while working in the games space for Microsoft.

The pluralistic walkthrough is a usability inspection method used to identify usability issues in a piece of software or website in an effort to create a maximally usable human-computer interface. The method centers on recruiting a group of users, developers and usability professionals to step through a task scenario, discussing usability issues associated with dialog elements involved in the scenario steps. The group of experts used is asked to assume the role of typical users in the testing. The method is prized for its ability to be utilized at the earliest design stages, enabling the resolution of usability issues quickly and early in the design process. The method also allows for the detection of a greater number of usability problems to be found at one time due to the interaction of multiple types of participants. This type of usability inspection method has the additional objective of increasing developers’ sensitivity to users’ concerns about the product design.

In computer programming and software development, debugging is the process of finding and resolving bugs within computer programs, software, or systems.

Adaptive learning, also known as adaptive teaching, is an educational method which uses computer algorithms as well as artificial intelligence to orchestrate the interaction with the learner and deliver customized resources and learning activities to address the unique needs of each learner. In professional learning contexts, individuals may "test out" of some training to ensure they engage with novel instruction. Computers adapt the presentation of educational material according to students' learning needs, as indicated by their responses to questions, tasks and experiences. The technology encompasses aspects derived from various fields of study including computer science, AI, psychometrics, education, psychology, and brain science.

<span class="mw-page-title-main">API</span> Software interface between computer programs

An application programming interface (API) is a way for two or more computer programs or components to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build or use such a connection or interface is called an API specification. A computer system that meets this standard is said to implement or expose an API. The term API may refer either to the specification or to the implementation. Whereas a system's user interface dictates how its end-users interact with the system in question, its API dictates how to write code that takes advantage of that system's capabilities.

Agile usability engineering is a method created from a combination of agile software development and usability engineering practices. Agile usability engineering attempts to apply the principles of rapid and iterative development to the field of user interface design.

Feminist HCI is a subfield of human-computer interaction (HCI) that applies feminist theory, critical theory and philosophy to social topics in HCI, including scientific objectivity, ethical values, data collection, data interpretation, reflexivity, and unintended consequences of HCI software. The term was originally used in 2010 by Shaowen Bardzell, and although the concept and original publication are widely cited, as of 2020 Bardzell's proposed frameworks have been rarely used since.

References

  1. Nielsen, J. (1994). Usability Engineering, Academic Press Inc, p 165
  2. Mejs, Monika (2019-06-27). "Usability Testing: the Key to Design Validation". Mood Up team - software house. Retrieved 2019-09-11.
  3. Dennis G. Jerz (July 19, 2000). "Usability Testing: What Is It?". Jerz's Literacy Weblog. Retrieved June 29, 2016.
  4. Andreasen, Morten Sieker; Nielsen, Henrik Villemann; Schrøder, Simon Ormholt; Stage, Jan (2007). "What happened to remote usability testing?". Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. p. 1405. doi:10.1145/1240624.1240838. ISBN   978-1-59593-593-9. S2CID   12388042.
  5. Dabney Gough; Holly Phillips (2003-06-09). "Remote Online Usability Testing: Why, How, and When to Use It". Archived from the original on December 15, 2005.
  6. 1 2 Dray, Susan; Siegel, David (March 2004). "Remote possibilities?: international usability testing at a distance". Interactions. 11 (2): 10–17. doi:10.1145/971258.971264. S2CID   682010.
  7. Chalil Madathil, Kapil; Greenstein, Joel S. (2011). "Synchronous remote usability testing". Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. pp. 2225–2234. doi:10.1145/1978942.1979267. ISBN   978-1-4503-0228-9. S2CID   14077658.
  8. "Heuristic Evaluation". Usability First. Retrieved April 9, 2013.
  9. "Usability Testing with 5 Users (Jakob Nielsen's Alertbox)". useit.com. 2000-03-13.; references Nielsen, Jakob; Landauer, Thomas K. (1993). "A mathematical model of the finding of usability problems". Proceedings of the SIGCHI conference on Human factors in computing systems. pp. 206–213. doi:10.1145/169059.169166. ISBN   978-0-89791-575-5. S2CID   207177537.
  10. Virzi, R. A. (1992). "Refining the Test Phase of Usability Evaluation: How Many Subjects is Enough?". Human Factors. 34 (4): 457–468. doi:10.1177/001872089203400407. S2CID   59748299.
  11. Spool, Jared; Schroeder, Will (2001). Testing web sites: five users is nowhere near enough. CHI '01 extended abstracts on Human factors in computing systems. p. 285. doi:10.1145/634067.634236. S2CID   8038786.
  12. Caulton, D. A. (2001). "Relaxing the homogeneity assumption in usability testing". Behaviour & Information Technology. 20 (1): 1–7. doi:10.1080/01449290010020648. S2CID   62751921.
  13. Schmettow, Martin (1 September 2008). "Heterogeneity in the Usability Evaluation Process". Electronic Workshops in Computing. doi: 10.14236/ewic/HCI2008.9 .{{cite journal}}: Cite journal requires |journal= (help)
  14. "Results of the 2020 User Testing Industry Report". www.userfountain.com. Retrieved 2020-06-04.
  15. Bruce Tognazzini. "Maximizing Windows".
  16. 1 2 3 Meyers, Joe; Tognazzini, Bruce (1982). Apple IIe Design Guidelines (PDF). Apple Computer. pp. 11–13, 15.
  17. Breuch, Lee-Ann M. Kastman; Zachry, Mark; Spinuzzi, Clay (April 2001). "Usability Instruction in Technical Communication Programs: New Directions in Curriculum Development". Journal of Business and Technical Communication. 15 (2): 223–240. doi:10.1177/105065190101500204. S2CID   61365767.
  18. Miller-Cochran, Susan K.; Rodrigo, Rochelle L. (January 2006). "Determining effective distance learning designs through usability testing". Computers and Composition. 23 (1): 91–107. doi:10.1016/j.compcom.2005.12.002.
  19. Bjork, Collin (September 2018). "Integrating Usability Testing with Digital Rhetoric in OWI". Computers and Composition. 49: 4–13. doi:10.1016/j.compcom.2018.05.009. S2CID   196160668.
  20. Geisen, Emily; Bergstrom, Jennifer Romano (2017). Usability Testing for Survey Research. Cambridge: Elsevier MK Morgan Kaufmann Publishers. ISBN   978-0-12-803656-3.
  21. Wang, Lin; Sha, Mandy (2017-06-01). "Cultural Fitness in the Usability of U.S. Census Internet Survey in Chinese Language". Survey Practice. 10 (3). doi: 10.29115/SP-2017-0018 .
  22. Sha, Mandy; Hsieh, Y. Patrick; Goerman, Patricia L. (2018-07-25). "Translation and visual cues: Towards creating a road map for limited English speakers to access translated Internet surveys in the United States". Translation & Interpreting. 10 (2): 142–158. ISSN   1836-9324.