System usability scale

Last updated

Strongly
disagree
Strongly
agree
 1 2 3 4 5
1.I think that I would like to use this system frequently.
2.I found the system unnecessarily complex.
3.I thought the system was easy to use.
4.I think that I would need the support of a technical person to be able to use this system.
5.I found the various functions in this system were well integrated.
6.I thought there was too much inconsistency in this system.
7.I would imagine that most people would learn to use this system very quickly.
8.I found the system very cumbersome to use.
9.I felt very confident using the system.
10.I needed to learn a lot of things before I could get going with this system.
Standard version of the system usability scale

In systems engineering, the system usability scale (SUS) is a simple, ten-item attitude Likert scale giving a global view of subjective assessments of usability. It was developed by John Brooke [1] at Digital Equipment Corporation in the UK in 1986 as a tool to be used in usability engineering of electronic office systems.

Contents

The usability of a system, as defined by the ISO standard ISO 9241 Part 11, can be measured only by taking into account the context of use of the system—i.e., who is using the system, what they are using it for, and the environment in which they are using it. Furthermore, measurements of usability have several different aspects:

Measures of effectiveness and efficiency are also context specific. Effectiveness in using a system for controlling a continuous industrial process would generally be measured in very different terms to, say, effectiveness in using a text editor. Thus, it can be difficult, if not impossible, to answer the question "is system A more usable than system B", because the measures of effectiveness and efficiency may be very different. However, it can be argued that given a sufficiently high-level definition of subjective assessments of usability, comparisons can be made between systems.

The formula for computing the final SUS score requires converting the raw scores, by subtracting 1 from each raw score, then utilizing the following equation [2] :

SUS has generally been seen as providing this type of high-level subjective view of usability and is thus often used in carrying out comparisons of usability between systems. Because it yields a single score on a scale of 0–100, it can be used to compare even systems that are outwardly dissimilar. This one-dimensional aspect of the SUS is both a benefit and a drawback, because the questionnaire is necessarily quite general.

Recently, Lewis and Sauro [3] suggested a two-factor orthogonal structure, which practitioners may use to score the SUS on independent Usability and Learnability dimensions. At the same time, Borsci, Federici and Lauriola [4] by an independent analysis confirm the two factors structure of SUS, also showing that those factors (Usability and Learnability) are correlated.

The SUS has been widely used in the evaluation of a range of systems. Bangor, Kortum and Miller [5] have used the scale extensively over a ten-year period and have produced normative data that allow SUS ratings to be positioned relative to other systems. They propose an extension to SUS to provide an adjective rating that correlates with a given score. Based on a review of hundreds of usability studies, Sauro and Lewis [6] proposed a curved grading scale for mean SUS scores.

Related Research Articles

Usability testing is a technique used in user-centered interaction design to evaluate a product by testing it on users. This can be seen as an irreplaceable usability practice, since it gives direct input on how real users use the system. It is more concerned with the design intuitiveness of the product and tested with users who have no prior exposure to it. Such testing is paramount to the success of an end product as a fully functioning application that creates confusion amongst its users will not last for long. This is in contrast with usability inspection methods where experts use different methods to evaluate a user interface without involving users.

Questionnaire construction refers to the design of a questionnaire to gather statistically useful information about a given topic. When properly constructed and responsibly administered, questionnaires can provide valuable data about any given subject.

<span class="mw-page-title-main">Usability</span> Capacity of a system for its users to perform tasks

Usability can be described as the capacity of a system to provide a condition for its users to perform the tasks safely, effectively, and efficiently while enjoying the experience. In software engineering, usability is the degree to which a software can be used by specified consumers to achieve quantified objectives with effectiveness, efficiency, and satisfaction in a quantified context of use.

<span class="mw-page-title-main">Likert scale</span> Psychometric measurement scale

A Likert scale is a psychometric scale named after its inventor, American social psychologist Rensis Likert, which is commonly used in research questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, although there are other types of rating scales.

ISO 9241 is a multi-part standard from the International Organization for Standardization (ISO) covering ergonomics of human-system interaction and related, human-centered design processes. It is managed by the ISO Technical Committee 159. It was originally titled Ergonomic requirements for office work with visual display terminals (VDTs). From 2006 onwards, the standards were retitled to the more generic Ergonomics of Human System Interaction.

<span class="mw-page-title-main">Personality test</span> Method of assessing human personality constructs

A personality test is a method of assessing human personality constructs. Most personality assessment instruments are in fact introspective self-report questionnaire measures or reports from life records (L-data) such as rating scales. Attempts to construct actual performance tests of personality have been very limited even though Raymond Cattell with his colleague Frank Warburton compiled a list of over 2000 separate objective tests that could be used in constructing objective personality tests. One exception, however, was the Objective-Analytic Test Battery, a performance test designed to quantitatively measure 10 factor-analytically discerned personality trait dimensions. A major problem with both L-data and Q-data methods is that because of item transparency, rating scales, and self-report questionnaires are highly susceptible to motivational and response distortion ranging from lack of adequate self-insight to downright dissimulation depending on the reason/motivation for the assessment being undertaken.

User experience (UX) is how a user interacts with and experiences a product, system or service. It includes a person's perceptions of utility, ease of use, and efficiency. Improving user experience is important to most companies, designers, and creators when creating and refining products because negative user experience can diminish the use of the product and, therefore, any desired positive impacts. Conversely, designing toward profitability as a main objective often conflicts with ethical user experience objectives and even causes harm. User experience is subjective. However, the attributes that make up the user experience are objective.

Mean opinion score (MOS) is a measure used in the domain of Quality of Experience and telecommunications engineering, representing overall quality of a stimulus or system. It is the arithmetic mean over all individual "values on a predefined scale that a subject assigns to his opinion of the performance of a system quality". Such ratings are usually gathered in a subjective quality evaluation test, but they can also be algorithmically estimated.

Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.

The Vividness of Visual Imagery Questionnaire (VVIQ) was developed in 1973 by the British psychologist David Marks. The VVIQ consists of 16 items in four groups of 4 items in which the participant is invited to consider the mental image formed in thinking about specific scenes and situations. The vividness of the image is rated along a 5-point scale. The questionnaire has been widely used as a measure of individual differences in vividness of visual imagery. The large body of evidence confirms that the VVIQ is a valid and reliable psychometric measure of visual image vividness.

The self-perceived quality-of-life scale is a psychological assessment instrument which is based on a comprehensive theory of the self-perceived quality of life (SPQL) and provides a multi-faceted measurement of health-related and non-health-related aspects of well-being. The scale has become an instrument of choice for monitoring quality of life in some clinical populations, for example, it was adopted by the Positively Sound network for women living with HIV.

User experience evaluation (UXE) or user experience assessment (UXA) refers to a collection of methods, skills and tools utilized to uncover how a person perceives a system before, during and after interacting with it. It is non-trivial to assess user experience since user experience is subjective, context-dependent and dynamic over time. For a UXA study to be successful, the researcher has to select the right dimensions, constructs, and methods and target the research for the specific area of interest such as game, transportation, mobile, etc.

The NASA Task Load Index (NASA-TLX) is a widely used, subjective, multidimensional assessment tool that rates perceived workload in order to assess a task, system, or team's effectiveness or other aspects of performance. It was developed by the Human Performance Group at NASA's Ames Research Center over a three-year development cycle that included more than 40 laboratory simulations. It has been cited in over 4,400 studies, highlighting the influence the NASA-TLX has had in human factors research. It has been used in a variety of domains, including aviation, healthcare and other complex socio-technical domains. It is a subjective self-reporting set of scores, and is not an objective measure of the Task Load that should be measured using objective metrics that examine the product of the speed and accuracy of users performing a task.

The Health Dynamics Inventory (HDI) is a 50 item self-report questionnaire developed to evaluate mental health functioning and change over time and treatment. The HDI was written to evaluate the three aspects of mental disorders as described in the Diagnostic and Statistical Manual of Mental Disorders (DSM): "clinically significant behavioral or psychological syndrome or pattern...associated with present distress...or disability". This also corresponds to the phase model described by Howard and colleagues Accordingly, the HDI assesses (1) the experience of emotional or behavioral symptoms that define mental illness, such as dysphoria, worry, angry outbursts, low self-esteem, or excessive drinking, (2) the level of emotional distress related to these symptoms, and (3) the impairment or problems fulfilling the major roles of one's life.

Component-based usability testing (CBUT) is a testing approach which aims at empirically testing the usability of an interaction component. The latter is defined as an elementary unit of an interactive system, on which behavior-based evaluation is possible. For this, a component needs to have an independent, and by the user perceivable and controllable state, such as a radio button, a slider or a whole word processor application. The CBUT approach can be regarded as part of component-based software engineering branch of software engineering.

The Questionnaire For User Interaction Satisfaction (QUIS) is a tool developed to assess users' subjective satisfaction with specific aspects of the human-computer interface. It was developed in 1987 by a multi-disciplinary team of researchers at the University of Maryland Human–Computer Interaction Lab. The QUIS is currently at Version 7.0 with demographic questionnaire, a measure of overall system satisfaction along 6 scales, and measures of 9 specific interface factors. These 9 factors are: screen factors, terminology and system feedback, learning factors, system capabilities, technical manuals, on-line tutorials, multimedia, teleconferencing, and software installation. Currently available in: German, Italian, Portuguese, and Spanish.

The International Personality Item Pool (IPIP) is a public domain collection of items for use in personality tests. It is managed by the Oregon Research Institute.

Tools, devices or software must be evaluated before their release on the market from different points of view such as their technical properties or their usability. Usability evaluation allows assessing whether the product under evaluation is efficient enough, effective enough and sufficiently satisfactory for the users. For this assessment to be objective, there is a need for measurable goals that the system must achieve. That kind of goal is called a usability goal. They are objective criteria against which the results of the usability evaluation are compared to assess the usability of the product under evaluation.

The six-factor model of psychological well-being is a theory developed by Carol Ryff that determines six factors that contribute to an individual's psychological well-being, contentment, and happiness. Psychological well-being consists of self-acceptance, positive relationships with others, autonomy, environmental mastery, a feeling of purpose and meaning in life, and personal growth and development. Psychological well-being is attained by achieving a state of balance affected by both challenging and rewarding life events.

The Pittsburgh Sleep Quality Index (PSQI) is a self-report questionnaire that assesses sleep quality over a 1-month time interval. The measure consists of 19 individual items, creating 7 components that produce one global score, and takes 5–10 minutes to complete. Developed by researchers at the University of Pittsburgh, the PSQI is intended to be a standardized sleep questionnaire for clinicians and researchers to use with ease and is used for multiple populations. The questionnaire has been used in many settings, including research and clinical activities, and has been used in the diagnosis of sleep disorders. Clinical studies have found the PSQI to be reliable and valid in the assessment of sleep problems to some degree, but more so with self-reported sleep problems and depression-related symptoms than actigraphic measures.

References

  1. Brooke, John (1996). "SUS: a "quick and dirty" usability scale". In P. W. Jordan; B. Thomas; B. A. Weerdmeester; A. L. McClelland (eds.). Usability Evaluation in Industry. London: Taylor and Francis.
  2. Lewis, James R. (3 July 2018). "The System Usability Scale: Past, Present, and Future". International Journal of Human–Computer Interaction. 34 (7): 577–590. doi:10.1080/10447318.2018.1455307. ISSN   1044-7318.
  3. Lewis, J.R. & Sauro, J. (2009). The factor structure of the system usability scale (PDF). San Diego, California: International conference (HCII 2009).
  4. Borsci, Simone; Federici, Stefano; Lauriola, Marco (2009). "On the dimensionality of the System Usability Scale: a test of alternative measurement models". Cognitive Processing. 10 (3): 193–197. doi:10.1007/s10339-009-0268-9. PMID   19565283. S2CID   1330990.
  5. Bangor, Aaron; Kortum, Philip T.; Miller, James T. (2008). "An Empirical Evaluation of the System Usability Scale". International Journal of Human-Computer Interaction. 24 (6): 574–594. doi:10.1080/10447310802205776. S2CID   29843973.
  6. Sauro, J.; Lewis, J.R. (2012). Quantifying the user experience: Practical statistics for user research. Waltham, Massachusetts: Morgan Kaufmann. doi:10.1016/C2010-0-65192-3. ISBN   9780123849687.

Further reading