Design Tests Done Right
From user to usability testing
The design testing toolbox contains a range of optimised tools that answer different questions in a user-centred design process. Usability tests are one essential test type in this toolbox.
We need to get these testing tools right to realise their full value.
Here is how.
To get testing right, it is helpful to consider the overall objectives of a user-centred design process. At the highest level, we test to understand:
What unmet user needs should a new design (tool) answer?
Does our design support the user in using the tool correctly and effortlessly?
These two objectives correspond to the double diamond model's two main phases (diamonds). To support each of these, several specific design testing tools are available.
In the following, I will go through these different test types to outline how they differ in their objectives and some of the test tools used to conduct them.
This article is based on the topics we cover in our 3-day Usability Testing - Data-driven UX Design course.
You can read more about the course here: →UX Campus: Usability Tests.
⚠️ One final note before we begin:
There are no standardised terms to distinguish the test type we conduct. Therefore, we need to be careful and detail the test objective and the specific labels we assign to that kind of test. For practical reasons, I, therefore, refer to general tests as “UX Design Tests” and then distinguish between the 1) Problem-focused tests and 2) Solution-focused tests.
Enjoy!
Problem space tests
In the first “design the right thing” diamond, we have tests that help us explore and validate the design problem space:
Diagnostic tests help us identify what the problem is with an existing design.
Benchmark test to understand how our design performs relative to competitors.
Design probe test to explore how concepts (based on user research) resonate with users to validate user needs.
▶︎ These tools are summed up as “user testing tools”.
⚠️ Please note that a test is different from more general user research. So while everything we do in the first problem-focused diamond is user research, we have specific user research tools that are user tests. You will likely find other resources where tests belonging to the second solution space diamond are also referred to as a user test. For instance, usability tests are often referred to as user tests. This is in no way wrong - it simply requires us to specify a term for tests in the first diamond.
Ok, no more precautionary disclaimers ;-)
Let us move on to UX Design Tests we conduct in the second diamond, where we find the basic usability test.
Solution space tests (where usability tests live)
In the second “design it right” diamond, we have tests that help us explore and validate designs in our design solution space:
Usability tests helps mature our design (also referred to as a “formative usability test” in human factors engineering).
Component tests help us validate the quality of a single design component. This could be to optimise the ergonomy of a handle.
User acceptance tests (UATs) help us verify that our design technically functions as intended (does not involve users — read more below).
Final validation usability test of our design to ensure that our design performs as intended in the hands of real users (also referred to as a “summative usability test” in human factors engineering).
▶︎ These tools can be summed up as “usability testing tools”.
With software products like web pages, we can conduct a special kind of usability test after the design launch. In the production environment - meaning the actual live design people use - we can introduce variations of the same to test the usability effect of small (or more significant) design tweaks. With two (or more) versions of our design, we can see how the A and B designs perform relative to each other - hence the term A-B testing. Companies with a LARGE user base can test many variations of the same design to make small usability optimisation to the design continuously.
The two design tests groups can be thought of as reducing project risk in two different ways:
The problem space design tests reduce the risk of not addressing a real user need in the right way.
The solution space design tests reduce the risk of designing solutions that users struggle to use (i.e., have poor usability).
With these two primary groups of tests established, we can start to dig into some of the myths, misconceptions and sometimes even malpractices of design testing in general and usability testing in particular.
Brace yourself!
Overview of misconceptions:
Distinguish between user and usability testing - they are not the same.
There are no users in a UAT - it is NOT a usability test.
The “think-aloud test” does not exist.
We need 5-8 users in a usability test - well, maybe…
There are three types of usability tests.
Don't be nice to users!
1) Distinguish between user and usability testing - they are not the same.
Sometimes usability tests are carried out in a way where they become more like a concept validation test.
Naturally, if the concept is the very first prototype that manifests a user need into something tangible, some product or project managers would like to understand if end-users endorse the design solution.
However, if you are design-maturing and need to explore what to improve to make the design work (e.g., usability), you won’t learn anything substantial by engaging with users to learn what they think about the design concept.
Essentially, in a usability test, we want users NOT to reflect but to DO. We want users to carry out meaningful tasks to identify where our current design creates friction that needs to be removed.
Unfortunately, due to the widespread use of the “Think aloud protocol”, we risk staging that users talk about our design rather than using it. Of course, the formal description of the think-aloud protocol clearly specifics that users should not reflect but merely narrate what they do. This correct use is, however, seldom achieved.
Our recommendation is therefore, NOT to use the think-aloud protocol at all. This is further motivated by the biases it can introduce to getting good usability insights.
→ This point is detailed in our article: Usability Dieselgate.
This topic is also treated in detail at the Usability Testing course where we introduce and discuss Retrospective Think Aloud as an alternative to the Concurrent Think Aloud.
2) There are no users in a UAT - it is not a usability test.
The User Acceptance Test (UAT) is often mistaken for a usability test. Likely due to the inclusion of "user" in the name.
However, the UAT has nothing to do with users. It is a purely technical review of the design to verify the intended functionality.
To help avoid this misconception of UATs as a usability test, we can think of it as:
"Is the design ready (acceptable) enough to pass onto users for usability testing?".
3) The "think-aloud test" does not exist... and we need to distinguish between "test types" and "data sources".
When we carry out tests, we aim to sample data we can analyse to answer our questions. To do so, we have different data gathering tools and data types.
When we run a specific type of test, to answer a design question we also need to decide which types of data we would like to sample during the test.
At a very general level, we can split these data types into two categories:
Objectively measured data (typically behavioural data).
Subjectively reported data (everything the users say or rate).
The think-aloud protocol is a data-gathering tool that helps us understand why users do as they do in a test - typically a usability test. It does so by sampling data about what goes on “in the head of the user” as they interact with our design by having users verbalise their continuous stream of thoughts. Think-aloud data constitutes a subjective data source despite the fact we encourage users not to share opinions but merely to “let us in on their mental processes as they naturally occur”.
In that sense, a think-aloud test does not exist. Usability tests are tests, and they can be conducted with or without the think-aloud protocol.
Having users verbalise their thoughts makes it deceptively easy to shift from usability testing to concept validation, as addressed earlier. Here you have users “commenting on what they do” …. why not have them comment on the design as well!
Furthermore, since data analysis from usability tests is time-consuming, why not have the users comment on what they think should be fixed.
To both ideas: DON'T DO IT.
🎓 At our three-day course in Usability Testing, we dig deeper into how even correct use of the think-aloud protocol can give misleading results by shifting the user interaction from basic cognitive functions to higher-level intellectual.
4) We need 5-8 users in a usability test
... well, maybe!
It depends on what you are trying to achieve
Yep, that is the classical recommendation for running a usability test during design development as part of the design maturation.
However, the premise is that this is not the only usability test you will run.
Also, the logic is that we only seek to identify problems - not in any way try to quantify or prioritise these. All identified issues constitute equal problems that should be addressed.
Inherent to the number 5-8 is also that for some types of design, it will reveal around 80% of the potential problems. That is, of course a very broad generalisation. Specific types of design may have a completely different “user problem structure”, - especially if they are complex user interfaces.
Another crucial point is that this type of usability test should aptly be referred to as a formative usability test. This contrasts with a validation usability test (also called summative test) that helps us document how well the design works at a more general population level. You can think of it as the final exam your design takes before being released to the world. This logic is wholly formalised when working in regulated industries such as HealthTech. and pharma.
🎓 At our three-day course in Usability Testing, we discuss in detail how to scope usability tests with regard to recruitment of participants. Also, we share our best practice for how to best recruit test participants.
5) There are three types of usability tests: Diagnostic, Formative and Summative. Guerrilla and hallway don't count.
Overall there are three types of usability tests - or versions of the usability test that most people think about when we talk about testing how user-friendly a design is.
These three usability tests all share a common logic or structure regarding how the tests are designed and conducted: we ask users to use our design to carry out specific tasks to reach an overall goal.
That is it!
Here are three examples:
Please administer an injection with this insulin pen.
Please order a meal using this food delivery app.
Please set up this connected air cleaner.
To do this correctly, we have to design a test that creates a realistic use of our design. As mentioned, the think-aloud protocol should be used VERY carefully and preferably avoided since it creates an unrealistic setting where usability problems can disappear.
In some usability tests, we see users either do things they would never do in actual life use or sometimes omit doing things they would typically have done. We refer to these test biases as “study artefacts”. How to design a realistic setting is thus a critical part of what we teach in our course → Usability Testing.
How the three usability tests differ
Diagnostic, formative and summative tests belong to three different places in the design process and product life-cycle. Here is how:
The diagnostic usability tests can be used in design projects focusing on updates or refresh an existing design. That is very common for many companies to release updates to their design or make so-called line extensions. In such cases, running a diagnostic usability test as part of the problem space exploration can be valuable to understand how the existing design performs. At our Design Psychology UX Lab, we also run diagnostic usability tests to help clients identify why a design generates a lot of customer support traffic. Results and insights from the diagnostic usability test can hereby provide input for the project “value proposition” that scopes what the design development process should focus on. A diagnostic usability test can be conducted in a qualitative style with 5-8 participants if you only want to identify potential sources of the problems. If you need data about how many users experience a problem and how often, you need to move to a more quantitative setup with more users (15+).
The formative usability test is used in the design development process in the second “design it right” diamond. Ideally, we conduct formative usability tests until we are sure we have identified and solved all design issues that cause usability problems. To carry out a formative usability test, you, therefore, need a precise understanding of all the tasks that users need to carry out AND how they should be carried out correctly. That is called a task analysis and is a central tool we introduce in our Usability Testing course.
The summative usability test (also called a validation test) helps validate that all intended user groups can carry out all the intended tasks in all intended contexts. This requires us to test with multiple user groups, AND we need to test with a quantitative setup. Naturally, this can be quite a costly test. However, the costs should be seen in contrast to the cost consequences of realising a design that is not fully validated. Worst case, it can cause harm to people if it is used incorrectly. More generally, we see that designs that are not validated cause a lot of user frustration, errors, excessive calls to support - or just plain and simply that designs fail to sell.
The Guerrilla and hallway tests do not count
A guerrilla and hallway test are quick and dirty versions of a usability test. And primarily the formative usability test type.
A formative usability test often requires good reparation, especially if you want good data. Some design projects do not have the time and resources to do that; therefore, quick and dirty tests have been introduced. You can argue that a bit of testing is always better than no testing. That logic seems deceptively right. But consider this: if you were to weigh yourself on a scale, you would never settle for only placing one foot on the scale to do a “little weighing”. That weighing test would be inherently flawed to the point where the result cannot be used.
To give you an idea what a fully qualified usability often require here are some main aspects that need to be in place:
Before the usability test:
UX Requirements (what is the agreed upon quality we should ensure is delivered?).
A task analysis that breaks down all the tasks and the sub-tasks a user need to be able to complete. The PCA adds additional detail to this analysis relative to what the user should be able to 1) Perceive, 2) Think and 3) Do.
Study design require us to carefully consider and design the context and staging of the test.
Recruitment can be tricky if you need to test with users that have very particular qualities - like patient groups.
During the usability test:
The protocol is the agreed upon purpose and objective of the usability test. It needs to be completed prior to the test itself but is also a supportive document during the test as it specifies all the test details.
The moderator guide is the “cheat sheet” of the moderator to ensure the usability test is conducted exactly as specified in the protocol for every participant.
Data collection covers all the tools we use to collect data - from video recordings, to rating scales to eye-tracking.
After the usability test:
Root-causes. To ensure our usability tests provides value to the project we need to deliver actionable insights. To do so we need to analyse all the usability problems we identify for their root-causes. What behavioural cause is at the root of the usability problem. Once we understand that we also have qualified input for the project team on what needs to be fixed.
Reporting. All usability problems and their root-causes are gathered in a report - for instance a powerpoint presentation. Note that the report in regulated industries constitute a legal document that needs to be reviewed, approved and signed.
Mitigation workshop. Based on the report it is very practical to meet with the design team to discuss how to make design changes (formally referred to as design mitigations). Ideas to mitigations can be discussed and qualified on the spot which makes such a workshop very effective.
…as you have probably guessed we will teach you all the practical details of these tools at our 3-day course in Usability Testing.
This means that minimum requirements need to be in place before we can meaningfully test. Beyond those minimum requirements, there are ways to scale a usability test up and down in complexity.
For instance, if what you are testing does not require a particular user group, then you can use a so-called “convenience sample”, which basically translates into testing with colleagues that you pull from the hallway or canteen (hence the name a “canteen-test”).
At our UX Lab we use “canteen-tests” in setups where we test a component and need many people. For instance to try to attach connectors and cables to a device repeatedly.
Component usability testing is also a topic we cover in our course ➞ Usability Testing.
6) Don't be nice to users!
If you pick up any textbook on how to conduct a usability test, there will likely be a paragraph about how to be nice to test users or test participants.
It could be something like:
It is important that you create a comfortable atmosphere and that you are not disturbed during testing. Therefore, we should say that it is not them but the design we are testing. Also, it can be a good idea to have the user settle down and maybe have a small warm-up session before the actual testing commences. Finally, we should also let the test participants know that they are free to opt-out at any point during the test without providing an excuse.
While these recommendations are great, they can also be misleading. All the above considerations will likely bring the test participant to an ideal high-performer super version of themselves. And that is problematic. We don’t need their best selves but their usual selves of how they would typically perform using our design.
Therefore, we need to consider ways to “reduce” test participants' performance to a realistic level.
Essentially, we need to look for ways to make or test participants less smart. How smart we are is also called the “intelligence quotient” or IQ. There we essentially try to induce an IQ drop and ensure that users are focused on what they would naturally focus on (or not focus on).
An excellent way to induce IQ drops is to speed up the tempo. You can ask test participants to complete a task as quickly as possible. Or set a time constraint to stress them. In some projects, we use outside stressors like noise. The IQ-drop stress toolbox is a topic we cover in detail in our ➞ Usability Testing 3-day course.
We can also work with staging the proper focus. If you want to test the usability of a door - don’t ask people to open the door. Instead, create a task that implicitly requires them to open the door. We call this “task embedding”. And it makes a difference. Asking people to open a door, they will consciously focus on doing precisely that. Our conscious analytical skills are way more robust to overcome poor design. Therefore, we should test against more basic skills to see how the door performs.
Your next UX design test
Hopefully, this article has provided inspiration for your next design test.
We are very happy to assist you in several ways:
The Design Psychology UX Lab team is ready to help you design and execute your next design test. See our UX Lab page for more details.
You can also grow your expertise and attend our UX Campus 3-day course in Usability Testing.
Finally, you are always welcome to reach out to us with any questions you may have.