10 Tips For Running Usability Benchmark Tests

With the emergence of tools like Loop11, conducting usability benchmark tests is now more common than ever. Benchmark studies are a great way to see how design or content changes affect website usability. Since benchmark tests are all about tracking usability over time, it’s important to be consistent with the process. With this in mind, Jeff Sauro at measuringusability.com has shared his “10 Tips For Benchmark Usability Test”. Here’s a quick rundown:

  1. 1. Recruit for representativeness over randomness
  2. 2. Triangulate using multiple metrics
  3. 3. Estimate Sample Size using the desired margin of error
  4. 4. Counterbalance tasks
  5. 5. Collect both Post-Test and Post-Task Satisfaction
  6. 6. Combine measures into a Single Usability Metric for reporting
  7. 7. Use confidence intervals around all your metrics
  8. 8. Conduct a pilot test
  9. 9. Include some cheater/speeder detection for remote usability tests
  10. 10. When you record task time don’t throw away the failed task-times

To view the full and detailed list, click here to read “10 Tips For Benchmark Usability Test” on the measuringusability.com blog.

Happy Testing

High Usability, Low Cost

A recent blog post by Jakob Nielsen, describes how to achieve high UI usability through an in-depth user testing process involving three different methods:

1.      Iterative design

2.      Parallel design

3.      Competitive testing

It is a great and thorough guide to creating and testing designs for high website usability. However, the kind of “test early, test often” approach described here and advocated by usability professionals world-wide is prohibitively expensive – even for large corporate entities. For example, the Nielsen Norman Group currently charges $45,000 for most competitive testing, which is likely to be out reach for most organisations. And that’s before you’ve even started your parallel and iterative design testing.

In this article we wanted to explain how you can run the same kind of testing described by Jakob Nielsen at a fraction of the cost by running online usability testing with Loop11.

Before we get started, it is important to know that there are benefits and drawbacks of both online and lab-based (or moderated) testing. This article is essential reading for anyone about to embark on any form of usability testing.

Iterative Testing with Loop11:

Iterative testing is about testing an initial design, making changes to it and testing it again. The process is repeated several times.

To do this with Loop11, simply create and initial website design, preferably beginning with lo-fidelity wireframes and run your online study with Loop11. Analyse the data and make appropriate changes to the design and test them again with Loop11. This process should be repeated until you have a final design (not wireframes).

Cost: $350 per iteration.

Parallel Design Testing with Loop11:

In parallel design testing, users are asked to test several variations of the same design at the same time. The best areas of each design are then merged into a single design.

Doing this with Loop11 is easy. You’ll need to prepare at least two different designs, run your online study in which you ask participants to complete the same task on multiple designs.
In parallel design testing it is important that each participant only performs tasks on one design otherwise there will be a learning bias.

Cost: $350 per design.

Competitive Testing:

Competitive testing involves comparing your design with your competitors. Doing this with Loop11 is extremely easy and powerful. Simply create a project and ask users to complete the same (or similar) tasks across several websites (yours and your competitors). Since Loop11 requires no downloading or no coding you will be able to test your competitors’ live sites.

Cost: $350.>

So why spend tens of thousands when you can do all of this testing for just a couple of thousand dollars.

Happy Testing!

Top 10 Usability Research Findings of 2010

With the evolution of usability testing and cutting edge tools like Loop11 creating a new wave of research techniques and methodologies, usability professionals are continuously discovering new ways of conducting research, thus finding new insights into usability testing.

So we set out to compile a list of the top ten usability research findings for 2010. Thankfully, Jeff Sauro at measuringusability.com saved us a lot of time by already putting together a list of Top 10 Research-Based Usability Findings of 2010. Some of the major findings of 2010 were based around remote and unmoderated testing as well as online surveys and self-reporting. There are a few interesting game-changing surprises on the list, especially at number one. What do you think made it to number one? Check out the top ten list here.

How To Catch A Cheater When Dealing With Large Sample Sizes.

When online user testing, and dealing with large sample sizes , the higher the number of participants, the higher the likelihood of inaccurate data collection. The main culprits for bad data are those participants who are just doing it for the money and not taking the test seriously.

How can we identify these types of cheaters, and what kind of quality control methods can be used to make sure the data is accurate?

Jeff Sauro at Measuring Usability has written a great piece on how to catch cheaters , with some fascinating insights and statistics about cheaters. It also describes which measures can be taken minimize the damage caused by cheaters . The full article can be found here. We highly recommend it.

Usability Case Study: Wireframe Usability Testing

Think it’s too costly to integrate usability testing in the early stages of web development? Think it requires fully designed prototypes? Think again. Usability testing in the early stages of web development can be both efficient and cost-effective. With wireframes you can easily ensure that you’ve streamlined the user-experience before even completing your site.

The Media Department at a University in Sweden recently used Loop11 to run usability testing on wireframes. Two different prototypes of a tourism website were tested. Running this quick investigation, the researcher discovered how users would naturally navigate and interact with their site before the design phase commenced. Their testing yielded some interesting results.

The Test

In the early stages of the project—just after drafting the outline and organisation of a tourist website for a major city in Sweden—the team wanted feedback from users. Using two prototypes made of very low-fidelity wireframes, the team witnessed how user-experience differed across different versions of the same site.

Prototype 1:                                                                                                                                                           Prototype 2:

These two sites were presented to 60 participants in total.

All participants completed a series of six tasks: find an events list, locate city maps, learn more about language courses, and other actions usually performed by visitors to a tourism website. The project recorded if tasks were completed successfully and how long each task took.

At first blush, with only minor tweaks on layout and information architecture, the two prototypes might not seem distinct enough to yield significant test results. But, as we know, even the smallest change can make a huge difference on overall web experience.

The Results

On the whole, Prototype 1 performed the best. Prototype 1 demonstrated a task completion rate of 58%, while only 51% of the tasks were completed successfully on Prototype 2. Looking closely at each prototype, however, there are some nuances across the board.

Four tasks on Prototype 1 benefited from higher rates of task completion. But most of those tasks actually took significantly longer to complete on Prototype 1. On the other hand, Prototype 2 only had two tasks with higher rates of task completion. And overall, there was only one task on Prototype 1 which was completed more quickly than on the second prototype. Looking for student accommodation took at full 24 seconds longer on Prototype 2.

The Interpretation

Generally speaking, the first prototype appears to be more usable. But it does take users longer on Prototype 1 to get to their destination. So instead of using Prototype 1 wholesale as the ultimate guiding draft for the final site, the researchers can take a closer look at their results and their website.

The researchers might question why, though many tasks were easier to complete on Prototype 1, did they take longer to complete. To do this, they might take a critical look at layout, link naming, or site organisation on Prototype 1. Or since both prototypes demonstrated better completion rates on a few tasks, the team might explore ways to combine the best of both into one site.

By using the results, this team can use the newly gained insight to revisit fundamental decisions on web design.

The Meaning of It All

The project demonstrates how you can gain significant insight on usability with only the barest of wireframes.

It allows teams to enhance page layout, navigation paths, and information architecture without investing great funds on fully designed prototypes. Easy and quick, a wireframe usability test helps flesh out foundational details before designers and developers commit to creating polished, finalised site. Wireframe usability tests—despite their simplicity—can help teams seriously question, rethink and fine-tune a site’s overall experience.

When two great minds get together!

Wireframe design and testing just got a whole lot easier…

Testing the usability of wireframes has always been a great Loop11 feature. However Loop11 is not a wireframe design tool. On the other hand, Justinmind is a great wireframe design tool, but it didn’t offer comprehensive wireframe usability testing. So we put two and two together and integrated Justinmind with Loop11.

Now you can create your own wireframes with Justinmind and test the usability of them with Loop11. It’s as simple as a click of the mouse!

How does it work?

Justinmind is a wireframe creation and design tool. It also allows you to upload your wireframes into HTML and test them.

This is where Loop11 comes in.

Once you have created your wireframes with the Justinmind wireframe creation tool. Simply use the Justinmind Usernote feature. Their Usernote feature is where you can upload your wireframes into HTML so that they can be tested.

This is where you will find the Loop11 integration… and we all know how simple Loop11 is to use.

So if you’re looking for a simple way to design and test wireframes, look no further.

Happy testing!

Airline Website Usability: British Airways Soars Ahead!

We thought we would have a look at how user friendly 10 of the world’s leading airline websites are. On a recent overseas trip, I was astonished to see how many people continue to take dangerous or banned items, such as scissors and cigarette lighters through the check-in gates at airports. Since security has become radically tougher in recent years we thought we’d explore how easy (or difficult!) it is to find information about the items you’re not supposed to have in your luggage… So we tested the usability of the ten following airline websites:

The following task was asked of 1,000 participants (100 per website):

“You are taking an overseas holiday next month. Before you go you want to check whether certain items are considered by the airline to be dangerous or banned.  Using the website how can you do this?”

Our participants were sourced from a number of resources, including our Twitter and Facebook accounts, but the vast majority came from Mechanical Turk where we paid the nominal sum of $30 for the bulk of the participants.  Thanks to all those who got involved.

Task Completion Rates:

In general, each website had one page dedicated to banned or restricted items, such as these pages from American Airlines and British Airways.

American Airlines Restricted Items Page.

British Airways Banned Items Page

If the participants found the appropriate page they were deemed to have completed the task successfully, otherwise they were considered to have failed it, or they abandoned the task if it all became too hard.

The results indicate that finding information on dangerous and banned items is rather difficult.  This perhaps provides some clues as to why so many people on my recent trip were still packing them in their luggage.

Chart showing the task completion rates

The British Airways website was the standout performer with 71% of participants completing the task successfully.  Of most concern were Virgin Atlantic and Malaysia Airlines where less than half of participants were able to locate the information.  In the case of Malaysia Airlines, just 31% of participants were able to complete the task.

Additionally, a total of 39% of participants abandoned the task on the Malaysia Airlines website even though more than half went directly from the home page to the Baggage Information landing page where they should have easily found the information.  It would seem the call to action to “Download now” is not sufficient to indicate the PDF document on the Baggage Information landing page is the place to go for this information.

Malaysia Airlines Webpage.

Average Time to Complete Task:

The average time taken to complete the task on each of the ten websites again shows that the British Airways website was the standout performer, with participants completing the task in an average time of 87 seconds.  Malaysia Airlines and Virgin Atlantic once again performed poorly, with the average time for Virgin Atlantic (199 seconds) being more than twice the time taken for those using the British Airways website.

Chart showing the average time to complete the task.

The study also revealed that only Virgin Atlantic and Lufthansa did not have fly-out menus in their main navigation.  Fly-out menus, such as those shown on the American Airlines website below; often result in faster navigation since users are able to see at least the second level navigation links without having to make a click.

American Airlines Webpage With Fly-out Menu.

A deeper look at the path analysis for Virgin Atlantic shows that a quarter (24%) of participants went to the correct section of the website, the Passenger Information landing page in the first instance.  This is a substantially lower result than British Airways and even Malaysia Airlines where more than half navigated to the correct section first.

Ease of Use Rating:

One of the follow-up questions for participants after completing the task was to rate on a 5-point scale how easy it was to use the website.  There was much less variation in these results, which we don’t find surprising.  In face-to-face, lab-based user testing we frequently encounter participants who have a terrible time navigating a website but still comment on how easy the website was to use!  We always felt this was the moderator effect, but perhaps this extends to unmoderated user testing too!

Chart showing the ease of use rating

Overall Usability Score:

To directly compare the usability of one website to another we decided to follow the ISO definition of usability.  ISO 9241-11 defines usability as the “Extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.”  This gives us three areas to focus on: effectiveness, efficiency and satisfaction.

Combining the scores for the task completion rate (effectiveness), the average time taken to complete the task (efficiency) and the ease of use rating (satisfaction) we can establish an overall score for each of the ten airline websites, which are shown below.

Chart showing the overal usability score.

Not surprisingly, British Airways was heads and shoulders above the rest while Malaysia Airlines and Virgin Atlantic were well behind.  There was little difference between the remaining seven websites.  But clearly there’s a lot more work to be done by airline websites to help people avoid packing those banned and dangerous items.

Tested with Loop11

Usability Case Study: iPad vs PC

Wondering how your website performs for users on their latest gadget? Try split-testing your site on participants who use different web-worthy platforms to find out. Testing can help you uncover how user-experience on a desktop or laptop might differ from user-experience on a smartphone or iPad.

To demonstrate this, here’s a little test we ran on 100 participants.

Some participants completed our test using a PC and others with an iPad. Our results help us see how their online experience was being facilitated by the site and just how doable are these tasks on different platforms.

We didn’t have any pre-conceived ideas as to how browsing and completing tasks might differ on the two platforms. We did, however, think it’d be really interesting to see the results, so that in turn we could demonstrate a great way to test, analyse and understand user behaviour. All this, so that you can tune, tweak, and better the online experience you offer.

Our Study

We ran identical unmoderated online studies on the Apple website for participants using either an iPad or a PC. In each study there were three tasks and few follow-up questions.

Loop11 also allows us to validate whether iPad participants were actually using iPads or not. You ask if there were any of these? Well, you might be surprised to hear that there was a handful of PC participants who signed up for the iPad test! Yes, you know who you are.

Don’t worry – you won’t be finding tarnished results in this case study. We threw out the invalid tests.

You can run through the test yourself, here, before reviewing the results. But, of course, our study has since been closed, so your results won’t be counted.

In total, 100 participants completed the test, 50 in each group. Here are the results.

Task 1: Free Surf

“You are thinking about purchasing an iPad. You arrive on the Apple website and want to learn about what the iPad can do and whether it’s a product you might buy. You’re not interested in watching videos, but freely surf the Apple website.”

The first task simply required participants to freely surf the Apple website. Participants were allowed to check out what they could about the iPad. When they felt they’d found sufficient information, they simply had to let us know by marking the task as complete.

This kind of task helps us understand how a browsing experience might differ on these two devices.
The results are rather interesting: iPad users spent longer than PC users during the task of free surfing of the Apple website. On average iPad participants clocked in at 98 seconds, with PC participants coming ahead with 86 seconds.
Remember, iPad participants already own the iPad. The longer time spent on browsing for iPad users does hint at fairly significant browsing differences on the two devices.

Task 2: Shop!

“Fast forward 4 weeks and you’ve gone and bought an iPad! But you now realise you’re going to need a case to protect it. Find a case and put it in your shopping cart. You’re not going to buy it right now, though. Hit ‘Task Complete’ when it’s in your shopping cart.”

The second task required participants to achieve a defined goal: Find a protective case for an iPad and add it to the shopping cart. This task helps us understand the usability and presentation of functions on a website as it’s translated onto the different devices.

Task completion results for iPad and PC users were identical with both groups boasting a 100% completion rate. A success! iPad participants, however, in completing the task took 53% longer than PC participants. At an average of 135 seconds, iPad users were surpassed by the speed of PC participants who hit the mark at 89 seconds.
What’s more, PC users even had to visit one extra page on average – 5 pages on the PC versus 4 on the iPad – to complete the task.

Task 3: iPad battery life

“For future reference, you want to know how long the battery will last before you need to re-charge it. Where would you find this information?”

The final task also presented a clearly defined goal: Find details on the iPad’s battery life. We knew on which page the information was available – there were at least two pages, and we wanted to see how users would go about finding it and how long it would take them.

Given that this task was a little more involved than previous tasks, completion rates are subsequently lower. But we admit, the task completion rates are still impressive by any standards.

iPad users had greater success with 92% completing the task correctly, while PC participants dropped down to a 90% completion rate. Once again iPad users took significantly longer averaging a time of 56 seconds as PC users averaged 38 seconds. That’s 47% longer for the iPad users despite the same number of average page views.


Adapting websites to fit the needs of users who come from different machines can be a head-walloping task.

Our software aims to help you realise small steps towards great improvements.

To improve it, though, you must first document it. And Loop11 documents user-experience so that it can be improved.

Simplified designs, variations in page layout, consistencies of functionality across all elements, or adaptations on page flow and size—there are tons of ways web professionals work to iron out user-experience on all devices.

But don’t just guess how user-experience might differ. When transitioning your website for a gadget-friendly version, find out where and how users are stumbling so that you can understand why. In this process, usability testing is often the missing lynchpin.

« Previous PageNext Page »
Want more inspiration?
Join the Fab-UX 5!

Five links to amazing UX articles,sent to you once a week.

No SPAM, just pure UX gold!

No Thanks