Engineering

Writing Resilient End-to-End Tests

Jake Marsh
April 20, 2020

What is end-to-end testing?

End-to-end testing is the process of testing your application through replicating the expected full flows and behaviors of an end user. For example, if you wanted to test that an application's signup flow was working, you would use code to automatically navigate and complete the signup flow in the browser just like a user.

By testing the full functionality of your application in this manner, you're ensuring everything is working as expected from the client, to the API, to the database. It eliminates the seams that occur when unit testing each parts of your application in isolation.

If it's possible to gain more confidence with end-to-end tests, why don't people do more of it? Because it's hard. One of the biggest issues is resiliency: what happens if you change the text of a button that your test was looking for? What happens if you hit a network spike and something takes longer than expected? Without handling cases like these, your end-to-end tests are likely to start failing.

Why is resiliency important?

In a perfect world, end-to-end tests should be run any time a change is made to your application. This is because the smallest (perceived) change is always capable of causing unforeseen issues throughout the rest of your application. To achieve this, teams will typically execute their end-to-end tests within their CI/CD pipeline.

If your end-to-end tests are running inside your CI/CD pipeline, that means they're directly intertwined with your deployment process. Any delays or failures will impede (or prevent) the deployment from completing. Change the button text in your signup flow from "Next" to "Continue"? If your tests weren't updated for that change, you're about to hit a deployment failure and lose up to 30 minutes of precious time.

What is a flake, and what causes them?

A "flake" is a (usually) non-deterministic failure caused by something that is either intermittent or unexpected. Due to the nature of end-to-end tests and the many layers involved, a flake can be caused by any number of things. Some of the more common examples are:

  • Asynchronous delays. If submitting a form on your page suddenly takes 5 seconds instead of 2, the page will not be behaving as your tests may expect. Spinners may stick around, buttons may not re-enable.
  • Identifier changes. As we've mentioned, changing something like the text of a button could cause failures. That's also true of non-user-facing identifiers such as HTML classes or IDs, which can change at any time due to unrelated engineering updates.
  • Actual application/UX changes. In the case that your team has actually carried out a large redesign or refactor of your application, your existing tests are very likely to start failing.

Flakes and How to Avoid Them

Asynchronous Delays

Asynchronous delays, often resulting from network requests, are the largest and broadest category of test flakes. This is because asynchronous requests are happening quite often in an average web application, affecting many other parts of the app.

Submitting the login form? Waiting for a response from the API. Trying to view a list of data? Waiting for data to be fetched. Just clicked a button? Waiting for the request to finish before updating the UI. You may even have to wait on something like an email being sent and arriving in your test inbox.

These are all examples of things that most end-to-end tests will be doing often: navigating, filling out forms, clicking elements, waiting for things to happen. And they're all unpredictable!

What can you do?

There are a few things you can do in your end-to-end tests to better handle any form of asynchronous delay. None of these are foolproof, but they should get most of the way to resiliency around these issues.

1. Explicitly wait on network requests when possible. Many testing libraries provide APIs to hook into the loading status of the browser page. This can be done when you're submitting a form that you know will result in a redirect, or when you're clicking a link. In Puppeteer, for example, they provide page.waitForNavigation(). Clicking a button to submit a form, then, might look something like this:

await Promise.all([
   page.waitForNavigation(),
   page.click('button[type="submit"]'),
]);

2. Wait often to allow for arbitrary load times. Even when using a helper like waitForNavigation, there may be additional delay involved post-request. For example, the application may take a second or two to enable the button you'll be clicking next. In instances like these, it can be helpful to have a simple wait helper to wait a small amount of hardcoded time before proceeding:

// Helper
function wait(time) {
  return new Promise((resolve) => setTimeout(resolve, time));
}  
    
// Usage
await Promise.all([
    page.waitForNavigation(),
    page.click('button[type="submit"]'),
]);
    
await wait(2000);
await page.click('.some-later-rendered-element');

3. Use polling. Many testing libraries will provide a method (or its easy to write your own), that "polls" for an element on the screen. This makes your test more resilient to those millisecond differences when finding an element to operate on.

// Bad
await page.click('button');
    
// Good
await (await page.waitForSelector('button')).click();

If you're testing some process that can be particularly long and/or unpredictable such as querying a large set of data for a new entry, or receiving an email, it can be helpful to carry out your logic and assertions in a custom polling method. This means you're checking on some defined interval for the assertion (i.e. "my data entry did appear"), with a hardcoded timeout that determines a "failure".

await pollUntilTimeout(async () => {
   return !!(await page.$('.some-new-element'));
}, {
   intervalMs: 1000,
   timeoutMs: 30000,
   timeoutMessage: 'New element not found.',
});

Identifier Changes

Most identifiers or attributes used in testing exist for other purposes. Classes, for example, are primarily used for styling. Element tags are used for DOM layout and specific to element purpose. Any time one of these identifiers is changed for their real purpose (updating a button's styling, changing a button to a link), your tests can break.

One option is to define and enforce strict rules around naming behaviors and update processes for element identifiers. For example, you could add a new class to every element you'll be using in your test, prefixed with test-, and ensure those are always kept in sync. However, this can be a painful process to remember and enforce.

Ideally, you can write your tests to be better prepared for changes to identifiers.

What can you do?

1. Avoid relying on "internals". Things like class and id should generally be avoided and assumed to be unstable, as they may change at any time due to internal engineering efforts. Of course there may still be scenarios in which you have no other option, but be prepared for changes to affect your tests.

// Bad
await page.$('input.form--input.some-styling.text');
    
// Good
await page.$("input[type='text'");

2. Use selectors based on actual functionality to the end-user. Since the goal of your end-to-end test is to ensure that an end-user is still able to complete some flow, it makes sense that it should be written in the same way a user may look for something. A user of your application isn't going to be looking for the selector button.btn.btn--green when they want to create a new post, they'll look for a button that contains the words "Create Post".

// Bad
await page.click('button.btn.btn--green');
    
// Good
await (await page.waitForXPath("//button[contains(., 'Create Post')]"));

3. Be (reasonably) lenient with your matchers. In the example above, we're searching for a button with the text "Create Post". But what if a change is made and it now says "Create New Post"? Barely anything has changed, but your test is going to break. If there's no other conflicts on the page, it can be helpful to loosen your matcher. What if we just search for "Create"?

// Bad
await (await page.waitForXPath("//button[contains(., 'Create Post')]"));
    
// Good
await (await page.waitForXPath("//button[contains(., 'Create')]"));

Real Application/UX Changes

The last major cause of end-to-end test failures are larger instances of what we've already discussed: intentional changes to your application and/or UX. If your signup flow, for example, is completely redesigned from the ground up, your test logic and matchers will likely no longer be correct.

Although this class of changes is least common, they are by far the most expensive in terms of time to update the applicable end-to-end tests. There's not much you can do in your tests to prepare for large sweeping changes, due to the need to balance non-colliding and specific matchers with the possibility of an entirely different page.

What can you do?

walrus.ai is a new solution for end-to-end testing with ease. All you provide is a user story written in plain English instructions, and the test is then interpreted and executed by a human and automated for all future runs. Any changes or flakes are handled, and so your tests will only fail when there's a true failure.

Follow us on Twitter