Toward OpenTelemetry Compatible Synthetic Tests

or, what an Open Source E2E synthetic browser test suite might look like

Feb 21, 2022

Prices are important not because money is considered paramount but because prices are a fast and effective conveyor of information through a vast society in which fragmented knowledge must be coordinated. -Thomas Sowell

Background Motivation

(tl;dr skip to the next section to see the demo and some code walkthrough)

An Observability signal I’ve wanted to explore more is “Synthetic Tests”. I’m not sure who or what company coined the term, but I first learned about it in 2019 after reading Tim Nolet(of ChecklyHQ)’s blogs about his SaaS Startup which offered these kind of “Synthetic Browser Tests”, and also when Datadog acquired French Station F startup Madumbo in 2019 (I was working at Datadog the time, #frenchtech).

So, what is a “Synthetic” test? Basically, it’s a recording of an actual user flow on a website, like filling out a signup form, or adding an item to a cart. From that recording, a script is generated that simulates the user’s actions, which can then be run at a specified interval in a headless browser framework (like Playwright, Puppeteer, or Selenium). Assertions can be included in the script as well: “After I press *Signup*, I expect to see a big *Welcome!* pop-up in bright green text”. Each test run can measure when those assertions Pass or Fail, how long each step takes, and emit various alerts. Some Synthetic tests offer premium features like actual screenshots, video recordings, and collection of browser performance statistics for each test run. While many of these concepts are not exactly new (Puppeteer and Selenium have been around a long time), packaging it into a cohesive Vendor product is cutting edge.

This is all pretty fascinating to me. A lot of current standard Observability signals have a pretty low “Signal to Noise” ratio. A single user session on a website may generate dozens if not hundreds of traces, logs, and metrics. But being able to sift through all these individual data points and determine if “/checkout is broken” can require advanced querying tools, dashboards, and a deep intuitive understanding of how an application’s various components fit together. After all, when an alert triggers (“Alert, >1% 5xx Errors on /api/checkout in the last 5 minutes!”), the first thing a developer or SRE usually does is visit the website via browser and try to recreate the issue.

What’s equally interesting is just how expensive most Synthetic Browser Tests are. Datadog’s pricing page lists the sticker price at $12 per 1,000 Tests per month. A quick back of the envelope means it costs just over $100 per month to run 1 Browser Test once every 5 minutes. It’s easy to see how that bill can quickly skyrocket and seemingly make Synthetic Browser Tests either a luxury reserved for only the largest enterprises with the deepest pockets, or impractical for the average user at anything beside functionally useless time intervals(not sure how useful *any* observability signal is that get collected only once or twice an hour).

Despite this price tag, Synthetic Browser Tests seem to be exceedingly popular. Not only does Datadog continue to highlight them in their earnings calls, but most other major observability vendors have begun to offer them as well, like AWS (who also went so far as to unceremoniously fork parts of Checkly’s codebase), Splunk, Dynatrace, and New Relic.

Given this combination of popularity, usefulness, and cost, what stuck out to me is just how fragmented the ecosystem is, without any real sense of standards, despite the very large majority of these tools being partially or fully open source.

A smattering of headless browser frameworks like Playwright (maintained by Microsoft) Puppeteer (maintained by Google, though the original authors have since moved onto Playwright iirc), and Selenium 2.0 (not sure, I think also maintained by google?).
A smattering of Test Runners for headless browsers, like Cypress, Jest-Puppeteer, and Playwright-Test(part of Playwright now).
A smattering of Test Recorders, like Checkly’s Headless-Recorder, The Puppeteer Recorder(now baked directly into chromium devtools!), The Playwright Codegen CLI, and new projects like DeploySentinel Recorder.

The running theme with all these components is that they’re either overly coupled to a single Vendor product, or they’re completely uncoupled with any Vendor product, exporting in arbitrary formats and not packaged in ways that are easy for the average user to try (For example, the Playwright CodeGen CLI isn’t available as a browser extension). The latest craze for customers of Observability vendors seems to be avoiding the dreaded vendor lock-in (see: my last post on OpenTelemetry), so it’s strange to me that one of the “hottest” vendor products doesn’t appear to be stabilizing around any standard or promoting vendor interoperability.

And that brings us to the code.

Demo and Code Walkthrough

With everything above in mind, I tried to build my own Synthetic Browser E2E testing tool that could export into existing Open Source Observability Signal Standards, like OpenTelemetry Traces. I was able to get a brief demo up and running, which I’ll walk through below. After that, I’ll try to summarize where the rough spots are and what’s missing in Open Source from being able to create a unified standard for Synthetic Monitoring.

Here’s the demo (and a brief a11y friendly description of what’s happening at each step). Below the demo I’ll try to walk through the bits of code for each step.

Summary

A browser extension records a User journey through a website feature.
A basic assertion is made at the end of the user flow (expect specific text to appear on the page after completing the user flow).
The recording is exported to my Synthetic Testing web application.
Basic details about the recording are added and saved, which kicks off a test run.
The test run executes the script via test runner in a headless browser framework, generates artifacts, and measures the result of the test.
After the test completes, the results and artifacts (video recording, screenshot) are viewable in the Synthetic Test application, with options to export the data in an OpenTelemetry compatible signal (a Trace).
The Synthetic Test run is exported as a trace viewed through an Open source trace viewer.

Details and Code

A browser extension records a User journey through a website feature. A basic assertion is made at the end of the user flow.

I chose to fork the Checkly Headless-Recorder for this Demo. It has automatic support for Playwright, some useful features like easy input capture for form fields, and an existing feature request/WIP branch for injecting assertions that I built off of. Additionally, it has a useful feature that takes the outputted Playwright script, base64encodes it, and attaches it as a query string when opening a new tab. All I had to do was make some small changes to format the playwright code snippet in a template that fits the Playwright Test format, and update the `RUN_URL` to point to my local Synthetic Test app instead of the Checkly Homepage.

// https://github.com/checkly/headless-recorder/blob/c78ef6cbc1fb9c4d6b4bdb9b7844e4af307ef7fa/src/services/browser.js#L58-L67

const script = encodeURIComponent(btoa(code))
const url = `${RUN_URL}?framework=${runner}&script=${script}`
chrome.tabs.create({ url })

It’d be nice to find the time for an upstream PR to allow some of this to be modifiable by Env Var or Configuration Option but, in the meantime, feel free to cherry pick anything useful off a diff of my branch against main.

The recording is exported to my Synthetic Testing web application. Basic details about the recording are added and saved, which kicks off a test run.

Nothing fancy here, just a Rails App with a few Controllers Models, Views, and Jobs. Rails Gems like ActiveStorage with OOTB GCS Support (for saving the generated Playwright Test script to my cloud storage service) and ActionText (for displaying the generated playwright test script) make a lot of this easier than it has any right to be. After the test is saved, an after_save hook enqueues a Job which collects the relevant information about the test run and GCS bucket where the script was saved, and publishes to a specific GCP PubSub Topic.

# controllers/test_controller.rb
# some psuedo code below

def create
  script = Base64.decode64(params["script_string"])
  
  if !script.nil?
    file = StringIO.new(script)
    to_upload = test_params_no_script
    
    @test = Test.new(to_upload)
    @test.script.purge
    @test.script.attach(io: file, 
      filename: 'example_test_string.test.js')

    if @test.script.save
      redirect_to @test
    else
      render :new, status: :unprocessable_entity
    end
  else
    render :new, status: :unprocessable_entity
  end
end

# models/test.rb
# some psuedo code below

class Test < ApplicationRecord
  after_save_commit  :enqueue_test_run
  has_one_attached :script

  def enqueue_test_run
    if !previous_changes.empty? && script.key.nil?

      gcs_bucket = Rails.application.config.active_storage.service_configurations["#{Rails.application.config.active_storage.service}"]["bucket"]
      gcs_key = script.key

      EnqueueTestJob.perform_later(name: name, description: description, gcs_bucket: gcs_bucket, gcs_key: gcs_key)
    end
  end
end

# jobs/enqueue_test_job.rb
# some psuedo code below

require "google/cloud/pubsub"

class EnqueueTestJob < ApplicationJob
  queue_as :default

  def perform(name:, description:, gcs_bucket:, gcs_key:)
    json_message = {"name" => name, "description" => description, "gcs_bucket" => gcs_bucket, "gcs_key" => gcs_key}

    creds = Google::Cloud::PubSub::Credentials.new(Rails.root.join("<PATH_TO_CONFIG_FILE>"))
    topic_id = "<YOUR_TOPIC_ID>"
    pubsub = Google::Cloud::Pubsub.new(project_id: "<YOUR_PROJECT_ID>", credentials: creds)
    topic = pubsub.topic(topic_id)
    topic.publish(json_message)
  end
end

At the moment this is just running locally but with a bit of cleanup it’d be straightforward to publish to Google App Engine (if some VC reading this wants to throw heaps of money at me to productize as an observability startup, feel free to slide into my DMs @ericmustin).

The test run executes the script via test runner in a headless browser framework, generates artifacts, and measures the result of the test.

The GCP Pubsub Topic that I’ve published to has a subscriber via Google Cloud Run, which is a small NodeJS application. The Node app receives the message, downloads the script that was stored on GCS, saves it to the appropriate directory and shells out to the test runner command with a pre-canned configuration file. I’ve chosen to just rely on Playwright, it’s Test Runner Playwright-Test, and one of it’s standard json reporters. Playwright generally seems like the most mature headless browser framework and has official support of a Test Runner, along with support for multiple browser types and lots of nice add ons. It also generates Video and Screenshot of the test run and, along with the results of the test run, saves them back to GCS under the appropriate bucket. This is all pretty brittle and simplistic at the moment, but for demonstration purposes it works just fine

// index.js

const express = require('express')
const bodyParser = require('body-parser');
const shell = require('shelljs');

const fs = require('fs');
const {promisify} = require('util');
const path = require('path');
const {Storage} = require('@google-cloud/storage');

const storage = new Storage();

const app = express();
app.use(bodyParser.json())
app.use(bodyParser.urlencoded({extended: false}))

app.post('/exec', async (req, res) => {
  try {
    const pubSubMessage = req;

    let messageBodyAttributes;
    messageBodyAttributes = req.body.message.attributes;
    const file = storage.bucket(messageBodyAttributes.gcs_bucket).file(messageBodyAttributes.gcs_key);
    const filePath = `gs://${messageBodyAttributes.gcs_bucket}/${messageBodyAttributes.gcs_key}`;
    const tempLocalPath = `./tests/${path.parse(file.name).base}.spec.js`;

    // Clear tests folder from any previous runs
    try {
      const filesToDelete = await fs.promises.readdir( "./tests" );

      for ( const fileToDelete of filesToDelete) {
        try { 
          await fs.promises.unlink(`./tests/${fileToDelete}`);
        } catch (err) {
          console.log(`Unable to delete file tests/${fileToDelete}: ${err}`);
        }
      }
    } catch (err) {
      throw new Error(`File deletion failed: ${err}`);
    }
    // Download file from bucket.
    try {
      await file.download({destination: tempLocalPath});

      console.log(`Downloaded ${file.name} to ${tempLocalPath}.`);

    } catch (err) {
      throw new Error(`File download failed: ${err}`);
    }

    // Run playwright tests
    var command = "npx playwright test --config=docker.config.js --project=DesktopChromium"
      const shellResp = shell.exec(command, {
          "timeout": 20*1000
      });

    if (shellResp.stderr !== undefined && shellResp.stderr !== "") {
        console.log("stderr")
        console.log(shellResp.stderr)
    }

    // just reusing the same bucket for now
    const uploadBucket = storage.bucket(messageBodyAttributes.gcs_bucket)
    
    try {
      await fs.promises.access(`./test-results.json`);
      var jsonResults = `testresults/scratch/${messageBodyAttributes.gcs_key}/test-results.json`

      await uploadBucket.upload(`./test-results.json`, {destination: jsonResults });
    } catch (err) {
      console.log(`Unable to upload test-results.json: ${err}`);
      res.status(200).send()
    }

    const filesDirs = await fs.promises.readdir( "./test-results" );

    for( const fileDir of filesDirs ) {
      const filesInDir = await fs.promises.readdir( `./test-results/${fileDir}` );

      for ( const fileInDir of filesInDir) {

        try {
          var filepath = `testresults/scratch/${messageBodyAttributes.gcs_key}/screenshots/${fileDir}/${fileInDir}`
          await uploadBucket.upload(`./test-results/${fileDir}/${fileInDir}`, {destination: filepath });
          console.log(`Uploaded to: ${filepath}`);
        } catch (err) {
          console.log(`Unable to upload image ${fileDir}/${fileInDir}: ${err}`);
        }
      }

      res.status(200).send()
    }
  } catch (err) {
    console.error(`error in cloud run handler: ${err}`);
    res.status(200).send()
  }
});

const port = process.env.PORT || 8080;
app.listen(port, () => {
  console.log(`helloworld: listening on port ${port}`);
});

// playwright.config.js

module.exports = {
    timeout: 35000,
    retries: 0,
    reporter: [['json', {  outputFile: 'test-results.json' }]],
    workers: 5,
    use: {
      trace: 'on-first-retry',
    },
    projects: [
      {
        // Desktop Chromium
        name: 'DesktopChromium',
        use: {
          browserName: 'chromium',
          headless: true,
          channel: 'chrome',
          screenshot: 'on',
          video: 'on',
          trace: 'off'
        },
      },
    ],
    outputDir: 'test-results/',
  };

After the test completes, the results and artifacts (video recording, screenshot) are viewable in the Synthetic Test application, with options to export the data in an OpenTelemetry compatible signal (a Trace). The Synthetic Test run is exported as a trace viewed through an Open source trace viewer.

Back in my Synthetic Test Rails application, it checks to see if the results have been saved at the appropriate location in GCS, if so it displays them for the user and links to an option to export as an OpenTelemetry trace. At the moment the trace isn’t particularly interesting, it’s just a single span with some attributes and events attached (those events include metadata like the script that was run, and potentially urls of the artifacts like the screenshots and video recording). The trace gets exported to an OpenTelemetry Collector, and then to Grafana Tempo, where it’s viewable via the Grafana Trace Integration UI. I’m lucky to work with some really bright folks who have made this OpenTelemetry Collector / Tempo / Grafana Trace UI setup very seamless, so all I have to do is configure my OpenTelemetry Tracing SDK to point to the right endpoint.

def export_as_otel_trace
  @test = Test.find(params[:id])

  key = @test.script.key

  if !key.nil?
    bucket = storage.bucket "testrunstorage"
    file = bucket.file "testresults/scratch/#{key}/test-results.json"
    hex_trace_id = nil

    if file&.exists? # Return true if the file does exist
      trace_start_time = Time.parse(file.updated_at.to_s)
      results = JSON.parse(file.download.read)

      results["suites"][0]["specs"].each do |spec|
        spec["tests"].each do |test_run|
          span = Otel::Tracer.start_span(@test.name, 
            attributes: {
              "browser" => test_run["projectName"], 
              "playwright.test.name" => @test.name, 
              "playwright.test.description" => @test.description 
            }, 
            start_timestamp: trace_start_time, 
            kind: :internal)
          # TODO: infer error status, exception msg and events like attachments
          duration = 0 

          passed_result = test_run["results"].find do |result|
            result["status"] == "passed"
          end

          if !passed_result.nil?
            duration = passed_result["duration"].to_i

            hex_trace_id = span.context.trace_id&.unpack1('H*')
            end_timestamp = (trace_start_time + (duration/1000))

            script_string = @test.script.blob.download

            if !script_string.nil?
              span.add_event("playwright_test", attributes: {'script' => script_string})
            end
            span.finish(end_timestamp: end_timestamp)
          else
            "missing passed result"
          end
        end
      end
    end
  end

  if !hex_trace_id.nil?
    url = "https://<TRACE_VIEWER_URL>?trace_id=#{hex_trace_id}"
    flash.now[:notice] = %Q[<a target="_blank" href="#{url}">View Trace In Grafana</a>]
  else
    flash.now[:notice] = "Unable to Generate Trace"
  end

  render :show    
end

Making this all more OpenTelemetry Friendly

A lot of this is quite rough of course, but I think the biggest thing missing right now to make this truly valuable via export as an OpenTelemetry Trace is to be able to visualize the individual steps in the User Journey (click, input, redirect, etc) as Individual Spans within the test, instead of the Exported trace being just one big span representing the entire test execution. For that to happen, the would need to be a translation of the Playwright Test Trace format into OTLP (seems totally reasonable but I’m not familiar with the format Playwright is using), or there would need to be a contribution made to opentelemetry-js-contrib of a plugin that monkey-patches both playwright-core and playwright-test and captures timing information and spans for each individual action (something that’s been requested previously but no one has prioritized it). That would allow for rich trace views in Grafana like what’s below, that also could include span events with links to all the artifacts generated from a test run like Screenshots and Videos

Conclusions

I think there’s still some work to be done for Synthetic Tests to be a truly vendor agnostic Observability Signal, instead of just a premium product Vendors offer that inextricably locks you into relying on them for monitoring some of the most crucial portions of your application. But, I think the components are all floating around out there, and with a bit of elbow grease and a few upstream contributions, it seems possible to turn Synthetic Browser Tests into something accessible to all developers and organizations.

Would love to hear any feedback or comments. I plan to try to formalize and OSS some of the components above when I have more time, so would love to hear whether anyone out there found this useful.

Eric Mustin (@ericmustin)

A Small, Good Thing

Discussion about this post