Skip to content

Latest commit

 

History

History
564 lines (441 loc) · 18.2 KB

File metadata and controls

564 lines (441 loc) · 18.2 KB

Getting Started with webdriver-w3c

Hello, and welcome to the wonderful world of browser automation with WebDriver and Haskell! This module is a brief tutorial on how we can use use the webdriver-w3c library to write Haskell programs that interact with web pages just like a person would. If you need to test a web application, or want to automate some web thing that curl and wget alone can't handle easily, you might find this mildly interesting, maybe.

(This text is a literate program, so we have to start with some compiler noises. Nothing to see here!)

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Web.Api.WebDriver
import Test.Tasty.WebDriver

import Test.Tasty
import Control.Monad.Trans.Class
import qualified System.Environment as SE
import Control.Monad
import System.IO

main :: IO ()
main = return ()

Prerequisites

To follow along, you're going to need a few things.

  1. Stack. Stack is a build tool for Haskell projects. It compiles our programs, runs tests, processes documentation, generates code coverage reports, and keeps project dependencies under control.
  2. A copy of this repository
  3. A web browser; this tutorial assumes you're using Firefox.
  4. A WebDriver proxy server for your browser. For Firefox this is geckodriver. Don't sweat it if you don't know what "WebDriver proxy server" means right now, we'll get to that.

Next, start your proxy server. For geckodriver on unix-like OSs, that is done with the geckodriver & command. You should see a line that looks something like this:

1521524046173   geckodriver INFO    Listening on 127.0.0.1:4444

Leave that program running. Just leave it alone.

Finally, in another shell window, navigate to the directory holding this repo and say

stack ghci webdriver-w3c:webdriver-w3c-intro

Well, don't say that, out loud. Type it. :) This might take a while the first time while stack downloads the compiler and libraries it needs. When it finishes, this command opens a Haskell interpreter with webdriver-w3c loaded so we can play with it. You'll know everything is okay if you see a line like

Ok, one module loaded.

followed by a λ: prompt. To be sure, try typing in return and then hit (enter). If you see this scary error message:

<interactive>:1:1: error:
    • No instance for (Show (a0 -> m0 a0))
        arising from a use of ‘print’
        (maybe you haven't applied a function to enough arguments?)
    • In a stmt of an interactive GHCi command: print it

then everything is working great!

My First Browser Automation

Ok! You've got your WebDriver proxy (geckodriver) running in one terminal window, and ghci running in another. Let's start with a simple example to illustrate what we can do, then explain how it works. Read this code block, even if the syntax is meaningless.

release_the_bats :: WebDriverT IO ()
release_the_bats = do
  fullscreenWindow
  navigateTo "https://www.google.com"
  performActions [typeString "bats"]
  performActions [press EnterKey]
  wait 5000000
  pure ()

Without running that code -- and maybe without being proficient in Haskell -- what do you think it does?

Now let's run it. In the interpreter, type

example1

followed by (enter). You should see a Firefox window open, go fullscreen, and search Google for "bats".

example1, by the way, is this:

example1 :: IO ()
example1 = do
  execWebDriverT defaultWebDriverConfig
    (runIsolated_ defaultFirefoxCapabilities release_the_bats)
  return ()

Let's break down what just happened.

  1. release_the_bats is a WebDriver session, expressed in the WebDriver DSL. It's a high-level description for a sequence of browser actions: in this case, "make the window full screen", "navigate to google.com", and so on.
  2. runIsolated_ takes a WebDriver session and runs it in a fresh browser instance. The parameters of this instance are specified in defaultFirefoxCapabilities.
  3. execWebDriver takes a WebDriver session and carries out the steps, using some options specified in defaultWebDriverConfig.

You probably also noticed a bunch of noise got printed to your terminal starting with something like this:

λ: example1
2018-06-23 15:19:46 Request POST http://localhost:4444/session
{
    "capabilities": {
        "alwaysMatch": {
            "browserName": "firefox"
        }
    },
    "desiredCapabilities": {
        "browserName": "firefox"
    }
}
2018-06-23 15:19:48 Response
{
    "value": {
        "sessionId": "383edca7-3054-0544-8c1e-cc64099462de",
        "capabilities": {
            "moz:webdriverClick": true,
            "platformVersion": "17.4.0",
            "moz:headless": false,
            "moz:useNonSpecCompliantPointerOrigin": false,
            "browserVersion": "60.0.2",
            "rotatable": false,
            "pageLoadStrategy": "normal",
            "moz:profile": "/var/folders/td/sxyy9wl919740vddr49g8nth0000gn/T/rust_mozprofile.aleh5JscOwwI",
            "moz:accessibilityChecks": false,
            "moz:processID": 88470,
            "platformName": "darwin",
            "timeouts": {
                "implicit": 0,
                "script": 30000,
                "pageLoad": 300000
            },
            "acceptInsecureCerts": false,
            "browserName": "firefox"
        }
    }
}

This is the log. WebDriver sessions keep track of a bunch of info to help with debugging, like all requests and responses and all raised errors. By default the logs are printed to stderr but this is configurable.

So what can you do in a WebDriver session? Not much -- but this is by design. The library includes:

  • A binding for each endpoint in the WebDriver spec
  • Some basic functions for reading and writing files, reading and writing at the console, and making arbitrary HTTP requests

This plus Haskell's do notation make for a tidy EDSL for running browsers. Notably, a WebDriver session cannot do arbitrary IO by default, and WebDriver sessions are pure values. (There is an escape hatch for this restriction.)

Behind the Scenes

WebDriver is an HTTP API for controlling web browsers like a human user would. In principle a browser could implement this API directly. In practice the major browsers have their own internally maintained APIs for automation and use a proxy server to translate between WebDriver and their internal API.

This is the role geckodriver is playing in our examples so far: deep down, our code is making HTTP requests to geckodriver, and geckodriver is passing these requests on to Firefox.

This library is also tested against Chrome via chromedriver. To do that, using chromedriver's default settings, we need to make a couple of adjustments to the examples: replace

defaultWebDriverConfig

by

defaultWebDriverConfig
  { _environment = defaultWebDriverEnvironment
    { _env = defaultWDEnv
      { _remotePort = 9515
      }
    }
  }

and replace

defaultFirefoxCapabilities

by

defaultChromeCapabilities

(By the way - defaultWebDriverConfig has type WebDriverConfig, and includes knobs for tweaking almost everything about how our sessions run.)

Making Assertions

It's expected that you're probably interested in using browser automation to run end-to-end tests on some web application -- and webdriver-w3c has some extra bits built in to make this simpler.

In addition to the usual browser action commands, you can sprinkle your WebDriver sessions with assertions. Here's an example.

what_page_is_this :: (Monad eff) => WebDriverT eff ()
what_page_is_this = do
  navigateTo "https://www.google.com"
  title <- getTitle
  assertEqual title "Welcome to Lycos!" "Making sure we're at the lycos homepage"
  return ()

Note the signature: (Monad eff) => WebDriverT eff () instead of WebDriverT IO (). What's happening here is that WebDriverT is a transformer over a monad eff within which a restricted set of effects (like writing to files and making HTTP requests) take place. These effects are "run" by an explicit evaluator that, for the default configuration, happens to use IO, but both the effect monad and the evaluator are configurable. By swapping out IO for another type we can, for example, run our tests against a mock Internet, and swapping out the evaluator we might have a "dry run" evaluator that doesn't actually do anything, but logs what it would have done. It's good practice to make our WebDriver code maximally flexible by using an effect parameter like eff instead of the concrete IO unless there's a good reason not to.

Anyway, back to the example. What do you think this code does? Let's try it: type

example2

in the interpreter. You should see a browser window open briefly to google.com, with a scary "Invalid Assertion" message in the interpreter. assertEqual is the assertion statement: it takes two things (strings in this case) and checks whether they are equal. Shocking, hm? The third argument to assertEqual is a comment, so we can include some human readable info as to why this assertion was made.

This is example2:

example2 :: IO ()
example2 = do
  (_, result) <- debugWebDriverT defaultWebDriverConfig
    (runIsolated_ defaultFirefoxCapabilities what_page_is_this)
  printSummary result
  return ()

Here's what happened:

  1. what_page_is_this is a WebDriver session, just like release_the_bats, this time including an assertion: that the title of some web page is "Welcome to Lycos!".
  2. runIsolated_ runs what_page_is_this in a fresh browser instance.
  3. debugWebDriver works much like execWebDriver, except that it collects the results of any assertion statements and summarizes them (this is result).
  4. printSummary takes the assertion results and prints them out all pretty like.

Documentation on assertions is on Hackage.

Suites of Tests

Alright. If you're writing e2e tests, you probably want to write a lot of e2e tests. In this case, we'd like our tests to be modular, isolated, and well-organized, so that when things go wrong we can quickly diagnose what happened. For this, webdriver-w3c integrates with the tasty test framework -- just import Test.Tasty.WebDriver.

Suppose we've got two WebDriver tests. These are pretty dweeby just for illustration's sake.

back_button :: (Monad eff) => WebDriverT eff ()
back_button = do
  navigateTo "https://www.google.com"
  navigateTo "https://wordpress.com"
  goBack
  title <- getTitle
  assertEqual title "Google" "Behavior of 'back' button from WordPress homepage"
  return ()

refresh_page :: (Monad eff) => WebDriverT eff ()
refresh_page = do
  navigateTo "https://www.mozilla.org"
  pageRefresh
  title <- getTitle
  assertEqual title "Mozilla's Epic HomePage on the Internets"
    "Refresh mozilla.org"
  return ()

We can organize them into a hierarchy of tests like so.

test_suite :: TestTree
test_suite = testGroup "All Tests"
  [ testCase "Back Button" back_button
  , testCase "Refresh" refresh_page
  ]

Try running the suite with

example3

in the interpreter. Here's what example3 looks like:

example3 :: IO ()
example3 = do
  SE.setEnv "TASTY_NUM_THREADS" "1"
  defaultWebDriverMain
    $ localOption (SilentLog)
    $ localOption (PrivateMode True)
    $ test_suite

Here's what happened:

  1. test_suite is a Tasty tree of individual WebDriverT test cases.
  2. defaultWebDriverMain is a Tasty function that runs test trees. In this case we've also used localOption to tweak how the tests run -- in this case suppressing the usual session log output and running the browser in private mode.

Tasty gave us lots of nice things for free, like pretty printing test results and timings.

λ: example3
>>> Deployment environment is DEV
>>> Logging with colors
All Tests
  Back Button: OK (7.23s)
    1 assertion(s)
  Refresh:     FAIL (4.29s)
    Invalid Assertion 
    assertion: "Internet for people, not profit \8212 Mozilla" is equal to "Mozilla's Epic HomePage on the Internets" 
    comment: Refresh mozilla.org

1 out of 2 tests failed (11.53s)

Other test case constructors and test options are available; see Hackage for the details.

The test suite for webdriver-w3c itself uses the Tasty integration. There is also a function, checkWebDriver, that can be used to build tests with QuickCheck, if you don't find that idea abominable. :)

We need more power!

The vanilla WebDriverT is designed to help you control a browser with batteries included, but it has limitations. It can't possibly anticipate all the different ways you might want to control your tests, and it can't do arbitrary IO. But we have a powerful and very general escape hatch: the WebDriverT monad transformer is a special case of the WebDriverTT monad transformer transformer.

The actual definition of WebDriver is

type WebDriverT eff a = WebDriverTT IdentityT eff a

where IdentityT is the inner monad transformer. By swapping out IdentityT for another transformer we can add more features specific to our application.

Here's a typical example. Say you're testing a site with two deployment tiers -- "test" and "production". For the most part the same test suite should run against both tiers, but there are minor differences. Say the base URLs are slightly different; maybe production lives at example.com while test lives at test.example.com. Also while developing a new feature some parts of the test suite should only run on the test tier, maybe controlled by a feature flag.

What we need is some extra read-only state to pass around. We can do this with a ReaderT transformer. To avoid adding a dependency on a whole transformer library, lets roll our own:

data ReaderT r eff a = ReaderT
  { runReaderT :: r -> eff a
  }

instance (Monad eff) => Monad (ReaderT r eff) where
  return x = ReaderT $ \_ -> return x

  x >>= f = ReaderT $ \r -> do
    a <- runReaderT x r
    runReaderT (f a) r

instance (Monad eff) => Applicative (ReaderT r eff) where
  pure = return
  (<*>) = ap

instance (Monad eff) => Functor (ReaderT r eff) where
  fmap f x = x >>= (return . f)

instance MonadTrans (ReaderT r) where
  lift x = ReaderT $ \_ -> x

reader :: (Monad eff) => (r -> a) -> ReaderT r eff a
reader f = ReaderT $ \r -> return $ f r

Now our actual state might look something like this:

data MyEnv = MyEnv
  { tier :: Tier
  , featureFlag :: Bool
  }

data Tier = Test | Production

env :: Tier -> MyEnv
env t = MyEnv
  { tier = t
  , featureFlag = False
  }

And we can augment WebDriverTT with our reader transformer.

type MyWebDriverT eff a = WebDriverTT (ReaderT MyEnv) eff a

Now we can build values in MyWebDriver using the same API as before, using the extra features of the inner monad with liftWebDriverTT.

custom_environment :: (Monad eff) => MyWebDriverT eff ()
custom_environment = do
  theTier <- liftWebDriverTT $ reader tier
  case theTier of
    Test -> navigateTo "http://google.com"
    Production -> navigateTo "http://yahoo.com"

To actually run sessions using our custom monad stack we need to make a few adjustments. First, we use execWebDriverTT instead of execWebDriverT.

Second, we need to supply a function that "runs" the inner transformer (in this case ReaderT eff a) to IO.

execReaderT :: r -> ReaderT r IO a -> IO a
execReaderT r x = runReaderT x r

Running our custom WebDriver monad is then straightforward.

example4 :: Tier -> IO ()
example4 t = do
  execReaderT (env t) $
    execWebDriverTT defaultWebDriverConfig
      (runIsolated_ defaultFirefoxCapabilities custom_environment)
  return ()

Try it out with

example4 Test
example4 Production

We can similarly use a custom inner monad to check assertions and with the tasty integration; there are analogous debugWebDriverTT and testCaseTT functions.

ReaderT is just one option for the inner monad transformer. We could put mutable state, delimited continuations, or even another HTTP API monad in there. Use your imagination!

Debugging

Running browser sessions is one thing, but writing and debugging them is another. webdriver-w3c has some tools for dealing with this as well. Besides the log, which gives a thorough account of what happened, we can include breakpoints in our code. When breakpoints are activated, they stop the session and give us a chance to poke around the browser before moving on.

Here's a simple example.

stop_and_smell_the_ajax :: (Monad eff) => WebDriverT eff ()
stop_and_smell_the_ajax = do
  breakpointsOn

  navigateTo "https://google.com"

  breakpoint "Just checking"

  navigateTo "https://mozilla.org"

  breakpoint "are we there yet"

We can run this with example5:

example5 :: IO ()
example5 = do
  execWebDriverT defaultWebDriverConfig
    (runIsolated_ defaultFirefoxCapabilities stop_and_smell_the_ajax)
  return ()

The basic breakpoint command gives the option to continue, throw an error, dump the current state and environment to stdout, and turn breakpoints off. A fancier version, breakpointWith, takes an additional argument letting us trigger a custom action.

Where to Learn More

For now the canonical documentation is the haddock annotations on Hackage.