Home > Ajax, Uncategorized > PhantomJS A Scriptable Headless Browser

PhantomJS A Scriptable Headless Browser

October 19th, 2012 | Ken Payson

PhantomJS A Scriptable Headless Browser

More and more, modern web applications are moving away from post-back driven pages and embracing ajax intensive sites that make use of client-side view-models. While this leads to great user-experience, it raises challenges in writing tests that cover the complex functionality on the web page. Web automation tools have been around for a long time to help automate web tasks. Selenium is one of the most popular web automation frameworks. Selenium has webdrivers for all of the major browsers. By using these drivers we can automate tasks such as opening a browser, navigating to a page, filling out a form, submitting it and checking the results. This is a very usefully thing and it can be fun to watch a browser performing like a player piano; running through set of tasks without you. There is a major drawback to the current suite of web drivers though, they are slow. The time it takes to load and render pages is too slow when we have a suite of tests to run. Most of the time, the questions we are asking can be phrased as a whether a certain element is in the DOM once some other action is complete. Actually seeing things on the screen isn’t really necessary. This is doubly so when these tests are being run on a build server where we will not be watching the test run. What we want is a “headless” browser – a browser that internally does the same things that a standard browser does, makes requests, parses html, builds a DOM, understands javascript, handles cookies, and session. In shorts it behaves like a browser does except it doesn’t actually render pages.

Enter PhantomJS. PhantomJS is a headless javascript scriptable webbrowser built using the WebKit engine. There have been other attempts at headless browsers in the past. The HTML Browser remote driver with Selenium, is one example. However, earlier headless browsers did not have a proper javascript engine backing them and as a result were limited to use with very simple pages. Because PhantomJs is based on webkit, and uses the WebKit javascript engine it does not have this problem.

To get started with PhantomJS, download the latest version from PhantomJS.org Working with PhantomJS directly can be challenging because PhantomJS is rather low level. To make working with PhantomJS easy, also download CasperJS from CasperJS.org. CasperJS is a navigation scripting & testing utility written to work with PhantomJS. It enhances the PhantomJS API so the coding is easier.

Scripting with Phantom/Casper is very easy once you learn to avoid a few of the pitfalls. With Phantom/Casper you can write javascript that is injected into the webpage you are testing. Casper has a utility class for selecting and modifying elements via css selectors.

If you need, it is also be possible to use JQuery. If JQuery is not already part of the page, it can be dynamically injected and used. However, usually it is easiest to use the document.querySelector method that is natively part of the latest version of javascript.

Phantom scripts are server side javascript. We can send client side javascript to the browser. We can also do things server side that we cannot do in a browser. There is a File System module that lets us read and write to files. There is a System module that lets us work with command line arguments and environment variables.

Here is a simple example that will query google using supplied command line arguments. A report on the results will be written to a file.

Here is the PhantomJS script using Casper

phantom.casperPath = 'C:\\CasperJs\\casperjs-1.0.0-RC1';
phantom.injectJs(phantom.casperPath + '\\bin\\bootstrap.js');

var casper = require('casper').create();

var system = require('system');
var page = require('webpage').create();
var utils = require('utils');
var fs = require('fs');

var Debug = function(message) {
    casper.echo("\n" + message + "\n");
}

var googleHome= "http://www.google.com";

casper.start();

casper.userAgent('Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89');
casper.thenOpen(googleHome, function() {
    casper.echo("url: " + this.getCurrentUrl());
});

casper.then(function() {
    Debug("SearchTerm1: " + system.args[1]);
    Debug("SearchTerm2: " + system.args[2]);
});


casper.then(function() {
    casper.evaluate(function(searchTerm1,searchTerm2) {
        
        document.querySelector('input[name="q"]').setAttribute('value', searchTerm1 + ' ' + searchTerm2);
        document.querySelector('form[action="/search"]').submit(); 
    }, {
        searchTerm1: system.args[1],
        searchTerm2: system.args[2]
    });
});


casper.then(function() {
    Debug("new url: " + this.getCurrentUrl());
});


casper.then(function() {

    var secondLink = this.evaluate(function() {
        return  __utils__.findAll('h3.r a')[1].href; 
    });
    
    var numResultsOnPage = this.evaluate(function() {
        return __utils__.findAll('h3.r a').length;
    });
    
    var fstream = fs.open('C:\\temp\\searchResults.txt', 'w');
    fstream.write("There are " + secondLink + " results on the page\r\n\r\n");
    fstream.write("The second link url is " + numResultsOnPage);
    fstream.close();
    
});

casper.run(function() {
    this.exit(); 
});

PhantomJS is run from the command line: PhantomJs …

In order to stay away from the dos prompt, I usually create a one or two line batch file to run my PhantomJs script. Here is the batch file to run the example program.
cd c:\\phantomjs\\scripts
phantomjs GoogleSearch2.js stinky cheese

The future of PhantomJS with Selenium
One thing exciting development to keep an eye on is GhostDriver. GhostDriver is a Selenium Web Driver for PhantomJS. It is still in development and Selenium doesn’t fully support it, but when it is available (mostly in the next release 2.26) it will be possible to write Selenium tests in C# and have them run against PhantomJS. Initial reports say that the GhostDriver with PhantomJS could be twice as fast as the Selenium Chrome Driver.

 

David Cooksey is a Senior .NET developer at LogicBoost, an agile software services and product development company based in Washington DC.

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: