Passing information to and from webpages in PhantomJS

27. September 2011 11:01 by Cameron in javascript, PhantomJS  //  Tags: , , , , ,   //   Comments

Recently, I needed a way to pass dynamic content to and from webpages using PhantomJS as part of writing my screen scraper. I need the scraper to follow dynamic sets of links and scrape the data from each page. Since a webpage's scope is currently sand boxed, I had to find a way to pass data to and from webpages. With the addition of the new filesystem module in PhantomJS 1.3, it is now possible to pass data from the main scope to an individual page's scope. Any data that you want passed to a particular page should be saved as a javascript string to a javascript file. Then you can inject the javascript into the page on page.onLoadFinished so that the data is then accessible within the page's scope. For example:

var page = require('webpage').create(), 
     fs = require('fs'), 
     data = "var dataObject = { item: 'value' };", 
     fullpath;

fullpath = fs.workingDirectory + fs.separator + 'data.js';
// open file for writing
var dataFile = fs.open(fullpath, 'w');
dataFile.write(data);
dataFile.close();

// check that the file was successfully written
if(fs.size(fullpath) > 0) {
	console.log('File wrote successfully!');
	page.open('http://somesite.org/page.html');
	// put page data in a local variable
	var output = page.evaluate(function () {
		// print the output of the data object
		console.log(dataObject.item);
		return dataObject.item;
	});
	// output should be the same value as the page's dataObject.item
	console.log(output);
}
else {
	console.log('Error in writing the file!');
	phantom.exit();
}

page.onLoadFinished = function() {
	// inject the javascript data that we created earlier
	page.injectJS(fullpath);
}

For more information about PhantomJS' File System module, please visit: http://code.google.com/p/phantomjs/wiki/Interface#Filesystem_Module

While this solution may not be the best long term solution, it does provide a way to get data to and from your pages until official support for passing data to a webpage object becomes available in PhantomJS.

Month List

Tag cloud