Passing information to and from webpages in PhantomJS

27. September 2011 11:01 by Cameron in javascript, PhantomJS  //  Tags: , , , , ,   //   Comments

Recently, I needed a way to pass dynamic content to and from webpages using PhantomJS as part of writing my screen scraper. I need the scraper to follow dynamic sets of links and scrape the data from each page. Since a webpage's scope is currently sand boxed, I had to find a way to pass data to and from webpages. With the addition of the new filesystem module in PhantomJS 1.3, it is now possible to pass data from the main scope to an individual page's scope. Any data that you want passed to a particular page should be saved as a javascript string to a javascript file. Then you can inject the javascript into the page on page.onLoadFinished so that the data is then accessible within the page's scope. For example:

var page = require('webpage').create(), 
     fs = require('fs'), 
     data = "var dataObject = { item: 'value' };", 
     fullpath;

fullpath = fs.workingDirectory + fs.separator + 'data.js';
// open file for writing
var dataFile = fs.open(fullpath, 'w');
dataFile.write(data);
dataFile.close();

// check that the file was successfully written
if(fs.size(fullpath) > 0) {
	console.log('File wrote successfully!');
	page.open('http://somesite.org/page.html');
	// put page data in a local variable
	var output = page.evaluate(function () {
		// print the output of the data object
		console.log(dataObject.item);
		return dataObject.item;
	});
	// output should be the same value as the page's dataObject.item
	console.log(output);
}
else {
	console.log('Error in writing the file!');
	phantom.exit();
}

page.onLoadFinished = function() {
	// inject the javascript data that we created earlier
	page.injectJS(fullpath);
}

For more information about PhantomJS' File System module, please visit: http://code.google.com/p/phantomjs/wiki/Interface#Filesystem_Module

While this solution may not be the best long term solution, it does provide a way to get data to and from your pages until official support for passing data to a webpage object becomes available in PhantomJS.

Installation of Postive SSL wildcard certificate

26. September 2011 01:27 by Cameron in Security, Web  //  Tags: , , , , , , , , ,   //   Comments

The other day, I splurged on getting a wildcard SSL certificate for my website, www.iga-home.net. I felt that it was important to secure content on my site for my users as sensitive data is sent when users login or post content to the site. I bought a wildcard SSL certificate since I wanted to be able to secure all subdomains of iga-home.net and not be restricted to just iga-home.net or www.iga-home.net. 

After I created the request on IIS for the certificate, I copied the output into my web browser on Comodo's website for requesting an SSL certificate. It seemed fairly straight forward. About 10-15 minutes later, I received an email with my SSL certificate attached in a zip file. I opened the zip file and saw my certificate with a .cert extension. I fiddled around for a while with trying to get this certificate installed through mmc and IIS's manager. When I tried to install my certificate through IIS, I kept receiving errors that IIS couldn't find my certificate request. I followed many tutorials and couldn't find a solution. I later found this page that said I should install these certificates first through mmc before I could install my purchased SSL certificate in IIS. After I installed these mentioned certificates, IIS accepted my certificate and I proceeded to adding SSL to my website. It's a shame that these certificates weren't bundled with the original zip file. It would have made life a lot easier.

My next task is to add a rewrite rule for sending all http requests to https requests. I also want to write a resource handler that caches remote resources on my server so that all resources are secure. In Google Chrome it will notify the user if some content displayed on a page is insecure and I want to remedy this problem. 

Take Screenshot of all HTML documents in a folder using PhantomJS

26. September 2011 01:14 by Cameron in javascript, PhantomJS, Programming  //  Tags: , , , ,   //   Comments

Recently I came across a question on stackoverflow that asked about how to take screenshots of all HTML files in a local folder. I have been playing with PhantomJS quite a bit lately and felt comfortable answering the question. Here is the code for those interested:

var page = require('webpage').create(), loadInProgress = false, fs = require('fs');
var htmlFiles = new Array();
console.log('working directory: ' + fs.workingDirectory);
var curdir = fs.list(fs.workingDirectory);

// loop through files and folders
for(var i = 0; i< curdir.length; i++)
{
	var fullpath = fs.workingDirectory + fs.separator + curdir[i];
	// check if item is a file
	if(fs.isFile(fullpath))
	{
		if(fullpath.indexOf('.html') != -1)
		{
			// show full path of file
			console.log('File path: ' + fullpath);
			htmlFiles.push(fullpath);
		}
	}
}

console.log('Number of Html Files: ' + htmlFiles.length);

// output pages as PNG
var pageindex = 0;

var interval = setInterval(function() {
	if (!loadInProgress && pageindex < htmlFiles.length) {
		console.log("image " + (pageindex + 1));
		page.open(htmlFiles[pageindex]);
	}
	if (pageindex == htmlFiles.length) {
		console.log("image render complete!");
		phantom.exit();
	}
}, 250);

page.onLoadStarted = function() {
	loadInProgress = true;
	console.log('page ' + (pageindex + 1) + ' load started');
};

page.onLoadFinished = function() {
	loadInProgress = false;
	page.render("images/output" + (pageindex + 1) + ".png");
	console.log('page ' + (pageindex + 1) + ' load finished');
	pageindex++;
}

The process is quite simple. First, I loop through all objects in the current working directory and check to see if each item is a file and whether it has the .html extension. Then I add each html file's filepath to an array that I later loop through to take the screenshots. A screenshot must be taken after the page is fully loaded so that the screenshot will contain actual content and not a blank image. This is done by saving the image on the page.onLoadFinished callback. The application loop for taking the screenshots inserts small 250ms delays between each request so that pages may fully load into the headless browser before advancing to the next page.

XBox Live Data

20. September 2011 14:32 by Cameron in Programming, Xbox Live  //  Tags: , , , , , , , , , , ,   //   Comments

While my gaming social networking site, IGA: International Gamers' Alliance, is still under beta, I have been looking at ways to provide a more rich experience for my users. Lately I've been working on a way to gather data from XBox Live so that I can provide content to my users on IGA. I used to have a way to gather data from a RESTful API, using the official XBox Live API, that Microsoft employee, Duncan Mckenzie, used to host on his website. However, his service is no longer available. While there is an official XBox Live API, access to this API is restricted to those who are in the XBox Community Developer Program. Acceptance into this the XBCDP is very limited at the moment and it seems that only well known companies with sponsors receive membership into the program. 

While it would be very nice to get official access to the XBox Live API, it may be a while until I can get into the program. My social networking site, IGA, is still in beta and has much to be done on the roadmap to completion. Currently I am the only developer for the project and I am also in school so development is slow. Maybe once IGA is closer to completion, Microsoft will be more eager to accept me into the program. In the meantime, I have a solution for gathering data from XBox Live.

There are a couple of places to get data from XBox Live. There is the publicly available user's gamercard and the user's protected XBox.com profile. Getting data from the public gamercard is very easy. One could write a parser in PHP, C#, or even jQuery to get the different values from the HTML elements on the page. Retrieving data from a user's XBox.com profile requires a little more skill and resources. You cannot simply use cURL to remotely login to XBox.com since it has anti-bot mechanisms in place to check against the browser agent, browser cookies, and many other aspects that can't easily be manipulated with cURL. There is a remedy to this problem however.

This past summer, I learned about a headless webkit browser called PhantomJS from some co-workers while working on a project at work. We needed something that could run without a GUI on a server that could manipulate the DOM of a webpage. PhantomJS gave us exactly what we needed. After working on the project at work, it occurred to me that I could use PhantomJS in addition to jQuery to manipulate the DOM and screen scrape data from XBox.com.

I'm currently working on scripts to pull data from users' profiles including the users' games, the achievements earned in each game, and more information not publicly available on users' gamercards. Please understand though that screen scraping should only be done on a last resort and it is taxing on both ends to make numerous requests per day. I will implement some sort of data caching that will pull new data on a schedule to limit bandwidth usage. I plan to release this code to my Git hosting when it is finished. 

Thoughts on Windows 8 Developer Preview

20. September 2011 13:30 by Cameron in Windows 8  //  Tags: , , , , , , , , , , , , , , ,   //   Comments

Last week I downloaded the Windows 8 Developer Preview in both the x64 and x86 editions. I first installed the x64 edition with the developer tools in a virtual machine in Oracle VirtualBox to get a feel for the operating system. After I had installed the DP, I immediately wanted to try out the new Metro UI applications. However, although the tiles were responding to my mouse clicks, they were not opening up in the virtual machine. I did some research to find out what might be the cause of this and learned that Metro UI apps require at least a 1024x768 screen resolution to run. I changed my screen resolution in my virtual machine to 1024x768 and voila, the Metro UI apps worked.

After getting a sense for the new Metro UI, I ventured into installing the 32 bit Windows 8 DP on my ASUS EEEPC T101MT Intel Atom netbook. The installation went smoothly as expected and I was brought into the Metro UI on login. My netbook has a 10.1 inch screen with a standard 1024x600 resolution so I had to apply a registry hack to get support for a 1024x768 resolution. After applying the registry hack and loading the EEEPC resolution changer, I was able to get my netbook running with a 1024x768 resolution. The higher resolution requires down scaling due to the smaller screen size, making things appear squished using the legacy UI. However, the Metro UI looks fairly nice with the higher resolution and most applications work well. 

I'd say overall, I'm pretty pleased with the first Windows 8 Developer Preview public release. I'm looking forward to seeing if Microsoft will actually add support for my smaller screen resolution in the upcoming releases. It will be great to see new features unfold as updates for the new operating system arrive.

Month List

Tag cloud