Passing information to and from webpages in PhantomJS

27. September 2011 11:01 by Cameron in javascript, PhantomJS  //  Tags: , , , , ,   //   Comments

Recently, I needed a way to pass dynamic content to and from webpages using PhantomJS as part of writing my screen scraper. I need the scraper to follow dynamic sets of links and scrape the data from each page. Since a webpage's scope is currently sand boxed, I had to find a way to pass data to and from webpages. With the addition of the new filesystem module in PhantomJS 1.3, it is now possible to pass data from the main scope to an individual page's scope. Any data that you want passed to a particular page should be saved as a javascript string to a javascript file. Then you can inject the javascript into the page on page.onLoadFinished so that the data is then accessible within the page's scope. For example:

var page = require('webpage').create(), 
     fs = require('fs'), 
     data = "var dataObject = { item: 'value' };", 
     fullpath;

fullpath = fs.workingDirectory + fs.separator + 'data.js';
// open file for writing
var dataFile = fs.open(fullpath, 'w');
dataFile.write(data);
dataFile.close();

// check that the file was successfully written
if(fs.size(fullpath) > 0) {
	console.log('File wrote successfully!');
	page.open('http://somesite.org/page.html');
	// put page data in a local variable
	var output = page.evaluate(function () {
		// print the output of the data object
		console.log(dataObject.item);
		return dataObject.item;
	});
	// output should be the same value as the page's dataObject.item
	console.log(output);
}
else {
	console.log('Error in writing the file!');
	phantom.exit();
}

page.onLoadFinished = function() {
	// inject the javascript data that we created earlier
	page.injectJS(fullpath);
}

For more information about PhantomJS' File System module, please visit: http://code.google.com/p/phantomjs/wiki/Interface#Filesystem_Module

While this solution may not be the best long term solution, it does provide a way to get data to and from your pages until official support for passing data to a webpage object becomes available in PhantomJS.

Installation of Postive SSL wildcard certificate

26. September 2011 01:27 by Cameron in Security, Web  //  Tags: , , , , , , , , ,   //   Comments

The other day, I splurged on getting a wildcard SSL certificate for my website, www.iga-home.net. I felt that it was important to secure content on my site for my users as sensitive data is sent when users login or post content to the site. I bought a wildcard SSL certificate since I wanted to be able to secure all subdomains of iga-home.net and not be restricted to just iga-home.net or www.iga-home.net. 

After I created the request on IIS for the certificate, I copied the output into my web browser on Comodo's website for requesting an SSL certificate. It seemed fairly straight forward. About 10-15 minutes later, I received an email with my SSL certificate attached in a zip file. I opened the zip file and saw my certificate with a .cert extension. I fiddled around for a while with trying to get this certificate installed through mmc and IIS's manager. When I tried to install my certificate through IIS, I kept receiving errors that IIS couldn't find my certificate request. I followed many tutorials and couldn't find a solution. I later found this page that said I should install these certificates first through mmc before I could install my purchased SSL certificate in IIS. After I installed these mentioned certificates, IIS accepted my certificate and I proceeded to adding SSL to my website. It's a shame that these certificates weren't bundled with the original zip file. It would have made life a lot easier.

My next task is to add a rewrite rule for sending all http requests to https requests. I also want to write a resource handler that caches remote resources on my server so that all resources are secure. In Google Chrome it will notify the user if some content displayed on a page is insecure and I want to remedy this problem. 

Take Screenshot of all HTML documents in a folder using PhantomJS

26. September 2011 01:14 by Cameron in javascript, PhantomJS, Programming  //  Tags: , , , ,   //   Comments

Recently I came across a question on stackoverflow that asked about how to take screenshots of all HTML files in a local folder. I have been playing with PhantomJS quite a bit lately and felt comfortable answering the question. Here is the code for those interested:

var page = require('webpage').create(), loadInProgress = false, fs = require('fs');
var htmlFiles = new Array();
console.log('working directory: ' + fs.workingDirectory);
var curdir = fs.list(fs.workingDirectory);

// loop through files and folders
for(var i = 0; i< curdir.length; i++)
{
	var fullpath = fs.workingDirectory + fs.separator + curdir[i];
	// check if item is a file
	if(fs.isFile(fullpath))
	{
		if(fullpath.indexOf('.html') != -1)
		{
			// show full path of file
			console.log('File path: ' + fullpath);
			htmlFiles.push(fullpath);
		}
	}
}

console.log('Number of Html Files: ' + htmlFiles.length);

// output pages as PNG
var pageindex = 0;

var interval = setInterval(function() {
	if (!loadInProgress && pageindex < htmlFiles.length) {
		console.log("image " + (pageindex + 1));
		page.open(htmlFiles[pageindex]);
	}
	if (pageindex == htmlFiles.length) {
		console.log("image render complete!");
		phantom.exit();
	}
}, 250);

page.onLoadStarted = function() {
	loadInProgress = true;
	console.log('page ' + (pageindex + 1) + ' load started');
};

page.onLoadFinished = function() {
	loadInProgress = false;
	page.render("images/output" + (pageindex + 1) + ".png");
	console.log('page ' + (pageindex + 1) + ' load finished');
	pageindex++;
}

The process is quite simple. First, I loop through all objects in the current working directory and check to see if each item is a file and whether it has the .html extension. Then I add each html file's filepath to an array that I later loop through to take the screenshots. A screenshot must be taken after the page is fully loaded so that the screenshot will contain actual content and not a blank image. This is done by saving the image on the page.onLoadFinished callback. The application loop for taking the screenshots inserts small 250ms delays between each request so that pages may fully load into the headless browser before advancing to the next page.

XBox Live Data

20. September 2011 14:32 by Cameron in Programming, Xbox Live  //  Tags: , , , , , , , , , , ,   //   Comments

While my gaming social networking site, IGA: International Gamers' Alliance, is still under beta, I have been looking at ways to provide a more rich experience for my users. Lately I've been working on a way to gather data from XBox Live so that I can provide content to my users on IGA. I used to have a way to gather data from a RESTful API, using the official XBox Live API, that Microsoft employee, Duncan Mckenzie, used to host on his website. However, his service is no longer available. While there is an official XBox Live API, access to this API is restricted to those who are in the XBox Community Developer Program. Acceptance into this the XBCDP is very limited at the moment and it seems that only well known companies with sponsors receive membership into the program. 

While it would be very nice to get official access to the XBox Live API, it may be a while until I can get into the program. My social networking site, IGA, is still in beta and has much to be done on the roadmap to completion. Currently I am the only developer for the project and I am also in school so development is slow. Maybe once IGA is closer to completion, Microsoft will be more eager to accept me into the program. In the meantime, I have a solution for gathering data from XBox Live.

There are a couple of places to get data from XBox Live. There is the publicly available user's gamercard and the user's protected XBox.com profile. Getting data from the public gamercard is very easy. One could write a parser in PHP, C#, or even jQuery to get the different values from the HTML elements on the page. Retrieving data from a user's XBox.com profile requires a little more skill and resources. You cannot simply use cURL to remotely login to XBox.com since it has anti-bot mechanisms in place to check against the browser agent, browser cookies, and many other aspects that can't easily be manipulated with cURL. There is a remedy to this problem however.

This past summer, I learned about a headless webkit browser called PhantomJS from some co-workers while working on a project at work. We needed something that could run without a GUI on a server that could manipulate the DOM of a webpage. PhantomJS gave us exactly what we needed. After working on the project at work, it occurred to me that I could use PhantomJS in addition to jQuery to manipulate the DOM and screen scrape data from XBox.com.

I'm currently working on scripts to pull data from users' profiles including the users' games, the achievements earned in each game, and more information not publicly available on users' gamercards. Please understand though that screen scraping should only be done on a last resort and it is taxing on both ends to make numerous requests per day. I will implement some sort of data caching that will pull new data on a schedule to limit bandwidth usage. I plan to release this code to my Git hosting when it is finished. 

Thoughts on Windows 8 Developer Preview

20. September 2011 13:30 by Cameron in Windows 8  //  Tags: , , , , , , , , , , , , , , ,   //   Comments

Last week I downloaded the Windows 8 Developer Preview in both the x64 and x86 editions. I first installed the x64 edition with the developer tools in a virtual machine in Oracle VirtualBox to get a feel for the operating system. After I had installed the DP, I immediately wanted to try out the new Metro UI applications. However, although the tiles were responding to my mouse clicks, they were not opening up in the virtual machine. I did some research to find out what might be the cause of this and learned that Metro UI apps require at least a 1024x768 screen resolution to run. I changed my screen resolution in my virtual machine to 1024x768 and voila, the Metro UI apps worked.

After getting a sense for the new Metro UI, I ventured into installing the 32 bit Windows 8 DP on my ASUS EEEPC T101MT Intel Atom netbook. The installation went smoothly as expected and I was brought into the Metro UI on login. My netbook has a 10.1 inch screen with a standard 1024x600 resolution so I had to apply a registry hack to get support for a 1024x768 resolution. After applying the registry hack and loading the EEEPC resolution changer, I was able to get my netbook running with a 1024x768 resolution. The higher resolution requires down scaling due to the smaller screen size, making things appear squished using the legacy UI. However, the Metro UI looks fairly nice with the higher resolution and most applications work well. 

I'd say overall, I'm pretty pleased with the first Windows 8 Developer Preview public release. I'm looking forward to seeing if Microsoft will actually add support for my smaller screen resolution in the upcoming releases. It will be great to see new features unfold as updates for the new operating system arrive.

Hackintosh Computer Build

23. August 2011 16:27 by Cameron in Hackintosh  //  Tags:   //   Comments

Last summer I built my first gaming rig with a somewhat substantial budget of about $1400. This was not my first computer build, but it was the first build that I had spent my own money on. My goal was to build a powerful machine that was hardware compatible with Mac OS X Snow Leopard and later. Initially I had some issues with my graphics card as the Fermi line wasn't supported by Apple until about halfway into Snow Leopard's life. In addition to OS X, I also boot Windows 7 x64 enterprise and Ubuntu 64bit 11.04, Natty Narwhal. As a developer, I enjoy using different operating systems to code for various projects.

Here's my original build from last summer:
Mac OS X version 10.6 Snow Leopard
http://www.amazon.com/Mac-OS-Snow-Leopard-10-6/dp/B002KG02QO/ref=dp_cp_ob_sw_title_3
Intel Core i7 875K 2.93GHz Quad Core CPU
http://www.newegg.com/Product/Product.aspx?Item=N82E16819116368
MSI P55 GD65 Motherboard
http://www.newegg.com/Product/Product.aspx?Item=N82E16813130239
ADATA XPG Gaming Series 4GB (2 x 2GB) 240-Pin DDR3 SDRAM DDR3 1600 (x 2)
http://www.newegg.com/Product/Product.aspx?Item=N82E16820211409
EVGA NVIDIA GTX 470 1280MB GDDR5
http://www.amazon.com/nVidia-GeForce-1280-PCI-Express-Video/dp/B003EM68MK (I originally bought this from Newegg. It has been discontinued there however)
1TB Samsung Spinpoint F3
http://www.newegg.com/Product/Product.aspx?Item=N82E16822152185
Seagate Momentus 5400.6 ST9500325AS 500GB 5400 RPM 8MB Cache 2.5" SATA 3.0Gb/s Internal Notebook Hard Drive -Bare Drive
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148371
80GB Excelstore  (already owned)

60GB Hitachi (already owned)

650W Thermaltake PSU
http://www.newegg.com/Product/Product.aspx?Item=N82E16817153116
Samsung Blu-Ray Combo

http://www.newegg.com/Product/Product.aspx?Item=N82E16827151199 (this drive has been discontinued on newegg)
Cooler Master Hyper 212 Plus CPU Cooler
http://www.newegg.com/Product/Product.aspx?Item=N82E16835103065
Antec 200 Mid Tower case
http://www.newegg.com/Product/Product.aspx?Item=N82E16811129074

This summer, I made a few upgrades:
G.SKILL 16GB DDR3 1333 RAM
http://www.newegg.com/Product/Product.aspx?Item=N82E16820231417
850W Thermaltake PSU
Bought this in CompUSA (Tiger Direct) store
2 x OCZ 60GB SSD
http://www.newegg.com/Product/Product.aspx?Item=N82E16820227542
Mac OS X Lion
Mac App Store

Installing Snow Leopard
I used iBoot supported (10.3 kernel) to boot the retail Snow Leopard 10.6.0 installer and Multibeast to install the DSDT. I installed the ALC889 kext from Multibeast and the legacy AppleHDA to get audio. I had to use my 9500 GT to install SL because I was getting kernel panics with my GTX 470 without the proper enabler installed. After SL installed, I installed the modified Fermi Chameleon bootloader to support my GTX 470. I installed the JMicron 36xxx kext from kexts.com to get PATA support.

Installing Lion

Once Lion was released, I bought Lion from the Mac App Store and I used tonymacx86's xMove to create a USB installer for installing Lion. I did a clean install of Lion on one of my OCZ SSDs since I no longer needed my Snow Leopard installation, I wanted to start fresh and I wanted to have the performance boost of my new SSD. Also, my GTX 470 works without a hitch in Lion as Apple has been supporting Fermi cards since about Snow Leopard 10.6.4.

Here are some older photos of my computer when I first built the system. I'll upload some newer photos soon.

Gitting started with Git

15. August 2011 00:25 by Cameron in Git  //  Tags: , , , , , , , , ,   //   Comments

Git, created by Linus Torvalds, is a very high quality version control system. It was created with the task to manage the source tree of the Linux kernel. Torvalds didn't believe that pre-existing version control systems could give justice to the Linux kernel's source code given its massive size and collaborators so Torvalds created Git. If you are using other version control systems for your projects, consider reading this: http://whygitisbetterthanx.com/

This website explains the advantages in full of why Git is better than other version control systems available. 

Git is free and open source and is available for all platforms: Linux, Mac, Windows, Solaris, you name it

First, be sure to install git for your platform and then you can start playing around with different commands. Once you've installed git, here are a few references to get you started:  

http://book.git-scm.com/  

http://www.kernel.org/pub/software/scm/git/docs/  

Setting Up Git

In order to setup your environment for using a remote git repository, be sure to run these commands:

$ ssh-keygen -t rsa -C "youremail@site.com"

This command creates a public/private key pair for SSH. SSH is used by git to encrypt the connection to remote servers. When asked to where to save your public key, press enter. Then, when asked for a passphrase, leave it empty. Your screen should look like this:

Generating public/private rsa key pair.

Enter file in which to save the key (/home/cameron/.ssh/id_rsa): 

Created directory '/home/cameron/.ssh'.

Enter passphrase (empty for no passphrase): 

Enter same passphrase again: 

Your identification has been saved in /home/cameron/.ssh/id_rsa.

Your public key has been saved in /home/cameron/.ssh/id_rsa.pub.

After your public/private key have been setup, add your global user information:

$ git config --global user.name "Firstname Lastname"

$ git config --global user.email "your_email@youremail.com"

Now you are ready to clone a repository. If you run:

$ git clone git@git.tinksoft.net:test.git

A new directory will be created for the git repository, test, and all of the remote files in the repository will be downloaded into that directory.

Git Command Basics

A few common commands to git are cloning repositories, committing to repositories, pushing to repositories, and pulling from repositories. If you've worked with subversion before, "git clone" is like subversion checkout. It literally clones the remote repository in its current state to your local repository. However, "git commit" is not like subversion commit. When you commit to a git repository, you are only committing to your local repository until you push to the remote repository. Using "git push" is like subversion commit and will push your changes to the remote repository. On the first push, you need to run the command "git push origin <branch name>". This tells git to push the origin to the branch that you specify. After that first push, you can run "git push" thereafter. If you choose to switch branches later on, you simply need to run the original command and specify your origin branch. Similarly, "git pull" behaves like subversion update and pulls down changes from your remote repository into your locally cloned repository. The same  applies to the first "git pull" as does the first "git push". Git needs to know which branch to pull from.  

One thing about pushing and pulling is that if you are working in a team and multiple people are pushing and pulling to the remote repository, you may be required to pull before you push out your changes. Don't worry though. If you have a conflict with your changes, your code will not be overwritten. Git has a conflict resolution tool where you can choose which changes to accept. Another thing that is good practice is to always run "git status" before committing and pushing to your repository. This will allow you to confirm that you are indeed committing files that should be committed to your repository. Also, whatever shows up in a commit log will be pushed to your remote repository when you push our your changes. Be sure to only push out working code and not break the build for your team.

A few advanced commands include "git branch <branch name>" (branches the repository at its current state), "git merge -s ours <branch>" (merges a branch with current branch), and "git checkout <branch name>" (changes current working branch). Please be sure to read up on these commands so that you know how to use them correctly. In a project repository, you don't want to create unnecessary branches, merge branches incorrectly, or lose changes when switching branches. Another advanced topic is to create a .gitignore file for your repository and put all files that git should ignore into this file. Each file should be on a separate line. This can be helpful if you don't want files such as database configurations to be pushed to your remote repository. 

For more information about git, be sure to read the references I listed above and also check out some books on git for a more in depth discussion.

Automatic builds in Jenkins from Git

11. August 2011 03:05 by Cameron in Continuous Integration, Git, Indefero, Jenkins  //  Tags: , , , , ,   //   Comments

Today I discovered how to run automated builds from Git post-receive hooks. Git has different hooks that you can trigger at various stages in the commit/push cycle. A full list of git hooks can be found here: http://www.kernel.org/pub/software/scm/git/docs/githooks.html

I found a very nice ruby script that does the trick of triggering an automatic build in Jenkins here: http://lostechies.com/jasonmeridth/2009/03/24/adding-a-git-post-receive-hook-to-fire-off-hudson-ci-server/

Here is the script:

#!/usr/bin/env ruby
#
while (input = STDIN.read) != ''
   rev_old, rev_new, ref = input.split(" ")
   if ref == "refs/heads/master"

       url="http://yourhudsondomain.com/job/job_name_here/build?delay=0sec"

       puts "Run Hudson build for job_name_here application"
       `wget #{url} > /dev/null 2>&1`
   end
end

I'm sure you could write a bash script to do the same thing if you wanted to, but the original author preferred to use Ruby.

I'm glad that automatic builds finally work. I struggled for quite some time on this issue. I was looking in the wrong place. The web interface that I use for git, Indefero, has a place for post-commit hook web urls, but the problem was that post-commit hooks don't behave the same in git as they do in subversion. I didn't want to trigger builds on post-commit in git but rather when someone pushes their commits to the server. If you have scm polling enabled for your job, you no longer need this after you've configured post-receive hooks.

Separation of Concerns

9. August 2011 17:57 by Cameron in Programming  //  Tags: , ,   //   Comments

A good programmer makes sure to provide proper separation of concerns while coding applications. This makes maintaining the application's source code much more manageable and it also prevents the application's source code from becoming one large function that does everything. Back in the days before object oriented programming and even procedural, it was difficult to separate functionality of one part of an application from another.

With procedural programming in a language like BASIC, many of you might remember the GOTO statement, quite possibly the worst programming language mechanism ever conceived. Using GOTO statements made application maintenance quite a challenge. People should never have to manage program flow manually through GOTO statements. They behave essentially like a JUMP instruction in assembly. However, in assembly, using constructs such as GOTO or JUMP is required as there is no other way to control program flow in assembly.

In languages such as C or later versions of Microsoft Quick Basic or QBASIC, the languages provide the ability to call functions from a main function, presenting a huge improvement in programming history. This made it possible to separate business logic from database/filesystem logic and thus was the beginning of better code.

With the continuing popularity of object oriented programming, separation of concerns is improved exponentially beyond what procedural programming had done previously. Programmers have the ability to separate their application's functions into objects that represent various parts of the application. For instance, in part of a user authentication system, one might create a user class that can then be instantiated and passed to the user data access object, the object that handles all the low level database interactions.

Using object oriented design, applications are clearly divided up into objects that serve their own individual purpose, while achieving the same end goal: a finished product. While there may be better approaches than object oriented design that become evident in the future, it is clearly one of the best way of modelling the real world in a virtual environment. Also, people think in terms of tangible items and enjoy representing application parts with objects. It will be interesting to see how the industry develops in the next 10 years and how design paradigms shift.

My thoughts on continuous integration

8. August 2011 17:29 by Cameron in Continuous Integration  //  Tags: , , , ,   //   Comments

Whether the choice is SVN or git or another version control system, I believe it is vital for software development groups to have a central source repository. My preferred version control system is git as it provides huge improvments over Subversion and you can work with a local repository without needing to affect the remote repository. With each commit or push to a team's central repository, it is important to check that the latest commit/commit doesn't break the main build. This is where continuous integration comes into play.

Continuous integration servers can attempt to build the source committed/pushed to the central repository and if the build passes, it can then integrate the changes into the main development branch. Some continuous integration servers will push the code to production once the code passes a series of unit tests and builds with the rest of the main development branch. However, if the build fails, the code will not be integrated into the main build and the failed build will be logged for reviewing. The event logging helps immensely with finding and fixing software bugs quickly, allowing the main branch to accept the changes made by the original user. This feature saves developers time and frustration in trying to sift through thousands of lines of code. Why should anyone manually check for a software bug if a computer can analyze the source code and find it for you?

With various continuous integration servers, project maintainers can view statistics such as commit/build success rate and code redundancy. In general practice, developers should never just copy and paste code. This is not coding. It's laziness. Usually, if you are copying and pasting code, you can probably refactor your code to use only one set of the code you were originally going to copy. There's no point having duplicate code in a code project if you can help it. One of the reasons it's a bad idea to copy and paste code is that the code base becomes harder to maintain. Another reason is that copied code doesn't necessarily work everywhere you paste it to. Just because it works in one place doesn't mean it will work as expected in another place.

With all of the benefits of continuous integration, I believe that every software development team should have some sort of continuous integration to track their projects whether they are an open source shop or Microsoft shop. Continuous integration really promotes the agile software development cycle and everyone should enjoy the advantages that it provides. I can definitely say that with all collaborated personal software projects that I work with I will make sure that continuous integration is key part of the development process.

 

Month List

Tag cloud