How to download a file using Capybara and Poltergeist - Ruby

How to download a file using Capybara and Poltergeist - Ruby

___

In this tutorial, I explain how I managed to use Capybara and Poltergeist to download any file given that you are able to identify the url of the file you wish to download.



In one of my recent projects, I was working on a scraper that needed to login into a website and download a file which I would then save to use later on. Since the website was totally dependent on javascript being active, I decided to write the scraper using Capybara and Poltergeist as the javascript driver.

I quickly stumbled into an obstacle as phantomjs does not support file downloads and since Poltergeist is a wrapper around PhantomJS I was out of luck in having this kind of functionality out of the box. Also, adding support for this functionality in PhantomJS as i came to discover was not likely to happen.

So like always, when there is no outright solution and using another library in not an option, you have to hack your way around it to get it to work. For me, I figured if I was able to know the download link, i'd be able call the url via javascript, retrieve the binary file, base64 encode it and reverse-decode it using Ruby so I can save it for later.

I achieved this in the following steps. Credits StackOverflow

1. STEP 1

The first thing I did was have a javascript function to download the binary file

getBinary.js

Loading Gist

2. STEP 2

Inorder to save the file in Ruby, I need a proper way to pass the binary data to ruby in a way that ensured the data remained intact without modification during transport to avoid corrupting the file. I achieved this using the following javascript function

base64encode.js

Loading Gist

3. STEP 3

The final step was put everything together in a ruby method. I called it download. It expects a valid url as an argument. This is the url of the file you wish to download. How you get this url is upto you to figure out as per the requirements of your case.

Capybara gives you two different methods for executing Javascript:

page.evaluate_script("$('input').focus()")
page.execute_script("$('input').focus()")

The difference between the two is that evaluate_script will always return a result. The return value will be converted back to Ruby objects. This is exactly what I needed as compared to execute_script which always returns nil and is useful in cases where you don't care about the return value.

download.rb

Loading Gist

That's it! You should be able to integrate this in your ruby project to download any file using Capybara and Poltergeist. Please note however you may need to modify a few things in the download ruby method to work for your case.

Pro Tip

In my case, the download url was dynamically provided through an Ajax call on the website page I was crawling. I needed a way to get this url. Luckily, Poltergeist has a method to inspect network traffic just as you would do on a normal browser. I used the loop method to wait for the url to be available as shown below

sniff_file_url.rb

Loading Gist

That's it guys! Hope this tutorial was helpful to anyone looking to achieve downloads using Capybara Poltergeist driver. Have any questions? Kindly feel free to ask me in the comments section below. Happy Coding!



0 Comments

___

Latest Tutorials

___

How to download a file using Capybara and Poltergeist - Ruby New

In one of my recent projects, I was working on a scraper that needed to login into a website and download a file which I would then save to use later on. ...

Private Inbox System in Rails with Mailboxer

Introduction It's been quite a while since my last tutorial and since then I've recieved alot of requests by email to implement a private messaging system ...

Ajax Sortable Lists Rails 4

With me, is a simple to-do list application where users can create dummy to-do lists and displays them in card-like form just like in Trello. We want to e...

Managing ENV variables in Rails

Often when developing Rails applications, you will find a need to setup a couple of environment variables to store secure information such as passwords, a...

Gmail Like Chat Application in Ruby on Rails

Introduction We are all fond of the Gmail and Facebook inline chat modules. About a week ago, I came across a tutorial on how to replicate hangouts chat...

Load more scroll top