20 Jun 18
In this tutorial, I explain how I managed to use Capybara and Poltergeist to download any file given that you are able to identify the url of the file you wish to download.
In one of my recent projects, I was working on a scraper that needed to login into a website and download a file which I would then save to use later on. Since the website was totally dependent on javascript being active, I decided to write the scraper using Capybara and Poltergeist as the javascript driver.
I quickly stumbled into an obstacle as phantomjs does not support file downloads and since Poltergeist is a wrapper around PhantomJS I was out of luck in having this kind of functionality out of the box. Also, adding support for this functionality in PhantomJS as i came to discover was not likely to happen.
So like always, when there is no outright solution and using another library in not an option, you have to hack your way around it to get it to work. For me, I figured if I was able to know the download link, i'd be able call the url via javascript, retrieve the binary file, base64 encode it and reverse-decode it using Ruby so I can save it for later.
I achieved this in the following steps. Credits StackOverflow
The first thing I did was have a javascript function to download the binary file
Inorder to save the file in Ruby, I need a proper way to pass the binary data to ruby in a way that ensured the data remained intact without modification during transport to avoid corrupting the file. I achieved this using the following javascript function
The final step was put everything together in a ruby method. I called it download. It expects a valid url as an argument. This is the url of the file you wish to download. How you get this url is upto you to figure out as per the requirements of your case.
Capybara gives you two different methods for executing Javascript:
page.evaluate_script("$('input').focus()") page.execute_script("$('input').focus()")
The difference between the two is that evaluate_script
will always return a result. The return value will be converted back to Ruby objects. This is exactly what I needed as compared to execute_script
which always returns nil and is useful in cases where you don't care about the return value.
That's it! You should be able to integrate this in your ruby project to download any file using Capybara and Poltergeist. Please note however you may need to modify a few things in the download ruby method to work for your case.
In my case, the download url was dynamically provided through an Ajax call on the website page I was crawling. I needed a way to get this url. Luckily, Poltergeist has a method to inspect network traffic just as you would do on a normal browser. I used the loop method to wait for the url to be available as shown below
That's it guys! Hope this tutorial was helpful to anyone looking to achieve downloads using Capybara Poltergeist driver. Have any questions? Kindly feel free to ask me in the comments section below. Happy Coding!
Rails
20 Jun 18
In one of my recent projects, I was working on a scraper that needed to login into a website and download a file which I would then save to use later on. ...
Rails
05 Apr 15
Introduction It's been quite a while since my last tutorial and since then I've recieved alot of requests by email to implement a private messaging system ...
Ajax
22 Dec 14
With me, is a simple to-do list application where users can create dummy to-do lists and displays them in card-like form just like in Trello. We want to e...