Thursday, 26 September 2013

Scraping Amazon.com with Screen Scraper

Let’s look how to use Screen Scraper for scraping Amazon products having a list of asins in external database.

Screen Scraper is designed to be interoperable with all sorts of databases and web-languages. There is even a data-manager that allows one to make a connection to a database (MySQL, Amazon RDS, MS SQL, MariaDB, PostgreSQL, etc), and then the scripting in screen-scraper is agnostic to the type of database.

Let’s go through a sample scrape project you can see it at work. I don’t know how well you know Screen Scraper, but I assume you have it installed, and a MySQL database you can use. You need to:

    Make sure screen-scraper is not running as workbench or server
    Put the Amazon (Scraping Session).sss file in the “screen-scraper enterprise edition/import” directory.
    Put the mysql-connector-java-5.1.22-bin.jar file in the “screen-scraper enterprise edition/lib/ext” directory.
    Create a MySQL database for the scrape to use, and import the amazon.sql file.
    Put the amazon.db.config file in the “screen-scraper enterprise edition/input” directory and edit it to contain proper settings to connect to your database.
    Start the screen scraper workbench

Since this is a very simple scrape, you just want to run it in the workbench (most of the time you want to run scrapes in server mode). Start the workbench, and you will see the Amazon scrape in there, and you can just click the “play” button.

Note that a breakpoint comes up for each item. It would be easy to save the scraped details to a database table or file if you want. Also see in the database the “id_status” changes as each item is scraped.

When the scrape is run, it looks in the database for products marked “not scraped”, so when you want to re-run the scrapes, you need to:

UPDATE asin
SET `id_status` = 0

Have a nice scraping! ))

P.S. We thank Jason Bellows from Ekiwi, LLC for such a great tutorial.


Source: http://extract-web-data.com/scraping-amazon-com-with-screen-scraper/

No comments:

Post a Comment