How to Scrape Prices from Online Stores with Netpeak Spider

4
2
4
2
How to Scrape Prices from Online Stores with Netpeak SpiderUse Cases

Data scraping is a labor-intensive and time-consuming procedure but you can’t do any extensive competitive analysis without carrying it out. You can simplify this process with automated data scraping and exporting with Netpeak Spider ‘Custom Search’ function.

In this post we will show you how to collect, filter and export prices from competitors' online stores.

However, cases for using custom search are not limited to scraping prices. You can extract different types of data except those protected in the website code.

1. How Custom Search Works

‘Custom Search’ is a Netpeak Spider feature that allows you to make a website search and extract the data you need from web pages. There are 4 types of search:

  • ‘Includes’ (‘only search’)
  • ‘RegExp’ (‘search and extraction’)
  • ‘CSS Selector’ (‘search and extraction’)
  • ‘XPath’ (‘search and extraction’)

We will use ‘XPath’ type used mostly for online stores data scraping. However, the type of search depends on the website structure.

You can get a detailed information about each type of search in the function overview.

2. How to Find the Element for Search

Before parsing you should define required data. In our case, it’s pricing.

To get the source code of this element you need to:

  1. Open product page.
  2. Find the price and hover over it.
  3. Right-click on it and choose ‘Inspect’ in the context menu.
  4. Inspect the price element in the opened window (page will highlight the element your cursor is hovering over).
  5. Right-click on it and select ‘Copy’ → ‘Copy XPath’.

How to copy XPath to parse prices in online stores

Usually, it’s enough to copy XPath from a single product page to scrape prices from the entire website or a category. But be aware, it works like that only if all product pages have the same template.

3. Setting and Launching the Crawling

3.1. Setting Custom Search

You can perform the custom search in a few simple steps:

  1. Choose ‘Entire Website’ mode in crawling mode settings.
  2. Go to the ‘Custom search’ tab in the ‘Crawling Settings’ window.
  3. Turn on ‘Use custom search’ option.
  4. Choose the ‘Xpath’ type of search and enter the code you’ve copied in the ‘Search expressions’ box. Then choose ‘Data extraction’ mode → ‘Inner text’. If you need to scrape several different types of data in addition to the main price (old price, discount, quantity, etc.), you can add more custom search flows. You can add up to 15 simultaneous searches.
  5. Name each search flow not to get lost in final search results. We have only one search flow responsible for price extracting.
  6. Go to the ‘Rules’ tab.

How to scrape prices from online stores: Crawling settings of 'Custom Search and Extraction' in Netpeak Spider

3.2. Rules Setting and Scraping Launch

If you need information about all products on the current website or you can’t choose just one category (URLs do not have any feature or element in common), use custom search parameters from paragraph 3.1. and launch entire website crawling with default Netpeak Spider settings.

General Netpeak Spider crawling settings to scrape prices from online stores

If URLs of the website look like site.com/category/product or site.com/category-product, you can use custom rules setting.

In our case, we are interested in Air Jordan brand products with URLs beginning with shop.bdgastore.com/collections/air-jordan/.

URL example to set crawling rules in Netpeak Spider for price scraping

‘Rules’ in ‘Crawling settings’ allow you to see only specific pages in crawling results. It means that you will see only pages that abide by your custom rules (in our case, all URLs must begin with shop.bdgastore.com/collections/air-jordan/).

There are two ways to do it: include pages of a particular type or exclude category you’re not interested in. This is how you can do it:

  1. Choose ‘Include’ or ‘Exclude’ option.
  2. Choose matching type (in our case it’s ‘Begins with’).
  3. Enter the common element of all URLs you are interested in (or have absolutely no interest in) into the field below.

  4. Setting of crawling rules in Netpeak Spider to scrape prices from online stores
    If you want to set several rules at the same time, you can choose ‘and’/’or’ logic of rules setting. It will determine how the crawler will filter pages.
  5. Apply settings and launch crawling.

As the result of setting crawling rules, Netpeak Spider will show you only pages answering your requirements.

In the last right column of the main results table (by default, it’s called XPath but we’ve named it ‘Price’) you will find custom search results. It will show you the number of all entries of the expression in every URL (1 product = 1 entry). You can find results of the custom search in the right panel of the main window, on a ‘Search’ tab.

Custom search results of scraping prices on a 'Search' tab in Netpeak Spider

Please, note that it would be better to turn off analyzing images on ‘General’ crawling settings tab. Sometimes product URLs and image names look almost identical, so the last ones will also get on a list of crawled pages.

4. Data Exporting

To get a table summary that contains only URLs and price information you need, perform the following operations:

  1. Open ‘Search’ tab in the right panel of the main window.
  2. Click ‘All’ button to show all custom search results.

  3. 'All' button to show all custom search results of price scraping in Netpeak Spider
  4. Filter obtained data if necessary.
  5. Click ‘Export’ button to save the search results.
Exporting search results of price scraping from Netpeak Spider

Summary

To scrape data from the competitors' websites you need to perform several consistent actions:

  1. Determine element for custom search (in our case it’s price).
  2. Copy XPath.
  3. Set custom search parameters.
  4. Set custom crawling rules.
  5. Launch crawling.
  6. Export obtained data.

Using this method you can scrape not only prices but many other types of data. Therefore, ‘Custom search’ function can be useful for digital and content marketers, SEO specialists, webmasters, sales managers, etc.

Do you use this function in your working process? If so, for what purpose?

Share your experience in the comments below and don’t hesitate to ask any questions about the custom search. We’ll be glad to answer them! :)

Read this post inRussian