How to Scrape Prices from Online Stores with Netpeak Spider
Data scraping is a labor-intensive and time-consuming procedure but you can’t do any extensive competitive analysis without carrying it out. You can simplify this process with automated data scraping and exporting with Netpeak Spider ‘Custom Search’ function.
In this post we will show you how to collect, filter and export prices from competitors' online stores.
However, cases for using custom search are not limited to scraping prices. You can extract different types of data except those protected in the website code.
1. How Custom Search Works
‘Custom Search’ is a Netpeak Spider feature that allows you to make a website search and extract the data you need from web pages. There are 4 types of search:
- ‘Includes’ (‘only search’)
- ‘RegExp’ (‘search and extraction’)
- ‘CSS Selector’ (‘search and extraction’)
- ‘XPath’ (‘search and extraction’)
We will use ‘XPath’ type used mostly for online stores data scraping. However, the type of search depends on the website structure.
You can get a detailed information about each type of search in the function overview.
2. How to Find the Element for Search
Before parsing you should define required data. In our case, it’s pricing.
To get the source code of this element you need to:
- Open product page.
- Find the price and hover over it.
- Right-click on it and choose ‘Inspect’ in the context menu.
- Inspect the price element in the opened window (page will highlight the element your cursor is hovering over).
- Right-click on it and select ‘Copy’ → ‘Copy XPath’.
Usually, it’s enough to copy XPath from a single product page to scrape prices from the entire website or a category. But be aware, it works like that only if all product pages have the same template.
3. Setting and Launching the Crawling
3.1. Setting Custom Search
You can perform the custom search in a few simple steps:
- Choose ‘Entire Website’ mode in crawling mode settings.
- Go to the ‘Custom search’ tab in the ‘Crawling Settings’ window.
- Turn on ‘Use custom search’ option.
- Choose the ‘Xpath’ type of search and enter the code you’ve copied in the ‘Search expressions’ box. Then choose ‘Data extraction’ mode → ‘Inner text’. If you need to scrape several different types of data in addition to the main price (old price, discount, quantity, etc.), you can add more custom search flows. You can add up to 15 simultaneous searches.
- Name each search flow not to get lost in final search results. We have only one search flow responsible for price extracting.
- Go to the ‘Rules’ tab.
3.2. Rules Setting and Scraping Launch
If you need information about all products on the current website or you can’t choose just one category (URLs do not have any feature or element in common), use custom search parameters from paragraph 3.1. and launch entire website crawling with default Netpeak Spider settings.
If URLs of the website look like site.com/category/product or site.com/category-product, you can use custom rules setting.
In our case, we are interested in Air Jordan brand products with URLs beginning with shop.bdgastore.com/collections/air-jordan/.
‘Rules’ in ‘Crawling settings’ allow you to see only specific pages in crawling results. It means that you will see only pages that abide by your custom rules (in our case, all URLs must begin with shop.bdgastore.com/collections/air-jordan/).
There are two ways to do it: include pages of a particular type or exclude category you’re not interested in. This is how you can do it:
- Choose ‘Include’ or ‘Exclude’ option.
- Choose matching type (in our case it’s ‘Begins with’).
- Enter the common element of all URLs you are interested in (or have absolutely no interest in) into the field below.
- Apply settings and launch crawling.
As the result of setting crawling rules, Netpeak Spider will show you only pages answering your requirements.
In the last right column of the main results table (by default, it’s called XPath but we’ve named it ‘Price’) you will find custom search results. It will show you the number of all entries of the expression in every URL (1 product = 1 entry). You can find results of the custom search in the right panel of the main window, on a ‘Search’ tab.
Please, note that it would be better to turn off analyzing images on ‘General’ crawling settings tab. Sometimes product URLs and image names look almost identical, so the last ones will also get on a list of crawled pages.
4. Data Exporting
To get a table summary that contains only URLs and price information you need, perform the following operations:
- Open ‘Search’ tab in the right panel of the main window.
- Click ‘All’ button to show all custom search results.
- Filter obtained data if necessary.
- Click ‘Export’ button to save the search results.
To scrape data from the competitors' websites you need to perform several consistent actions:
- Determine element for custom search (in our case it’s price).
- Copy XPath.
- Set custom search parameters.
- Set custom crawling rules.
- Launch crawling.
- Export obtained data.
Using this method you can scrape not only prices but many other types of data. Therefore, ‘Custom search’ function can be useful for digital and content marketers, SEO specialists, webmasters, sales managers, etc.
Do you use this function in your working process? If so, for what purpose?
Share your experience in the comments below and don’t hesitate to ask any questions about the custom search. We’ll be glad to answer them! :)