Thin Product Listing Pages: How to Find and Fix Them
Many Ecommerce website owners are stunned when they see that 90% of their website is out of the search engine's index. It usually happens when there are lots of pages search engines marked as low-quality either because they are duplicate or have not enough content. One of the most common reasons for this is the presence of product listing pages that contain a low number of products. They are defined as thin content pages that hamper proper indexing and overall website ranking. Let's see how such pages appear, how to detect and fix them.
1. Sorting Out Products on Your Website
Big Ecommerce websites have hundreds and thousands of different products. That's why various taxonomies are created in order to sort and categorize them. The most common ones are:
- Categories – these are broad groups of products according to their type or some general characteristics. They have hierarchical structure so you can create subcategories (e.g. Shoes → Working boots → Men's).
- Tags – an effective method to group products according to their specific characteristics. Tags are not hierarchical and they can include products from different categories and subcategories. For instance, the 'Special offer' tag may contain products that match its condition from all categories.
- Filters – another great way to manage products on your website. Filters offer the deepest sorting based on specific attributes (e.g. size, color, price, etc.). It's worth mentioning that there are several ways to implement filtering and some of them won't even change the initial URL, thus they won't create new pages.
These techniques might be implemented in multiple ways but they all have the same purposes:
- to ease user navigation,
- to create more landing pages targeting bigger number of high (categories/subcategories), medium (subcategories/tags/filters), and low (tags/filters) competition keywords,
- to increase conversion rate (if a user searching for red dresses lands on a specific page where red dresses are listed, the conversion rate is much higher).
When the first one is quite clear, the last two require creating lots of new pages that might cause some serious issues if treated incorrectly.
2. Thin Product Listing Pages and Why They Are Harmful
Bulk creation of pages for certain tags and filters can cause lots of low-quality ones (so-called 'zombie pages') that contain a low number of products (1-3) and negatively affect your website. These pages usually aren't demanded because no one will search 'green men suede boots under 100 dollars', so there is no need to index such page.
As an example, that's what you get if you use the 'Summer shop' tag within the 'Coats & Jackets' category. Certainly not the best page to be indexed.
The two main reasons why you should get rid of such pages:
- They are marked as thin content pages and affect your overall website quality.
- They waste your crawl budget making important pages wait to get indexed.
3. How to Detect Thin Product Listing Pages
You can easily find such pages using scraping power. In our example, we'll show you how to do it with Netpeak Spider.
- You need to conduct some basic analysis in order to find strings that define product listing pages and products. In other words, you have to find 2 strings in source code: one that defines a product list and the other that defines a product itself. Just right-click, choose 'Inspect' and hover over the needed element.
For example, this is the one for a product list:
And this is for products:
- Check whether the indicator of a product list is present only at the product list pages. Use search through the source code of product pages to make sure it's not there.
- Also, check if the number of products you see on the page = the number of times you find the parameter you've chosen for products in the source code of a product listing page.
- Launch Netpeak Spider.
- Go to the 'Settings' tab → 'Scraping' and set the following parameters:
- Custom name
- Type of scraping → 'Contains'
- Fingerprint parameter you've chosen
- Search space → 'All source code'
- Now you have to exclude paginated pages. Check how pagination is implemented in the URL and copy it.
- Go to the 'Settings' tab → 'Rules' and exclude crawling of paginated pages.
- Select the 'Minimum' template in the 'Parameters' tab in a sidebar and start crawling.
- When crawling is complete, go to the 'Reports' tab → 'Scraping', and set all product listing pages as a segment.
- In the 'Reports' tab → 'Scraping' select pages where products were found and click 'Show selected'.
- Sort them by the number and have a look at those that contain only a few products. Double check to make sure that everything is correct.
- Export the list of URLs for further actions.
4. How to Fix Thin Product Listing Pages
As soon as you managed to find all these zombie pages, it's time to get rid of them. Don't get me wrong, I'm not saying you have to 404 them, but you should tell search engines there's no need to index them.
First of all, check whether the hierarchy of your categories is logical and there are no categories that have a small number of products. If you find some, you should either add more products to them or merge them with other relevant categories. For instance, if you have two different categories called 'Bicycles' and 'Electric Bicycles' and the last one contains only 4 products, you better redevelop your structure and make a filter which will show electric bicycles instead.
Set noindex, nofollow for all thin product listing pages. You can't really hide links to such pages so crawl budget will still be spent on them, but you will prevent them from getting into index. It's recommended to set noindex, nofollow for all combinations of more than 3 filters (as I said earlier, no one makes such long queries), for multiple filters chosen from one section (once again queries like 'red green white dress' are not that popular). The same for tags, if you allow combining several tags, noindex all pages made by the combination of 2 and more tags.
It should be placed in the HEAD section of an HTML page and that's how it looks like:
Some SEOs set rel=canonical to the parental category in order to tell search engines what pages should not be indexed. It requires setting rel=canonical pointing to the main category from all pages that are created by the further filters. However, Google might not consider this instruction if it finds the content on those pages not duplicate enough. So I'd still recommend setting noindex, nofollow.
Thin content issue is the source of major concern for all webmasters who work with big Ecommerce websites. While you might do your best to serve your customer creating various opportunities to navigate through your site, it's easy to miss technical things that will undermine your optimization. That's why it's essential to regularly audit your site and fix such issues as soon as possible.
And how do you detect thin content on your website and fix it? Don't hesitate to share your experience in the comments below.