How to Boost Your SEO Audit with Data Enrichment

If you use data sources outside your desktop crawler to perform SEO audits, you may be surprised to learn that these data sources can easily be combined. Netpeak Spider lets you enrich your audit with data. For example, you can add page indexing status or number of backlinks.

We’ll explain how to use this feature below, but let me start out by saying that it’s very simple. You’ll need a project that’s already crawled and a CSV file with the data you want to add — and that’s it.

1. Why is it Important to Enrich Crawling Data if You Want to Perform a Good SEO Audit?

A full-fledged audit needs more than just website crawling data. Specialists typically use the following data:

  1. Data from Google services:
    • Google Pagespeed Insights
    • Google Safe Browsing
    • Google Mobile Friendly Test
    • Google Search Console (already built into Netpeak Spider)
    • Google Analytics (already built into Netpeak Spider)
  2. External link data, such as:
    • Data from Ahrefs (UR, number of external links, number of dofollow/nofollow, etc.)
    • Majestic data (citation flow, trust flow, number of external links, etc.)
  3. Data about positions:
    • Number of keywords at positions
    • Position of the main high-frequency keywords for the page
  4. Data on the types of pages on a website:
    • Data by page type (especially if it cannot be determined by nesting)
    • Data indicating the most important pages for promotion.
  5. Page indexing status

Most of this data can be obtained using Netpeak Checker:

Data from various sources in Netpeak Checker

If you use Netpeak Checker to retrieve data, it will already be in the required format and no additional processing will be required when importing into Netpeak Spider.

So why is it important to use third-party data?

When you use third-party data in a project, you can segment pages by more metrics, and you can choose the metrics that are important to your project. This means that you can focus on a segment of pages with low Core Web Vitals or pages that have Mobile Score in the red zone on Google Pagespeed Insights:

Filter by Mobile Score on Google Pagespeed Insights in Netpeak Spider

As a result, you will see filtered data:

Filtered results by Mobile Score on Google Pagespeed Insights in Netpeak Spider

How to use the data enrichment feature:

  1. Crawl a website or URL list in Netpeak Spider.
  2. Save the project.
  3. Collect the necessary data for all the URLs you are interested in. You do not have to do this for all the pages crawled.
    1. The data should be in a CSV file.
    2. The first row should contain column headers. These headers will be used as parameter names in the main table in the program.
    3. The file must contain a column called ‘URL’ or 'url'. This column will be used as a key field for combining data in the program.
    4. Each line must contain a URL and its data.
    5. More detailed requirements for the file are described in the documentation.
  4. Load the prepared files into the program. You can upload several files at once. To load the data, go to 'Project' → 'Load data enrichment parameters...' in the main menu.
  5. After the files have been uploaded, the parameters will appear in the panel on the right.

    Data Enrichment parameters in the right-hand panel in Netpeak Spider

    Uploaded parameters can be disabled and enabled like other parameters in the program.

  6. Now the necessary parameters are added to the project, and you can use them to filter or segment.
    1. Filters and segments apply only to URLs that are crawled. If there are URLs in the enrichment file that are not crawled, the enrichment data will be displayed for them, but you will not be able to filter on them. We recommend that you synchronize these actions and crawl all URLs.
  7. Save the project and the added data will be saved in it.
  8. If you want to delete this data, select ‘Project' → 'Delete data enrichment parameters’ from the main menu.

    Option to delete Data Enrichment parameters in Project menu of Netpeak Spider

2. Use Cases for Enriching Crawling Results

2.1. Using Netpeak Checker to Get API Data

When it comes to standard technical audits, you can’t go without data from Google Pagespeed Insights and Google Mobile Friendly Test. The easiest way to get data from these services is with Netpeak Checker. If you haven’t done this before, you will need to connect to the Google API. For more information, see this help page.

Add the list of URLs to Netpeak Checker and select the appropriate options on the right-hand panel:

List of URLs and Google PageSpeed Insights parameters in Netpeak Checker

Press start and get the necessary data.

Results from Google PageSpeed Insights for list of URLs in Netpeak Checker

To export data, use the export menu button:

Netpeak Checker export menu button

You won’t need to make any additional changes; the file will be in the format you need to move forward.

Use this function to upload the file to Netpeak Spider:

Function for uploading parameters to enrich data in Project menu of Netpeak Spider

After the file is uploaded, the results will appear in both the general table and the parameter panel:

Result of uploading parameters for data enrichment in Netpeak Spider

What you do next depends on your objectives or tasks, but you’ll likely want to filter out problem pages with low scores:

Example of filter by Google Pagespeed Insights Mobile Score in Netpeak Spider

Note that you may be used to seeing this indicator as a percentage, but when it’s uploaded it’s displayed as a number between zero and one.

What we end up with is a list of pages with low scores, without leaving the program:

The result of filtering by Google Pagespeed Insights Mobile Score in Netpeak Spider

You can also use data from your CMS. Sometimes you can’t use nesting to determine what type of page you’re working with (article, product, category, etc.) because you don’t have descriptive words in URL paths. If this is the case, you can take this data from your CMS and add it to Netpeak Spider.

In this example, we don't have an easy way to filter out blog articles, so we uploaded a marker separately:

The results of filtering by types of pages in Netpeak Spider

2.2. Check Internal PageRank with Data Enrichment

You can also check marked pages with PageRank. Let's say you have a list of products that generate the most revenue and have high margins. Your task is to check if these pages have a high enough internal link weight.

Here, you’ll want to take the following steps:

  1. Crawl a website with link parameters.

    Links parameters in the right-hand panel in Netpeak Spider

  2. Add page importance labels using Data Enrichment.
  3. Run the internal PageRank calculation tool.

    Internal PageRank calculation function in Netpeak Spider menu

  4. Add this data to the main table.

    Internal PageRank calculation function in Netpeak Spider

  5. Analyze PageRank values for the pages in question.

    Main table of Netpeak Spider with Internal PageRank data

2.3. Correlation Between PageRank and URL Rating by Ahrefs

The popular page-level metric that SEO specialists use to assess the strength of page-level link profiles also depends in part on PageRank. The Ahrefs help section doesn't explain it in depth, but it does point out that: 'Both internal and external links are taken into account.'

Using the method above, you can see the correlation between the PR value, UR, and the number of internal inbound links to your top pages.

But things get tricky if your website has thousands of pages. In that case, we recommend getting the UR value for each of the PR values. For example, for a website with 20,000+ pages, 3,000 unique PR values may be obtained. This means you won’t need to add data for 20,000 pages, just for 3,000 or fewer. Even with a small sample of data, you can trace the correlation and draw conclusions about the impact of PR on UR.

3. Conclusion

Using the features described here, you can expand the scope of your audit by supplementing it with third-party data. This will allow for a more systematic analysis of a website without leaving Netpeak Spider. Data from Google Analytics and Google Search Console can already be found in the standard Netpeak Spider parameters, since it’s used quite often. No need to use this function to retrieve data from those sources — you can keep things simple :)

What’s most important is that this function lets you segment data in ways that best suit your needs.