Netpeak Spider 3.2: JavaScript Rendering and Express Audit in PDF

Обновления
17Нравится
3Комментарии
Поделиться
Netpeak Spider 3.2: JavaScript Rendering and Express Audit in PDF

Folks, in this post you will find out what we've prepared in the long-awaited Netpeak Spider 3.2 update :) We can't wait to share all details but first, let's watch a short video teaser.

Now let's get on with the details. If you're so eager to read a certain part, save your scroll wheel and use the clickable content:

  • 1. JavaScript Rendering
  • 2. Express Audit of the Optimization Quality (PDF)
  • 3. Enhanced Issue Description
  • 4. Other Changes
  • 5. In a Nutshell
  • (!) Breaking News for Our Users

1. JavaScript Rendering

Before we talk about this long-desired feature in our crawler, let's go over how Google indexes content created dynamically using JavaScript.

Can't wait to try JavaScript rendering? Hide details and go to the descripton of the rendering settings in Netpeak Spider.

1.1. What Is JavaScript Rendering

JavaScript rendering is the final view of the webpage's HTML code with the changes made by JS-scripts.

Initially, search robots crawled and indexed only content in the static HTML source code. Now website developing includes JS frameworks to add content partially or fully using JavaScript.

Here is a clear example of HTML code before and after JavaScript execution from our own website:

  • the view of Netpeak Spider price on one of our pages – before JavaScript execution:
<h4>12 months</h4> <div>$[vm.amountByMonthMain[vm.urlParameter__spider]]/mo.</div>
  • the view of the same block after JS-scripts execution:
<h4>12 months</h4> <div>$9,80/mo.</div>

Google had to adapt to the evolution of web technologies, and now it processes and renders content that is added to a page using JavaScript. Their web rendering service (WRS) is based on Chrome browser.

Googlebot crawls sites in 2 waves:

  1. During the first one, it (as before) works only with HTML: it requests source code, crawls and indexes content, and also adds found links to the crawl queue.
  2. The second wave involves content rendering. At this stage, search bot needs resources to execute JavaScript. Therefore, if there are insufficient crawling resources, rendering requires an extra time.

Google separates indexing and rendering to quickly index content that is available without JavaScript, and then go back and add content for which it is required.

Waves of crawling websites by Googlebot with JavaScript execution

If Google did not receive important page content at the first wave, then adding of this content to the index will occur at the second wave with possible delays.

Modern web applications use 2 ways of content rendering:

  • Client-side rendering (CSR) – the search engine gets an empty HTML page (with almost no content), and the main content is added to the page using JavaScript scripts. In this case, you have to wait for the second wave of crawling, and the content may be indexed with a delay.
  • Server-side rendering (SSR) – in this case, the search engine receives the full HTML with all important webpage content. Therefore, Googlebot adds content to the index at the first wave of crawling, and you don't have to worry about possible delays of the second wave.

An advanced SEO specialist has to understand these nuances. Essentially, JavaScript SEO is ensuring that content added to a page with JavaScript is successfully processed by a search robot, indexed, and ranked.

Now we will explain why JavaScript rendering is needed in modern SEO tools, and how to use it.

1.2. Why Does Crawler Need JavaScript Rendering

If you try to crawl a site that uses CSR in a traditional way (analyzing only HTML), the crawler will not be able to detect the data added with JavaScript (links, descriptions, images, etc.), thus it will not find issues on such site.

That's why, to process such sites as a Googlebot, the crawler needs a browser to execute JavaScript → to download all the content added by the scripts and only then analyze it.

1.3. How to Set up JavaScript Rendering in Netpeak Spider

We've implemented JavaScript rendering in Netpeak Spider using integrated Chromium browser. The world's most popular browser Chrome was created on the basis of that browser. We use one of the latest versions of Chromium, which makes crawling in Netpeak Spider as advanced and close to Googlebot as possible. Though it's not the same, because Google uses the old version of Chrome 41 browser, which does not support some modern JavaScript features.

To start crawling pages with enabled JS rendering, go to the 'Settings' → 'General' tab → tick the 'Enable JavaScript rendering and set AJAX timeout, s' checkbox:

Setting of the JavaScript rendering and AJAX timeout in Netpeak Spider

The 'AJAX timeout' sets the timeout for JavaScript execution after the page and resource files (JS / CSS) are loaded. It is necessary for all the scripts to have enough time to work.

Let's look at the example of crawling a website where the whole content is shown using JavaScript. In this case, you will not be able to crawl such site without enabled JS rendering.

Example of crawling JS-site without JavaScript rendering in Netpeak Spider

Example of crawling JS-site without JavaScript rendering in Netpeak Spider

Example of crawling JS-site with enabled JavaScript rendering in Netpeak Spider

Example of crawling JS-site with enabled JavaScript rendering in Netpeak Spider

1.4. Peculiarities of JS Rendering in Netpeak Spider

Netpeak Spider executes JS scripts only for HTML pages with a 200 OK response code. It speeds up the program and allows you not to waste resources on the pages that don't need the browser for processing.

If you want to crawl a site with enabled JavaScript rendering, remember that it increases crawling time. When the crawler works without rendering, there is only one request to get the HTML code of a page. When you enable the rendering, there is an additional request to Chromium to get the HTML code, load the JS and CSS files, and execute the JavaScript during the time specified in the AJAX timeout setting. Accordingly, crawling will take more time.

Also, while JavaScript rendering is enabled, Netpeak Spider:

  • uses the User Agent chosen by a user in the crawling settings;
  • supports basic authentication;
  • supports list of proxies;
  • blocks requests to analytics services (Google Analytics, Yandex.Metrica, etc.) in order not to distort site analytics;
  • considers cookies regardless of the settings on the 'Advanced' tab;
  • doesn't load iframes and images;
  • is limited to 25 threads. So if you set 100 threads in the crawling settings, Netpeak Spider will simultaneously crawl 100 documents in the usual way, however, only 25 compliant HTML pages will be rendered.


Now let's dive into the second new feature in our crawler. We reckon that visual learners will definitely like it ;)

2. Express Audit of the Optimization Quality (PDF)

In the new Netpeak Spider version, you can export a report in PDF format in one click with an express audit of the optimization quality based on the crawling results.

This feature allows you to get the best of 'two worlds' in Netpeak Spider: the deep analysis and customization of the desktop tool and the results visualization like in the most advanced online products.

This report is available on the 'Reports' tab:

Export of the express audit of the optimization quality in Netpeak Spider

We set a goal to achieve the highest quality data visualization adding completely new metrics to the program logic, and took into account the following nuances:

  • The report will only show data found on a site during crawling. Sections and diagrams without data are not displayed. It means that you will not see empty sheets and tables because some issue / parameter was not found during the crawling.
  • The report displays illustrative examples for data analysis, rather than a complete list of relevant URLs. So, when analyzing medium and large sites, you will not have to scroll through dozens of the report pages with hundreds of hard-to-read links.

2.1. Who Will Find This Report Useful?

First of all, PDF reports will be useful for SEO specialists. This is an improved and expanded (20+ pages) version of the dashboard, which is convenient for assessing the optimization quality of a project or a list of URLs. The key information for site audit is clearly shown here. It is enough to add your own recommendations and you can send it to a client or colleagues for implementation.

If you are a sales specialist, the report contains a brief summary of the site status. It can help you quickly evaluate the project and discuss the scope of work with the client.

2.2. Structure and Peculiarities of PDF Reports

The report is based on the data from the 'All Results' table, which is directly affected by the settings, parameters, and segmentation. You can test the report by changing different settings.

Each section of the report is dedicated to a certain optimization aspect. Let's take a closer look at each of them.

2.2.1. Title Page + Contents

If you crawl a website in Netpeak Spider, then the first page will show a screenshot of the initial page and the domain of the crawled resource. So it is easy to distinguish reports for different sites.

Netpeak Spider: title page of the crawled website in express audit

If you crawl a list of URLs, you'll see the corresponding image instead of a screenshot:

Netpeak Spider: title page of the crawled list of URLs in express audit

To ease navigation through the document, there is an anchor link for each section of the content.

2.2.2. Overview

This section contains a brief overview of crawling results. This is the most succinct report. It allows you to quickly understand what data is being analyzed, and what issues were found.

‘Overview’ section in the express audit of the optimization quality

It shows the following data:

  • Crawling mode → crawling of a website or a list of URLs.
  • Initial URL. If Netpeak Spider crawls a list of URLs, you'll see the first URL from the list.
  • The number of parameters selected during report generation.
  • The number and types of URLs the current report is based on.
  • The number of URLs with major issues (errors and warnings).
  • The main types of issues → fresh grouping for even faster understanding of audit results.
  • Content type → separate diagrams for external and internal pages.
  • The main hosts.
2.2.2.1. Content Type (of Internal and External URLs)

These reports help evaluate and compare the number of different document types found during crawling. You'll see separate charts for internal and external URLs (here and in other report sections they will be marked in blue and violet accordingly).

Netpeak Spider: ‘Overview’ section in the express audit of the optimization quality – content type

2.2.2.2. Top Hosts

This table contains main hosts analyzed in the express audit. It may be very handy if your website has a lot of hosts or if you crawl a list of URLs.

Netpeak Spider: ‘Overview’ section in the express audit of the optimization quality – top hosts

2.2.3. URL Structure

This report helps to visualize the structure of crawled URLs. It shows the most popular hosts and second-tier segments (e.g. site.com/category/).

Netpeak Spider: ‘URL structure’ section in the express audit of the optimization quality

Also, we've added data about [root documents] – the number of documents in the site root (they do not contain segments).

2.2.4. Status Codes

This section shows status codes of internal and external URLs.

Netpeak Spider: diagram of ‘Status codes of internal URLs’ in the express audit of the optimization quality

Netpeak Spider:diagram of ‘Status codes of external URLs’ in the express audit of the optimization quality

2.2.5. Crawling and Indexing

This section shows data on instructions and server settings affecting website crawling and indexing (only internal URLs are analyzed here). Note that non-compliant documents usually don't drive search traffic and even waste crawl budget.

Netpeak Spider: ‘Crawling and Indexing’ section in the express audit of the optimization quality

You might be already familiar with the charts on reasons for URL compliance and non-compliance (they are also available on the Netpeak Spider dashboard). Meta Robots, X-Robots-Tag, and canonical overviews are the new reports that are currently available only in express audit.

Netpeak Spider: Meta Robots, X-Robots-Tag, and canonical overviews in the express audit of the optimization quality

2.2.6. Click and URL Depth

This section shows click depth (the number of clicks from the initial page) and URL depth (the number of segments in a URL).

Netpeak Spider: ‘Click and URL Depth’ section in the express audit of the optimization quality

More about possible issues:

  • Pages with click depth over '4' may have indexing issues, thus not drive organic traffic.
  • Page URLs with depth over '4' and/or too long address may be hard to perceive for site visitors.

2.2.7. Load Speed

This report helps to evaluate the server response time of internal and external documents. This parameter is a major ranking factor – fast sites usually drive more traffic, have lower bounce rate, and higher conversion rate. Note that only URLs returning 2xx status code are analyzed in this report.

Typically, a server spends more time generating and processing HTML pages, while static files are usually cached and returned faster. That's why we have separated the reports for HTML and non-HTML pages to analyze pages of similar types.

Netpeak Spider: ‘Load speed’ section in the express audit of the optimization quality

Also, internal and external documents are shown separately. First of all, you need to pay attention to internal ones and do everything in your power to make them load quickly. But do not forget about external. If the site refers to an external document that takes a long time to open, it will worsen the user experience. And if external resources (JS, CSS, images, fonts) are loaded to the HTML page, then their load speed directly affects the load speed of an entire page.

Additional maximum, minimum, and median values are displayed for internal documents. This will help to quickly assess the variation in load speed. If the server is stable, then the spread should be small.

2.2.8. HTTP/HTTPS Protocols

This section contains document protocols: secure (HTTPS), and insecure (HTTP). If websites with HTTPS protocol have HTML pages, images, or resources with HTTP protocol, it may cause the 'Mixed content' issue. In this case users may see a corresponding warning in browser and search engines will consider the site insecure.

Netpeak Spider: ‘HTTP/HTTPS Protocols’ section in the express audit of the optimization quality

2.2.9. Content Optimization

This section analyzes only the internal compliant pages, because the content-related SEO issues should be fixed first of all for the internal pages that potentially drive traffic.

We've focused on data that can be a signal for wasted optimization opportunities that can negatively affect ranking and website visibility in SERP.

  • Duplicate and Missing Title, Description, H1 → for both users and search engines, issues with these parameters can indicate poor quality of the site.
  • The number of characters and words on a page → often pages with more text have higher rankings because they cover the topic better.
  • Image size → pictures with a big size load slower and reduce the page load speed. If a page loads for a long time (especially on mobile devices, where the Internet speed is lower), the user may find the site low-quality and leave it.

Netpeak Spider: ‘Content optimization’ section in the express audit of the optimization quality

2.2.10. Issues

This section contains reports on issues detected during crawling:

  • the number of URLs with issues of different severity
  • top major issues that may have the most noticeable effect when fixed
  • the issue list with links to documentation and examples of the URLs they've been detected on

Netpeak Spider: ‘Issues’ section in the express audit of the optimization quality

2.2.11. Terms

At the end of the express audit you'll find the meaning of some important terms.

  • Crawled URLs – documents that have any data in the 'All Results' table. Note that segmentation affects this parameter.
  • Major issues – errors and warnings only.
  • Internal URLs – documents with the same domain name as the initial URL or the first URL in a list: usually, they are the internal pages of the analyzed website. They are highlighted in blue in all sections of the report.
  • External URLs – documents with the domain name different from the initial URL: you have to enable the 'Crawl external links' option on the 'General' tab of crawling settings to crawl them. They are highlighted in violet in all sections of the report.
  • Resources – documents that are not HTML nor images and return a 2xx status code. They include JavaScript, CSS, PDF, XML, JSON, PlainText, GZIP, etc.
  • Non-compliant – HTML documents returning not a 2xx status code or disallowed from crawling and indexing.

2.2.12. Settings

This section contains parameters and crawling settings that were used when creating this express audit. We have thought about possible situations beforehand that's why we show this data. It allows you to understand how the crawling was set and why some reports are missing.

Netpeak Spider: ‘Settings’ section in the express audit of the optimization quality

Now it's up to you → try to generate a report on already saved projects, or run crawling to collect data and explore all details in the report ;)


3. Enhanced Issue Description

During crawling Netpeak Spider shows detected, undetected, and disabled issues on a website. They can be errors, warnings, and notices, and are indicated by the corresponding color.

If you click on any issue on the 'Issues' tab in a sidebar, a detailed description appears in the 'Information' panel at the bottom of the interface.

We have significantly expanded the description of each issue with the following points:

  • the threat the issue can cause
  • how to fix it
  • useful links for deeper understanding of the problem

‘Information’ panel with enhanced issue description in Netpeak Spider

Now, if you have any questions about issues, look at the 'Information' panel. We tried to answer most of them there. By the way, try changing the height of this panel → it will be more convenient to read detailed issue descriptions, where we've got carried away ;)

3.1. Export of the Report with Enhanced Issue Descriptions

We will surprise you with another useful feature that will allow you to apply enhanced issue descriptions directly to your work.

Now you can export a combination from a brief summary of all found issues and their enhanced descriptions. This will allow you to quickly set a technical task and fix them.

Export of the report with enhanced issue descriptions in Netpeak Spider

This report is called 'Issue overview + descriptions'. You can find it in the 'Export' → 'Issue Reports' tab, and in the following bulk exports:

  • 'Main reports set'
  • 'All issues'
  • 'All available reports (main + XL)'

3.2. Issues Changes: Issue Severity, Names, Sorting

We also made a 'spring cleaning' among our parameters and issues. Here you'll find out what has changed.

3.2.1. Changes in Issue Severity

  • Warningserrors:

    • Duplicate H1
    • Canonical Chain
    • 5xx Error Pages: Server Error
    • Bad AMP HTML Format
  • Errorsnotices:

    • Bad Base Tag Format
    • Max URL Length
  • Warningsnotices:

    • Multiple H1

3.2.2. Changes in Issues' and Parameters' Names

  • Broken Links → Broken Pages (because you'll see pages in this report and you can look at links in the separate interface or even export them in the 'Export' menu),
  • Duplicate Canonicals → Identical Canonical URLs.

3.2.3. Changes in the Logic of Determining Issues and Parameters

  • 'Bad Base Tag Format' issue: previously, the relative URL in this tag was considered an issue. Now the issue is determined if the URL with the wrong format is specified in the href attribute.
  • 'Canonical URL' parameter: now only the absolute URL is taken into account in the canonical instruction, as required by Google. If relative is specified → the value in the table will be (NULL).

3.2.4. Changes in the Issues Sorting

We improved the sorting of issues by moving the most significant and common ones to the prominent places.


4. Other Changes

Let's check out a brief description of other changes in Netpeak Spider 3.2:

  • We've changed the logic of determining the internal addresses for a URL list. To determine whether a link is external or internal, Netpeak Spider considers the 'Initial URL'. If the domain matches, the link is considered internal, if it doesn't match, then it's external. Previously, when crawling a list of URLs without an initial one, all links were considered external. Now they are compared with each other: if all the hosts of the addresses belong to the same domain, the URL will go to the internal links report; if at least one URL belongs to another domain, all addresses will be considered external. These changes affect results in reports.
  • We've improved the 'Default' parameter template. This was necessary for proper use of the new feature exporting express audit of the optimization quality in PDF.
  • The results sorting in the table is saved only for a single session (until the program is restarted). In a new session, the sorting will be standard by the URLs' sequence number. We did it to make users' experience with results more clear and convenient.
  • During crawling you can see the new 'JavaScript rendering' and 'Ajax Timeout' settings on the dashboard.
  • We've optimized the work with robots.txt. Now, when you start / continue crawling, only one request for each host is sent to robots.txt (previously, if you set many threads several requests could be sent for the same robots.txt).
  • The project's and export files' name is created based on the initial URL of the crawling. Now, when saving a project or report, the host name from the initial URL (if a certain site is crawled) or the first URL in the table (if a custom URL list is checked) will be indicated in the file name.
  • We've improved alerts in the lower right corner: to make it easier to click on it and go to the file location folder, a successful table or report export is now displayed within 60 seconds (it used to be 7 seconds). Alerts about the end of crawling and others will no longer disturb you from other programs. Earlier, when they occurred, Netpeak Spider window became active.
  • Setting 'Allow cookies' is enabled by default on the advanced settings tab. This is one of the frequent reasons why the site is not crawled or is being crawled incorrectly, so we have enabled cookies by default. Remember that this setting does not affect the rendering in browser: when crawling in Chromium is enabled, cookies are always taken into account.
  • We've changed a lot of texts within the program to explain more nuances and tips on how a tool works. If you notice any inaccuracies, please contact our support team.


5. In a Nutshell

In Netpeak Spider 3.2, we have implemented long-awaited features. Now you can use our program more comprehensively with the following changes:

  • JavaScript rendering;
  • Express audit of the optimization quality in PDF with the following reports: 'Overview', 'URL Structure', 'Crawling and Indexing', 'Status Codes', 'Click and URL Depth', 'Load Speed', 'HTTP/HTTPS Protocols', 'Content Optimization', and 'Issues';
  • Enhanced issue description: the threat the issue can cause, how to fix it, useful links for deeper understading of the problem;
  • New 'Issue overview + descriptions' export;
  • And 50+ other improvements

(!) Breaking News for Our Users

New Prices for Netpeak Spider and Checker

Our company is about to turn three years old. It means that we have been regularly improving Netpeak Spider and Checker for three years now. Moreover, we strive not only to advance our programs with the most popular features, but also to release completely unique ones. At the same time, we have never raised the prices for our tools.

Talking about Netpeak Spider, we have released 41 versions of the program from the 2.0 to 3.2 inclusive in these three years. You can check them in the changelog of our crawler. It's plain to see, just look at our Spider version 2.0 three years ago:

Netpeak Spider 2.0 interface

And there is how Netpeak Spider 3.2 looks now:

Netpeak Spider 3.2 interface

So since now you can use our tools more comprehensively and with more options, starting from April 24, 2019 the prices will change:

  • Monthly license for Netpeak Spider will cost $19.
  • Discounts for long-term subscriptions for both Netpeak Spider and Checker will change as follows:
    • 3 months → 10%,
    • 6 months → 15%,
    • 12 months → 20%.

If you have any questions, feel free to ask our Support team in online chat, they will be happy to help you.


Friends, thank you for your attention! We hope that now your work with Netpeak Spider will become even more effective ;) We'd love to know your feedback, so feel free to leave a comment under this post, while we are working on one of the most requested features within Netpeak Checker!

Digging This Update? Let's Discuss Netpeak Spider Perks in Person

Book a personal demo