Netpeak Spider 3.2: Latest Updates Unboxing Video

ОбновленияВидео
10Нравится
1Комментарии
Поделиться

Hello everyone, today I want to tell you about our updated crawler – Netpeak Spider 3.2. It was a long-awaited release: PDF report, tons of data about issues, opportunity to crawl websites that use JavaScript, and a whole lot more.

1. Crawling and Rendering Websites with JavaScript

Watch this part on Youtube

JavaScript is not a rare case anymore, the number of websites that were built using this technology or have plans to implement JS is continuously growing. Unfortunately, hi-tech solutions not only give us new opportunities but also cause problems. I’m happy to say that now Netpeak Spider can help you to fix them.

To activate the JS rendering feature, tick the ‘Enable JavaScript rendering and set AJAX timeout, s’ checkbox.

Enabling AJAX timeout for JS rendering in Netpeak Spider

By default, the program will execute scripts for 2 seconds, it will suit the majority of cases. But if it’s not enough for the website you crawl, you can set a longer period for the rendering process. Please, do not forget that this option works only in crawler, website visitors usually are not so sweet. If your scripts take too long to execute, it’s worth optimizing them for a better experience of your visitors and search robots.

Netpeak Spider executes JavaScript only for 200 OK HTML pages in order not to waste your valuable resources where it is not necessary. When you want to enable JS rendering, keep in mind that it will make your crawling much slower. The tool will send an additional request to Chromium to retrieve the HMTL code, download JS and CSS files, and of course execute JavaScript. Thus, do not enable it every time, there are a lot of websites which can be crawled without JS rendering. I want to underline several useful things:

  • During the page rendering, crawler blocks requests to analytics services (Google Analytics, Yandex.Metrica, etc.) in order not to hurt your data accuracy.
  • Cookies will be used despite the corresponding checkbox in the ‘Advanced’ tab of crawling settings.
  • Iframe content and images will not be loaded.
  • JavaScript rendering can use only 25 threads. If you set 100 threads in settings, the crawler will simultaneously scan 100 documents in a regular way, but JS rendering will lag behind working with only 25 of them at once.

As Morpheus told us in Matrix, choose wisely: whether you need this function for the next crawl or not.

2. Express Audit of the Optimization Quality (PDF)

Watch this part on Youtube

We created a PDF report that will help SEO specialists quickly determine website optimization quality without any additional efforts. All the key information for website audit is gathered in this report and waits for you to enrich it with conclusions and recommendations! We tried so hard to make it not only informative but also pleasing to the eyes of your customers or colleagues when you send it as an implementation task.

Before moving on to the report I also want to say that it will be useful for the sales teams that are offering SEO services because you can get a brief overview of the current website problems. After reading the report, you can evaluate the project much faster and then discuss further actions with a client.

If you haven’t seen our uber-beautiful PDF report yet, I have 2 things to say:

  1. You missed a lot.
  2. Don’t worry, I’m going to run through it with you and take a deep look at all the details.

2.1 Title Page + Contents

Report’s title page greets you with a screenshot of the initial page if you crawl a website, or a nice image if you crawl a list of URLs. Right after that, you will see navigational links to the main parts of the report to quickly surf through the file.

Netpeak Spider 3.2: Express Audit of the Optimization Quality (PDF)

2.2 Overview

Have you seen the 'Overview' tab in a sidebar? Here is its counterpart on steroids. You will see:

  • Crawling mode → crawling of a website or a list of URLs.
  • Initial URL. If Netpeak Spider crawls a list of URLs, you'll see the first URL from the list.
  • The number of parameters selected during report generation.
  • The number and types of URLs the current report is based on.
  • The number of URLs with major issues (errors and warnings).
  • The most frequent issues on the crawled website.
  • Content type → separate diagrams for external and internal pages for easy understanding.
  • The main hosts. For example, it can be useful to see the number of pages across different subdomains, or when crawling a list of URLs, you can see the most frequent hosts.

2.3 URL Structure

This is a brief overview of the 'Site structure' report that shows the most popular URL segments, the number of pages that contain the corresponding segment and also their percentage in comparison with all crawled URLs. If you want to get a more detailed site structure report, you can find it in the 'Export' menu in the crawler.

2.4 Status Codes

You will find all status codes received during the crawling in this section. As you can see, we separate diagrams into internal and external for a better understanding of issues source. Pay the most attention to redirects, 4xx and 5xx codes → try your best to exclude these codes from the report.

2.5 Crawling and Indexing

If you want to dig deeper into crawling and indexing issues, here is the list you are looking for. Non-compliant documents usually do not drive search traffic and even waste crawl budget. Note that only internal URLs are analyzed in this section. We use different colors to explain the instructions meaning:

  • Instructions not restricting anything are highlighted in green.
  • Instructions that affect crawling and results in SERP are highlighted in yellow.
  • Instructions that affect indexing are highlighted in red.
  • Other found instructions are highlighted in black.

By the way, here is a useful table with canonical tags content. For example, you can use it to quickly understand how many pages are canonicalized on a website.

2.6 Click and URL Depth

Here you will find how many clicks from the initial page it takes to get to the deepest part of a website, or how many URL segments your pages usually have. Note that only internal compliant HTML pages are analyzed in this report. If you see a lot of pages with click depth over '4', it means that they may have indexing issues, thus not driving traffic from search. Or if you see a page with more than 4 URL segments and/or too long address, it may be hard to perceive for site visitors. Thus, better think again: do you really need such a long URL?

2.7 Load Speed

The whole section is dedicated to the speed! You can quickly check the highest and lowest values, and also a median of server response times for internal and external HTML pages and resources. Isn't it awesome to see all these graphics divided by types and sources in one place? As long as load speed is one of the ranking factors, it gets a lot of attention from SEO specialists, but I want to remind you that it’s not only about SEO: if I want to buy a laptop, and website loads longer than 5 seconds, I switch to another website. Let’s make the internet faster and better together ;)

2.8 HTTP/HTTPS Protocols

Here we gather document protocols: secure (HTTPS), and insecure (HTTP). If a website with the HTTPS protocol has HTML pages, images, or resources with the HTTP protocol, it may cause the 'Mixed content' issue. In this case, users may see a corresponding warning in browser, and search engines will consider the site insecure. To prevent this issue, do your best to get rid of all links pointing to documents with the HTTP protocol. Come on, it’s 21st century we’re living in, you can even use a free SSL certificate. If you still use the HTTP protocol, I recommend you to start the migration process as soon as possible.

2.9 Content Optimization

If you have SEO compliant pages with too short, too long or even duplicate titles, descriptions, H1 headings – this is the time to start fixing these issues. They are slowing you down on your way to the top of the search result page. Additionally, we’ve included diagrams to show you the number of characters and words that you have on the page. The last diagram in this section shows sizes of the images the crawler found on the website.

By the way, we have one interesting feature for customization fans: if you don’t agree with the default length limits for title, description, H1, you can set preferable ranges in settings. You can do it in the 'Restrictions' tab of the crawling settings.

Restrictions setting in Netpeak Spider

2.10 Issues

It’s my favorite section of the whole report because you can see how problematic the website is. Usually people think that their website is not so bad, but in reality we can see that the situation is pretty sad. It’s not only words, but we also tested it last summer during the internet marketing conference with more than a thousand people there. Let’s make a quick check of my words:

  • How many issues (errors and warnings) do you think the crawler will find? Make a guess.
  • Crawl your website and compare your guess with a real situation in a sidebar.
  • Write your results in the comments below this Youtube video or our blog post. I'm just curious about how many people will find more issues than they guessed.

Ok, let’s return to the report. We show you:

  • How many pages contain notices, warnings and errors.
  • Top major issues by the number of URLs that contain these issues. It’s useful to see the most important problems. If you fix 10 000 redirects it may give you much more profit than 3 broken links.
  • Tables with all the issues that have been found by our crawler. We also added useful links to read more about each of them, an example of the page with corresponding issue and the number of pages with this problem.

We made it that way without a complete list of URLs that contain issues because anyway it will be more comfortable to see in a spreadsheet not in PDF. This information can take a lot of pages in the report and will not be really comfortable to work with. If you want to get a report on any issue, export it in the crawler using our 'Export' menu, it will save time for you and your colleagues.

And the last table of this section contains issues that haven’t been found on your website during crawling. I wish you guys to see more and more issues here each time you recrawl the website :)

2.11 Terms and Settings

If you don’t understand some phrases we used in the report or want to double-check used parameters and crawling settings, we briefly run through all of these questions here.

The last page of the report contains brief information about Netpeak Software and useful links for you to know more about us and our software. Do not skip them, it’s a great pleasure for us to see you reading our blog or Help Center.

Like this video or article if you enjoyed the PDF report or leave a comment with your idea how to improve it – give us an opportunity to ease your day-by-day analysis tasks ;)

3. Extended Issue Descriptions

Watch this part on Youtube

As I said before, most people think that their site is doing better than it actually does. When you see a huge number of issues that our crawler found on your website, you want to dig deeper and understand:

  • What do these issues mean?
  • How can they hurt your website?
  • How to fix them? It’s the main question.

We received hundreds of questions about issues and how to fix them. Fortunately, starting from now, all the answers are already built-in and can help even an SEO beginner.

Extended issue descriptions are displayed in the 'Information' panel of the tool. To read it, click on any issue in a sidebar. By the way, we also wrote these descriptions in our 'Help Center' so you can read it even on your smartphone, tablet or print it.

You can even export the special report with descriptions of issues found during the crawling. To do so, go to the 'Export' → 'Issue reports' → 'Issue overview + descriptions'. Also, it’s added to the following bulk exports to make further work with the reports easier for your colleagues:

  • Main reports set
  • All issues
  • All available reports (main + XL)

4. Other Changes

Watch this part on Youtube

Let’s briefly run through all the other changes in Netpeak Spider:

  • We changed the issue severity for the following issues:
    • From warnings to errors:
      • Duplicate H1. We consider it a high severity issue because the same H1s on different pages can result in so-called 'keyword cannibalization'. It’s the case when different website pages are competing to be shown in the SERP for the same keywords. It can confuse search engines because they will not be sure what page is more relevant for the target keyword.
      • Canonical Chain. In this case, a canonical URL may be ignored by search robot. Usually as a result, you will see duplicates on the website and possible traffic losses.
      • 5xx Error Pages: Server Error. These pages are the signs of some technical problems with your server, and users may consider the whole website a low-quality one. Sometimes a website responds with the 5xx status code because of the maintenance. Thus, if you see 5xx codes after the crawling, it’s better to double-check what was the real reason of the error.
      • Bad AMP HTML Format. If you have already spent your time and resources to create an AMP version of the page, I’m sure you don’t want to see any problems with these hi-tech website rockets. That’s why we try to attract as much your attention as we can placing it to the errors.
    • We moved to notices:
      • Bad Base Tag Format. It’s rarely used, and if any issues in it, Google will not use it.
      • Multiple H1 и Max URL Length — As John Muller told us, it’s no need to worry about that. Multiple H1 is a common thing for HTML5 and CMS. Also, there is no problem with understanding long URLs if they are user-friendly, the only thing you should care about is keeping it shorter than 2 000 symbols, please.
  • Several issues and parameters received new names to ease understanding of their meanings:
    • Broken links became broken pages, because when you click on the corresponding issue in a sidebar, you see which pages are broken. If you want to see the complete list of links pointing to these broken pages, you can click on the 'Issue report' button or use the 'Export' menu.
    • Duplicate Canonicals → Identical Canonical URLs. We changed the name of this issue to reduce the negative perception. It’s far from being as bad as duplicate titles or H1s.
  • We changed the logic for the following issues and parameters:
    • Bad Base Tag Format. Previously, the relative URL in this tag was considered an error. Now, this issue means only that the href attribute of this tag has a bad format.
    • By default, the 'Canonical URL' parameter shows only absolute URLs as it’s recommended by Google. Thus, if the canonical tag contains a relative URL, you will see the (NULL) value in the table. If you are still using relative URLs in canonical tags on your website, you can tick the 'Crawl relative canonical URLs' checkbox in the 'Advanced' tab of the crawling settings. In this case, the crawler will transform a relative URL into absolute and add it to the table. We already have a plan on how to make work with canonicals much easier and efficient, just hold on a little bit and we will add new reports about canonicals to the new versions of Netpeak Spider.
  • We’ve changed sorting in the ‘Issues’ tab of a sidebar. The most popular and important issues are at the top.
  • We've changed the logic of determining the internal addresses for a list of URLs. When crawling a website, in order to determine whether a link is external or internal, Netpeak Spider considers the 'Initial URL'. If the domain matches, the link is considered internal, if it doesn't match, then it's external. When crawling a list of URLs, if at least one URL belongs to another domain, all addresses will be considered external.
  • At the same time with starting JavaScript rendering support, we’ve stopped supporting OS Windows versions lower than 7 SP1. We couldn’t avoid it because older versions do not work with the latest versions of the .NET framework that have been used for a lot of improvements in crawler. If you use old Windows versions, I recommend you to update the OS because there are a lot of useful programs that use new technologies to achieve the best performance and rich functionality.

5. Brief Summary

Watch this part on Youtube

In Netpeak Spider 3.2:

  1. We’ve implemented JavaScript rendering, and now you can work with this fast-growing technology using our crawler.
  2. Nice-looking and useful PDF express audit of the optimization quality available right after the crawling is done. It will help SEO specialists, project and sales managers to quickly evaluate the website.
  3. If you have any doubts about issues in a sidebar, click on any of them to get a detailed information on:
    • What is the reason for this issue?
    • How can it hurt your website SEO?
    • How to fix it?
    • Useful links pointing to information that can help you understand any issue even better.
  4. More than 50 other changes to improve your user experience when using our crawler. By the way, we do a lot of improvements using your ideas, our customers are real quality assurance specialists for our products all over the web. You face situations that can never be checked in any testing environment.

Thanks a lot for your attention! It’s a great pleasure for me to see you watching this video until the end! If you enjoyed Netpeak Spider 3.2 update, thumbs up and subscribe ;) If you have any ideas on how to improve the software to meet your real tasks, please write a comment below on our blog or on Youtube, it’s always a pleasure reading your messages. I like to send you some wishes at the end of my videos, and today I want to wish you high rankings and zero bugs vibes, see you soon, bye-bye ;)