Caring Digest from Netpeak Software Support Team #1News
Netpeak Software customer support team receives tons of users’ requests daily: from solving some routine situations to performing complex tasks with our tools. If you read this post, you’ve probably dropped us a line before ;)
We launch customer support digest to describe how your tasks can be quickly and easily solved with Spider and Checker. Also, we are going to reveal some secrets of our work that have been kept behind closed doors for a long time. Fasten your belt tight because we begin!
- We handled 781 requests from users in March.
- You wrote 2803 messages during the same period of time.
- You rated our work 38 times and in 100% of cases it was positive feedback.
- In average, we send the first reply to your requests within 1 minute and 23 seconds.
Alright, enough boring numbers, let’s take a look at the questions often asked by our users and how to solve them.
2.1. What Is the PageRank For?
PageRank is a page relative weight which is calculated by a long and complex formula from Google (thanks Google for that!) for each page on the web. We took this formula and implemented it in Netpeak Spider. The tool calculates PageRank, but only within a website. Internal PageRank calculation will inform you of:
- How link weight is distributed throughout your website, which pages lose it and which ones have too much of it.
- Which pages are dead ends (they receive link weight but do not pass it further), orphan pages (they do not have incoming links that pass link weight to them), or pages that steal your PageRank and your crawl budget.
By the way, I will let you in on a secret – a new article about PageRank is going to be published soon, so stay tuned!
2.2. How to Scrape Data From My Website?
Many users consider scraping an extremely difficult task, but I assure you it is not. And now I will provide a proof!
Let’s take a popular type of scraping as an example – by CSS-selectors. Originally, they represent patterns used to select elements you want to style, but we use these patterns to extract content from HTML elements.
If you haven’t skipped your HTML classes, you certainly know that most HTML elements contain attributes with values which might be unique for an element and that is why can be used as a condition for scraping. The rules for CSS-selectors are simple. The most common of them are:
- [attribute=”value”] – for any attributes
- .className – only for values of the ‘class‘ attribute
- #idName – only for values of the ‘id‘ attribute
You can find the full list of CSS-selectors here → CSS Selector Reference
Alright, let’s take a certain website as an example. Suppose, we need to scrape the price and the link to a photo. Firstly, it is necessary to select a required element and inspect its source code by using the F12 key. In the image below, the price is located inside the ‘span‘ tag with two attributes – ‘class‘ and ‘id‘.
As there are several elements with the ‘notranslate‘ value in the ‘class‘ attribute, it is better to use the value from the ‘id‘ attribute because it’s unique. So there are two possible ways to scrape the price – using common condition for attributes or condition for identificators. In the program these conditions will look like this:
To scrape images, we can use a lot of attributes, including ‘id‘, ‘class‘ or ‘itemprop‘. By the way, ‘itemprop‘ represents a microdata tag. It is a common type of markup which most search engines take into account. That’s why it is the first thing that should be examined if it is necessary to scrape data. It shows robots that a certain text or other element on a page is important and belongs to a particular type of data. The ‘itemprop‘ indicates that it's content.
The condition in Spidey will look like this:
After scraping, you see the following result:
You can see scraping results using the most convenient way for you:
- Go to ‘Reports‘ → ‘Scraping‘ → ‘All results‘ in a sidebar;
- Go to ‘Database‘ → ‘Scraping overview‘;
- Export the report using ‘Export‘ → ‘XL (Extra Large) reports from database‘ → ‘Scraping summary in single file‘ or ‘Data for scraping conditions in separate files‘.
Also, there are two more methods to scrape data – using XPath and regular expressions. These expressions are more advanced and will be covered in the next digest.
By the way, check out how SEOs and digital marketers can use regular expressions.
2.3. Why Doesn’t Netpeak Spider Crawl My Website?
The most common reason why Netpeak Spider doesn’t crawl a website is called ‘Canonical + Redirect‘. Well, that’s how we call it, so let’s find out what it represents and what to do in this case.
If a page has the rel="canonical" attribute which points out to the page with a redirection to the previous page, Spider will get stuck inside this loop and won’t go deeper into your website. Sounds interesting, doesn’t it?
Using the ‘Info‘ panel you can find the target URL of a canonical tag. Just look at the ‘Canonical URL‘ parameter of the canonicalized page. You should also pay attention to the ‘Target Redirect URL‘ of the page with redirection.
So simply disable the ‘Canonical‘ tag consideration (it’s enabled by default in the Spider) and hit crawl!
However, we recommend you to solve this issue because search engine robots will probably face difficulties when crawling your website. It can lead to indexing issues, thus you may lose your rankings in SERP
In the article ‘Why Netpeak Spider doesn’t crawl my website?‘ in our FAQ section, you will find all reasons why Netpeak Spider doesn’t crawl your website properly and what to do in these situations.
3. Funny Stories
Oh hi, Mark!
My name is always associated with the character from the masterpiece of film industry ‘The Room‘. I and my colleagues are big fans of it, so they always say ‘Oh hi, Mark‘ instead of ‘Hi‘ to me.
Even users make fun of that and I find it hilarious :)
Netpeak Spider is the best anti-SEO tool!
But sometimes users have bad intentions.
4. Users’ Feedback
And as the icing on the cake, I would like to thank our customers. Your positive feedbacks motivate and inspire us to continue providing nice customer service!
By the way, we’d love, if you leave your feedback about our tools on G2Crowd :)
I also would like to thank my colleagues – David and Richard who helped me with this digest!
With love from Mark
Junior+ Customer Support Manager