How to Check Website for Mixed Content with Netpeak Spider

Кейсы
2Нравится
Комментарии
Поделиться
How to Check Website for Mixed Content with Netpeak Spider

When migrating to the secure HTTPS protocol or when starting the site on HTTPS, a warning message about the blocked loading of mixed content may appear on the page. Quite often, browsers block such pages due to insecure scripts, links, images, videos, etc. Mixed content is the reason for such a bump.

Brower warns about insecure connection

In this blog post, I'm going to tell you what mixed content is and how to get the handle of this problem.

1. What is Mixed Content?

Mixed content is partially unencrypted content. It occurs when initial HTML is loaded over HTTPS connection, but other resources (such as images, videos, stylesheets, etc.) are loaded over an insecure HTTP connection.

This is how a warning about insecure mixed content looks like in Google Chrome browser:

This is how a warning about insecure mixed content looks like in Google Chrome browser

This is a notification about the page has been blocked:

This is a notification about the page has been blocked

Pages with insecure content can be changed on the script level, that's why attackers can intercept the user's credentials.

Sure thing, it hobbles the website's promotion.

2. How to Detect Mixed Content on Your Website

The problem can be detected with the help of Chrome developers' tools, but it's time-consuming when it comes to big website audit – especially when you have hundreds of such websites.

Regular expressions and the 'Scraping' feature in Netpeak Spider can become a solution.

To detect mixed content scripts, we'll use the following expression:

(?i)(?:]*http:[^<>]*>(?:(?!<\/script>)[\s\S])*(?:(?!<\/script>)[\s\S])*<\/script>)|(?i)(?:]*>(?:(?!<\/script>)[\s\S])*http:(?:(?!<\/script>)[\s\S])*<\/script>)

To find href links:

[href^="http"]

To detect another mixed content type, which contains URL, location, DOCTYPE, and other elements:

(?i)<(?!a )[^<>]*(?:src|href|content|location|url|origin)[="\(]{0,2}http:(?:(?!§)[^<>])*>

Launch Netpeak Spider and follow the steps:

  1. Open 'Settings' → 'Scraping'. Copy expressions described above, and name them.
  2. For mixed content scripts and mixed content type, which contains URL, location, DOCTYPE, and other elements choose ‘RegExp’ type of search and 'All source code' search space.
  3. For href links choose ‘CSS selector’ type of search and ‘Attribute’ search space with href attribute.

    Scraping settings in Netpeak Spider

  4. On the sidebar, tick a minimum number of parameters and check if the 'Scraping' parameter is on.

    Parameters on the sidebar of Netpeak Spider

  5. Enter the website domain, which you want to check for the mixed content into the ‘Initial URL’ field and press the 'Start' button.
  6. When the analysis is completed, go to the 'Reports' tab → 'Scraping' and choose 'All results.' You can see there whether the pages with http:// protocol have insecure scripts or resources.

We’ve already pulled our socks up and wrote this comprehensive guide with Ecommerce examples: 'Comprehensive Guide: How to Scrape Data from Online Stores With a Crawler.'

3. How to Remove Mixed Content

To remove mixed content, you need to change all URL protocol prefixes from http:// to https:// and set a 301 redirect on the server. Here are some directives that you’ll need in the .htaccess file in the website root directory to perform redirect correctly:

For websites with www embed:

RewriteCond %{HTTP_HOST} ^www\.snhd\.(.*)$ [NC] RewriteRule ^(.*)$ https://snhd.%1/$1 [R=301,L]

For websites without www embed:

RewriteCond %{HTTP_HOST} ^snhd\.(.*)$ [NC] RewriteRule ^(.*)$ https://snhd.%1/$1 [R=301,L]

We’ve already written an extensive guide on how to switch to secure HTTPS protocol. Check it out in this blog post: ‘WordPress SEO: Security Connection With HTTPS.’

Let's Wrap It Up

Mixed content issue poses a threat to the website, making it open to attacks. It gets in the way of promotion, so you have to fix it immediately.

To quickly check the website for this mixed content, use regular expressions and scraping function in the Netpeak Spider crawler. Then, remove the mixed content issue, changing http:// to https://.

Have you ever dealt with a mixed content dilemma? Share your experience in the comment section below :)