How to Check Website for Mixed Content with Netpeak Spider

Use Cases
2Like
Comments
Share
How to Check Website for Mixed Content with Netpeak Spider

When migrating to the secure HTTPS protocol or when starting the site on HTTPS, a warning message about the blocked loading of mixed content may appear on the page. Quite often, browsers block such pages due to insecure scripts, links, images, videos, etc. Mixed content is the reason for such a bump.

Brower warns about insecure connection

In this blog post, I'm going to tell you what mixed content is and how to get the handle of this problem.

1. What is Mixed Content?

Mixed content is partially unencrypted content. It occurs when initial HTML is loaded over HTTPS connection, but other resources (such as images, videos, stylesheets, etc.) are loaded over an insecure HTTP connection.

This is how a warning about insecure mixed content looks like in Google Chrome browser:

This is how a warning about insecure mixed content looks like in Google Chrome browser

This is a notification about the page has been blocked:

This is a notification about the page has been blocked

Pages with insecure content can be changed on the script level, that's why attackers can intercept the user's credentials.

Sure thing, it hobbles the website's promotion.

2. How to Detect Mixed Content on Your Website

The problem can be detected with the help of Chrome developers' tools, but it's time-consuming when it comes to big website audit – especially when you have hundreds of such websites.

Netpak Spider detects a mixed content issue in a blip of an eye. You just have to choose the ‘Outgoing Links’ parameter in the ‘Links’ group in a sidebar, put the initial URL and start crawling.

Choose the ‘Outgoing Links’ parameter in Netpeak Spider to detect the mixed content issue

When the crawling is completed, you’ll see the results in the main table. If you checked many parameters as I did, you can filter the results by the ‘Mixed Content’ issue only. To do so, go to the ‘Issue’ report, find this nasty issue (in the ‘Warning’ block), click on it and the table will filter unnecessary results.

The main table in Netpeak Spider with filtered results

Regular expressions and the 'Scraping' feature in Netpeak Spider is another way to find mixed content on website pages for those who don't seek for easy solutions.

To detect mixed content scripts, we'll use the following expression:

(?i)(?:]*http:[^<>]*>(?:(?!<\/script>)[\s\S])*(?:(?!<\/script>)[\s\S])*<\/script>)|(?i)(?:]*>(?:(?!<\/script>)[\s\S])*http:(?:(?!<\/script>)[\s\S])*<\/script>)

To find href links:

[href^="http"]

To detect another mixed content type, which contains URL, location, DOCTYPE, and other elements:

(?i)<(?!a )[^<>]*(?:src|href|content|location|url|origin)[="\(]{0,2}http:(?:(?!§)[^<>])*>

Launch Netpeak Spider and follow the steps:

  1. Open 'Settings' → 'Scraping'. Copy expressions described above, and name them.
  2. For mixed content scripts and mixed content type, which contains URL, location, DOCTYPE, and other elements choose ‘RegExp’ type of search and 'All source code' search space.
  3. For href links choose ‘CSS selector’ type of search and ‘Attribute’ search space with href attribute.

    Scraping settings in Netpeak Spider

  4. On the sidebar, tick a minimum number of parameters and check if the 'Scraping' parameter is on.

    Parameters on the sidebar of Netpeak Spider

  5. Enter the website domain, which you want to check for the mixed content into the ‘Initial URL’ field and press the 'Start' button.
  6. When the analysis is completed, go to the 'Reports' tab → 'Scraping' and choose 'All results.' You can see there whether the pages with http:// protocol have insecure scripts or resources.

We’ve already pulled our socks up and wrote this comprehensive guide with Ecommerce examples: 'Comprehensive Guide: How to Scrape Data from Online Stores With a Crawler.'

3. How to Remove Mixed Content

To remove mixed content, you need to change all URL protocol prefixes from http:// to https:// and set a 301 redirect on the server. Here are some directives that you’ll need in the .htaccess file in the website root directory to perform redirect correctly:

For websites with www embed:

RewriteCond %{HTTP_HOST} ^www\.snhd\.(.*)$ [NC] RewriteRule ^(.*)$ https://snhd.%1/$1 [R=301,L]

For websites without www embed:

RewriteCond %{HTTP_HOST} ^snhd\.(.*)$ [NC] RewriteRule ^(.*)$ https://snhd.%1/$1 [R=301,L]

We’ve already written an extensive guide on how to switch to secure HTTPS protocol. Check it out in this blog post: ‘WordPress SEO: Security Connection With HTTPS.’

Let's Wrap It Up

Mixed content issue poses a threat to the website, making it open to attacks. It gets in the way of promotion, so you have to fix it immediately.

To quickly check the website for this mixed content, use Netpeak Spider crawler. Then, remove the mixed content issue, changing http:// to https://.

Have you ever dealt with a mixed content dilemma? Share your experience in the comment section below :)