RegExp Bigrams Accuracy?

Devin Poole shared this question 7 months ago
Answered

Maybe I'm not understanding how to use this feature properly. When I analyzed https://www.trajectorywebdesign.com/blog/2016-07-13-what-should-i-look-for-when-hiring-a-web-designer/, it's showing me a count of 2 for 'web design'.

I expected the RegExp to count every instance of 'web design' in the visible on-page text, which is significantly more. Am I wrong?

Comments (4)

photo
1

Hey Devin,

Thanks for your message.

Could you tell me please whether you used a regular expression stated in our post for finding bigrams in your case (it's \w+\s+\w+)? The thing is regular expressions work linearly and count two words in a row, not every two pairs of words. So it won't be suitable here. Sorry for our mistake, we've corrected the data in the post.

If you need to find exactly web design expression, you can use the following expression → web\s+design

If you use web\s+design\w regexp, then you'll get all expression containing web design phrase, such as web design, web designer, web designer, etc.

Feel free to contact us in case you have any other questions. And again our apologies for this mistake.

photo
1

Hi Amber,

Thanks for the thorough explanation. I need to dig deeper into RegEx.

Regards

photo
1

But that would be a killer feature though. Especially since Google Search Console removed Content Keywords.

photo
1

Hey Devin,

Hope you had a great Christmas!

I've just pitched our discussion about finding bigrams to the team and we do think it's a really useful feature. We'll think how it's possible to implement this stuff to Netpeak Spider for analyzing the whole website or at least for an analysis of one page. The algorithm here is more complicated than RegExp, but we'll do our best to realize it.

Also, we'd like to thank you for taking part in improving our tools; if you have any other suggestions, we'd be glad to hear from you.

Happy New Year to you and your beloved ones!