A Step-by-Step guide to find Orphan Pages on your Website

October 5, 2020

Orphan pages are a disaster for SEO. An orphan page is a page that does not contain any link and hence can’t be reached by crawlers and users while navigating a website.

The orphan page isn’t marked as an ‘Error’ but is displayed as a ‘Notice’ tag. If you are running a site, you should be aware of your orphan pages as well. Below are some cases that might arise for orphan pages,

  • Orphan pages might deliver old content that you don’t wish on your website and also want to remove from your sitemap file.
  • They are useful as people keep visiting them via backlinks. However, with no links indulged in these pages, they miss out on receiving enough traffic and link juice.
  • Sometimes, a case can arise where they might be orphaned accidentally during a website migration, determining if existing pages are still getting internal links.

The orphan pages can be found by the site audit. However, here we are going to mention some simple steps that can help you in finding your orphan pages.

You need to identify your crawlable pages

The first step is to identify your crawlable pages. For this, you will need a list of all the URLs that can be reached by crawling your site links.

To do so, there are many crawlers available, and one of the best is the StreamingFrog that could be used here. While using a crawler, ensure it is set to the crawl only pages that are indexable by search engines.

Make sure to use a canonical URL that includes proper HTTP or HTTPS and www or non-www. Once your crawler has crawled the site, export the URLs to a spreadsheet.

Connect to Google Analytics under ‘configuration > API access

With the aid of Google Analytics API, you can easily connect and pull data for a specific property, account, segment, and view directly while crawling.

Connect to Google Analytics under ‘configuration > API access

You can even set the date, range to be analyzed that would ideally be at least a month. If you want to find orphan pages via other sources, the segment can be tweaked to ‘All Users’ or ‘Paid Traffic’,

Google Analytics is a hassle-free approach, and you should implement it with ease.

Choose ‘Crawl New URLs Discovered in Google Analytics

This configuration is found under the ‘General’ tab of the Google Analytics configuration window. Once this option is enabled, the ‘Orphan Pages’ report will display the new URLs discovered with the help of Google Analytics which is an SEO tool.

The next step is to select ‘Crawl New URLs Discovered In Google Search Console’

These URLs won’t be added to the crawl queue but will be viewed within the user interface appearing under the tabs and filters.

Get connected to the Google Search Console under ‘Configuration > API Access’

At the time of crawling, you can get connected to Search Analytics API and pull in data such as impressions, CTR, clicks, and position metrics directly.

However, when you have to find orphan pages that are receiving impressions under search but aren’t linked to internally, you can choose the correct property. You can even set the date range for the data to be analyzed, which would ideally be at least a month like Google Analytics. Google search console is a smart choice to achieve all this.

Also, Read – 6 Essential Tips for Auditing Google Ads Accounts

The next step is to select ‘Crawl New URLs Discovered In Google Search Console’

Continuing with the Google Search Console, the next step is to tick ‘Crawl New URLs Discovered in Google Search Console’ that can be found under the ‘General’ tab.

It is a necessary option to be enabled. If not, the new URLs discovered via Google Search Console will be available to view in the ‘Orphan Pages’ report. With this process, the new URLs won’t be added to the crawl queue, viewable within the user interface, and appear under the respective tabs and filters.

Time to crawl the website

It’s time to crawl the website now. For this, you need to open the SEO Spider, type or copy in the website you want to crawl via ‘Enter URL to spider’ box and press ‘Start.’ Once you hit ‘Start’ the progress bar will show the processing, and you can monitor the progress of the APIs and crawl from the same.

Click ‘Crawl Analysis > Start’ To Populate Orphan URLs Filters

The filters on SEO Spider are available to view in real-time during a crawl. However, there are three respective ‘Orphan URLs’ filters under ‘Sitemaps’, ‘Analytics,’ and ‘Search Console’ tabs that can be viewed once the crawl ends. This is because they require ‘Crawl Analysis’ for them to be populated with data.

The right hand ‘overview’ panel displays,

  • URLs in Sitemap
  • URLs not in Sitemap
  • Orphan URLs
  • Non-indexable URLs in Sitemap
  • URLs in Multiple Sitemaps

When the entire crawl completes, the SEO Spider will only know which URLs are missing from an XML Sitemap. To populate these orphans, you need to click ‘Start’ from the Crawl Analysis option.

You can even double-check the crawl analysis under ‘Crawl Analysis > Configure’ that ‘Sitemaps,’ ‘Analytics,’ and ‘Search Console’ are ticked.

Once the crawl analysis is done, the progress bar will be at 100%, and the filters will no longer flash the message ‘Crawl Analysis Required’.

Analyze the ‘Orphan URLs’ filter that is visible under Sitemaps, Analytics, and Search Console Tabs

Once you have achieved populating Orphan URLs, you can now browse each tab and respective ‘Orphan URLs’ to view the orphan pages found. The pages will come up with their respective links along with status code, status, indexability, etc.

These pages are the Orphan URLs that aren’t linked to internally on the website. These are the old URLs that should have been removed from the XML sitemap.

Export Combined Orphan URLs through ‘Reports > Orphan Pages’

Decisively, it is now time to use the ‘Orphan pages’. You can find it under the ‘Reports’ head from wherein you can export a combined list of all orphan pages discovered. There’s a ‘Source’ column right next to each orphan URL, which provides the source of discovery.

If you have already integrated Google Analytics and Search Console in a crawl, but forgot to tick ‘Crawl New URLs Discovered In GA/GSC,’ then this report will still contain data for those URLs. This is because the URLs might not have crawled and won’t appear under the respective filters and tabs.

The final tip

Now, you have to identify Orphan pages in the internal tab via blank crawl depth. The ‘internal’ tab includes every URL found in a crawl that includes orphan URLs.

Conclusion: The aforementioned steps will help you massively in finding every orphan page on your website. This digital marketing strategy will aid you in attaining heavy traffic for your website that got hampered due to orphan pages.

Let's Talk!