Making Sure You Are Being Found by Your Target Audience – Part II (Search Engines)

Making Sure You Are Being Found by Your Target Audience - Part II (Search Engines)

By Andrew Rufener

As discussed in our first blog of this series, search engines play an important role in ensuring that your site is found by your target audience. Search engines exist to help us find, organize and understand content on the Internet. In order to be found by your audience you must ensure that your site can be found by the search engines (naturally!) and that the engines rank your content as high as possible.

But before drilling down into the details, let’s quickly review what a search engine is and does.


How Do Search Engines Work?

Search engines perform a number of key functions, many of which most people are unaware of but that you need to understand in order to improve your site's visibility. The functions these engines provide are:

  1. Crawling; the initial key task that these systems perform is to “crawl” the internet, i.e. following every URL they can find, and loading the content of the pages if the publisher allows that.
  2. Indexing; once loaded, the content is indexed or prepared for search queries. The content we see as a result of a search is provided as a result of this process.
  3. Ranking; this step attempts to rank content to help ensure that results provided are most relevant to the search performed.
    Search Responses; the step we are all familiar with as users, providing responses to queries.


The crawling process you can think of search engines deploying spiders or bots who follow all URL’s in search of new and revised content. By literally following all links in pages, the bots map out the internet and ingest the associated content (see figure below).



Once ingested, the content needs to be processed and indexed to be sure other parts of the system can utilize the data and it is tagged accordingly. That way we can see structures of documents, index images separately, etc.


The biggest challenge in any web search is to provide the user with “relevant” answers, e.g. answers that match the query but that is also of higher “quality” whereby the term “quality” needs to be understood in a broader context. As an example, content that has been referenced a lot by other sites will be considered more relevant than content that isn't. In order to perform this task, the ranking engine processes and tags the content accordingly.


Search Engine Optimization (SEO)

Much has been said and written about SEO and different search engines use different means to rank content. The key to basic SEO, however, is to ensure that the search engines can find your site, index it and that the key parameters affecting ranking are taken into consideration. With this blog we do not intend to provide a detailed description of how to perform SEO, but to give you basic guidelines of what to do and what to avoid.


Step 1: Ensure Search Engines Can Index Your Site

You can actively register your site for indexing with most search engine providers and for platforms such as WordPress you will also find a range of plugins. If you wish to guide robots on what to crawl and what not, for example to have them focus on important versus old or irrelevant content, you can create a robots.txt file. This file will instruct the robots what to index and what not. Not all robots follow these instructions, but as an example, this is the way Googlebot will act:

  • If it does not find a robots.txt file it will crawl the entire site
  • If it finds a robots.txt file it will crawl the portions you allow it to
  • If it finds a robots.txt file but cannot access it or finds it to be mal-formatted it will not crawl the site


Step 2: Make Sure the Engine Stays Away from Unimportant Content

As outlined above, the robots.txt file is a means to guide the crawler to the content that you want it to index while leaving the rest. If you have an e-commerce site and have the same products in different categories that you don’t want a search engine to index, Google for example offers a feature in the Google Search Console (GSC) that allows you to tell Google not to index pages with a given URL parameter. Other systems have similar capabilities.


Step 3: Make Sure the Crawler Does Find the Important Stuff

Now that we know how to instruct the engine not to index content, let's look at what we need to ensure is in place so it can index. Common issues that lead to a poorly indexed site are the following:

  • Content behind login forms; if you have content that is behind login forms, the engine cannot access and index it.
  • Content behind search forms; similarly, content that is behind a search form cannot be accessed by the search engine to index.
  • Non-text content; content in images and other non-text formats cannot be indexed. Metadata can help add context.
  • Can the engines follow your site navigation; crawlers move through your site the same way a human would, following menus and links. Therefore, pages that are not linked to a menu or other page are not indexed in any way.
  • 4xx errors, i.e. errors where the page contains bad syntax or cannot be found. A common example is a 404 error, meaning the page is not found. Redirecting 404 errors to a relevant page is a good practice to avoid the crawler erroring
  • 5xx errors, i.e. errors when a server cannot access your content due to server errors. Refer to Google's documentation for more detail.


What is very helpful and easy to implement is a feature called a sitemap which is essentially just that, a map of your site. This sitemap can help the crawlers to access your site effectively, index it and therefore expose your content.


Step 4: Ensure the search engine can index your content

You may be surprised to find that the fact that the crawler can crawl your site doesn’t mean that the search engine can also index your site. The index is where your discovered pages are stored after they are rendered.

Search engines such as Google index pages infrequently and you can use the search engine to check when it has last indexed your site and what it has captured; this helps you understand if there are any problems. Pages can also be removed if they are no longer available or are flagged to no longer be indexed. Google provides a URL inspection tool that you can use to try to uncover any issues you may find.

Finally, for more advanced users, you can use robots tax to provide specific instructions to robots as well as additional metadata. Google provides specific metadata tag specifications for advanced users. WordPress users, make sure that your Search Engine Visibility tag is NOT checked for search engines to find you (Dashboard -> Settings -> Reading).


Step 5: Ensuring Good Ranking

Again, this is a subject a whole book could be dedicated to, but just ask yourself what search engines want. Yes, relevant content! Relevancy in this context has a range of dimensions but what you can influence is the following:

  • Quality and freshness of content; make sure you have good quality content on your site and update it regularly. Stale content will rank lower.
  • Referrals; referrals from other sites, i.e. other sites pointing to specific content on your site increases its relevance, so make sure that other sites refer to you where possible.
  • Engagement metrics; i.e. the number of clicks, the time on page, the bounce and other metrics all affect the ranking. If people get to your page but bounce straight off again that is a good indication of the content not being relevant, so keeping an eye on these metrics is important.


There are many more factors that affect rankings, but the most important ones were discussed here. If you make sure that your site is reachable, relevant content can be crawled and indexed and the content is “fresh” and gets many referrals and good engagement, your site’s ranking will improve.

In this part of our series we covered the search engine optimization, in upcoming blogs we will cover more important topics. Therefore, please follow-us on LinkedIn, Facebook or Twitter (@websplash1) so we can alert you when we post new content.