What Content Can Search Engines “See” on a Web Page?
Search engine crawlers and indexing programs are basically software programs. These
programs are extraordinarily powerful. They crawl hundreds of billions of web pages, analyze
the content of all these pages, and analyze the way all these pages link to each other. Then
they organize this into a series of databases that can respond to a user search query with a
highly tuned set of results in a few tenths of a second.
This is an amazing accomplishment, but it has its limitations. Software is very mechanical, and
it can understand only portions of most web pages. The search engine crawler analyzes the
raw HTML form of a web page. If you want to see what this looks like, you can do so by using
your browser to view the source.
The two screen shots in Figure 2-15 show how to do that in Firefox (top) and Internet Explorer
(bottom).
FIGURE 2-15. Viewing source in your browser
Once you view the source, you will be presented with the exact code for the website that the
web server sent to your browser. This is what the search engine crawler sees (the search engine
also sees the HTTP headers for the page). The crawler will ignore a lot of what is in the code.
For example, search engines largely ignore code such as that shown in Figure 2-16, as it has
nothing to do with the content of the web page.
The information the search engine crawler is most interested in is in the HTML text on the
page. Figure 2-17 is an example of HTML text for a web page (using the SEOmoz.org home
page).
Although Figure 2-17 still shows some HTML encoding, you can see the “regular” text clearly
in the code. This is the unique content that the crawler is looking to find.
In addition, search engines read a few other elements. One of these is the page title. The page
title is one of the most important factors in ranking a given web page. It is the text that shows
in the browser’s title bar (the blue line above the browser menu and the address bar).
Figure 2-18 shows the code that the crawler sees, using Trip Advisor as an example.
The first red circle in Figure 2-18 is for the title tag. The title tag is also often (but not always)
used as the title of your listing in search engine results. Exceptions to this can occur when you
obtain Yahoo! or DMOZ directory listings for your site. Sometimes the search engines may
choose to use a title for your page that was used in your listings in these directories, instead of
the title tag on the page. There are also meta tags that allow you to block this from happening,
such as the NOODP tag, which tells the search engine not to use DMOZ titles, and the NOYDIR tag,
which tells Yahoo! not to use the Yahoo! directory listing. In any event, Figure 2-19 shows
what happens when you search on stone temple consulting (the Stone Temple Consulting home
page at http://www.stonetemple.com). Notice how the title of the search listing matches the title
of the Stone Temple Consulting home page.
In addition to page titles, the search engines also read the “meta keywords” tag. This is a list of
keywords that you wish to have associated with the page. Spammers (people who attempt to
manipulate search engine results in violation of the search engine guidelines) ruined the SEO
value of this tag many years ago, so its value is now negligible. Google does not use this tag for
ranking at all, but Yahoo! and Bing seem to make reference to it (you can read about this in
detail at http://searchengineland.com/meta-keywords-tag-101-how-to-legally-hide-words-on-your
-pages-for-search-engines-12099). Spending a lot of time on meta keywords is not recommended
because of the lack of SEO benefit.
The second red circle in Figure 2-18 shows an example of a meta keywords tag.
Search engines also read the meta description tag (the third red circle in the HTML in
Figure 2-18). However, the meta description tag is not of any influence in search engine
rankings (http://searchengineland.com/21-essential-seo-tips-techniques-11580).
Nonetheless, the meta description tag plays a key role as search engines often use it as the
description for your page in search results. Therefore, a well-written meta description can have
a significant influence on how many clicks you get on your search listing. Time spent on meta
descriptions is quite valuable as a result. Figure 2-20 uses a search on trip advisor to show an
example of the meta description being used as a description in the search results.
N O T E
The user’s keywords are typically shown in boldface when they appear in the
search results (sometimes close synonyms are shown in boldface as well). As an
example of this, in Figure 2-20, TripAdvisor is in boldface at the beginning of the
description.
A fourth element that search engines read is the alt attribute for images. The alt attribute was
originally intended to allow something to be rendered when viewing of the image is not
possible. There were two basic audiences for this:
• Vision-impaired people who do not have the option of viewing the images.
• People who turn off images for faster surfing. This is generally an issue only for those who
do not have a broadband connection.
Support for the vision-impaired remains a major reason for using the alt attribute. You can
read more about this by visiting the Web Accessibility Initiative page on the W3C website.
Search engines also read the text contained in the alt attribute of an image tag. An image tag
is an element that is used to tell a web page to display an image. Here is an example of an image
tag from the Alchemist Media site:
<img src=”http://www.alchemistmedia.com/img/btob2009.jpg” alt=”BtoB Interactive
Marketing Guide” border=”0″ />
The src= part of the tag is where the image to be displayed is located. The part that starts with
alt= and is then followed with “BtoB Interactive Marketing Guide” is considered the alt
attribute.
The alt attribute is something that the search engines read. The search engines interpret it to
help them determine what the image is about and to get more of a sense as to what the page
is about.
A final element that search engines read is the noscript tag. In general, search engines do not
try to interpret JavaScript that may be present on a web page (though this is already changing).
However, a small percentage of users do not allow JavaScript to run when they load a web
page (the authors’ experience is that it is about 2%). For those users, nothing would be shown
where the JavaScript is on the web page, unless the page contains a noscript tag.
Here is a very simple JavaScript example that demonstrates this:
<script>
document.write(“It is a Small World After All!”)
</script>
<noscript>Your browser does not support JavaScript!</noscript>
The noscript portion of this is Your browser does not support JavaScript!. The search engines
will read this text and see that as information about the web page. In this example, you could
also choose to make the noscript tag contain the text “It is a Small World After All!”. The
noscript tag should be used only to represent the content of the JavaScript. (Placing other
content or links in this tag could be interpreted as spammy behavior by the search engines.)
In addition, the browser warning could end up as your search snippet, which would be a bad
thing.