Search Quality

3 08 2008

A few weeks back Udi Manber introduced the search quality group, and the previous posts in this series talked about the ranking of documents. While the ranking of web documents forms the core of what makes search at Google work so well, your search experience consists of much more than that. In this post, I’ll describe the principles that guide our development of the overall search experience and how they are applied to the key aspects of search. I will also describe how we make sure we are on the right track through rigorous experimentation. And the next post in this series will describe some of the experiments currently underway.

Let me introduce myself. I’m Ben Gomes, and I’ve been working on search at Google since 1999, mostly on search quality. I’ve had the good fortune to contribute to most aspects of the search engine, from crawling the web to ranking. More recently, I’ve been responsible for the engineering of the interface for search and search features.

A common reaction from friends when I say that I now work on Google’s search user interface is “What do you do? It never changes.” Then they look at me suspiciously and tell me not to mess with a good thing. Google is fine just the way it is — a plain, fast, simple web page. That’s great, but how hard can that be?”

To help answer that question, let me start with our main goal in web search: to get you to the web pages you want as quickly as possible. Search is not an end in itself; it is merely a conduit. This goal may seem obvious, but it makes a search engine radically different from most other sites on the web, which measure their success by how long their users stay. We measure our web search success partly by how quickly you leave (happily, we hope!). There are several principles we use in getting you to the information you need as quickly as possible:

  • A small page. A small page is quick to download and generally faster for your browser to display. This results in a minimalist design aesthetic; extra fanciness in the interface slows down the page without giving you much benefit.
  • Complex algorithms with a simple presentation. Many search features require a great deal of algorithmic complexity and a vast amount of data analysis to make them work well. The trick is to hide all that complexity behind a clean, intuitive user interface. Spelling correction, snippets, sitelinks and query refinements are examples of features that require sophisticated algorithms and are constantly improving. From the user’s point of view search, almost invisibly, just works better.
  • Features that work everywhere. Features must be designed such that the algorithms and presentation can be adapted to work in all languages and countries. Consider the problem of spell correction in Chinese, where user queries are often not broken up into words or Hebrew/Arabic, where text is written right to left (interestingly, this is believed to be an example of first-mover disadvantage — when chiseling on stone, it is easier to hold the hammer in your right hand!).
  • Data driven decisions – experiment, experiment, experiment. We try to verify that we’ve done the right thing by running experiments. Designs that may seem promising may end up testing poorly.

There are inherent tensions here. For instance, showing you more text (or images) for every result may enable you to better pick out the best result. But a result page that has too much information takes longer to download and longer to visually process. So every piece of information that we add to the result page has to be carefully considered to ensure that the benefit to the user outweighs the cost of dealing with that additional information. This is true of every part of the search experience, from typing in a query, to scanning results, to further exploration.

The start of your search is typing in a query. A common cause of frustration is if you don’t know the correct spelling of a word! Spell correction — which seems like a simple and obvious feature — hides many technical challenges. No common English dictionaries would ever include the correct spelling of Britney Spears, for instance (who, probably completely unbeknownst to her, has become the poster child example for this feature). We do a huge amount of analysis of the billions of pages on the web and our query logs to determine what are “real words” on the web, and what are likely to be misspellings. The system that gives you the spell correction has to, in a fraction of a second, consider a huge number of possible words you might have meant (vastly greater than any dictionary ever manually constructed) and determine if there is a more likely query you meant to type. When we are confident that you actually meant to type something else, we take a rare liberty with our search results: we try to distract you from looking at the top result on the page. The spelling correction is in your line of sight and colored a bright must-see red. Furthermore, we now make sure that nothing else on the page is red, unless it is as important to you as spelling! (so far, nothing is). The algorithms involved in spell correction are constantly getting better. They now work in a large number of languages and are even better at detecting when you have made a spelling mistake. Getting the spelling of your query right is so important that we are considering showing you the results of the spell-corrected query in the middle of the page (just in case you missed our bright red text at the top and bottom!).

Having formulated your query correctly, the next task is to pick a page from the result list. For each result, we present the title and url, and a brief two line snippet. Pages that don’t have a proper title are often ignored by users. One of the bigger recent changes has been to extract titles for pages that don’t specify an HTML title — yet a title on the page is clearly right there, staring at you. To “see” that title that the author of the page intended, we analyze the HTML of the page to determine the title that the author probably meant. This makes it far more likely that you will not ignore a page for want of a good title. Below the title comes the snippet, and a key early innovation was in what Google showed for the snippet. At the time, search engines showed you the first two lines of the web page; Google, instead, showed you parts of the page where your actual search keywords showed up (information retrieval experts call this “keywords-in-context”). Showing keywords-in-context is visually simple and virtually indistinguishable from the simpler style of snippets, but vastly more useful in helping you decide which page to visit. This simplicity belies underlying complexity: when we create a snippet we have to go through the actual text from each result to find the most relevant part (which contain your keywords) rather than just giving you the first few lines.

We have been making improvements to our snippets over time with algorithms for determining the relevance of portions of the page. The changes range from the subtle we highlight synonyms of your query terms in the results to more obvious. Here’s an example screenshot where the user searched for “arod” and you can see that Alex and Rodriguez are bolded in the search result snippet, based on our analysis that you might plausibly be referring to him:

As a more obvious example, we now extract and show you the byline date from pages that have one. These byline dates are expressed in a myriad formats which we extract and present uniformly, so that you can scan them easily:

For one of the most common types of user needs, navigational queries — where you type in the name of a web site you know — we have introduced shortcuts (we refer to them as sitelinks). These sitelinks allow you to get to the key parts of the site and illustrate many of the same principles alluded to above; they are a simple addition to the top search result that adds a small amount of extra text to the page.

For instance, the home page of Hewlett-Packard has almost 60 links, in a two-level menu system. Our algorithms, using a combination of different signals, pick the top ones among these that we think you are most likely to want to visit.

What if you did not find what you were looking for among the top results? In that case, you probably need to try another query. We help you in this process by providing a set of query refinements at the bottom of the results page — even if they don’t give you the query that you need, they provide hints for different (likely more successful) directions in which you could refine your query. By placing the query refinements at the bottom of the page, the refinements don’t distract users, but are there to help if the rest of the search results didn’t serve a user’s information need.

I’ve described several key aspects of the search experience, including where we have made many changes over time — some subtle, some more obvious. In making these changes to the search experience, how do we know we’ve succeeded, that we’ve not messed it up? We constantly evaluate our changes by sharing them with you! We launch proposed changes to a tiny fraction of our users and evaluate whether it seems to be helping or hurting their search experience. There are many metrics we use to determine if we’ve succeeded or failed. The process of measuring these improvements is a science in itself, with many potential pitfalls. Our experimental methodology allows us to explore a range of possibilities and launch the ones that work the best. For every feature that we launch, we have frequently run a large number of experiments that did not see the light of day.

So let me answer the question I started with: We’re actually constantly changing Google’s result page and have been doing so for a long time. And no, we won’t mess with a good thing. You won’t let us.





What Is SEO and Why Do I Care?

3 08 2008

SEO is yet another techie acronym to add to your arsenal. It stands for Search Engine Optimization and is all about optimizing your website’s position in search results. Many books have been written about SEO and websites are dedicated to it. It’s big business—and it should be.

Showing up higher in search results (like when you go to Google and type in a phrase or keyword) usually means more traffic and more business.

SEO is part science and part art. The SEO experts figure out what search engines look at when they rank websites. Then they figure out what can be tweaked to improve a site’s rank in the results. Some SEO tactics are of questionable value and others are borderline unethical, but the mainstream of SEO thought is simple, basic improvements that will boost your rank in search engine results.

It’s usually simple things like figuring out keywords for your site and then putting those keywords in the right places. Those right places include the title bar (the text that shows up at the top of your browser), any headers, headlines or bold text on a page, and the meta data (helpful information in a page’s code that’s not visible in a browser).

It’s really not rocket science. What search engines do is look at a page and try to figure out what’s the most important thing on that page. For a page to be the number one search result for something like rocket science it better have the words “rocket science” and other important keywords in key locations on the page. Otherwise it’s probably not really about rocket science.

Good SEO is about recognizing the same thing for your site. So if you’re in the rocket science business, you’d probably have keywords like rocket science, rockets, propulsion, physics, NASA, booster, missile, etc. And you’d want to use those words in your headlines and other important places on your site.





Online Social Marketing and the Call to Action – Just Do It

3 08 2008

We firmly believe that the absolute most important part of using social media (Web 2) sites is the content you submit. Without good content you will not get your articles ranking in the search engines and even if you do get the rankings you will not get any conversions.

There are different types of online social marketing sites available to use. There are some that allow you to bookmark, some leave little comments or posts and others that allow you to create your own little page such as Squidoo.com, Hubpages.com, Ning.com, Google Notebook & Ezinearticles.com. It is these sites that the content we talk about in this post is focused on being used on.

The majority of people out there trying to earn money with these types of sites are often too focused on quantity rather then quality. They will outsource the content or spin it without taking the time to ensure it reads well and is informative. Your first priority needs to be focusing on the human visitor. You can then go back and tweak the content slightly to better optimize it for the search engines, which we will
explain more about later on.

The simplest and most effective method that we use for creating good “human focused” content is to follow the simple formula A.I.D.A. which stands for…

A – Attention: Get the persons attention right away. This is where your title (headline) comes into play. You will want to right away grab the attention of the reader and for an example, say we were promoting a page on our site that was a weight loss pre sales affiliate page for “Supplement X” and our keyword was “supplement x reviews”; our headline (content title) might be…

“Most Supplement X Reviews Are Full Of Crap, I Stopped Using It and I Will Tell You Why!”

Now, this might seem aggressive to some, but if you were looking for a supplement x review, would this not get your attention?

I – Interest: Now that you got their attention it is time to get their interest in what you are talking about. Following the above example, if our main purpose is to promote this product, this is where we have to pull the reader in and get them interested in learning more.

We have taken the stance that we will be slamming the product but in reality we will not be, the title was just to get their attention. You may want to start by explaining why other reviews are so fake and how it is so hard to know what is real out there. Peak their interest in what you have to say so they keep reading. It is very important that you relate to them right away as well. Remember, this is just an example of a strategy to try… you can write this however you want.

D – Desire: You got their attention, peaked their interest in learning more and so now it is time to spark a desire. In this example we would now need to make sure to come clean and explain what was meant about quitting supplement x. Maybe take the stance that we quit early on because of forgetting to take the supplement and then a month later started up again and we soon realized that quitting was the worse mistake we made because the product worked.

Make sure to explain the benefits of the product and how it will help. Do not just say something like “lose weight” instead you will want to highlight exactly what the benefits of this feature is. So you might say something like…

“Helped me to fit into my size 32 pants from a size 38 in only 30 days”

A – Action: This is now were you tell them how they can quench that desire. This is also where many people fail because they make a weak call to action and just leave the reader hanging. It is so important that you tell the reader exactly what to do. Do not just put something like a keyword linked to your page or use “Click Here” because this does not tell the reader why they need to take that action. Instead tell them why to “Click Here”.

A couple examples may be…

“Click Here Now to Learn How Easy it is to Get Started”

“Get complete access to this incredible system by Clicking Here Now”

You will want to reemphasize the benefits and encourage the reader that this is one of the smartest things they will ever do.

IMPORTANT: You will want to ensure you have linked to your page with the keyword but we also recommend you use a call to action word such as “Click Here”, these get much better clickthroughs then just the keyword. You can add a nofollow to the click here link if you like )

It is also important to understand that this type of content is not for every social site, these are meant more for the sites that allow you to create a page that you own. Some sites however frown upon this type of content and do not like call to actions in the content and get up tight if there are more then 2 links in the content. Do not be discouraged if your content gets taken down from some sites, learn and tweak and try again. Remember though, the whole purpose is to get the reader to take an action and visit your blog.. so use the content to convince them that they need to do that.

TIPS: Grab a notepad and a pencil and rough out the outline of the content. It is easier to start it out this way then on a computer. Also, create your headline first and then the article… this helps guide you as to what to write about.

Once you have your content ready, you are going to want to optimize the content (we will talk about this tomorrow) and we will also show you how to make a few variations of one article for a few different sites ;)

So, did you get anything worthwhile from this post?





The Simple Skill That is an SEO Secret Weapon!

2 08 2008

There is one key skill that is often overlooked in the quest for higher search engine rankings. Yet almost everyone who uses the web will at least have heard of it. Good Web Designers use it every day and real SEO experts use it too. So what is this skill? Lets see what the well known and respected SEO Expert Jim Westergren has to say about it. Recently he listed what he sees as the essential skills that any SEO expert or good Internet Marketeer needs to know. Top of that list was HTML:

“As an SEO, knowing HTML is a must. You need to see and understand how the search engines are reading the source code of the web pages.”

Its an excellent and often overlooked point – hence the little known secret! If you know HTML (which is easy to learn) you can really get under the skin of any web page and start to see how the search engine reads the page. So it seems that a little time spent brushing up on your html skills will be time well spent.

You can use your updated or new knowledge of html to design or alter your pages so that the search engines can read through them easier. If you stop to think about it for a moment you will see that this makes perfect sense. I mean who likes a hard read! I remember having to study Dicken’s classic book “Hard Times” during my school days and boy that was a hard read. It wasen’t just the subject matter but also the tiny font and the immense word density of the pages.

In order to put this key skill into practice you don’t have to be a profesional code wrangler but you must at least know how a web page is structured, what the main tags are, and how they effect the readability of the page. With this knowledge you can take your current SEO skills to the next level and fine tune your pages for even more exposure.

So the Secret SEO Weapon is not so secret afterall. Yet my guess is that many internet marketeers and many self proclaimed SEO experts have probably overlooked this simple and yet essential skill. So be good to your search engine robot friends amd make it easy for them to read your pages. I am sure they will thank you for it by promoting your pages up the search engine rankings.