The Question Filter

Improving the Usability of FAQs and Search Engines

Introduction

FAQs and Search Engines

Frequently Asked Question lists (FAQs) are common throughout the Internet. They originated from newsgroups, to alleviate the workload of veteran readers from answering the same question over and over as new users entered the group. The form of a frequently asked questions list is typically a hyperlinked list of question sentences, which link to the answers to the questions.

Search engines are also commonly used throughout the Internet. These take the form of a text box where the query is entered. The query is run against an automatically generated index of the site's content. In the simplest form, every page that contains the query string is returned to the user. In a more advanced form, pages are assigned ranks using different weighting factors, such as number of links to the page (Google), occurence count of the query, or other formulaic measures of closeness.

The Question Filter

The question filter is a technique that improves upon the FAQ and the Search Engine by making the simple observation that most common questions are asked repeatedly, have the same key words, and have well-defined answers. For most questions, a veteran of the subject will have no trouble matching the entered key words to the single right answer or possible set of answers.

The technique consists of a text input box, where the user types the query or key words for his or her question. When the submit button is pressed, the query is stored in a database. Then, instead of returning results based on data from an automatically generated index, the user is directed to a pre-defined page or set of pages based on the keywords in the query.

The set of keyword-to-page mappings is stored in a database or in a script. In the beginning, perhaps this database or script will be mostly empty. However, the veteran of the subject can easily add mappings as new questions are entered, because all the queries are stored in the database. Furthermore, if new questions are asked, the expert can go through the recorded queries and add more answers and content to the site. If no keyword mapping exists, then the user can be directed to either the standard FAQ index or a Search Engine result page.

Benefits and Drawbacks

The benefit of this system over the FAQ is that the user is freed from having to scan through a possibly long list of questions, which may be dozens of pages long. This saves time for the user, and if there are a lot of users, the aggregate time savings could be dramatic. Each word that a person must read on a page consumes the user's time, thus, minimizing the amount of reading is courteous and proper manners. When a user arrives at a site with a question, he or she usually knows what they are looking for. Typing in the keyword for their question takes about 5-10 seconds, whereas navigating and scanning a FAQ index of question sentences can take dozens of minutes.

The benefit of this system over the Search Engine is that the user is immediately directed to the expert's answer to the question. For common queries, there is no wading through possibly hundreds of pages of search results. The search results often contain junk from the page rather than meaningful descriptions, making the hunt even more time consuming because the user must view the page before being able to know what is on the page.

The drawback of this system is that keywords must be manually mapped to answers. (We assume that the answers to the questions have already been developed.) However, we argue that the same domain experts that voluntarily answer questions in newsgroups would be just as willing to perform the keyword-to-page mappings. In fact, if the system helped users to find their answers better, the experts would be more than willing because that would cut down on the redundant traffic in the newsgroup.

This system is not suited for queries where the domain is too general, for example, a gateway page to the entire Internet. For this type of query, Google seems to be the current state-of-the-art system to help users get started in their hunt. On the other hand, if multiple people created domain specific question filters, then it would be a simple matter to allow the user to choose a domain from the available set of domains, and then type in the query.

Case Studies

Examples of FAQs

Here is an example of an obnoxiously long FAQ: The Debian GNU/Linux FAQ. To the casual user looking for a quick answer to a question, this FAQ is a nightmare. It contains fifteen categories of questions that consume five letter pages of text. To find an answer, the user must either scan through the entire list, or use the browser's "find" function to locate the keyword. The categorization does not help the user who is not a domain expert, because the user doesn't understand the categories a priori. The user would have to expend time and mental effort to understand the categorization, so this would not be on the path of least resistance.

Consider the user who knows that Debian is a freeware operating system, and wishes to know where to download it. Upon arriving at the page, the user might first try to use the browser's "find" button with the keyword "download". If one tries this, it will fail to find anything. The next logical keyword might be "obtain". This also fails. A third attempt might be made with the keyword "ftp". This query succeeds in moving the browser to the section titled "Debian FTP Archives". Now the user must read through the the list of questions in that section. None directly answer the question "How do I download Debian."

With a question filter, an expert could quickly map the keyword "download" directly to a page which describes the key points a user needs to know to download Debian. These could be the addresses of the download mirrors, the required files, and links to additional information on installation, other files, etc.

Examples of Search Engines

An extremely obnoxious search engine has been implemented on the IRS home page. (The IRS, Internal Revenue Service, is the tax-collection agency of the US government.) First of all, the query is split into two query forms. This makes it impendent upon the user to figure out what the IRS means by having two query forms. It would be like walking into a building and finding not one, but two receptionists, and then one would have to figure out which receptionist to ask one's question based on the title posted above their desk. The titles, of course, use company language that is difficult for an outsider to understand.

Setting that aside, suppose I am looking for the latest tax regulations on travel expenses. Suppose that I know there is an IRS publication for travel expenses, but I do now know its title or its number. Let's say I take the path of least resistance and enter my query, "travel", into the first text box I see, the one labelled "Search IRS Site For:". This query produces 10 results, and 462 more on other pages. Now, I must scan through the results. Luckily, the page titles are bold and underlined, so I can scan the title. I ignore the titles that say "tips", because I am looking for a publication. Once I've scrolled down to the 9th and 10th results, I see the titles "Publications", but the descriptions are not meaningful. If I click on the 9th result, it takes me to what looks like the publication on Travel Expenses, but there is no mention of the publication title or number on the page, so I can not be sure. I go back and click on the 10th entry. Scanning this page, I find two prominent hyperlinks in the text to "Publication 463". I guess that this may be it, and I click on the link, and I have found my answer. The title is now at the top of the page, "Publication 463: Travel, Entertainment, Gift, and Car Expenses."

Now the IRS is a large agency, but when a member of the public queries "travel" in the main web site, an IRS expert would know that that query can mean only a very, very small subset of things. These would include Publication 463 at the top, and maybe a few links to tip pages and other related pages. The user would save a tremendous amount of time versus using the automated search engine.

Hybrid Search Engine and FAQ

Paypal has implemented a hybrid automatic search engine combined with a heirarchical FAQ. This is accessed via the "Help" link that shows in the top right corner of every page on the site. Suppose I am a user who is comparing online-payment systems, and am shopping by the cost of the service. I would want to know, as fast as possible, what the fees are for Paypal's service. Paypal is a very elegant site due to its simplicity. The path of least resistance would be for me to click on the "Help" link at the top of the page. This is because nothing else relevant to fees looks clickable on the main page. At the very bottom of the page, in small type, is a link called "Fees," but lets assume I don't see this because it would require scrolling my browser and reading more of the page than I care to.

On the help page, the first thing I see is a text input box labeled "Search: Have a question? Find your answer fast", which makes me happy. I ignore the categories on the left side of the page and the FAQ-style question listing below the search box. Remember, I am on the path of least resistance and wish to read the fewest amount of words possible. So, I type in "fees" in the search box and hit Search. The first result that shows up is "How do I view or edit my account information, including my email address, street address, phone number, credit card, and bank account?" This is a very long sentence to read. The first result that even contains the word "fees" is the fourth result. I scan through the rest of the results and do not find what I'm looking for, which is a consolidated listing of the "fees" for accounts. However, once I get to the bottom of the page, I notice the "fees" link, which I click on. This takes me to exactly what I want.

Again, if the search was using a question filter rather than an automatic search engine, the answer to my question would have been instantaneous, not requiring me to read a single word. All I would have to do is visually locate the prominent and centered query input box, type in "fees", and press enter. A Paypal public-relations expert would know the exact page to which my query should be directed.

Patching the Search Engine

Microsoft probably has the largest, best organized, and best maintained on-line library of content in the world. Its MSDN library is reknowned amongst Windows developers for its thoroughness and accuracy. However, because the library is so large, finding something in it is extremely difficult for someone who does not know the techniques of search engine querying or the organization of the MSDN content.

In order to improve the ability of searchers to find content, Microsoft has implemented two strategies. (We ignore training, which is expensive.) The first is a keyword system where the keywords begin with the letters "kb". For example, to restrict my query to problems only, I can include "kbPRB" in my query. Typing "kbPRB Internet Explorer" brings up all articles on Internet Explorer problems. The drawback of this system is that the keywords must be learned in advance. Also, the documentation for the keyword searching is not consolidated in one, easy-to-find place.

Another method Microsoft has implemented is to collect user feedback on the search results. They have tried various methods, including placing an input box on each page soliciting feedback about whether the results answered the user's question, and displaying a popup request for a survey to be taken by the user on the quality of the search experience. Currently, they have a simple envelope icon at the top right of each page, which extends the bottom of the page so the user can submit a 1 through 5 rating of the page, or go to another page to send more feedback. (This feature is currently broken under Mozilla/Netscape 7 because Microsoft's web page code is not cross-browser compatible.)

The failing of this method lies in the disconnect between what Microsoft needs to know from the user, a detailed description of their question, and what Microsoft is implicitly asking. By soliciting feedback in this way, Microsoft is asking, "How well does this search result answer your question?". Due to time constraints of the searcher and the nature of this question, the only answer can be a feedback rating like 1-5, which is nearly useless. Microsoft really needs to know, "What is your full, detailed question?", because only then can the answer be correctly given or improved.

Live support

The web hosting company, rackspace.com, implements an on-line live sales system. The function of a sales team is to provide the customer with accurate answers to his or her questions, and as a byproduct, figure out what the customer is looking for. In order to maintain a 24/7 live support system, a person must man the terminals at all times. As is well known, human time is extremely expensive compared to machine time. It is also inefficient because the sales people may have to wait around when no one is visiting the site, or become inundated when too many people come to the site. It is also a boring job, because the sales people must answer the same questions over and over again. To alleviate this, the sales people may develop pre-generated boilerplate answers, and feed these to the users. However, this negates the point of having a live person with whom to chat. By sending canned responses, the customer will feel unimportant to the company.

Combination Systems

E*Trade Financial Group implements a combination help system of live support, telephone support, email support, search engine, indexes, and FAQs. A major expense of the company is maintaining this vast system whose most time-consuming function is simply to answer common user questions.

Implementation

An Example of a Question Filter

An example system, created by the author, is located at the Simcity 3000 page. The system has two user interface components, the main question page, and a no-match result page. The system also has two administrative components, a question history and a keyword mapping script.

question history (link out of date)
The code to match keywords to pages, and to save the question. (link out of date)

Note that if the question is not matched to a query, the user is directed to the site map. This is because my site does not have a search engine, due to being fairly small. Also, on the site map page, the user is presented with an opportunity to enter his or her full question. The user is incentivized to enter their question because they can receive a personalized notice when and if the question is answered. Having the full question helps the expert doing the keyword mapping, because the expert can understand how users are entering keywords, in their own language. It is a simple matter to send an email to the user when the question is answered.

Conclusion

A Much Needed Addition

The question filter is another addition to the arsenal of systems that help people find information. It is very simple and elegant, both in implementation and in concept. As the first line of interaction, it will save the user valuable time in obtaining answers to common questions. More arcane questions can easily and naturally fall-back to existing systems.

Home · Contact · Search · Print · Tweet

Have you heard of the new, free Automated Feeds offered by Google Merchant Center? Learn more in Aten Software's latest blog post comparing them to traditional data feed files.

Created 2004-12-16, Last Modified 2018-01-25, © Shailesh N. Humbad
Disclaimer: This content is provided as-is. The information may be incorrect.