Friday, October 30, 2009

Section 16.3. Googling a Site







16.3. Googling a Site

Google is perhaps the most popular search engine on the Web today, and it has developed some of the most innovative ideas using Ajax. One of the secrets to Google's success has been giving developers access to their solutions through an API (http://code.google.com/apis/ajaxsearch/). By doing this, Google allows developers to add a search box to a site that ties directly to Google's search engine, encouraging more people to use Google for their searching needs. And the cost for this ability? Nothing.

16.3.1. Google's AJAX Search API

Google provides many ways to conduct Ajax searches through its API besides the raw searching capabilities we saw earlier in this chapter. Google allows you to search in its different search categories via different predefined objects. Depending on what you want your search to do, Google offers a more generic search control as well as specialized controls. Table 16-1 lists these specialized Searchers.

Table 16-1. The Searchers available with the Google AJAX Search API
SearcherDescription
GSearchThe GSearch object provides the ability to execute searches and receive results from a specific search service. This is the base class that the service-specific searchers inherit from.
GwebSearchThe GwebSearch object implements a Gsearch interface for the Google Web Search service. It returns a collection of GwebResult objects upon search completion.
GlocalSearchThe GlocalSearch object implements a Gsearch interface for the Google Local Search service. It returns a collection of GlocalResult objects upon search completion.
GvideoSearchThe GvideoSearch object implements a Gsearch interface for the Google Video Search service. It returns a collection of GvideoResult objects upon search completion.
GblogSearchThe GblogSearch object implements a Gsearch interface for the Google Video Search service. It returns a collection of GblogResult objects upon search completion.
GnewsSearchThe GnewsSearch object implements a Gsearch interface for the Google News service. It returns a collection of GnewsResult objects upon search completion.
GbookSearchThe GbookSearch object implements a Gsearch interface for the Google Book Search service. It returns a collection of GbookResult objects upon search completion.


In this chapter, I will concentrate on a few objects.

16.3.1.1. GSearchControl

The GSearchControl object is a single search control on a page that is a container for Searchers. This object is not functional until it has at least one searcher child. A search control is bound to an XHTML container using its draw( ) method. The GSearchControl acts as the holder for a set of Searcher objects that can be manipulated or used on the client.

There are three steps to making this object functional, and they have an expected order of completion:

  1. Create a new instance of the GSearchControl object using sc = new GSearchControl( ).

  2. Add a Searcher or multiple Searchers to the object using sc.addSearcher( ).

  3. Draw the control so that it is ready for use using sc.draw( ).

When these steps have been executed, the search control is ready for use.

Searcher objects may not be added to a search control once its draw( ) method has been called.


Table 16-2 shows the methods that are available to the GSearchControl object.

Table 16-2. Methods available to the GSearchControl object
MethodDescription
addSearcher(searcher[,options])The method addSearcher( ) adds a Searcher object to the control. The optional options parameter supplies configuration options for the passed searcher.
cancelSearch( )The method cancelSearch( ) is used to tell the search control to ignore all incoming search result completions, and the internal state of the control is reset.
clearAllResults( )The method clearAllResults( ) is used to clear all of the search results from the search control.
draw(element[, options])The method draw( ) activates the control by creating the user interface and search result containers for each configured searcher. The element must be an XHTML block element, while the optional options supplies a GdrawOptions object that can be used to specify either linear or tabbed drawing mode.
execute([query])The method execute( ) causes the search control to initiate a sequence of parallel searches across all configured searchers. When the optional query argument is supplied, its value is placed within the search control's input text box and becomes the search expression. When this method is called, all previous search results are cleared.
inlineCurrentStyle(node[, deep])The method inlineCurrentStyle( ) is a static utility method used to clone the current computed style for the specified node (or tree of nodes when the optional deep is set) and inline the current style into the node.
setLinkTarget(target)The method setLinkTarget( ) sets the target used for links embedded in the search results. Valid values are:
  • GSearch.LINK_TARGET_BLANK: Links will open in a new window. This is the default value for the control.

  • GSearch.LINK_TARGET_SELF: Links will open in the same window and frame.

  • GSearch.LINK_TARGET_TOP: Links will open in the topmost frame.

  • GSearch.LINK_TARGET_PARENT: Links will either open in the topmost frame, or replace the current frame.

  • Anything else: Links will open in the specified frame or window.


setOnKeepCallback(object, method[, keepLabel])The method setOnKeepCallback( ) is used to inform the search control that the caller would like to be notified when a user has selected a text link for copy. When called, each search result is annotated with a text link, underneath the search result; when clicked, this will cause the method to be called, passing it a GResult object containing search results. The object defines the context in which the method will be called, while the optional keepLabel supplies an optional text label to be used for clicking. Valid values include:
  • GSearchControl.KEEP_LABEL_SAVE: A label value of "save."

  • GSearchControl.KEEP_LABEL_KEEP: A label value of "keep."

  • GSearchControl.KEEP_LABEL_INCLUDE: A label value of "include."

  • GSearchControl.KEEP_LABEL_COPY: A label value of "copy." This is the default value for the label.

  • GSearchControl.KEEP_LABEL_BLANK: A blank label value is used. This works well when all you want is the copy graphic (obtained using CSS).

  • Any other value: The value passed becomes the label.


setResultSetSize(switchTo)The method setResultSetSize( ) is called to select the number of results returned by each searcher. The switchTo value is an enumeration that indicates either a small or a large number of results. Valid values for the argument are:
  • GSearch.LARGE_RESULTSET: Request a large number of results (typically eight results).

  • GSearch.SMALL_RESULTSET: Request a small number of results (typically four results).


setSearchCompleteCallback(object, method)The method setSearchCompleteCallback( ) is used to inform the search control that the caller would like to be notified when the search completes. The callback method will be called for every search result returned (determined by the number of searchers attached). The object is an application-level object that defines the context in which the method will be called.
setSearchStartingCallback(object, method)The method setSearchStartingCallback( ) is used to inform the search control that the user would like to be notified right before a search begins. The callback method will be called for every search result that starts (determined by the number of searchers attached). The object is an application-level object that defines the context in which the method will be called.
setTimeoutInterval(timeout)The method setTimeoutInterval( ) sets the timeout used to initiate a search based on keystrokes when an application is providing its own input control and asking the search control to use it. Valid values are:
  • GSearchControl.TIMEOUT_SHORT: This is used for a very short delay (~350 ms).

  • GSearchControl.TIMEOUT_MEDIUM: This is used for a medium delay (~500 ms). This is the default value of the control.

  • GSearchControl.TIMEOUT_LONG: This is used for a long delay (~700 ms).



Example 16-7 shows how to use this control.

Example 16-7. Using the GSearchControl control


<script type="text/javascript">
//<![CDATA[
/* Example 16-7. Using the GSearchControl control. */

/**
* This function, body_onload, is called when the page finishes loading and
* creates and draws a /GSearchControl/, adding searchers and executing the
* search for "The Matrix".
*/
function body_onload( ) {
/* Create a search control */
var searchControl = new GSearchControl( );

/*
* Create a draw options object so that we can position the search
* form root
*/
var options = new GdrawOptions( );
options.setSearchFormRoot(document.getElementById('searchForm'));

/* Populate with searchers */
searchControl.addSearcher(new GwebSearch( ));
searchControl.addSearcher(new GvideoSearch( ));
searchControl.addSearcher(new GblogSearch( ));

searchControl.draw(document.getElementById('searchResults'), options);
searchControl.execute('The Matrix');
}
GSearch.setOnLoadCallback(body_onload);
//]]>
</script>




Figure 16-4 shows the results of a search using this control.

Figure 16-4. The results of using the GSearchControl control


16.3.1.2. GSearchForm

When applications use the GSearch objects in standalone form, rather than under the control of the GSearchControl object, they will often need to capture and process user-generated search requests. The GSearchForm object was designed with this in mind. It provides applications with a text input element, a Search button, an optional Clear button, and the standard Google branding.

The three steps involved in creating a GSearchForm object are:

  1. Create a new instance of the GSearchForm object using sf = new GsearchForm(true/false, container).

  2. Set an onsubmit callback using sf.setOnSubmitCallback(object, method).

  3. Optionally, set an onclear callback using sf.setOnClearCallback(object, method).

When these steps have been executed, the search form is active and ready to begin receiving and processing input. Table 16-3 shows all of the methods available to the GSearchForm object.

Table 16-3. Methods available to the GSearchForm object
MethodDescription
execute([query])The method execute( ) causes the search control to submit the form. When the optional query argument is supplied, its value is placed within the search control's input text box and becomes the search expression. When this method is called, all previous search results are cleared.
setOnClearCallback(object, method)The method setOnClearCallback( ) registers a method to be called when the Clear button is clicked. The object argument supplies an application-level object that defines the context in which the method will be called.
setOnSubmitCallback(object, method)The method setOnSubmitCallback( ) registers a method to be called when the Submit button is clicked. The object argument supplies an application-level object that defines the context in which the method will be called.


In addition to the methods listed in Table 16-3, GsearchForm also has two public properties: input and userDefinedCell.

The input property is the text input element for the form, and it has read and write access available to the application. The userDefinedCell is the DOM node of the table cell designed to hold application-specific content. An application may place information close to the search form by using this property.

Here is a simple example of using this object:


sf = new GSearchForm(false, document.getElementById('searchForm');
sf.setOnSubmitCallback(null, CaptureForm);
sf.input.focus( );
sf.execute('The Matrix');


16.3.1.3. GwebSearch

The GwebSearch object implements the GSearch interface for the Google Web Search service, which, upon completion of a search, returns a collection of GwebResult objects. It has access to all of the methods available to GSearch (see Appendix C), plus the method setSiteRestriction( ). This method takes a site as an argument that will restrict the form to search only in that site. It can take the following forms:

  • Partial URL (www.amazon.com, google.com, etc.)

  • Custom search engine ID (000455696194071821846:reviews, 000455696194071821846:shopping, etc.)

The setSiteRestriction( ) method also has two optional parameters: refinement and moreResultsTemplate. When a site refers to a Custom Search Engine, the value of the refinement argument specifies a Custom Search Engine Refinement. Also when a site refers to a Custom Search Engine, the value of the moreResultsTemplate specifies a URL template that is used to construct the "More results" link that appears under a set of search results in the search control.

16.3.2. Using Google's AJAX Search API

I have shown examples of how to create a search object using Google's AJAX Search API, but this does no good if you have no idea what you will be getting back from Google in the search results. Google passes results using Result objects that depend on the Searchers that were added to the search control. Table 16-4 gives a list of the possible Result objects as of this writing. For the most up-to-date information on these and any other objects that are a part of the Google AJAX Search API, check out Google's Class Reference at http://code.google.com/apis/ajaxsearch/documentation/reference.html.

Table 16-4. The Result objects available with the Google AJAX Search API
Result objectDescription
GwebResultThe GwebResult object is produced by the GwebSearch object when a search is executed, and is available in this object's .results[] array, though it may also be available as an argument of a search control's "keep callout" method.
GlocalResultThe GlocalResult object is produced by the GlocalSearch object when a search is executed, and is available in this object's .results[] array, though it may also be available as an argument of a search control's "keep callout" method.
GvideoResultThe GvideoResult object is produced by the GvideoSearch object when a search is executed, and is available in this object's .results[] array, though it may also be available as an argument of a search control's "keep callout" method.
GblogResultThe GblogResult object is produced by the GblogSearch object when a search is executed, and is available in this object's .results[] array, though it may also be available as an argument of a search control's "keep callout" method.
GnewsResultThe GnewsResult object is produced by the GnewsSearch object when a search is executed, and is available in this object's .results[] array, though it may also be available as an argument of a search control's "keep callout" method.
GbookResultThe GbookResult object is produced by the GbookSearch object when a search is executed, and is available in this object's .results[] array, though it may also be available as an argument of a search control's "keep callout" method.


All of the Result objects provide the same basic functionality, though their public properties will differ based on their type. For a primer, I will introduce the GwebResult object.

16.3.2.1. GwebResult

All Result objects have two common properties available to them: .GsearchResultClass and .html. The .GsearchResultClass property indicates the type of result that has been returned, which is one of the following:


GwebSearch.RESULT_CLASS

Indicates GwebResult


GlocalSearch.RESULT_CLASS

Indicates GlocalResult


GvideoSearch.RESULT_CLASS

Indicates GvideoResult


GblogSearch.RESULT_CLASS

Indicates GblogResult


GnewsSearch.RESULT_CLASS

Indicates GnewsResult


GbookSearch.RESULT_CLASS

Indicates GbookResult

The .html property supplies the root of an HTML element that may be cloned and attached somewhere into the application's DOM hierarchy. For example:


/* Clone the .html node from the result object */
var node = result.html.cloneNode(true);

/* Attach the node into the document's DOM */
container.appendChild(node);


In addition to the common properties available to all Result objects, GwebResult also has the following:


.cacheUrl

This property supplies a URL to Google's cached version of the page responsible for producing the result. When the property is null, no cached version of the result is available. This property should not be persisted to ensure that the cache has not gone stale.


.content

This property supplies a brief snippet of information from the page associated with the search result.


.title

This property supplies the title value of the result.


.titleNoFormatting

This property supplies the title, but unlike .title, it is stripped of any HTML markup (e.g., <b>, <i>, etc.).


.unescapedUrl

This property supplies the raw URL of the result.


.url

This property supplies the escaped version of the URL of the result.

An example of the GwebResult object in action follows:


<script type="text/javascript">
//<![CDATA[
/**
* This function, body_onload, is called once the page has loaded and
* creates a new search control with a /GwebSearch/ Searcher attached to it.
*/
function body_onload( ) {
/* Create a new search control */
var searchControl = new GSearchControl( );
/* Create a restricted web search with a custom search engine */
var siteSearch = new GwebSearch( );

/* Give this search control a custom label */
siteSearch.setUserDefinedLabel('Product Reviews');
siteSearch.setSiteRestriction('000455696194071821846:reviews');
searchControl.addSearcher(siteSearch);

/* Define the callback */
searchControl.setOnKeepCallback(null, DummySearchResult);
/* Draw the control in the /searchControl/ block element */
searchControl.draw(document.getElementById('searchControl');
}

/**
* This function, DummySearchResult, would be the callback when a
* search completed.
*
* @param {object} result The /Result/ object from the search.
*/
function DummySearchResult(result) {
// do something here
}

GSearch.setOnLoadCallback(body_onload);
//]]>
</script>




16.3.3. Displaying Results

Displaying results to the user is very important in the overall scheme of searching. Things to consider are what to show with each result, how many results to display, and where the results should be placed in a page. All of these are formatting concerns in one way or another. The other important part of the result set is how it is being delivered to the client in the first place.

16.3.3.1. The response

The response to any search query is what we are really concerned about. After all, this is what the user asked for, and we must present it in as clear and useful a manner as possible. With Google, as with many of the web services available, the results are returned in an easy-to-manage way for the developer to present and manipulate. For a custom search engine, it is good to have the following key pieces of data on hand when creating the response:

  • The URL of the page the result is for

  • The title of the page the result is for

  • A snippet of content from the page the result is for

  • The last modified date of the page the result is for

The URL of the page the result is for really needs to be two different URLs: one for the user to see (so, it should be readable and without protocol, etc.), and one for the application to use under the hood. The title is also just for the user to see as the main "clickable" part of the search result. A snippet of content is not strictly necessary for the result, but it can make it easier for the user to navigate to the most pertinent result. The last modified date is also more of a nicety, just to tell the user whether the search result is still relevant for the search.

This data could easily be passed as XML from the search engine (on the server) to the client:


<?xml version="1.0" encoding="utf-8"?>
<results>
<result>
<title>oreilly.com - Welcome to O'Reilly Media, Inc.</title>
<url>
<visible>www.oreilly.com/</visible>
<encoded>http://www.oreilly.com/</encoded>
</url>
<snippet>
O'Reilly Media spreads the knowledge of innovators through its
books, online services, magazines, and conferences. Since 1978,
O'Reilly has been a ...
</snippet>
<last_mod>2007/02/28</last_mod>
</result>
<result>
.
.
.
</result>
</results>




JavaScript Object Notation (JSON) might be the better choice for this, because it would require fewer bytes to transmit to the client and would be easier to parse when it came to formatting the results—that is, if the results are not already coming back formatted. If the results are coming from any web service, chances are they will not be formatted. The JSON for the previous XML would look like this:


{
result: [
{
title: 'oreilly.com - Welcome to O'Reilly Media, Inc.',
url: [
'www.oreilly.com/',
'http://www.oreilly.com/'
],
snippet: 'O\'Reilly Media spreads the knowledge of innovators through
its books, online services, magazines, and conferences. Since 1978,
O\'Reilly has been a ...',
last_mod: '2007/02/28'
},
{
.
.
.
}
]
}




16.3.3.2. Site formatting

Now comes the last part—what the user sees. With Ajax facilitating the search, you could use fancy effects to make the user more aware of the results when they are returned. For example:


/**
* This is our dummy function from before...the result argument contains the
* /.results/ array.
*/
function DummySearchResult(result) {
/* Always clear out the old results first */
searchControl.clearResults( );
/* Then hide them */
$('myResults').hide( );
/* Did the function get results back? */
if (result.results && result.results.length > 0) {
/* Loop through the results and format them */
for (var i = 0; i < result.results; i++) {
// display the results somehow...
// i.e. result.results[i].title
// result.results[i].content
// etc.
}
/* Make the results appear and make the user aware of them */
Effect.Appear('myResults', { duration: 3.0 });
Effect.Highlight('myResults');
}
}




Besides giving the results a jolt of Web 2.0, they need to be styled using CSS. Google's AJAX Search API provides CSS classes for each Result object that have the developer in mind. Each Result object is sent with an .html property that contains the template to which all results should be formatted. The GwebResult CSS structure, according to Google's API Class Reference, looks like the following:


<div class="gs-result gs-webResult">

<!-- Note, a.gs-title can have embedded HTML
// so make sure to account for this in your rules.
// For instance, to change the title color to red,
// use a rule like this:
// a.gs-title, a.gs-title * { color : red; }
-->
<div class="gs-title">
<a class="gs-title"></a>
</div>
<div class="gs-snippet"></div>

<!-- The default CSS rule has the -short URL visible and
// the -long URL hidden.
//
// If you want to reverse this, use a rule like:
// #mycontrol .gs-webResult .gs-visibleUrl-short { display:none; }
// #mycontrol .gs-webResult .gs-visibleUrl-long { display:block; }
-->
<div class="gs-visibleUrl gs-visibleUrl-short"></div>
<div class="gs-visibleUrl gs-visibleUrl-long"></div>
</div>


Table 16-5 lists the available Result styling structures.

Table 16-5. The Result styling structures available with the Google AJAX Search API
Result stylingDescription
GwebResult CSS structureThe GwebResult CSS structure is used to format the results from a GwebResult object.
GlocalResult CSS structureThe GlocalResult CSS structure is used to format the results from a GlocalResult object.
GvideoResult CSS structureThe GvideoResult CSS structure is used to format the results from a GvideoResult object.
GblogResult CSS structureThe GblogResult CSS structure is used to format the results from a GblogResult object.
GnewsResult CSS structureThe GnewsResult CSS structure is used to format the results from a GnewsResult object.
GbookResult CSS structureThe GbookResult CSS structure is used to format the results from a GbookResult object.


All APIs have their own ways of styling result content as well as their own methods for allowing developer interaction and manipulation. Search results do not have to be flashy, but they should be functional. Google's AJAX Search API allows for this type of searching, as do other search engines. Refer to Appendix C for information on other search engine APIs. Searching should be helpful, intuitive, and fast—otherwise, it becomes more than it ought to be. Adding Ajax to search engine functionality, either by using a web service or building your own, should increase speed. The rest is up to you.








No comments: