5.6 Accessing External Resources
This section discusses a Web application, shown in Figure 5-3, that illustrates how you can use JSTL URL actions to access external resources by scraping book information from Amazon.com. The application consists of two JSP pages that use the <c:import>, <c:url>, and <c:param> actions.
The top picture in Figure 5-3 shows the JSP page that serves as the application's welcome page. That page creates four links that are created by HTML anchor elements. The corresponding URLs for those links are created by <c:url> actions with nested <c:param> actions. The rest of the pictures in Figure 5-3 show information for each of the books listed in the welcome page. That information is scraped from Amazon.com with a combination of <c:import> actions and the <str:nestedString> action from the Jakarta String Tag Library.
The JSP page shown in the top picture in Figure 5-3 is listed in Listing 5.4.
The preceding JSP page uses four <c:url> actions with nested <c:param> actions to create four URLs that all point to show_book.jsp. That JSP page is specified with the <c:url> action's value attribute as a page-relative URL. Each of the four URLs created by the <c:url> actions also has a request parameter named bookUrl whose value represents an external URL that points to the respective book's page on Amazon.com. Each of the four <c:url> actions stores its processed URLs in page-scoped variables whose names correspond to the books that they represent. Subsequently, four HTML anchor elements are created to reference the values stored in those scoped variables. When a user clicks on one of those anchors, control is transferred to show_book.jsp, which is listed in Listing 5.5.
Listing 5.4 Creating the Book URLs
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Book Selection</title>
</head>
<body>
<%@ taglib uri='http://java.sun.com/jstl/core' prefix='c' %>
<font size='5'>
Select a book:
</font><p>
<%-- Create URLs for each book with a page-relative path
to show_book.jsp and a request parameter named bookUrl
whose value represents the book's URL on Amazon.com.
Those URLs are stored in page-scoped variables that
are used to create HTML links below --%>
<c:url var='theFutureOfSpacetime' value='show_book.jsp'>
<c:param name='bookUrl' value='http://www.amazon.com/exec/obidos/ASIN/0393020223/
ref=pd_sim_books_1/102-5303437-2118551'/>
</c:url>
<c:url var='whatEvolutionIs' value='show_book.jsp'>
<c:param name='bookUrl' value='http://www.amazon.com/exec/obidos/ASIN/0465044255/
ref=pd_sim_books_4/102-5303437-2118551'/>
</c:url>
<c:url var='goneForGood' value='show_book.jsp'>
<c:param name='bookUrl' value='http://www.amazon.com/exec/obidos/ASIN/038533558X/
ref=pd_sim_books_3/102-5303437-2118551'/>
</c:url>
<c:url var='tellNoOne' value='show_book.jsp'>
<c:param name='bookUrl' value='http://www.amazon.com/exec/obidos/ASIN/0440236703/
qid=1023935482/sr=8-1/ref=sr_8_1/104-6556245-7867920'/>
</c:url>
<%-- Create HTML links for each book using the URLs stored
in page-scoped variables that were created above by
<c:url> --%>
<a href='<c:out value="${theFutureOfSpacetime}"/>'>
The Future of Spacetime
</a><p>
<a href='<c:out value="${whatEvolutionIs}"/>'>
What Evolution Is
</a><p>
<a href='<c:out value="${goneForGood}"/>'>
Gone for Good
</a><p>
<a href='<c:out value="${tellNoOne}"/>'>
Tell No One
</a><p>
</body>
</html>
Listing 5.5 show_book.jsp
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Book Information</title>
</head>
<body>
<%@ taglib uri='http://java.sun.com/jstl/core' prefix='c' %>
<%-- Declare the Jakarta Strings tag library --%>
<%@ taglib uri='WEB-INF/string.tld' prefix='str'%>
<%-- Import the page from Amazon.com using the bookUrl
request parameter --%>
<c:import var='book' url='${param.bookUrl}'/>
<%-- Store today's date and time in page scope --%>
<jsp:useBean id='now' class='java.util.Date'/>
<table>
<tr>
<td>Book:</td>
<td><i>
<%-- Show the book title --%>
<str:nestedString open='buying info: '
close='</title>'>
<c:out value='${book}'/>
</str:nestedString>
</td></i>
</tr>
<tr>
<td>Rank:</td>
<td><i>
<%-- Show the book rank --%>
<str:nestedString open='Sales Rank: </b> '
close='</span>'>
<c:out value='${book}'/>
</str:nestedString>
</td></i>
</tr>
<tr>
<td>Average Review:</td>
<td><i>
<%-- Show the average review --%>
<str:replace replace='-' with='.'>
<str:nestedString open='stars-' close='.gif'>
<c:out value='${book}'/>
</str:nestedString> stars
</str:replace>
</td></i>
</tr>
<tr>
<td>Date and Time:</td>
<td><i>
<c:out value='${now}'/>
</td></i>
</tr>
</table>
</body>
</html>
The preceding JSP page uses <c:import> to import content from Amazon.com with the URL specified by the bookUrl request parameter. The var attribute is specified for the <c:import> actions so that the imported content is stored in a string that is referenced by a page-scoped variable named book. Subsequently, the preceding JSP page uses <jsp:useBean> to create a date representing the current date and time. Finally, the JSP page uses the <str:nestedString> action from the Jakarta String Tag Library�which extracts a substring specified with strings that precede and follow the substring�to extract the book's title, sales rank, and average review from the string stored in the book page-scoped variable. The preceding JSP page also displays the current date and time with the scoped variable created by the <jsp:useBean> action at the top of the page.
Disclaimer:
Scraping information from webpages is inherently risky business, because it relies on the absolute position of static text in a webpage's HTML; if the HTML is modified, you may have to change the code that scrapes information. As this book went to press, the example discussed in this section worked as advertised, but if Amazon.com modifies their webpage format, it may break that example.
|
No comments:
Post a Comment