| Average Rating: |
|
| Sales Rank: | 685576 (lower is better) |
| Price Used: | $52.00 |
| Shipping: | Free Shipping on most orders over $25* |
| Availability: | |
| Label: | Sybex |
| UPC: | 025211440407 |
| Pages: | 512 |
| Binding: | Paperback |
| Publication Date: | 2002-02 |
| Published By: | Sybex |
| ASIN: | 0782140408 |
| Category: | Book |
Aside from this, Heaton is not a great writer. Attempting to be particularly organized and structured, he comes off as excessively stiff; I stopped counting the number of times he wrote "I will now show how to..."
I purchased this book expecting the process of constructing a spider or bot to draw on a range of specialized skills, but it appears to be quite simple: basic knowledge of Java network programming (i.e. sockets), HTTP, HTML and XML parsing would appear to suffice. I'm sure there is all sorts of complex stuff Heaton does not talk about, but I wish he had!
At the moment I'm wondering whether this book deserves a space on my finite bookshelf.
As some of the other reviewers point out, this book does center around the creation of a "bot package". However, I see this as one of the book's greatest strengths. The author explains step by step how to take basic concepts, continually build upon them, progressing onward to more complex spiders and bots. Specifically:
1. Create an advanced HTTP object that overcomes many of the shortcomings of the one which is built into Java. (namely cookie support, referrer support, HTTP authentication, and more)
2. Add forms/page processing on top of the HTTP object. You are shown step by step how to process the data you collect from step 1.
3. Create a bot that wields the page/form processing created in step 2.
4. Create a spider, that, using steps 1-3, can access pages across an entire site.
5. Expand the spider to support thread pooling and a JDBC database.
Rather than providing a bunch of disjoint code samples, like many books do. The author guides you step by step through the above path, revealing the techniques at every step. For the reader who does not care about the intricate nature of bot programming, sadly, some of my students. You can skip to the API documentation and get right onto creating your own bots. You can also download updated versions of the "bot package" from the author's site. I actually did this before buying the book.
The downsides to the book are the example programs use of GUI's. I would rather every example had been straight console, the GUI only gets in the way, for a book targeting bot programming. Also the author very annoyingly putting an underscore in front of every class-instance variable, which gives some of the code something of a C++ look I suppose.
If you are already programming bots and spiders of your own, I don't think you will get much more from this book than you are likely already doing.
But for someone who wants to get started in this exciting area, there is nothing else like it, and I highly recommend it.
When beginning to program with HTTP protocols, it's easy to enter incorrect methods and parameters that lead to dead-ends and frustration. As I learn about and use the Heaton API, I am pleasantly surprised with the methods available and how easily they're implemented and that they lead to success.
The source code is included on the CD with updated versions at the Heaton Website.