Amazon.com Customer Reviews
good but no great - Review written on June 12, 2008
Rating: 3 out of 5
4 customers found this review helpful, 1 did not.
most people have shared their thoughts on the good of this book. I like to point out some of the bad as I read through:
- first, too many typos - both the author and oreilly should do a better job on proof read the materials. the typos are so much that it can easily wreck otherwise good materials.
- second, arcane solution and coding style. Many first step to the solution of machine learning is to represent the problem at hand well. The author's brain apparently wired different from mine so the opinion is personal. For example: chapter 5 on "optimization for preference", he chose to represent a solution as vector form like [0,0,0,0,0,0,0,0,0,0], there is no way I can relate this solution to the real meaning (you want to allocate 10 students into 5 rooms each with two slots) - if there is an easy explanation, the book didn't say so.
thus the 3 star. I believe a second edition is warranted and should be much better.
just my 2c.
Great, simple presentation of some powerful techniques - Review written on June 10, 2008
Rating: 5 out of 5
5 customers found this review helpful.
Programming Collective Intelligence is a book about applying data mining techniques to analyse collections of data. There is submerged information in Ebay prices, in Facebook profile networks, in collections of movie reviews, in news sites, in the stockmarket; this book by Toby Segaran shows ways to extract, visualise, understand, and predict that information.
Each chapter explains and explores a different data mining algorithm, and builds up a working example in Python, while presenting different methods and parameters of the implementation. I hadn't really worked with Python before, but found the code easy to follow, and picked up some interesting Python idioms that I haven't seen in other languages before. Chapters end with a set of exercises to follow that build your understanding.
As you follow the examples you build up a reasonably generic code base that allows you to swap in and out different implementations, and reuse previous code to add to new applications.
The examples use live examples from the web: sites like Ebay, Facebook, and Yahoo Finance, and this makes the book more interesting and the results more visceral than some other books on the subject which use more contrived or obscure examples. Even though there is a strong web (or web 2.0) focus on the examples, the methods and the understanding is useful for a whole range of applications.
Some of the topics covered:
* Bayesian classifiers to detect spam, or to file news articles into site sections
* Hierarchical and k-means clustering to discover groups of similar items in massive sets
* Euclidiean distance, Pearson Correlation Coefficient, Tanimoto Coefficient: ways to measure the distance (or difference) between items
* Neural networks to predict user behaviour and improve search result ordering
* Optimisation methods like hill climbing, simulated annealing, and genetic algorithms
* Non-negative matrix factorization
* Support vector machines and kernel methods to go where linear regression can't
I found it exciting to read -- it's one of those books that give you a whole bunch of new ideas for things to build as you read it. The presentation is very good: no background is assumed, and it doesn't talk down to those more experienced.
Recommended.
Nice introduction to exciting topics but lacks depth - Review written on May 05, 2008
Rating: 2 out of 5
9 customers found this review helpful, 2 did not.
I think this is a good, easy-to-read intro to several interesting data-centric software technologies, but it is superficial.
For example, their collaborative filtering (ratings + recommendations) section illustrates only the most simplest of algorithms and completely skips over more advanced techniques (improved normalization, matrix factorization, and others), it skips over even basic benchmarking of the rec system (IMO, if you aren't doing objective benchmarks and tuning it off of those metrics, your rec system is useless), and doesn't address any of the common pitfalls and problems (sparsity, overfitting, normalization problems, scalability issues).
I guess that is expected. If you want a book that's easy to read that can get you excited about some cool ares in software development, this book is great. If you want information beyond the introductory casual reading level, look elsewhere.
A fantastic book full of ideas & examples for anyone developing against websites with large user bases - Review written on April 28, 2008
Rating: 5 out of 5
6 customers found this review helpful.
As a long time O'Reilly reader & fan, I have to say this is the best O'Reilly book I've
read in the past several years, and is now among my favorite programming books in general. This is really an applied Artificial Intelligence book in disguise, as it covers most of the core topics found amongst the top AI textbooks. I've recently read a few of the standard AI books, such as Norvig, Duda & Hart; which are thorough, but in a bad way, because they miss the forest for the trees. Your average working software developer is not going to be able to use these textbooks to create any code without investing a lot of time, or stopping long the way to get a Phd.
And this is precisely where this books shines, unlike similar books out there--Toby Segaran has managed to explain the core AI algorithms in plain language, with very readable code examples that implement a fully working example to get you started. Reading this book made me realize most of the AI that I've studied is not hard in itself, but rather the standard way AI algorithms are presented in textbooks is just terrible and obfuscated.
For example, Toby describes a fully working backpropagation neural network, with code(!) in about 9 pages. I've never seen a NN presentation better than this. There were several chapters where I couldn't help laughing at how conceptually easy a given algorithm ends up being if only you stop and explain it as simply as possible, and throw out most of the mathematical notation. That sounds obvious, but for some reason few authors think brevity helps get the point across, especially when dealing with a mathematical topic. So kudos to Toby for this, which is a major accomplishment in itself, as it's going to really help the book appeal to a much wider audience.
I also though it was a great idea to connect every topic in the book to large data sets which anyone can get off the web. This lead me to think of many other kinds of datasets to try this code on, so it's not the kind of book that you read and put away;
but rather you keep tweaking the example code(available on the book's website), adding to it and experimenting.
In all, a great book, highly recommended!
Highly Recommend - Review written on January 31, 2008
Rating: 5 out of 5
3 customers found this review helpful.
I've always been interested in algorithm development, and was curious to see these creative techniques applied to data mining technologies. The author does a great job presenting complex material in a format that encourages hand-on experimentation in addition to providing an introductory understanding of the subject. The books is divided into chapters which focus on a specific problem, and the author walks you through techniques to solve them, from high level theory to concrete examples, often using data retrieved from online sources. The code samples were easy to follow (even without knowing Python), and numerous instructions and links were provided for libraries, data sources, etc to assist interested readers with creating their own programs or researching topics further. He even includes a summary chapter at the end, reviewing the highlights of each algorithm, along with pros and cons of the method. If you're interested in the subject, this is a good book for your shelf.
Putting Theory into Practice - Review written on December 18, 2007
Rating: 4 out of 5
72 customers found this review helpful, 3 did not.
This book is probably best for those of you who have read the theory, but are not quite sure how to turn that theory into something useful. Or for those who simply hunger for a survey of how machine learning can be applied to the web, and need a non-mathematical introduction.
My area of strength happens to be neural networks (my MS thesis topic was in the subject), so I will focus on that. In a few pages of the book, the author describes how the most popular of all neural networks, backpropagation, can be used to map a set of search terms to a URL. One might do this, for example, to try and find the page best matching the search terms. Instead of doing what nearly all other authors will do, prove the math behind the backprop training algorithm, he instead mentions what it does, and goes on to present python code that implements the stated goal.
The upside of the approach is clear -- if you know the theory of neural networks, and are not sure how to apply it (or want to see an example of how it can be applied), then this book is great for that. His example of adaptively training a backprop net using only a subset of the nodes in the network was interesting, and I learned from it. Given all the reading I have done over the years on the subject, that was a bit of a surprise for me.
However, don't take this book as being the "end all, be all" for understanding neural networks and their applications. If you need that, you will want to augment this book with writings that cover some of the other network architectures (SOM, hopfield, etc) that are out there. The same goes for the other topics that it covers.
In the end, this book is a great introduction to what is available for those new to machine learning, and shows better than any other book how it applies to Web 2.0. Major strengths of this book are its broad coverage, and the practicality of its contents. It is a great book for those who are struggling with the theory, and/or those who need to see an example of how the theory can be applied in a concise, practical way.
To the author: I expect this book will get a second edition, as the premise behind the book is such a good one. If that happens, perhaps beef up the equations a bit in the appendix, and cite some references or a bibliography for those readers interested in some more in depth reading about the theory behind all these wonderful techniques. (The lack of a bibliography is why I gave it 4 stars out of 5, I really think that those who are new to the subject would benefit greatly from knowing what sits on your bookshelf.)
Understanding the logic behind sites like Amazon and Google... - Review written on October 20, 2007
Rating: 5 out of 5
31 customers found this review helpful, 1 did not.
Have you ever wondered how some of those "collective intelligence" sites work? How Amazon can suggest books that you'll like based on your browsing history? How a search engine can rank and filter results? Toby Segaran does a very good job in revealing and teaching those types of algorithms in his book Programming Collective Intelligence: Building Smart Web 2.0 Applications. While I'm not ready to run out and build my own version of Facebook now, at least I can start to understand how sites like that are designed.
Contents:
Introduction to Collective Intelligence; Making Recommendations; Discovering Groups; Searching and Ranking; Optimization; Document Filtering; Modeling with Decision Trees; Building Price Models; Advanced Classification - Kernel Methods and SVMs; Finding Independent Features; Evolving Intelligence; Algorithm Summary; Third-Party Libraries; Mathematical Formulas; Index
In each of the chapters, Segaran takes a type of capability, be it decision-making or filtering, and shows how a programming language can be used to build that feature. His examples are all in Python, so it helps if you are already familiar with that language if you want to actually work with the code. But even if you don't know Python, the examples are clear and detailed enough that you can follow along and get the gist of what's happening. I personally think that it would help immensely if you had a background in mathematics and statistics. You can use the code here without having a detailed understanding of math, but I'm sure much of this would be more deeply appreciated if you already know about such things as Tanimoto similarity scores, Euclidean distances, or Pearson coefficients.
From my perspective (a non-Python programmer *without* the math background), I was more interested in understanding the overall picture about things like how ranking systems work or how recommendation engines are structured. While there was more detail than I needed (or understood), I still felt as if I accomplished my goal. I have a much greater appreciation for what companies like Google and Amazon have done in building web applications that allow the knowledge and wisdom of groups to be gathered and applied to my own preferences.
Statistical programmers will probably find years of entertainment here. :) "Normal" programmers will expand their horizons, too.
The most accessible book on machine learning I've found - Review written on September 05, 2007
Rating: 5 out of 5
17 customers found this review helpful.
I first learned of this book just a few weeks ago, shortly before it was available. I immediately read the sample chapter on the publisher's website and was certain I had to get a hold of a copy.
I was not in the least bit disappointed with what I found. It has been quite a while since I've looked at any Python code (I'm more of a Ruby fan, personally), but the code is easy to follow and it's a simple matter to extract the basic concepts into any language.
I have spent quite a few years now watching the field of machine intelligence from the sidelines, occasionally reading the odd technical write up or wikipedia article, trying to wrap my brain around the basic ideas. The thing is, it's not clear to me that in some regards, it's not that complex. It's just that most of the existing books and articles are written for those immersed in the field. This book is not like that. It explains things in clear language that is easy to follow, using simplified examples and making excellent use of graphics to "show" you how it works.
If you really want to dig in deep, Segaran provides exercises at the end of each chapter and gives you an appendix full of mathematical formulas (the "pure" representation of the algorithms).
Finally, I should mention that the last chapter does what so many other technical books should but don't: it clearly summarizes everything he has shown you. He does this in a straightforward way so that you won't have to go searching through the book, rereading everything again, to put these techniques into practice.
A "hands-on" approach to an otherwise abstract topic - Review written on August 16, 2007
Rating: 5 out of 5
17 customers found this review helpful, 2 did not.
"Programming Collective Intelligence" is a great book. I took a college course on data mining and this book really would have come in handy.
From a "hands-on" programming perspective, the information on the useful libraries in python for crawling, parsing RSS feeds, python drawing, and accessing popular RESTful APIs are really valuable. The code samples are well documented and rather timely. I think Toby has done an amazingly cogent job of demonstrating the nuts and bolts of implementing the plethora of data mining and AI-related concepts pertinent to the field of Collective Intelligence. Additionally, I was new to Python and this book was a real eye opener.
In fact, more than just a book on Collective Intelligence, this is a really useful Python book. I learned a lot about Python reading through the examples and trying to get them to work on my laptop. (I was new to Python before this book, but have since started using Python at my work).
The author has demystified the abstract idea of Collective Intelligence and presented the concepts in an excellent programming language choice in Python. Most of the topics covered are things most developers just hear about. Taking a college course on Data Mining or Artificial Intelligence may expose one to the ideas, but I have never encountered a book that introduced the topics covered in "Programming Collective Intelligence" in a way so intuitive and familiar to the programmer. Distilling all of the topics into a set of very useful Python script really illustrated how practical and available these concepts really are in ones daily work. I will definitely make use of Toby's book.