Amazon.com Customer Reviews
A sound conceptual approach - Review written on April 25, 2008
Rating: 4 out of 5
5 customers found this review helpful, 2 did not.
Usually considered to be a branch of artificial intelligence, especially at the present time, pattern recognition is defined in this book as the automatic discovery of regularities in data by the use of computer algorithms and the use of these regularities for classifying the data in different categories. The first part of this definition is typically referred to as `unsupervised learning' and the latter `supervised learning.' Both of these areas have resulted in a gargantuan amount of research due to their importance in areas such as medicine, genomics, network modeling, financial engineering, and voice recognition. This book emphasizes a "conceptual" approach to teaching pattern recognition, and therefore is highly valuable to those who need to learn the subject. Too often this field is taught purely from the formal standpoint, or conversely by the use of many trivial examples that illustrate the algorithms that are used. These approaches make the subject appear to be either a highly-developed mathematical one (which it is) or a cookbook that does not have a sound foundation. This book is one of the few that will allow the reader to gain a more in-depth understanding and appreciation of the subject as preparation for doing research and development in pattern recognition. The author claims that the book is self-contained as far as background in probability theory is concerned, but readers should still be prepared with this background in order to better appreciate the content. The Bayesian paradigm dominates the book, as it should given the current emphasis in research circles.
Some of the highlights of the book include discussions on:
* Relative entropy and mutual information. These two concepts have become very important in recent years, especially in the validation of pattern recognition models, the selection of relevant variables, and in independent component analysis.
* Periodic variables and how they can be used in contexts where Gaussian distributions are problematic.
* Markov chain Monte Carlo sampling, especially the role of the detailed balance condition in obtaining the acceptance probability for the Metropolis-Hastings algorithm.
* Bayesian linear regression and its ability to deal with the over-fitting problem in calculations of maximum likelihood and the determination of model complexity.
* Kernel learning (usually called support vector machines in other books).
Some of the minuses of the book include:
* Needs more in-depth discussion of Bayesian neural networks, over and above what is done in the book. The author's does devote a section in the book to this topic, but given its enormous importance, especially in automated learning and economic forecasting, more examples need to be included.
* More real-world test cases need to be included, along with a comparison of the efficacies of different approaches, so as to illustrate the "no free lunch" philosophy.
* More exercises that require more analysis on part of the reader, instead of derivation-type problems or straightforward numerical exercises.
* Needs more details on independent component analysis. Only a few paragraphs are devoted to this important topic.
Very good text, but with some flaws - Review written on February 22, 2008
Rating: 4 out of 5
9 customers found this review helpful.
First of all, as some other reviewers have pointed out, the subtitle of the book should include the word 'Bayesian' in some form or the other. The reason this is important is because the Bayesian approach, although an important one, is not adapted across the board in machine learning, and consequently, an astonishing number of methods presented in the book (Bayesian versions of just about anything) are not mainstream. The recent Duda book gives a better idea of the mainstream in this sense, but because the field has evolved in such rapidity, it excludes massive recent developments in kernel methods and graphical models, which Bishop includes.
Pedagogically, however, this book is almost uniformly excellent. I didn't like the presentation on some of the material (the first few sections on linear classification are relatively poor), but in general, Bishop does an amazing job. If you want to learn the mathematical base of most machine learning methods in a practical and reasonably rigorous way, this book is for you. Pay attention in particular to the exercises, which are the best I've seen so far in such a text; involved, but not frustrating, and always aiming to further elucidate the concepts. If you want to really learn the material presented, you should, at the very least, solve all the exercises that appear in the sections of the text (about half of the total). I've gone through almost the entire text, and done just that, so I can say that it's not as daunting as it looks. To judge your level regarding this, solve the exercises for the first two chapters (the second, a sort of crash course on probability, is quite formidable). If you can do these, you should be fine. The author has solutions for a lot of them on his website, so you can go there and check if you get stuck on some.
As far as the Bayesian methods are concerned, they are usually a lot more mathematically involved than their counterparts, so solving the equations representing them can only give you more practice. Seeing the same material in a different light can never hurt you, and I learned some important statistical/mathematical concepts from the book that I'd never heard of, such as the Laplace and Evidence Approximations. Of course, if you're not interested, you can simply skip the method altogether.
From the preceding, it should be clear that the book is written for a certain kind of reader in mind. It is not for people who want a quick introduction to some method without the gory details behind its mathematical machinery. There is no pseudocode. The book assumes that once you get the math, the algorithm to implement the method should either become completely clear, or in the case of some more complicated methods (SVMs for example), you know where to head for details on an implementation. Therefore, the people who will benefit most from the book are those who will either be doing research in this area, or will be implementing the methods in detail on lower level languages (such as C). I know that sounds offputting, but the good thing is that the level of the math required to understand the methods is quite low; basic probability, linear algebra and multivariable calculus. (Read the appendices in detail as well.) No knowledge is needed, for example, of measure-theoretic probability or function spaces (for kernel methods) etc. Therefore the book is accessible to most with a decent engineering background, who are willing to work through it. If you're one of the people who the book is aimed at, you should seriously consider getting it.
The NIPS view of the (Machine Learning) world - Review written on December 29, 2007
Rating: 3 out of 5
1 customer found this review helpful, 1 did not.
This book is quite good in the material it covers. However, other aspects, for example, decision trees, are only briefly covered. I think this is because this book provides a NIPS (Neural Information Processing Systems conference) view of the world, where only certain aspects of the Machine Learning world are accepted.
This book, together with Duda, Hart, and Stork's book Pattern Classification (2nd Edition) make an excellent pair.
The book should change its title - Review written on September 25, 2007
Rating: 3 out of 5
10 customers found this review helpful.
This book (PRML) should be re-titled as "PRML: a bayesian approach". Yes, bayesian approach is very useful for machine learning, and sometimes the final goal of learning is to maximize some sort of posterior probability. However, if the author is such a huge fun of bayes statistics, please tell perspective readers in a clear way. Emphasize bayes aspects too much really hurt the quality of this book as a general-purpose textbook of machine learning.
For a better textbook of machine learning, I recommend:
1) The elements of statistical learning (perhaps this book a little hard for beginner in this field -- but as least better than PRML -- you can compare their chapters about linear regression to see which one is better).
2) Pattern classification (focus on classification, not regression. Also not very easy -- anyway, machine learning is not an easy field ^_^).
3) Machine Learning (a little old, but great for beginner.)
These three book also mention bayesian statistics, but in a proper way. If you have some experience in machine learning and have engineering-level math background, just choose the 1) or 2). If you are completely a beginner, first take a glance on 3), and then go to 1) or 2).
Finally, if you want a book that discusses machine learning purely from bayesian perspective, PRML is good.
Ok, but too much math destroys the intuition... - Review written on September 09, 2007
Rating: 3 out of 5
5 customers found this review helpful, 3 did not.
This book is a fairly thorough overview of typical topics employed in a graduate machine learning course. However, from page 5 on, expect to see more equations on each page than paragraphs of text (with most of the remaining text explaining the context of the variables within the equations). Now, for someone such as myself who enjoys mathematics, this is not a problem. However, I would not recommend this book for someone with a mathematics background that is in any way weak. Furthermore, there is a more fundamental problem with the presentation of the material that warrants this book no more than a 3-star rating: the simple intuitiveness of the concepts is completely lost within the mathematics. Instead of explaining what variables represent and leaving it to the reader to figure out what is going on, this book could be made much more approachable by simply stating the intuition behind the equations. Take the sum rule, one of the first theorems in the book, for an example of how the author muddles what is effectively a basic and intuitive concept: the book has a fairly lengthy definition of several variables representing concepts such as "the number of observations in which x_ij appears" prior to presenting a summation over all y-variables (a notational convention that the author admits is "cumbersome" on the next page, and states that "there will be no need for such pedantry" as that which he proceeds to perpetrate throughout the book!), while he could have simply presented the simplified sum on the following page (p(X) = sum(p(X,Y), Y)) and it would be immediately clear to most readers what he was attempting to explain. He could also simply state the intuition behind the theorem in English, that summing over every event yields a probability of one, and therefore summing over all events in which a variable appears effectively marginalizes the variable (something he comes close to doing after the presentation of the equation, but by then, the reader's time has already been wasted). Similar examples abound throughout the book, becoming particularly bad during the middle sections, when the techniques begin to become less intuitive.
As another reader mentioned, the author also commits the serious mistake of using pi for a symbol other than the constant or the product operator, which muddles the equations on a skim and forces the reader to refer back to the variable definitions to determine the context.
Having done work in machine learning's applied cousin, data mining, and thus having used many of the techniques presented in the book in actual research, I can't help but think that the presentation of the book's content could be much clearer. When doing work in the field, we can look up the equations as-needed; it is the knowledge of *when* and *how* to apply or extend these techniques that is more important, and that is the area in which I feel this book is lacking.
The best Pattern Recognition textbook I know - Review written on July 17, 2007
Rating: 5 out of 5
4 customers found this review helpful.
This book brings the most updated research in this field. The writing stile combines common-sense intuitive explanations with precise mathematical formulations. A lot of colorful figures support the text and help the reader to understand and absorb the described ideas. Short biographies of scientists like Bayes, Laplace, Gauss etc. (which unfortunately substantially drop after the Ch. 2) provide a brief glancing on humans which are behind these great names. The author makes connections between the different chapters, which help the reader to see a wide picture. But don't expect for an easy work. As every deep scientific text it is sometimes fluent and fun, and sometimes demands an effort, rereading the same text again and again, and referring to other references. Personally I feel a great satisfaction when after such an effort the concept became clear to me.
The other useful feature is solved exercises which are available for download from the authors' web site [..]
The main drawback of this book is a relative small amount of detailed examples. As an experienced educator, I know that "a single good example could worth a thousand explanations". It probably will be not an issue with appearance of the practical companion volume (Bishop and Nabney, 2008). The reference to the future (2008) still un-existed publication is unusual, fresh-thinking, and right idea.
With this book C. Bishop continues his "tradition" of writing deep and important scientific books which was started with the "Neural Networks for Pattern Recognition".
A short comment to the reviewer "lew lwndn123", who is deeply disappointed by the fact that this is a textbook. Yes, it is a textbook, and it is clearly written in the "Book Description". It is unfair to "kill" the book just because you didn't really check what you are going to buy, especially you admit that "as a textbook, this is very good text, and deserves 5 starts". I think it will be a decent step if you will correct your review.
Great Insights, but a hard read - Review written on June 16, 2007
Rating: 3 out of 5
14 customers found this review helpful, 2 did not.
This new book by Chris Bishop covers most areas of pattern recognition quite exhaustively. The author is an expert, this is evidenced by the excellent insights he gives into the complex math behind the machine learning algorithms. I have worked for quite some time with neural networks and have had coursework in linear algebra, probability and regression analysis, and found some of the stuff in the book quite illuminating.
But that said, I must point out that the book is very math heavy. Inspite of my considerable background in the area of neural networks and statistics, I still was struggling with the equations. This is certainly not the book that can teach one things from the ground up, and thats why I would give it only 3 stars. I am new to kernels, and I am finding the relevant chapters difficult and confusing. This book wont be very useful if all you want to do is write machine learning code. The intended audience for this book I guess are PhD students/researchers who are working with the math related aspects of machine learning. Undergraduates or people with little exposure to machine learning will have a hard time with this book. But that said, time spent in struggling with the contents of this book will certainly pay-off, not instantly though.
Another book about machine learning without a clear theoretical backbone. - Review written on June 02, 2007
Rating: 4 out of 5
28 customers found this review helpful, 11 did not.
Bishop's book about machine learing and pattern recognition is well written and the figures are really pretty because they are in color and informative. Overall the book looks very nice and it is fun to read in. In my opinion only the book 'The Elements of Statistical Learning' by Hastie et al. looks comparably well.
The book is a textbook rather than a monograph and, hence, intended for students rather than researchers and the coverage of machine learning topics is thorough without being able to cover every topic in deepth. This is not really a draw back because no book is able to do this anyway. The presentation of the methods is informative and, depending on the background of the reader, clear enough to figure out how it works to use the method.
What is the problem: I do not like that the methods are introduced not rigourously but by examples. That mean Bishop does not have the definiton, theorem, proof style but is more heuristic. This may sound very helpful for the reader not familiar with the topic to reduce the barrier of understanding by providing examples to visulalize the problem. The problem is, in my opinion, that this is not the case but the oposite. In think it is never wrong to provide examples and it is absolutely desirable but after the examples are given and one has an intuitive understanding of the problem one wants to see its formal solution because that's what machine learning is about, it is applied statistics. For this reason I give only 4 instead of 5 points (but not less because also all the other books about this topic fail in this respect).
Overall, the book is well done and certainly a good source of information for students and researches.
recommend for non statistics majors - Review written on May 09, 2007
Rating: 5 out of 5
35 customers found this review helpful, 3 did not.
I started to read this book after I gave up the book "element of statisitcal learning" which I read about 80 pages. I won't say that the latter book EoSL is bad, but it definitely assumes a much higher math background. Also it doesn't give all the derivations and reasonings, so it may take a long time to understand a single paragraph. The reading is slow and frustrating. I read each chapter twice, but still do not think I did get it in my heart.
By contrast, the book "Pattern Recognition and machine learning" assumes much less math background, and usually gives complete derivation and reasoning, which makes it a pleasure to read. Therefore, if you are not in statistics major (but a CS major with reasonable statistics background), I recommend you to start this book.
Answers to some problems are posted in the author's website (just google the author's name). It is a big plus to me.
Thorough but vastly unclear - Review written on February 28, 2007
Rating: 2 out of 5
60 customers found this review helpful, 8 did not.
I can appreciate others who might think that this is a great book.... but I am a student using it and I have some very different opinions of it.
First, although Mr. Bishop is clearly an expert in Machine Learning, he is also obviously a HUGE fan of Bayesian Statistics. The title of the book is misleading as it makes no mention of Bayes at all but EVERY CHAPTER ends with how all of the chapter's contents are combined in a Bayes method. That's not bad it's just not clear from the title. The title should be appended with "... using Bayesian Methods"
Second, while it is certainly a textbook, the author clearly has an understanding of the material that seems to undermine his ability to explain it. Though there are mentions of examples there are, in fact, none. There are many graphics and tiny, trivial indicators, but I can't help to think that every single one of the concepts in the book would have benefited from even a single application. There aren't any. I am lead to believe that if you are already aware of many of the methods and techniques that this would be an excellent reference or refresher. As a student starting out I almost always have no idea what his intentions are.
To make matter worse, he occasionally uses symbols that are flat-out confusing. Why would you use PI for anything other than Pi or Product? He does. Why use little k, Capital K, and Greek Letter Kappa (a K!) in a series of explanations. He does. He even references articles that he has written... in 2008!!
Every chapter seems to be an exercise to see how many equations he can stuff in it. There are 300 in Chapter 2 alone. Over and over and over again I have the feeling that he is trying to TELL me how to ride a bicycle when it would have been so much easier to at least let me see the view from behind the handle bars with my feet on the pedals. Chapter five on Neural Nets, for example, is abysmally over-complicated. Would you hand someone a dictionary and ask them to write a poem? ("Hey, all the words you need are in here!") Of course not.
Third, the book mentions that there is a lot of information available on the web site. The only info available on his website is a brief overview of the text, a detailed overview of the text (that's not a typo.... he has both), an example chapter, links to where the book can be purchased, and (actually, quite useful for creating slides) an archive of all of the figures available in the book. There are no answers to problems or explorations of any part of the material. The upcoming book might be amazing and exactly what I am looking for but it could be months away and another $50 or so to purchase it. Hardly ideal. How about putting some of that MatLab code on your site? *Something* to crystalize the concepts!
Finally, while the intro indicates this might be a good book for Computer Scientists it would actually make more sense to call it a Math book. More specifically a Statistics book. There are no methods, no algorithms, no bits of pseudo-code, and (again) no applications are in the text. Even examples that actually used hard numbers and/or elements from a real problem and explained would be much appreciated.
Maybe I am being a little critical and perhaps I want for too much but in my mind if you are writing a book with the goal of TEACHING a subject, it would be in your interest to make things clear and illustrative. Instead, the book feels more like a combination of "I am smart. Just read this!" and a reference text.
Only for those with EXTREMELY strong math backgrounds - Review written on February 14, 2007
Rating: 2 out of 5
6 customers found this review helpful, 5 did not.
I'm currently using this textbook for a class, and I have to say that it is the WORST text book I have ever read. Its explanations are never clear and always cluttered with pointless notation which obfuscates its readability.
For instance, it will constantly explain things like "index x whose range is 1...X" for some complicated equation, and then sort of skim over what is actually going on in the rest of the equation. Just a clue: If I could understand the dense, utterly frustrating, notation-crufty equations you let pass unexplained, it would be IMMEDIATELY OBVIOUS (as it already is) that X was the upper bound on your indexing variable x. In fact, you wouldn't even need to explain that x was an indexing variable: I would be able to tell from its use in your sum notation (as I already am). Use the text to actually EXPLAIN IN ENGLISH the significance of the OBSCURE parts of your notation.
This book focuses on explaining the trivially obvious points of its equations and leaves out CLEAR and STAIGHT-FORWARD explanations for what the processes going on in its notation mean. The only reason I am giving it two stars is because it is obviously a wonderful book for someone who is a graduate-level math student, not a vanilla computer science student (even a fairly math savvy one).
If only all textbooks were this well-written - Review written on January 29, 2007
Rating: 5 out of 5
21 customers found this review helpful, 3 did not.
I was a big fan of Bishop's earlier "Neural Networks for Pattern Recognition" despite my not being particularly interested in neural networks (as opposed to other aspects of machine learning), and so I was pretty excited when I heard about this book. Reading it has not left me disappointed. Like his earlier book, this text is quite mathematically oriented, and not well-suited for people who aren't comfortable with calculus. However, also like in "NNPR", the writing style here is very clear, and everything past basic calculus and linear algebra is well-explained before it's needed. The appendices alone are a goldmine. (Appendix B is a great "cheat sheet" for commonly used probability distributions; Appendix C has lots of useful matrix properties you may have forgotten or never known; Appendix D quickly explains what you need to know about the calculus of variations; and Appendix E does the same for Lagrange multipliers.) The author also does an excellent job throughout the text of marrying math and intuition without giving either short shrift.
However, note that the material covered is inherently pretty complex, so the book can still be intimidating in parts despite the excellent writing. It's more appropriate for, say, Ph.D. students and professional researchers in statistics or machine learning than people who just want to crank out code for a simple classifier. There is very little pseudocode (although copious MATLAB code will supposedly be made available in a companion book due out in 2008), and the book's overall approach to machine learning is basically hard-core Bayesian statistics. If you are not willing to scratch your head for a while over lots and lots of equations, this may not be the book for you.
On the flip side, people who are already experts in machine learning may be mildly disappointed with the lack of coverage some of their pet topics get. For example, while the chapter on graphical models is excellent as far as it goes, it only mentions the problem of learning graphical model structures (one of my areas of interest) in passing. Reinforcement learning (another personal area of interest) is discussed briefly in the introduction and then written off as beyond the scope of the book.
However, the book is already a fabulous resource as it stands; complaining there's not even more of it would be gauche. The cover may look like goat barf, and there are some innocuous missing words here and there (hey, it's a first edition), but if you're serious about machine learning and not afraid of a little math, you should definitely own this book. I can only imagine how much cooler my own thesis research might have been if this book had been around a few years earlier.
THIS IS A TEXTBOOK! - Review written on January 26, 2007
Rating: 1 out of 5
4 customers found this review helpful, 51 did not.
I was expecting that 700+ book will be scientific monograph. Disappointment: this is a textbook, American style textbook, with wide margins to make notes, color text, color frames, color pictures explaining what is linear regression, gaussian distribution and such.
Just to be clear, as a textbook, this is very good text, and deserves 5 starts. But I am giving just one because of disappointment. Sending back to Amazon. This is not what I was looking for
Fantastic text - Review written on December 26, 2006
Rating: 5 out of 5
9 customers found this review helpful, 1 did not.
I've read many books on statistical pattern recognition and machine learning, and this is my favorite to date. This book is more focused than AIMA (Artificial Intelligence, A Modern Approach), so it serves a complementary role to this classic text.
The beginning lays a solid foundation on probability, decision theory and information theory. I was most interested in the chapters on Graphical Models, Kernel Methods, and Mixture Models & EM. The chapter on Graphical Models is available for preview on Bishop's site.
In addition to providing an insightful and coherent explanation of these techniques, he also introduces some ideas that were new to me: Relevance Vector Machines (as opposed to Support Vector Machines) and Variational Inference. His references are quite recent, and many are from pending texts and articles (It's funny to be reading the book in 2006 and see a reference from 2007.) Better still, soon he will release an accompanying library of Matlab algorithms.
This is a cutting-edge, well-written book. The writing is clear; this is the same author who wrote the widely adopted text "Neural Networks for Pattern Recognition". 5 stars...