I haven’t seen my nephew since 2 months B. C. (Before Covid) at his wedding. Well it’s 5+ years later and 2 kids and a job high level enough at Open AI that he’s been in the White House with Sam Altman. So in order not to appear too stupid in front of him, I’ve read Ananthaswamy’s marvelous book “Why Machines Learn The Elegant Math Behind Modern AI.”
It’s exactly what I wanted, but I don’t think it’s for everybody. How much you get out of it depends on what background you bring to the table. His math explanations are quite clear (except for Bayesian statistics) and as one of the founders of Open AI (Ilya Sutskever) said, the math behind AI just isn’t that complicated.
I think it’s possible for a smart (and motivated) high school student with just geometry and algebra I and II to learn all the math in it as taught by Dr. A. All you need is college calculus learning differentiation up to and including the chain rule. Don’t fret Dr. A. will teach it to you in the book.
I think the novice will have problems understanding the linear algebra although he explains it clearly. However he does cover a lot of ground quickly. I wrote a series of posts on linear algebra 14 years ago when I audited a quantum mechanics course at Smith College, 50 years after I took it the first time. There are 9 posts in all, which will explain why all the subscripts and superscripts are reasonable and natural (even matrix multiplication !). Start here https://luysii.wordpress.com/2010/01/04/linear-algebra-survival-guide-for-quantum-mechanics-i/ and follow the links.
There are no hard differential equations in it. No topology, group theory, number theory, differential geometry etc. etc. This is what Sutskever meant by simple.
The other thing I bring to the book is a lifelong amateur interest in computers since from von Neumann’s work at their very beginning at Penn during World War II, it was apparent to all that they might tell us something about the brain. I had a professional interest in how the brain works as a neurologist and so I followed the ins and outs of attempts to make computers intelligent (and believe me there were tons of them). One of the strengths of the book is its historical approach starting with McCouloch at Illinois and a high school drop out Warren Pitts in 1941, and going to nearly the present. The last reference in the book (copyright 2014) is 2012.
So none of the writing about the brain was new to me as it might be to others. I will say that what he says about the brain is quite accurate.
Along the way you’ll be introduced to the statistics of Penguins, a London Cholera epidemic, who wrote the Federalist papers (Hamilton or Madison), how to tell how much anesthesia is enough and much else. Dr. A. uses them all to bring in the math used to solve these problems, making the math far from dry. His writing is always clear and excellent.
You will actually understand the math behind backpropagation (which is how machines learn), neural networks, convolutional networks, self-supervised learning, etc. etc. Even better; at the end, he shows that our theories of machine learning are incomplete, in that phenomena appear (which he describes) when the machines are left to themselves, that shouldn’t happen accoding to theory, in the same way that quantum mechanical phenomena upended classical physics.
So give it a shot. I’m grateful to him for writing the book.