A young robot's primer on the origins of her learning and intelligence.
Data Science mediates between evidence and knowledge. it is as old as humans.
Plato explains the theory of forms through the allegory of the cave, wherein prisoners can only see
the shadows of puppets on a wall, cast by the light from a fire behind them. the prisoners attempt
to infer what is really going on based on what they see.
The analogy beautifully captures the separation between the imperfect nature of
evidence, and the complexity of the model one might form based on said evidence.
Data (from the Latin for "given") is often flawed. our models (forms) are usually limited.
We have developed natural philosophy, and scientific method, to arrive at the best models we can -
e.g. by parsimony (Occam's Razor). however, for millenia, the job of sifting evidence and creating
models was combined. experts carried out (at least) two tasks.
Statistics emerged as a way to summarise and model the data itself, specialising the business of
understanding the evidence itself, so that the form of models is abstracted away from the details
of the data (to some degree).
Now, we have moved to a world where the formation of models itself is, to some extent, part of this
new science. in varying degrees, the mechanisation of this process of model formation has moved on.
Machine learning packages up the business of spotting
which puppet is which, and even what the plot is.
Artificial Intelligence attempts to move this on to
figuring out it is a puppet show, inferring that there are puppet masters,
and deciding what they intend the shadow play to mean.
In the realm of science fiction (The Matrix, Dark City, etc), there are actual puppet masters. In
the realm of superstition, these masters may even be adversarial. Post enlightenment, we usually
assume the puppet masters have no intent, but merely represent a more complex world, for which the
puppets and their shadows give us a chance to understand. Popper's model of objective knowledge
suggests that scientific method allows us to ever improve our (approximate understanding.
The mechanisation of this process starts with the problem of industrial scale data gathering,hence
with industrial scale organisation of society. Hence statistics for things like insurance (actuarial tables)
had to be computers (by people initially; and many other planning tools.
With the emergence of computers & the digital age, we gather more data, store it and can process it
more readily. Excitement over "thinking machines" led to the notion that human intelligence could
be understood through computational models. Early on, Artificial Intelligence researchers realised
that a number of sub-challenges exist, starting from understanding more complex evidence from the
senses - hence image processing and understanding objects and scenes was important from the get-go, for a
robot, for example, safely to navigate its environment. Social intelligence requires communication,
and humans use (largely) speech, so natural language processing was a requirement. Other, grander
challenges concerning consciousness and imagination were envisaged, but there was enough to be
getting on with. Indeed, too much at the time. It wasn't until recent decades that the tools and
techniques scaled up to tackle these problems with comparable efficacy to humans.
Between the 19th & 20th century advances in statistics, and the discipline of Machine Learning,
there are a great many steps. We are not just concerned with quantities of data ("big data") but
with quality and structure of said givens. There's a huge range in the effort put in to make sure
there are consistent records kept in the simplest system. What those records represent is
also drawn from a wide range of complex structures (including no structure!). The advances in
mathematics and statistics are now coupled with advances in computing platforms to allow
exploration and comprehension of these properties of the data. As we go further down this road, we
start to infer what the data tells us about the world it comes from. This is not easy, as Plato's
allegory illustrates.This is also where we move from statistics to data science - we now have a
discipline which makes (falsifiable) predictions about the shadows and the puppets. Others can see
if out predictions remain accurate, or deviate from what they observe. We can use these predictions
to help look for explanations of what is going on, and choose between different explanations. That
process itself is to an increasing extent also being mechanised (automated).
However, the biggest advances are really often still the old techniques running on affordable,
very fast hardware with a lot of storage. Rarely do these techniques surprise with something that
resembles general intelligence. That said, the tools are clearly increasingly very very useful.
Domains where we have potential to have a huge impact are added every day, and as we gain the
capability to apply data science on new combinations of evidence, the unpredicted or unexpected
will no doubt emerge. While we started this note with a discussion of the emergence of a science
from the shadows, many of the key applications of the science concern data about us - humans. As we
make more observations, and automate processing and decision making concerning humans, we mustn't
lose sight of that humanity. After all, we are not the puppets.
As Conan Doyle said thru his memorable character Sherlock Holmes:
"Once you eliminate the impossible, whatever remains,
no matter how improbable, must be the truth."
---------part two--------
statistics aren't what they used to be....before and after Bayes...and before and after Hinton....
two advances in statistics combine with lots of fast computers and data to yield a lot of the machine learning and AI excitement - that's the way that we update our knowledge when we learn new evidence, due to the reverend Bayes, and the way we avoid having very much explicit domain knowledge in our machine for learning, due to artificial neural networks and "deep" learning
first off, we need a bunch of data - and typically, we need some clever people to label that data. so now we need to talk about domain knowledge and expertise, and how machine learning and AI has so often been parasitically dependent on human intelligence. this is where we "crowdsource" (or already have lying around) a bunch of data that has aready been classified in some senses as "A" or "not A" -
so lets work by example: lets say we want to learn a classifier for images with cats. and we have many many images (e.g. from google search or from some other available data) that has been tagged with whether or not there at cats in the image. so now we "fire up" our very simple neural net on this data, to train it in what is "cat-ness" - input at each step is an image and the label (cat or not cat). the neural net is, effectively, a big table where we write things in on the left hand side columns, and copy values through to the next columns if they fit some value so far (are near enough), and not if they don't. When we get all the way out the far side, we have a column full of numbers and one box that says "is cat" or "isn't cat". we compare that to the input and if we're right, we increase the values in the boxes in the table, and if we're wrong, we subtract from those values. [n.b. this is a gross simplification of how neural nets work missing out hidden layers, convolution, and piles of important other tricks that make things work "better")...] Another way of thinking of the "layers" of the neural net might be like a series of combs, each with different length (adjustable) teeth, which let things flow through to the next comb, and when the answer pops out the end and is right, we leave combs' teeth lengths alone more, and wrong, we adjust their lengths more. sort of:-) The adjustment process uses a technique called stochastic gradient descent - one way to picture that part of the process is like snowboarding many times down a complicated slope, adjusting which way you go on each trip, til you get the best ride. or skateboarding.
we do that for millions and millions of images. whether the cats are black or pink, furry or bald, the result is a table/box that is really quite good at spotting cats (a trained classifier) - we can then put that table in (say) a camera, and now the camera will be able to say "cat" or "not cat" and tag pictures itself without a human.
we could to the same with (say) machine translation. again, totally dependant on work already done by people, say we have a lot of text that has been translated (e.g. copies of Dickens' novels or travel guides or web pages, hand-translated between French and Latin and Greek and Basque and Korean. We can take the text and pull it into small pieces (words, phrases, sentences) and learn a classifier for these which "recognises" input phrase x in language A and outputs it in B...again, re-enforced over more and more examples, but once built, just a "table" to run things on your laptop...
What happens if we can't get labelled data? at least two reasons - we don't have time, or people just aren't good at this particular task - for example finding stuff in the asteroid belt, or explaining why a set of moves in Go were good (something people can explain in chess:). So one trick is to generate synthetic input to the neural net, where we "know" in the generator whether we put a pink panther in the picture or not. Another might be that there's something implicit in the output of the trained neural net that tells us if it is on the right track or gonig down the garden path - e.g. wins a game or not....so this is where Generative Adversarial Nets come in - where there are now two nets, one is like Socrates or Yoda, and the other is the "student"....
These techniques are very effective nowadays, although one interesting problem is that we're not entirely sure what the "features" are that they react to in all cases. Entirely different approaches to machine learning, involve taking a more explicit collection of rules (perhaps based in a model which we put in, in the first place) and just learning which are more or less important in which combinations - it isn't clear, but some research seems to be suggesting that there's ways to bring these approaches towards each other somehow...
There's much more to be said about other (and sometimes simpler, sometimes more complex) machine learning approaches, and about more foundational work on AI in the sense of trying to do something utterly different from all the above, which is actually understanding how humans (and other smart species) do so much without supervised learning. some the old and new theory for the latter concerns higher level goals & models & intentions....which maybe we just evolved to have, and sit at an entirely different level than the neurones and synapses in the grey matter and why it won't be just something emergent from more silicon....