The Rise of Artificial Intelligence: From Dartmouth to Deep Learning

In the summer of 1956, a group of brilliant minds gathered at Dartmouth College, New Hampshire. Among them were Claude Shannon, the father of information theory, and Herb Simon, a remarkable figure who held both the Nobel Memorial Prize in Economic Sciences and the Turing Award. Their purpose? To explore the potential of “making machines use language, form abstractions and concepts” and “solve kinds of problems now reserved for humans.” This gathering, spearheaded by the young researcher John McCarthy, marked the birth of “artificial intelligence,” a term he coined to encompass this ambitious endeavor.

The Dartmouth meeting, however, did not signify the absolute beginning of the quest for thinking machines. Pioneers like Alan Turing, for whom the Turing Award is named, and John von Neumann, a source of inspiration for McCarthy, had already pondered this question. By 1956, several approaches to AI existed. McCarthy’s creation of the term “artificial intelligence” aimed to unify these diverse approaches, leaving open the question of which path might ultimately prove most fruitful. Some researchers favored systems based on combining factual knowledge with axioms from fields like geometry and symbolic logic, aiming to deduce appropriate responses. Others preferred systems where the probability of an event depended on the continuously updated probabilities of numerous other factors.

The decades that followed witnessed intense intellectual activity and debate around AI. By the 1980s, a consensus emerged favoring “expert systems” that utilized symbolic logic to capture and apply human expertise. The Japanese government, in particular, invested heavily in these systems and the hardware they required. However, these systems often proved too rigid to handle the complexities of the real world. By the late 1980s, AI had fallen into disrepute, becoming synonymous with overblown promises and underwhelming results. Many researchers distanced themselves from the term.

Yet, amidst this disillusionment, the seeds of a future boom were being sown. As scientists delved into the workings of brain cells, or neurons, in the 1940s, the idea of creating machines that mimic these biological processes began to take root. In the biological brain, connections between neurons allow activity in one neuron to trigger or inhibit activity in others. Marvin Minsky, a participant in the Dartmouth conference, made an early attempt to model this in the lab using hardware to represent networks of neurons. This paved the way for the development of artificial neural networks, simulated in software. These networks learn by being exposed to countless examples, without being explicitly programmed with rules. The strength of connections between neurons, called “weights,” are repeatedly adjusted during training, enabling the network to eventually generate appropriate outputs for given inputs.

While Minsky abandoned this approach, others persevered. By the early 1990s, neural networks had learned to perform tasks like sorting mail by recognizing handwritten numbers. The ambition to create even more sophisticated networks by adding layers of neurons arose, but this also significantly slowed down their processing speed. Fortunately, a new kind of computer hardware emerged as a solution. In 2009, researchers at Stanford University achieved a 70-fold increase in the speed of a neural network using a gaming PC, thanks to its graphics processing unit (GPU). This marked a significant turning point, as GPUs were designed to efficiently handle the computations required for neural network operations.

This hardware boost, coupled with more efficient training algorithms, enabled the creation and training of networks with millions of connections within a reasonable time. These “deeper” networks proved far more capable. The power of this new approach, known as “deep learning,” became evident in the 2012 ImageNet Challenge. Participating image-recognition systems were presented with a database containing over a million labeled images. They were trained using these examples to associate image inputs with one-word descriptions. The challenge then required them to generate descriptions for previously unseen images. In 2012, a team led by Geoff Hinton achieved a remarkable 85% accuracy using deep learning, instantly establishing it as a breakthrough. By 2015, deep learning had become the dominant approach in image recognition, with winning systems achieving an accuracy of 96%, surpassing the average human performance.

Deep learning found applications in a wide array of tasks traditionally considered “human-only,” which could be reduced to mapping one type of data onto another. These included speech recognition (sound to text), face recognition (faces to names), and translation. The internet played a crucial role in this success, providing access to massive datasets and hinting at the potential for large markets. The performance of deep learning models continued to improve as their size (number of connections) and training data increased.

Deep learning soon began to power a range of products and services. Voice-activated devices like Amazon’s Alexa emerged. Online transcription services became more helpful. Web browsers offered automatic translation. While the term “AI” had once been associated with disappointment, it now carried a sense of coolness and excitement. Interestingly, many of these technologies, despite being labeled as AI, relied on deep learning as their core mechanism.

In 2017, a new architecture called the “transformer” emerged, introducing a qualitative shift beyond the quantitative benefits of increased computing power and data. Transformers enabled neural networks to track patterns in their input, even if the pattern elements were spread apart, allowing them to “attend” to specific features in the data. This improved their understanding of context, making them suitable for “self-supervised learning.” In this technique, words are randomly masked during training, and the model learns to predict the most likely word to fill the gaps. This approach eliminated the need for pre-labeled data, enabling models to be trained on billions of words of raw text from the internet.

Transformer-based large language models (LLMs), like GPT-2 released by OpenAI in 2019 (GPT stands for generative pre-trained transformer), started to gain wider attention. These LLMs exhibited “emergent” behaviors that went beyond their explicit training. They were not only surprisingly adept at linguistic tasks like summarization and translation, but also at tasks like simple arithmetic and software writing, which were implicitly present in the data they consumed. However, this also meant that they reflected the biases present in the training data, leading to the emergence of societal prejudices in their output.

In November 2022, OpenAI unveiled a larger model, GPT-3.5, to the public in the form of a chatbot called ChatGPT. Anyone with internet access could interact with it, providing a prompt and receiving a response. This consumer product took off with unprecedented speed. Within weeks, ChatGPT was generating everything from college essays to computer code, marking another significant leap forward in AI. While the first generation of AI products focused on recognition, this second generation emphasized generation.

Models like Stable Diffusion and DALL-E, also debuting around this time, used a technique called “diffusion” to generate images from text prompts. Other models could produce remarkably realistic video, speech, and music. This leap was not just technological; it was about creating something new. ChatGPT and its rivals, Gemini (from Google) and Claude (from Anthropic), produced outputs from calculations like other deep-learning systems. However, their ability to respond to requests with novel creations gave them a distinctly different feel compared to face recognition, dictation, or translation software. They genuinely seemed to “use language” and “form abstractions,” fulfilling McCarthy’s original vision. This series of briefs will delve into the inner workings of these models, explore their potential for growth and new applications, and discuss their limitations and ethical considerations for their use.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top