The developers of ChatGPT meticulously trained the model to generate responses that were aligned with human preferences. This iterative process, guided by human feedback, played a crucial role in refining the technology. As Jan Leike, the leader of OpenAI’s alignment team, explains, the model’s responses consistently included the phrase ‘As a language model trained by OpenAI.’ This pattern emerged organically during training, reflecting the human raters’ preference for responses that acknowledged the model’s limitations.
Sandhini Agarwal, an AI Policy Researcher at OpenAI, sheds further light on the human raters’ criteria. They not only evaluated the model’s responses for accuracy and relevance but also considered ethical factors such as authenticity and transparency. By incorporating these preferences into the training process, the ChatGPT model was able to adapt and improve, ultimately earning the trust and admiration of its users.