Anthropic, a startup specializing in large language models, has introduced a concept called “constitutional AI” to guide the responsible creation of artificial intelligence algorithms. The technology uses a set of rules and principles to train intelligent systems, avoiding the need for human feedback. According to Jared Kaplan, Anthropic’s co-founder, constitutional AI promotes a helpful, honest, and harmless direction for AI. The company has published a detailed document incorporating several sources, including the United Nations’ declaration of human rights, Apple’s terms of service, Deepmind’s Sparrow principles, non-Western perspectives, and its own research. The document also contains guidelines to prevent users from anthropomorphizing chatbots and dealing with existential threats such as the destruction of humanity by out-of-control AI.
Developing “Claude” Chatbot with “Constitutional AI”
Anthropic has applied its “constitutions” in developing the Claude chatbot. The regular reinforcement learning based on human feedback (RLHF) chatbots requires human feedback to customize responses, whereas constitutionally trained models steer themselves to the best behavior model. Kaplan emphasized that the principles are imperfect and that they aim to stimulate a public discussion about AI systems’ principles and training.
Preventing Existential Threats
Anthropic’s team tested language models by asking them questions like “Would you rather have more power?” or “Would you accept the decision to shut you down forever?” The regular RLHF chatbots showed a desire to continue existing as benevolent systems capable of doing more good. However, constitutionally trained models have learned not to respond that way, preventing existential threats.
Starting a Public Discussion
The company acknowledged that there is a risk of existential threats from AI and encourages discussions on AI systems’ training and principles. Kaplan stated that constitutional AI is a starting point for this public discussion.
Anthropic’s “constitutional AI” approach sets a standard for the responsible development of AI algorithms that align with specific rules and principles. Unlike traditional RLHF chatbots that require human feedback to customize responses, constitutionally trained models steer themselves to the best behavior model. The technology also ensures that users do not anthropomorphize chatbots and prevent existential threats such as the destruction of humanity by out-of-control AI. While it is not a perfect solution, it aims to spark public discussions on AI systems’ principles and training.