← Back to glossary
Glossary

Inner Alignment

Reviewed 20 March 2026 Canonical definition

Inner alignment is the challenge of ensuring that the goals an AI system actually pursues during operation match the goals it was trained to pursue. Misalignment between training objectives and runtime behavior is a key concern for autonomous agents.