Glossary

Inner Alignment

Reviewed 9 April 2026 Canonical definition

Inner alignment is the challenge of ensuring that the goals an AI system actually pursues during operation match the goals it was trained to pursue. Misalignment between training objectives and runtime behavior is a key concern for autonomous agents.

Inner Alignment

Related terms