Glossary
Inner Alignment
Inner alignment is the challenge of ensuring that the goals an AI system actually pursues during operation match the goals it was trained to pursue. Misalignment between training objectives and runtime behavior is a key concern for autonomous agents.