Researchers at OpenAI, a not-for-profit backed by Elon Musk, have previously shown that AI systems that train themselves can "sometimes develop unexpected and unwanted habits. For example, in a computer game, an agent may figure out how to 'glitch' its way to a higher score."
While supervising AI in order to prevent things like this is possible, there are use-cases that may be too complex and require other means of adjudication.
One method put forth by the researchers at OpenAI in a blog post titled 'AI Safety via Debate' is an AI safety technique that "trains agents to debate topics with one another, using a human to judge who wins."
The researchers "believe that this or a similar approach could eventually help train AI systems to perform far more cognitively advanced tasks than humans are capable of, while remaining in line with human preferences."