Skip to main content
Skip to content
Case File
d-29905House OversightOther

Essay on AI alignment and robot preference learning

The passage is a theoretical discussion of AI control and robot design with no specific individuals, transactions, or actionable allegations. It provides no new leads, actors, or controversial claims. Discusses building robots to infer human preferences Highlights challenges of irrational human behavior for AI training Calls for redefining AI as provably beneficial to humans

Date
November 11, 2025
Source
House Oversight
Reference
House Oversight #016838
Pages
1
Persons
0
Integrity
No Hash Available

Summary

The passage is a theoretical discussion of AI control and robot design with no specific individuals, transactions, or actionable allegations. It provides no new leads, actors, or controversial claims. Discusses building robots to infer human preferences Highlights challenges of irrational human behavior for AI training Calls for redefining AI as provably beneficial to humans

Tags

roboticsai-alignmenttechnology-policyethicshouse-oversight

Ask AI About This Document

0Share
PostReddit

Extracted Text (OCR)

EFTA Disclosure
Text extracted via OCR from the original document. May contain errors from the scanning process.
designs and provides at least one case of a provably beneficial system in the sense introduced above. The overall approach resembles mechanism-design problems in economics, wherein one incentivizes other agents to behave in ways beneficial to the designer. The key difference here is that we are building one of the agents in order to benefit the other. There are reasons to think this approach may work in practice. First, there is abundant written and filmed information about humans doing things (and other humans reacting). Technology to build models of human preferences from this storehouse will presumably be available long before superintelligent AI systems are created. Second, there are strong, near-term economic incentives for robots to understand human preferences: If one poorly designed domestic robot cooks the cat for dinner, not realizing that its sentimental value outweighs its nutritional value, the domestic-robot industry will be out of business. There are obvious difficulties, however, with an approach that expects a robot to learn underlying preferences from human behavior. Humans are irrational, inconsistent, weak-willed, and computationally limited, so their actions don’t always reflect their true preferences. (Consider, for example, two humans playing chess. Usually, one of them loses, but not on purpose!) So robots can learn from nonrational human behavior only with the aid of much better cognitive models of humans. Furthermore, practical and social constraints will prevent all preferences from being maximally satisfied simultaneously, which means that robots must mediate among conflicting preferences—something that philosophers and social scientists have struggled with for millennia. And what should robots learn from humans who enjoy the suffering of others? It may be best to zero out such preferences in the robots’ calculations. Finding a solution to the AI control problem is an important task; it may be, in Bostrom’s words, “the essential task of our age.” Up to now, AI research has focused on systems that are better at making decisions, but this is not the same as making better decisions. No matter how excellently an algorithm maximizes, and no matter how accurate its model of the world, a machine’s decisions may be ineffably stupid in the eyes of an ordinary human if its utility function is not well aligned with human values. This problem requires a change in the definition of AI itself—from a field concerned with pure intelligence, independent of the objective, to a field concerned with systems that are provably beneficial for humans. Taking the problem seriously seems likely to yield new ways of thinking about AI, its purpose, and our relationship to it. 35

Forum Discussions

This document was digitized, indexed, and cross-referenced with 1,400+ persons in the Epstein files. 100% free, ad-free, and independent.

Annotations powered by Hypothesis. Select any text on this page to annotate or highlight it.