Skip to main content
Skip to content
Case File
d-21765House OversightOther

Generic discussion of AI alignment and superintelligence risks

The passage contains abstract philosophical arguments about AI risk, without naming any individuals, institutions, financial transactions, or concrete allegations. It offers no actionable leads for in Mentions AI alignment challenges and the wireheading problem. References a Wired article by Kevin Kelly. Discusses theoretical solutions like formal problem F' and reward-based control.

Date
November 11, 2025
Source
House Oversight
Reference
House Oversight #016836
Pages
1
Persons
0
Integrity
No Hash Available

Summary

The passage contains abstract philosophical arguments about AI risk, without naming any individuals, institutions, financial transactions, or concrete allegations. It offers no actionable leads for in Mentions AI alignment challenges and the wireheading problem. References a Wired article by Kevin Kelly. Discusses theoretical solutions like formal problem F' and reward-based control.

Tags

wireheadingsuperintelligenceai-safetyhouse-oversightphilosophy

Ask AI About This Document

0Share
PostReddit

Extracted Text (OCR)

EFTA Disclosure
Text extracted via OCR from the original document. May contain errors from the scanning process.
whereas the iron-eating bactertum Thiobacillus ferrooxidans 1s thrilled. Who’s to say the bacterium is wrong? The fact that a machine has been given a fixed objective by humans doesn’t mean that it will automatically recognize the importance to humans of things that aren’t part of the objective. Maximizing the objective may well cause problems for humans, but, by definition, the machine will not recognize those problems as problematic. e Intelligence is multidimensional, “so ‘smarter than humans’ is a meaningless concept.”® It is a staple of modern psychology that IQ doesn’t do justice to the full range of cognitive skills that humans possess to varying degrees. IQ is indeed a crude measure of human intelligence, but it is utterly meaningless for current AI systems, because their capabilities across different areas are uncorrelated. How do we compare the IQ of Google’s search engine, which cannot play chess, with that of DeepBlue, which cannot answer search queries? None of this supports the argument that because intelligence is multifaceted, we can ignore the risk from superintelligent machines. If “smarter than humans” is a meaningless concept, then “smarter than gorillas” is also meaningless, and gorillas therefore have nothing to fear from humans; clearly, that argument doesn’t hold water. Not only is it logically possible for one entity to be more capable than another across all the relevant dimensions of intelligence, it is also possible for one species to represent an existential threat to another even if the former lacks an appreciation for music and literature. Solutions Can we tackle Wiener’s warning head-on? Can we design AI systems whose purposes don’t conflict with ours, so that we’re sure to be happy with how they behave? On the face of it, this seems hopeless, because it will doubtless prove infeasible to write down our purposes correctly or imagine all the counterintuitive ways a superintelligent entity might fulfill them. If we treat superintelligent AI systems as if they were black boxes from outer space, then indeed we have no hope. Instead, the approach we seem obliged to take, if we are to have any confidence in the outcome, is to define some formal problem /’, and design AI systems to be F-solvers, such that no matter how perfectly a system solves F, we're guaranteed to be happy with the solution. If we can work out an appropriate F' that has this property, we’ll be able to create provably beneficial AI. Here’s an example of how nof to do it: Let a reward be a scalar value provided periodically by a human to the machine, corresponding to how well the machine has behaved during each period, and let F' be the problem of maximizing the expected sum of rewards obtained by the machine. The optimal solution to this problem is not, as one might hope, to behave well, but instead to take control of the human and force him or her to provide a stream of maximal rewards. This is known as the wireheading problem, based on observations that humans themselves are susceptible to the same problem if given a means to electronically stimulate their own pleasure centers. There is, I believe, an approach that may work. Humans can reasonably be described as having (mostly implicit) preferences over their future lives—that is, given ® Kevin Kelly, “The Myth of a Superhuman AI,” Wired, Apr. 25, 2017. 33

Technical Artifacts (1)

View in Artifacts Browser

Email addresses, URLs, phone numbers, and other technical indicators extracted from this document.

Wire Refwireheading

Forum Discussions

This document was digitized, indexed, and cross-referenced with 1,400+ persons in the Epstein files. 100% free, ad-free, and independent.

Annotations powered by Hypothesis. Select any text on this page to annotate or highlight it.