memoir: the graded bluebook; outline Eliezerfic conclusion

author M. Taylor Saotome-Westlake <[email protected]>

Sun, 9 Apr 2023 02:07:16 +0000 (19:07 -0700)

committer M. Taylor Saotome-Westlake <[email protected]>

Sun, 9 Apr 2023 02:07:16 +0000 (19:07 -0700)
author M. Taylor Saotome-Westlake <[email protected]>
Sun, 9 Apr 2023 02:07:16 +0000 (19:07 -0700)
committer M. Taylor Saotome-Westlake <[email protected]>
Sun, 9 Apr 2023 02:07:16 +0000 (19:07 -0700)
diff --git a/content/drafts/standing-under-the-same-sky.md b/content/drafts/standing-under-the-same-sky.md

index 48f7ae4..78485ad 100644 (file)
--- a/content/drafts/standing-under-the-same-sky.md
+++ b/content/drafts/standing-under-the-same-sky.md
@@ -504,7 +504,7 @@ I found the comment reassuring regarding the extent or lack thereof of my own co
  
  In January 2022, in an attempt to deal with my personality-cultist writing block, I sent him one last email asking if he particularly _cared_ if I published a couple blog posts that said some negative things about him. If he actually _cared_ about potential reputational damage to him from my writing things that I thought I had a legitimate interest in writing about, I would be _willing_ to let him pre-read the drafts before publishing and give him the chance to object to anything he thought was unfair ... but I'd rather agree that that wasn't necessary. I explained the privacy norms that I intended to follow—that I could explain _my_ actions, but had to Glomarize about the content of any private conversations that may or may not have occurred.
  
-It had taken me a while (with apologies for my atrocious [sample efficiency](https://ai.stackexchange.com/a/5247)), but I was finally ready to give up on him; I thought the efficient outcome was that I should just tell my Whole Dumb Story on my blog and never bother him again. Since he probably _didn't_ particularly care (because it's not AGI alignment and therefore unimportant) and it would be psychologically easier on me if I knew he diidn't hold it against me, could I please have his advance blessing to just write and publish what I was thinking so I can get it all out of my system and move on with my life?
+It had taken me a while (with apologies for my atrocious [sample efficiency](https://ai.stackexchange.com/a/5247)), but I was finally ready to give up on him; I thought the efficient outcome was that I should just tell my Whole Dumb Story on my blog and never bother him again. Since he probably _didn't_ particularly care (because it's not AGI alignment and therefore unimportant) and it would be psychologically easier on me if I knew he didn't hold it against me, could I please have his advance blessing to just write and publish what I was thinking so I can get it all out of my system and move on with my life?
  
  If it helped—as far as _I_ could tell, I was only doing what _he_ taught me to do in 2007–2009: [carve reality at the joints](https://www.lesswrong.com/posts/esRZaPXSHgWzyB2NL/where-to-draw-the-boundaries), [speak the truth even if your voice trembles](https://www.lesswrong.com/posts/pZSpbxPrftSndTdSf/honesty-beyond-internal-truth), and [make an extraordinary effort](https://www.lesswrong.com/posts/GuEsfTpSDSbXFiseH/make-an-extraordinary-effort) when you've got [Something to Protect](https://www.lesswrong.com/posts/SGR4GxFK7KmW7ckCB/something-to-protect) (Subject: "blessing to speak freely, and privacy norms?").
  
@@ -930,7 +930,7 @@ Yudkowsky replied:
  
  > only half the battle even if you could do it. you're also not reporting any facts/arguments on the other side, which is a much larger and visible gap to me, and has a lot to do with why I'm not presently considering this criticism from a peer despite your spoken adherence to virtues I value. **QUESTION FOR ZACK ONLY, NOBODY ELSE ANSWER OR SAY ANYTHING ABOUT IT IN THIS MAIN CHANNEL:** What are some of the ways that Planecrash valorizes truth, as you, yourself, see that virtue?
  
-I didn't ask why it was relevant whether or not I was a "peer." If we're measuring IQ (143 _vs._ [131](/images/wisc-iii_result.jpg)), or fiction-writing ability (several [highly-acclaimed](https://www.lesswrong.com/posts/HawFh7RvDM4RyoJ2d/three-worlds-collide-0-8) [stories](https://www.yudkowsky.net/other/fiction/the-sword-of-good) [including the world's most popular _Harry Potter_ fanfiction](https://www.hpmor.com/) _vs._ a [_My Life as a Teenage Robot_ fanfiction](https://archive.ph/WdydM) with double-digit favorites and a [few](/2018/Jan/blame-me-for-trying/) [blog](http://zackmdavis.net/blog/2016/05/living-well-is-the-best-revenge/) [vignettes](https://www.lesswrong.com/posts/dYspinGtiba5oDCcv/feature-selection) here and there), or contributions to AI alignment (founder of the field _vs._ author of some dubiously relevant blog comments), I'm obviously _not_ his peer. It didn't seem like that was necessary when one could just [evaluate my arguments about dath ilan on their own merits](https://www.lesswrong.com/posts/5yFRd3cjLpm3Nd6Di/argument-screens-off-authority). But I wasn't going to be so impertinent to point that out when the master was testing me (!) and I was eager to pass the test.
+I didn't ask why it was relevant whether or not I was a "peer." If we're measuring IQ (143 _vs._ [131](/images/wisc-iii_result.jpg)), or fiction-writing ability (several [highly-acclaimed](https://www.lesswrong.com/posts/HawFh7RvDM4RyoJ2d/three-worlds-collide-0-8) [stories](https://www.yudkowsky.net/other/fiction/the-sword-of-good) [including the world's most popular _Harry Potter_ fanfiction](https://www.hpmor.com/) _vs._ a [few](/2018/Jan/blame-me-for-trying/) [blog](http://zackmdavis.net/blog/2016/05/living-well-is-the-best-revenge/) [vignettes](https://www.lesswrong.com/posts/dYspinGtiba5oDCcv/feature-selection) and a [_My Life as a Teenage Robot_ fanfiction](https://archive.ph/WdydM) with double-digit Favorites on _fanfiction.net_), or contributions to AI alignment (founder of the field _vs._ author of some dubiously relevant blog comments), I'm obviously _not_ his peer. It didn't seem like that was necessary when one could just [evaluate my arguments about dath ilan on their own merits](https://www.lesswrong.com/posts/5yFRd3cjLpm3Nd6Di/argument-screens-off-authority). But I wasn't going to be so impertinent to point that out when the master was testing me (!) and I was eager to pass the test.
  
  I said that I'd like to take an hour to compose a _good_ answer. (It was 10:26 _p.m._) If I tried to type something off-the-cuff on the timescale of five minutes, it wasn't going to be of similar quality as my criticisms, because, as I had just admitted, I had _totally_ been running a biased search for criticisms—or did the fact that I had to ask that mean I had already failed the test?
  
@@ -983,9 +983,9 @@ Yudkowsky replied:
  
  > I think if you asked Oli about the state of reality with respect to this whole affair, he'd have a very different take from your take, _if you're still able to hear differences instead of only those similarities you demand._
  
-That sounded like an easy enough experimental test! I wrote Habryka an email explaining the context, and asking him what "very different take" he might have, if any. (I resisted the temptation to start a [Manifold market](https://manifold.markets/) first.) As I mentioned in the email, I didn't expect to have a very different take from Habryka _about the state of reality_. ("Zack is (still?!) very upset about this, but Oli mostly doesn't care" isn't a disagreement about the state of reality.) I didn't think I disagreed with _Yudkowsky_ much about the state of reality.
+That sounded like an easy enough experimental test! I wrote Habryka an email explaining the context, and asking him what "very different take" he might have, if any. (I resisted the temptation to start a [Manifold market](https://manifold.markets/) first.) As I mentioned in the email, I didn't expect to have a very different take from him _about the state of reality_. ("Zack is (still?!) very upset about this, but Oli mostly doesn't care" is a values-difference, not a disagreement about the state of reality.) I didn't think I disagreed with _Yudkowsky_ much about the state of reality! (In his own account, he thought it was "sometimes personally prudent [...] to post your agreement with Stalin about things you actually agree with Stalin about", and I believed him; I was just unhappy about some of the side-effects of his _prudence_.)
  
-He didn't reply. (I might have guessed the wrong email address, out of the two I had on file for him?) I don't blame him. It might have been timelessly ungrateful of me to ask. (The reason people are reluctant to make on-the-record statements in politically charged contexts is because they're afraid the statements will be _used_ to drag them into more political fights later. He had already done me a huge favor by being brave enough to state the obvious in March; I had no right to demand anything more of him.)
+Oliver didn't reply. (I might have guessed the wrong email address, out of the two I had on file for him?) I don't blame him; it might have been timelessly ungrateful of me to ask. (The reason people are reluctant to make on-the-record statements in politically charged contexts is because they're afraid the statements will be _used_ to drag them into more political fights later. He had already done me a huge favor by being brave enough to state the obvious in March; I had no right to demand anything more of him.)
  
  Regarding my quick reply about Cheliax's structural disadvantage, Yudkowsky said it was "okay as one element", but complained that the characters had already observed it out loud, and that I "didn't name any concrete sequence of events that bore it out or falsified it." He continued:
  
@@ -993,17 +993,19 @@ Regarding my quick reply about Cheliax's structural disadvantage, Yudkowsky said
  
  At this point, I was a bit suspicious that _any_ answer that wasn't exactly whatever he was thinking of would be dismissed as too social or too inferentially close to something one of the characters had said. What did it mean for the _universe_ to say something about valorizing truth?
  
-The original prompt ("What are some of the ways _Planecrash_ valorizes truth") had put me into 11th-grade English class mode; the revision "if I ask you, not what any _character_ says [...]" made me think the 11th-grade English teacher expected a different answer. The revised–revised prompt "what does the underlying reality of _Planecrash_ think about your Most Important Issues?", with the previous rebukes in my context window, made me think I should be reaching for an act of philosophical [Original Seeing](https://www.lesswrong.com/posts/SA79JMXKWke32A3hG/original-seeing), rather than trying to be a diligent schoolstudent playing the 11th-grade English class game. I pondered the matter ... and I _thought of something_.
+The original prompt ("What are some of the ways _Planecrash_ valorizes truth") had put me into 11th-grade English class mode; the revision "if I ask you, not what any _character_ says [...]" made me think the 11th-grade English teacher expected a different answer. Now the revised–revised prompt "what does the underlying reality of _Planecrash_ think about your Most Important Issues?", with the previous rebukes in my context window, was making me think I should be reaching for an act of philosophical [Original Seeing](https://www.lesswrong.com/posts/SA79JMXKWke32A3hG/original-seeing), rather than trying to be a diligent schoolstudent playing the 11th-grade English class game. I thought about it ... and I _saw something_.
  
  _Thesis_: the universe of _Planecrash_ is saying that virtue ethics—including, as a special case, my virtue ethics about it being good to tell the truth and reveal information—are somewhat unnatural.
  
-In the story, the god Adabar values trading fairly, even with those who can't verify their partners are keeping up their end of the deal, and also wants to promote fair trading _elsewhere_ in Reality (as contrasted to just being fair Himself).
+In the story, the god Adabar values trading fairly, even with those who can't verify that their partners are keeping up their end of the deal,[^trade-verification] and also wants to promote fair trading _elsewhere_ in Reality (as contrasted to just being fair Himself).
  
-Adabar is kind of a weirdo. He's not a vanishly rare freak (whose specification would require lots of uncompressible information); there _is_ a basin of attraction in the space of pre-gods, where creatures who develop a computationally efficient "fairness" heuristic in their ancestral environment, reify that into their utilityfunction when they ascend, but it's not a _huge_ basin of attraction; most gods aren't like Adabar.
+[^trade-verification]: Significantly, this is somewhat "unnatural" behavior according to Yudkowsky's view of decision theory. Ideal agents are expected to cooperate with agents whose cooperation is _conditional_ on their own cooperation, not simply those that cooperate with them: you "should" defect against a rock with the word "COOPERATE" painted on it, and you "shouldn't" trade for what you could just as easily steal. See §6 of ["Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic"](https://arxiv.org/abs/1401.5577).
+
+Adabar is kind of a weirdo. He's not a vanishly rare freak (whose specification would require lots of uncompressible information); there _is_ a basin of attraction in the space of pre-gods, where creatures who develop a computationally efficient "fairness" heuristic in their ancestral environment and reify that into their utilityfunction when they ascend to divinity, but it's not a _huge_ basin of attraction; most gods aren't like Adabar.
  
  It's the same thing with honesty. Generic consequentialists have no reason to "tell the truth" to agents with different utility functions when they're not under compact and being compensated for the service. Why _would_ you emit signals that other agents can interpret as a map that reflects the territory? [You can't get more paperclips that way!](https://arbital.com/p/not_more_paperclips/)
  
-I had previously written about this in ["Commucation Requires Common Interests or Differential Signal Costs"](https://www.lesswrong.com/posts/ybG3WWLdxeTTL3Gpd/communication-requires-common-interests-or-differential); you needed some common interests in order for flexible, "digital" language to exist at all. ("Digital" language where the relationship between signals and meaning can be arbitrary, in contrast to costly signaling, where me expending resources at least tell you that I could afford those resources.)
+I had previously written about this in ["Commucation Requires Common Interests or Differential Signal Costs"](https://www.lesswrong.com/posts/ybG3WWLdxeTTL3Gpd/communication-requires-common-interests-or-differential); you needed some common interests in order for flexible, "digital" language to exist at all. ("Digital" language being that for which the relationship between signals and meaning can be arbitrary, in contrast to costly signaling, where me expending resources at least tell you that I could afford those resources.)
  
  It's _possible_ for imperfectly deceptive social organisms to develop a taste for "honesty" as a computationally efficient heuristic for navigating to Pareto improvements in the ancestral environment, which _might_ get reified into the utilityfunction as they ascend—but that's an Adabar-class weird outcome, not the default outcome.
  
@@ -1017,15 +1019,53 @@ I realized, of course, that this certainly wasn't the answer Yudkowsky was looki
  
  So, after sleeping on it first, I posted the explanation of what I saw to the channel (including the parts about how the original prompts steered me, and that I realized that this wasn't the answer he was looking for).
  
-[TODO:
- * third bluebook didn't get a response
- * I was feeling kind of glum about my poor performance; I had reasons that made sense to me for why I said the things I said, but that _wasn't what I was being tested on_; I definitely _didn't_ say "Nothing, because Big Yud and Linta are lying liars who hate truth"; I moped in Alicorner #drama
- * Big Yud saw it (I know this because he left an emoji react in the relevant section of messages) and took pity on me
-]
+The outcome was—silence. No response from Yudkowsky in several days. Maybe I shouldn't have ran with my Original Seeing answer? I showed the transcripts to a friend, who compared my answer about consequentialist gods to including a list of your country's war crimes in a high school essay assignment about patriotism; I had done a terrible job of emitting symbols that made me a good monkey, and a mediocre-at-best job of flipping the table (rejecting Yudkowsky's "pass my test before I recognize your criticism as legitimate" game) and picking a fight instead.
+
+("Don't look at me," he added, "I would've flipped the table at the beginning.")
+
+I tried to explain that my third answer wasn't _just_ doubling down on my previous thesis: "my virtue ethics run against the grain of the hidden Bayesian structure of reality" wasn't an argument _in favor of_ my virtue ethics. My friend wasn't buying it; I still hadn't been fulfilling the original prompt.
+
+He had me there. I had no more excuses after that: I had apparently failed the test. I was feeling pretty glum about this, and lamented my poor performance in the `#drama` channel of another Discord server (that Yudkowsky was also a member of). I had thought I was doing okay—I definitely _didn't_ say, "That's impossible because Big Yud and Linta are lying liars who hate Truth", and there were reasons why my Original Seeing answer made sense _to me_ as a thing to say, but _that wasn't what I was being tested on_. It _genuinely_ looked bad in context. I had failed in [my ambition to know how it looks](/2022/context-is-for-queens/#knowing-how-that-looks).
+
+I think Yudkowsky saw the #drama messages (he left an emoji-reaction in the relevant timespan of messages) and took pity on me.
+
+[TODO: summarize teacher feedback]
+
+[TODO: summarize my admitting that it did have something to do with my state of mind; I would have done better by giving the 11th grade English class algorithm more compute; Peranza's username was 'not-looking-there'!; proposed revision]
+
+[TODO: C.S. Lewis speech]
+
+I said that I thought it was significant that the particular problem to which my Art had been shaped (in some ways) and misshaped (in others) wasn't just a matter of people being imperfect. Someone at the 2021 Event Horizon Independence Day party had told me that people couldn't respond to my arguments because of the obvious political incentives. And I guessed—the angry question I wanted to ask, since I didn't immediately know how to rephrase it to not be doing the angry monkey thing was, was I supposed to _take that lying down?_
+
+**Eliezer** — 12/17/2022 5:50 PM
+you sure are supposed to not get angry at the people who didn't create those political punishments  
+that's insane  
+they're living in Cheliax and you want them to behave like they're not in Cheliax and get arrested by the Church  
+your issue is with Asmodeus. take it to Him, and if you can't take Him down then don't blame others who can't do that either.  
  
-[TODO: derail with Lintamande]
+Admirably explicit.
+
+[TODO: Yudkowsky's story: the story is about Keltham trusting Cheliax wrongly; leaving that part out is politicized; other commenters pick up on "But you're still saying to trust awesome institutions"]
+
+[TODO: I think there's a bit of question-substitution going on; the reason the virtue of evenness is important is because if you only count arguments for and not against the hypothesis, you mess up your beliefs about the hypothesis; if you substitute a different question "Is Yudkowsky bad?"/"Am I a good coder?", that's a bucket error—or was he "correctly" sensing that the real question was "Is Yudkowsky bad?"]
+
+[TODO: I express my fully-updated grievance (this doesn't seem to be in the transcript I saved??); I hadn't consciously steered the conversation this way, but the conversation _bounced_ in a way that made it on-topic; that's technically not my fault, even if the elephant in my brain was optimizing for this outcome.
+
+The fact that Yudkowsky had been replying to me at length—explaining why my literary criticism was nuts, but in a way that respected my humanity and expected me to be able to hear it—implied that I was apparently in his "I can cheaply save him (from crazy people like Michael)" bucket, rather than the "AI timelines and therefore life is too short" bucket.]
+
+[TODO: I think it's weird that Yudkowsky's reaction is "that's insane"; he should be able to understand why someone might consider this a betrayal, even if he didn't think he was bound to that level of service; the story of a grant-making scientist]
+
+[TODO: I bait Lintamande into engagement]
+
+[TODO: Linta says I'm impossible to talk to and the anticipation of my pouncing stiffles discussion. (I almost wonder if this is a good thing, from a _realpolitik_ perspective? I'd prefer to argue people out of bad ideas, but if the threat of an argument disincentivizes them from spreading ...? Game theory goes both ways—I've been self-censoring to.)]
+
+[TODO: I agreed that this was good feedback about my social behavior; I don't intellectually disagree that different cultures are different; I'm super-fighty because I'm super-traumatized; the thing I'm trying to keep on Society's shared map is, Biological Sex Actually Exists and Is Sometimes Decision-Relevant; Biological Sex Actually Exists and is Sometimes Decision-Relevant Even When It Makes People Sad; Biological Sex Actually Exists Even When a Prediction Market Says It Will Make People Sad; Linta agrees; Eliezer responds with a +1 emoji]
+
+[TODO: "like, if you just went and found Eliezer!2004 and were like 'hey, weird sci fi hypothetical'
+_speaking of the year 2004_; the thing I'm at war with is that I don't think he would _dare_ publish the same essay today
+]
  
-[TODO: knives, and showing myself out]
+[TODO: someone said "the word in their language doesn't match the word in yours"; and got a +1 emoji;  I resisted the temptation to say "So ... I can define a word any way I want"; I made a few more comments about kitchen knife deception (and let my friends talk me down from making more). I'm not worried about what he thinks about me anymore.]
  
  ------
author	M. Taylor Saotome-Westlake <[email protected]>
	Sun, 9 Apr 2023 02:07:16 +0000 (19:07 -0700)
committer	M. Taylor Saotome-Westlake <[email protected]>
	Sun, 9 Apr 2023 02:07:16 +0000 (19:07 -0700)