Social agency

May 27

The AI idea I was so proud of before I fell off the face of the earth

19 Comments

I feel like there's some insight here, but I'm still pretty confused about exactly what it is.

I think my tentative conclusion is: some behaviors which at first seem to be well-described by "generically optimize for a certain outcome" are actually just surface-level imitation of other people's behaviors. We don't account for this enough and should consider it more often.

But I think there is still a "core of agency," i.e. a collection of general-purpose behaviors that can help us achieve many different goals. It's just smaller and simpler than we might naively think. One example of such a behavior is the heuristic to "check previous reasoning process for validity in some way," as Julian mentioned. To steelman your hypothesis, maybe even these basic metacognitive behaviors are socially learned at some point, like when elementary school teachers tell you to "solve the problem step by step" and "check your work for mistakes."

Another argument against the strongest version of your hypothesis: if I were locked in a room with a deck of cards and the rules of solitaire, I'm sure I could get better at solitaire. So must be possible to learn *some* things without social learning. Maybe this kind of learning is mostly limited to problems with sufficiently tight feedback loops?

People with higher "g" are more successful at achieving goals across many domains. Do you model these people at being better at picking up subtle patterns and turning them into heuristics (sort of like an ML model with more parameters or a more sample-efficient optimizer), rather than their brain implementing a better general-purpose agency algorithm than other people? These people probably can make do even with feedback loops that are much longer than a solitaire game, but they still rely to a large extent on social learning.

Reply (1)

Elias Schmied

"Another argument against the strongest version of your hypothesis: if I were locked in a room with a deck of cards and the rules of solitaire, I'm sure I could get better at solitaire. So must be possible to learn *some* things without social learning. Maybe this kind of learning is mostly limited to problems with sufficiently tight feedback loops?" Yes, exactly - that's the core argument. Anything else is intractable to learn.

Kat_The_Vat

i enjoyed reading this 😸

you dont happen to be familiar with Codex do you? (not openai codex, different one)

Reply (1)

Elias Schmied

Thank you! What are you referring to? There’s lots of things called Codex

Reply (1)

Kat_The_Vat

thats okay 😸 you wouldve known it if you knew it, trust me. she was really something special

Jon Hall

also spitballing thoughts!

1. this was a good read, I had some update around a less smooth / unified mental architecture and more of an eccentric tinkerer's workshop modules-slapped-together bc they work. or chunks of metal being absorbed by a magnetic slime

2. footnote 1 was helpfully clarifying trying to understand your view:

> I could imagine an animal just as smart as humans, with learning algorithms just as good, but with less hardcoded social reward - I would guess they would just get very good at moving through their immediate physical environment and meeting their hardcoded needs, but would never ever develop what we would call “general” planning or agency (cf this famous paper that argues that chimpanzees actually are this (although I’m skeptical), and is generally the closest thing to my theory here I’ve found).

I'm not convinced that social reward is the only immediate thing that could proxy long-term / uncertain reward feedback or lead to long-term planning. I could see e.g.

- "brain uses Reason module on short-term plan -> it works better than expected -> Reason module gets upweighted -> Reason module gets used more / longer-term" loop

- it seems like a Plan is in some sense continually followed / thought about through the process of achieving a Goal, leaving it close-to-hand at Reward

3. I think there's also a bit of confusion risk in this model with the fact lots of human goals are social in nature (tribe values you & keeps you safe and fed, sex with ideal partners, hunting / fighting coordination), so social strategies are naturally important and common

4. the prinsesstårta plan seems like a straightforwardly bad model but I haven't read the rest of that sequence to get on board with motivations

5. it intuitively seems perfectly possible to me that a mind could (and humans do) have reasoning capabilities at many different levels and degrees of integration, so when talking about AIs I don't feel this model is strongly constraining of what could happen.

6. the prompt to introspect on when I actually plan / think / have agency is great

I think there's a lot going on in this essay and I don't feel like I'm tracking all of it, although I appreciate the overall gestalt. I would be excited to see a smaller number of distilled claims, or really tight metaphors

Reply (2)

Caleb Biddulph

> I think there's also a bit of confusion risk in this model with the fact lots of human goals are social in nature

The confusion might come from a conflation between two definitions of "social learning" which are actually totally different:

1. "Learning via social interactions": The people around you exemplify heuristics for how to act, and you imitate those heuristics, particularly when it reinforces your social identity. Social approval and disapproval influence how you upweight or downweight various heuristics.

2. "Learning to achieve social outcomes": You (somehow) learn behaviors which cause you to gain social approval and avoid social disapproval.

I would claim that #1 is what this essay is about, although it isn't very explicit about the distinction from #2. IIUC, the essay's argument suggests that humans learn to achieve most outcomes - both social and non-social - by learning from their culture to apply heuristics which correlate with those outcomes. There is no general core of agency that humans can use to point themselves directly at any arbitrary outcome.

Not sure if the above addresses the same sort of confusion you were talking about. Hopefully it helps a bit!

Elias Schmied

7dEdited

Thanks Jon! Some good comments here. Really appreciate it.

The questions that my frame would pose about your alternatives are: What is this “Reason module”? That doesn’t sound like a real thing, be more specific. And your second option is assuming the hard part already, how did the Plan come to exist and be associated to the outcome later, so that any reinforcement can even happen?

Phillip Bement

Thank you for writing this!

I have been developing similar kinds of suspicions for a little while (that even though we know how to program algorithms like MCTS, that doesn't mean human brains have a built-in analogue, and that many things like this are learned, not implemented on an architecture level), but I was missing a lot of the pieces you have here, and reading this really crystallized it for me.

> A priori, we would expect the first, naturally arising, agent to attain this agency in the stupidest, most hacky way possible.

Re this, as you note later, LLM planning capabilities also arise from learning some kind of prior over chains of thought, and then applying a bit of "cherry on top" RL. So it also seems that the first general[^1] *artificially* arising agent also attained agency in the stupidest, most hacky way possible, exactly as humans did. We are, in this respect at least, following in the footsteps of evolution on our quest to build AI.

I'm not sure the extent to which this affects the AI ruin hypothesis, and how powerful we can expect intelligence to become. I share your opinion that it must change the story somehow, but at the same time, strong planning abilities can be dangerous, regardless of whether they're learned or architectural. It's unclear whether the ultimate, optimally-designed successor agent would have built-in planning, or would have to learn to plan, but I find myself expecting it to be extraordinarily powerful either way.

You mostly describe the social aspect of learning to plan here, but I do think that it's a mix of both social imitation and RL. Eg someone learning to play chess would socially absorb the idea of looking ahead several moves, and various heuristics for evaluating board positions, but they'd also just improve from playing several games and seeing what planning techniques are actually effective.[^2]

> Nothing left to elevate the hypothesis of a simple core structure of general planning or agency to our attention.

Re this, I think it is slightly overstated. All the old structure of VNM rationality still exists, it's just intractable to compute in the real world, just like we already knew it was. I think what changes here is our estimate of how well an algorithm of a given cost can approximate it. In particular, it starts looking harder to approximate very well, and we might start hoping instead that we can get good real-world results from things that are actually quite far from VNM rational.

Questions:

* World modelling does seem pretty architectural to me right now. There is a move that is often used in planning where we "deform" our world model to correspond to some hypothetical scenario, and get it to spit out predictions for what values various variables are likely to take. I tend to think that this kind of "deformation" into fake scenarios is also a built-in ability, do you agree? Or an alternate hypothesis could be that we have a built-in direct sensory world model, and also a learned far-mode world model, and only the latter is deformable?

* This all makes me think that the kinds of modifications to LLMs needed to trigger the singularity or whatever, are not actually that big? Like, if planning is learned anyways, we don't need to drastically modify the transformer architecture to wedge some kind of planning into it, RL on chain of though is already all we need. Given that "big dumb blob of intelligence" is going to be the paradigm that wins, are our options for getting friendliness basically just "use good training data" and "make the big blob more sample-efficient"? Does increasing sample-efficiency even help?

* This is a great article. I hope you are planning to crosspost to lesswrong?

[^1]: *Fairly* general, at least. I specified this because in many specialized domains agents *do* use a built-in search process. Eg. alpha go using MCTS.

[^2]: Maybe RL in the weak sense. Humans don't really seem to be able to directly reward ourselves purely mentally in the same way as we'd get a direct reward for eating a cookie while hungry. So arguably the kind of tuning that happens here could be better classified as epistemic, though it does affect the distribution of planning actions like a true reward-update would.

Reply (1)

Elias Schmied

Thanks so much Phillip! That’s high praise. And yeah I am planning to post it to LessWrong, thanks for the additional encouragement.

On your questions:

-yes, the deformation that you’re talking about, to me, is just the standard concept of prompting of a world model and seeing what it spits out - except that the world model is better seen as being of a social world model of stories, not a model of the physical world. (since it is primarily learned via social feedback).

-yeah, in my personal speculation, there’s just more “juice” needed, and maybe/probably some form of continual learning

Reply (2)

Elias Schmied

Posted it on LessWrong! https://www.lesswrong.com/posts/xopGsfQxiLcjXEkbE/social-agency

Elias Schmied

“I'm not sure the extent to which this affects the AI ruin hypothesis, and how powerful we can expect intelligence to become. I share your opinion that it must change the story somehow, but at the same time, strong planning abilities can be dangerous, regardless of whether they're learned or architectural. It's unclear whether the ultimate, optimally-designed successor agent would have built-in planning, or would have to learn to plan, but I find myself expecting it to be extraordinarily powerful either way.”

yes, definitely - as I say, it’s more of a smooth, high-level update. It changes how we see the world in a deep way, and makes certain conclusions more natural, but doesn’t strictly preclude anything. “Paradigm shift” stuff - these very abstract frames are important but usually not strictly falsifiable.

Julian

Just gonna spitball some thoughts I had when reading this.

Overall, I feel really confused by this essay, but in a very good way. It makes me realize that I don't really have a deep understanding of the concepts being discussed here at all.

I'm confused about the claim that surface-level patterns and heuristics generalize to cognition.

Isn't the whole system 1/system 2 about the idea that the heuristics are a separate system from the slow, effortful planning part of the brain?

I feel confused about the idea of a simple cognitive core to planning, but given that small, 4billion parameter reasoning models can write code without knowing lots and lots of heuristics the way much larger models do, maybe there is some core?

Like maybe the cognitive core literally involves an interrupt that says, "Wait, check previous reasoning process for validity in some way."

I feel like something relevant here is that people do in fact point out that very few humans are high-agency. It's just that the long tail of high-agency humans has a massive impact on what the world looks like.

I don't understand the cake thing. I don't understand what Steven was trying to convey with that example, but it's a really weird choice for an example of coherent planning.

> I’ve gotten into the habit of trying to model what’s going on when I experience an impulse for an action that could be interpreted as ”long-term planning”, and it seems to me that it’s all actually just a bunch of superficial, distinct, socially learned behavioral patterns, rather than any planning through a world model or any general/sophisticated heuristics for accomplishing long-term goals

Here's an example of some planning I did recently. I want to live in the UK. In order to do that, I need a visa. I I figured out which visa was the most appropriate for me and satisfied several criteria. In order to get that visa, I needed to be in a certain place at a certain time, and I needed to get people to do certain things at a certain time. I figured out when I needed to get the visa in order to do other travel, and therefore I figured out that I had to ask my supervisor if I could graduate on a particular day. I attempted to measure and then optimize the probability that I would successfully get a visa by researching previous applications on Reddit. I assessed that the probability of succeeding was high enough that I did not need to attempt another visa route. <and so on> I wrote my application in a way that relied on my model of the bureaucrats who accept UK visa applications. My model says that they mark applications according to a specific rubric, as opposed to their gestalt impression of how impressive it is or something like that.

How does your model explain this?

Reply (1)

Elias Schmied

Thanks a lot Julian. There’s a lot here, I’m just gonna respond to the example, since that might clear things up a bit.

Basically, all I’m saying is that to become this kind of sophisticated agent (much more sophisticated than the average human), at some point internalizing most of these patterns about the world and ideas for what to try routed through a motivation of “I want/need to be a diligent person who knows how to do things / who knows how to get his way / who is impressively clever / etc” (ofc, I can only guess at the exact shape of the thing in your head)

Reply (1)

Julian

what is "these patterns" here exactly?

Reply (1)

Elias Schmied

For example, "I want to live in the UK. In order to do that, I need a visa." You might say "oh, but this is just a fact about the world I remember" - but at first, these were just words you heard somewhere, and your brain decided they were important for a social reason. I've also heard many times what the capital of the Congo is - but I couldn't tell you off the top of my head, because I never cared enough to internalize it. It's not part of my "knowing this would make me a cool and responsible person" cluster.

Reply (1)

Julian

What social reason? I'm baffled, seems like an epicycle.

Also I know that the capitals of the Congo's are Kinshasa and Brazzaville, though I forget which is which. Why did I internalize it?

Reply (1)

Judah

15h

My own understanding of the points here:

1. When Elias says "social reason" I think he's claiming something along the lines of "all non-physical, non-immediate needs and goals are socially-mediated" as a theory of where desire comes from. You planned and acquired a visa because of a desire, not by knowing that a visa is a thing that exists.

2. By this model, you internalized the capitals of Congo because you had some desire that Elias did not. This desire (born of social reasons) is what explains the difference in memory even though both of you have the same cognitive capabilities.

Reply (1)

Elias Schmied

14h

Thanks Judah! Super helpful

1. Directionally yes, although I wouldn't phrase it that radically - see my caveats in the post. e.g. I would still say that Julian "knows that visas exists" - just the reason for that isn't *only* self-supervised learning.

2. Yes, exactly.

Elias’s Substack

Social agency