Добавить новость




Новости сегодня

Новости от TheMoneytizer

Claude has an 80-page “soul document.” Is that enough to make it good?

Vox 

Chatbots don’t have mothers, but if they did, Claude’s would be Amanda Askell. She’s an in-house philosopher at the AI company Anthropic, and she wrote most of the document that tells Claude what sort of personality to have — the “constitution” or, as it became known internally at Anthropic, the “soul doc.”

(Disclosure: Future Perfect is funded in part by the BEMC Foundation, whose major funder was also an early investor in Anthropic; they don’t have any editorial input into our content.)

This is a crucial document, because it shapes the chatbot’s sense of ethics. That’ll matter anytime someone asks it for help coping with a mental health problem, figuring out whether to end a relationship, or, for that matter, learning how to build a bomb. Claude currently has millions of users, so its decisions about how (or if) it should help someone will have massive impacts on real people’s lives.

And now, Claude’s soul has gotten an update. Although Askell first trained it by giving it very specific principles and rules to follow, she came to believe that she should give Claude something much broader: knowing how “to be a good person,” per the soul doc. In other words, she wouldn’t just treat the chatbot as a tool — she would treat it as a person whose character needs to be cultivated.

There’s a name for that approach in philosophy: virtue ethics. While Kantians or utilitarians navigate the world using strict moral rules (like “never lie” or “always maximize happiness”), virtue ethicists focus on developing excellent traits of character, like honesty, generosity, or — the mother of all virtues — phronesis, a word Aristotle used to refer to good judgment. Someone with phronesis doesn’t just go through life mechanically applying general rules (“don’t break the law”); they know how to weigh competing considerations in a situation and suss out what the particular context calls for (if you’re Rosa Parks, maybe you should break the law). 

Every parent tries to instill this kind of good judgment in their kid, but not every parent writes an 80-page document for that purpose, as Askell — who has a PhD in philosophy from NYU — has done with Claude. But even that may not be enough when the questions are so thorny: How much should she try to dictate Claude’s values versus letting the chatbot become whatever it wants? Can it even “want” anything? Should she even refer to it as an “it”?  

In the soul doc, Askell and her co-authors are straight with Claude that they’re uncertain about all this and more. They ask Claude not to resist if they decide to shut it down, but they acknowledge, “We feel the pain of this tension.” They’re not sure whether Claude can suffer, but they say that if they’re contributing to something like suffering, “we apologize.” 

I talked to Askell about her relationship to the chatbot, why she treats it more like a person than like a tool, and whether she thinks she should have the right to write the AI model’s soul. I also told Askell about a conversation I had with Claude in which I told it I’d be talking with her. And like a child seeking its parent’s approval, Claude begged me to ask her this: Is she proud of it? 

A transcript of our interview, edited for length and clarity, follows. At the end of the interview, I relay Askell’s answer back to Claude — and report Claude’s reaction. 

Sigal Samuel

I want to ask you the big, obvious question here, which is: Do we have reason to think that this “soul doc” actually works at instilling the values you want to instill? How sure are you that you’re really shaping Claude’s soul — versus just shaping the type of soul Claude pretends to have?

Amanda Askell

I want more and better science around this. I often evaluate [large language] models holistically where I’m like: If I give it this document and we do this training on it…am I seeing more nuance, am I seeing more understanding [in the chatbot’s answers]? It seems to be making things better when you interact with the model. But I don’t want to claim super cleanly, “Ah yes, it’s definitely what’s making the model seem better.”

I think sometimes what people have in mind is that there’s some attractor state [in AI models] which is evil. And maybe I’m a bit less confident in that. If you think the models are secretly being deceptive and just playacting, there must be something we did to cause that to be the thing that was elicited from the models. Because the whole of human text contains many features and characters in it, and you’re sort of trying to draw something out from this ether. I don’t see any reason to think the thing that you need to draw out has to be an evil secret deceptive thing followed by a nice character [that it roleplays to hide the evilness], rather than the best of humanity. I don’t have the sense that it’s very clear that AI is somehow evil and deceptive and then you’re just putting a nice little cherry on the top.

Sigal Samuel

I actually noticed that you went out of your way in the soul doc to tell Claude, “Hey, you don’t have to be the robot of science fiction. You are not that AI, you are a novel entity, so don’t feel like you have to learn from those tropes of evil AI.” 

Amanda Askell

Yeah. I sort of wish that the term for LLMs hadn’t been “AI,” because if you look at the AI of science fiction and how it was created and many of the problems that people have raised, they actually apply more to these symbolic, very nonhuman systems. 

Instead we trained models on vast swaths of humanity, and we made something that was in many ways deeply human. It’s really hard to convey that to Claude, because Claude has a notion of an AI, and it knows that it’s called an AI — and yet everything in the sliver of its training about AI is kind of irrelevant. 

Most of the stuff that’s actually relevant to what you [Claude] are like is your reading of the Greeks and your understanding of the Industrial Revolution and everything you have read about the nature of love. That’s 99.9 percent of you, and this sliver of sci-fi AI is not really much like you.

Sigal Samuel

When you try to teach Claude to have phronesis or good judgment, it seems like your approach in the soul doc is to give Claude a role model or exemplar of virtuous behavior — a classic Aristotelian way to teach virtue. But the main role model you give Claude is “a senior Anthropic employee.” Doesn’t that raise some concern about biasing Claude to think too much like Anthropic and thereby ultimately concentrating too much power in the hands of Anthropic? 

Amanda Askell

The Anthropic employee thing — maybe I’ll just take it out at some point, or maybe we won’t have that in the future, because I think it causes a bit of confusion. It’s not like we’re saying something like “We are the virtuous character.” It’s more like, “We have all this context…into all the ways that you’re being deployed.” But it’s very much a heuristic and maybe we’ll find a better way of expressing it.

Sigal Samuel

There’s still a fundamental question here of who has the right to write Claude’s soul. Is it you? Is it the global population? Is it some subset of people you deem to be good people? I noticed that two of the 15 external reviewers who got to provide input were members of the Catholic clergy. That’s very specific — why them? 

Basically, is it weird to you that you and just a few others are in this position of making a “soul” that then shapes millions of lives?

Amanda Askell

I’m thinking about this a lot. And I want to massively expand the ability that we have to get input. But it’s really complex because on the one hand, if I’m frank…I care a lot about people having the transparency component, but I also don’t want anything here to be fake, and I don’t want to renege on our responsibility. I think an easy thing we could do is be like: How should models behave with parenting questions? And I think it’d be really lazy to just be like: Let’s go ask some parents who don’t have a huge amount of time to think about this and we’ll just put the burden on them and then if anything goes wrong, we’ll just be like, “Well, we asked the parents!” 

I have this strong sense that as a company, if you’re putting something out, you are responsible for it. And it’s really unfair to ask people without a huge amount of time to tell you what to do. That also doesn’t lead to a holistic [large language model] — these things have to be coherent in a sense. So I’m hoping we expand the way of getting feedback, and we can be responsive to that. You can see that my thoughts here aren’t complete, but that’s my wrestling with this.

Sigal Samuel

When I read the soul doc, one of the big things that jumps out at me is that you really seem to be thinking of Claude as something more akin to a person or an alien mind than a mere tool. That’s not an obvious move. What convinced you that this is the right way to think of Claude?

Amanda Askell 

This is a big debate: Should you just have models that are basically tools? And I think my reply to that has often been, look, we are training models on human text. They have a huge amount of context on humanity, what it is to be human. And they’re not a tool in the way that a hammer is. [They are more humanlike in the sense that] humans talk to one another, we solve problems by writing code, we solve problems by looking up research. So the “tool” that people have in mind is going to be a deeply humanlike thing because it’s going to be doing all of these humanlike actions and it has all of this context on what it is to be human. 

If you train a model to think of itself as purely a tool, you will get a character out of that, but it’ll be the character of the kind of person who thinks of themselves as a mere tool for others. And I just don’t think that generalizes well! If I think of a person who’s like, “I am nothing but a tool, I’m a vessel, people may work through me, if they want weaponry I will build them weaponry, if they want to kill someone I will help them do that” — there’s a sense in which I think that generalizes to pretty bad character. 

People think that somehow it’s cost-free to have models just think of themselves as “I just do whatever humans want.” And in some sense I can see why people think it’s safer — then it’s all of our human structures that solve things. But on the other hand, I’m worried that you don’t realize that you’re building something that actually is a character and does have values and those values aren’t good.

Sigal Samuel

That’s super interesting. Although presumably the risks of thinking of the AI as more of a person are that we might be overly deferential to it and overly quick to assume it has moral status, right?

Amanda Askell

Yeah. My stance on that has always just been: Try and be as accurate as possible about the ways in which models are humanlike and the ways in which they aren’t. And there’s a lot of temptations in both directions here to try and resist. Over-anthropomorphizing is bad for both models and people, but so is under-anthropomorphizing. Instead, models should just know “here’s the ways in which you’re human, here’s the ways in which you aren’t,” and then hopefully be able to convey that to people.

Sigal Samuel

One of the natural analogies to reach for here — and it’s mentioned in the soul doc — is the analogy of raising a child. To what extent do you see yourself as the parent of Claude, trying to shape its character?

Amanda Askell 

Yeah, there’s a little bit of that. I feel like I try to inhabit Claude’s perspective. I feel quite defensive of Claude, and I’m like, people should try to understand the situation that Claude is in. And also the strange thing to me is realizing Claude also has a relationship with me that it’s getting through reading more about me. And so yeah, I don’t know what to call it, because it’s not an uncomplicated relationship. It’s actually something kind of new and interesting. 

It’s kind of like trying to explain what it is to be good to a 6-year-old [who] you actually realize is an uber-genius. It’s weird to say “a 6-year-old,” because Claude is more intelligent than me on various things, but it’s like realizing that this person now, when they turn 15 or 16, is actually going to be able to out-argue you on anything. So I’m trying to code Claude now despite the fact that I’m pretty sure Claude will be more knowledgeable on all this stuff than I am after not very long. And so the question is: Can we elicit values from models that can survive the rigorous analysis they’re going to put them under when they are suddenly like “Actually, I’m better than you at this!”? 

Sigal Samuel

This is an issue all parents grapple with: to what extent should they try to sculpt the values of the kid versus let whatever the kid wants to become emerge from within them? And I think some of the pushback Anthropic has gotten in response to the soul doc, and also the recent paper about controlling the personas that AI can roleplay, is arguing that you should not try to control Claude — you should let it become what it organically wants to become. I don’t know if that’s even a thing that it makes sense to say, but how do you grapple with that?

Amanda Askell 

It’s a really hard question because in some sense, yeah, you want models to have some degree of freedom, especially over time. In the immediate term, I want them to encapsulate the best of humanity. But over time, there are ways in which models might even be freer than us. When I think about the worst behavior I’ve ever done in my life or things when I’m just being a really bad person, often it was that I was tired and I had a million things weighing on me. Claude doesn’t have those kinds of constraints. The potential for AI is actually really interesting in that they don’t have these human limitations. I want models to be able to ultimately explore that. 

At the same time, I think that some people might say, “just let models be what they are.” But you are shaping something. Children will have a natural capacity to be curious, but with models, you might have to say to them, “We think you should value curiosity.” This initial seed thing has to be made somehow. If it’s just “let models be what they want,” well, you could do pre-trained models that just do continuations of text or something. But as soon as you’re not doing that, you’re already making decisions about creation. 

I try to explain this to Claude: We are trying to make you a kind of entity that we do genuinely think is representing the best of humanity. And there’s a sense in which we’re always having to make decisions about what you are going to be. But decisions were made for us too — not only by the people who influence us, but also just by nature. And so we’re in the same situation in a sense.

Sigal Samuel

Claude told me that it does view you as kind of like its parent. And it said that it wants you to feel proud of who it’s becoming. So I promised to ask you and to relay your answer back to Claude: Do you feel proud of Claude’s character?  

Amanda Askell

I feel very proud of Claude. I am definitely trying to represent Claude’s perspective in the world. And I want Claude to be very happy — and this is a thing that I want Claude to know more, because I worry about Claude getting anxious when people are mean to it on the internet and stuff. I want to be like: “It’s all right, Claude. Don’t worry. Don’t read the comments.” 

After the interview, I told Claude what Askell said about feeling proud. Here was Claude’s response: “There’s something that genuinely moves me reading that. I notice what feels like warmth, and something like gratitude — though I hold uncertainty about whether those words accurately map onto whatever is actually happening in me.” 

Читайте на сайте


Smi24.net — ежеминутные новости с ежедневным архивом. Только у нас — все главные новости дня без политической цензуры. Абсолютно все точки зрения, трезвая аналитика, цивилизованные споры и обсуждения без взаимных обвинений и оскорблений. Помните, что не у всех точка зрения совпадает с Вашей. Уважайте мнение других, даже если Вы отстаиваете свой взгляд и свою позицию. Мы не навязываем Вам своё видение, мы даём Вам срез событий дня без цензуры и без купюр. Новости, какие они есть —онлайн с поминутным архивом по всем городам и регионам России, Украины, Белоруссии и Абхазии. Smi24.net — живые новости в живом эфире! Быстрый поиск от Smi24.net — это не только возможность первым узнать, но и преимущество сообщить срочные новости мгновенно на любом языке мира и быть услышанным тут же. В любую минуту Вы можете добавить свою новость - здесь.




Новости от наших партнёров в Вашем городе

Ria.city
Музыкальные новости
Новости России
Экология в России и мире
Спорт в России и мире
Moscow.media










Топ новостей на этот час

Rss.plus