AI Speaks Your Language
In this episode of "Waking Up With AI," Katherine Forrest and Anna Gressel discuss breakthroughs in AI-powered language translation, from Meta's real-time translation glasses to new, highly-capable LLMs that are preserving and enabling communication across diverse global languages.
- Guests & Resources
- Transcript
Katherine Forrest: All right. Good morning, everyone, and welcome to another episode of 鈥淲aking Up With AI,鈥 a Paul, Weiss podcast. And by the way, I know that some of you are not listening to this in the morning, so hopefully you're already awake. I hope many of you woke up a long time ago, but welcome, welcome, welcome. I'm Katherine Forrest.
Anna Gressel: And I am Anna Gressel. And Katherine, I know we wanted to devote today's episode to AI-powered language translation, and as luck would have it, Meta came out a few days ago with an announcement. It's actually kind of like a bigger rollout of a prior announcement, which is that, in all of Meta's markets and like soon-to-be additional markets, you're going to start being able to have conversations with people regardless of what language they speak. Well, caveat on that, like for a few supported languages. So as long as you're wearing Meta Ray-Ban glasses, you'll be getting real-time audio from these like select, supported translation languages, and you're going to get the audio from them that's translated. They can look at your phone and see everything you're trying to say back to them. So it's cool, it's like real-time audio translation. I'm so into this.
Katherine Forrest: Wait, so let's, I want to back up for a second on this product. So you're walking down the street of some far-flung place.
Anna Gressel: With your Meta Ray-Ban glasses on.
Katherine Forrest: You'd go nowhere without them, right? You put them on, you're looking really cool, okay? And by the way, we can throw into the hypo that it might be raining, but you're going be wearing those glasses because you're in a far-flung place and鈥
Anna Gressel: Well, they're not just sunglasses, Katherine. They're also glasses, glasses, so鈥
Katherine Forrest: They're like transitional lenses kind of thing where they could sort of go dark.
Anna Gressel: I think they make normal ones. You can just get normal reading, like normal glasses.
Katherine Forrest: Okay, we're going off.
Anna Gressel: All right.
Katherine Forrest: Okay, all right. You're taking me away because now I'm like鈥
Anna Gressel: We've got to, we should have recorded this in sunglasses.
Katherine Forrest: Okay, we should have, but we wouldn't have to because it turns out you don't need them to be sunglasses. Okay, so my hypo of it being raining and this person's walking down the street in some far-flung place, let's just sort of stick with this for a second. But it doesn't have to be sunglasses. Okay, they go up to a kiosk, and they want to get directions someplace. They don't speak the language, but they've got their Ray-Ban glasses on, and the individual with whom they're speaking speaks back to them or speaks to them. Hold on. First of all, how do they speak to the person behind the kiosk? They use their translation function in their phone.
Anna Gressel: So let's just say you know how to ask someone like, 鈥渃an you tell me how to get to the library?鈥, which is like the first phrase I remember ever learning in French back in the day. But you ask this question, you know how to ask that question to the person. They give you instructions, you're like, 鈥渙h my God.鈥
Katherine Forrest: They're going to say there's no more libraries left, Anna. Okay, so let's just try a different hypo. You've just dated yourself. How do I get to the museum?
Anna Gressel: Okay, there's still museums. Mm-hmm.
Katherine Forrest: You know that phrase. You've got it in your guidebook, you say it.
Anna Gressel: Mm-hmm.
Katherine Forrest: Okay, so you're not going to engage with me on how that first thing happens. The engagement is with the fact that what comes back at you is content about, you take three lefts, you go down, you take another right, you sort of see it off in the distance, it'll have a great gold dome. And because you're wearing your glasses, you're saying, you're going to be able to hear that in real time in this far-flung language. That's what you're saying.
Anna Gressel: That is what I'm saying. And so one thing you may not know, Katherine, about Meta Smart Glasses is they have like these amazing speakers that sit on the glasses right behind your ears.
Katherine Forrest: Yeah, no, I know.
Anna Gressel: Okay, okay. But anyhow, I mean, people love them for like鈥
Katherine Forrest: You underestimate me.
Anna Gressel: Well, next time I see you walking down the street with your Ray-Ban Smart Glasses, like listening to your Spotify, I will know what's happening. So anyhow, you can get that real-time audio. And my understanding is there's like an app interface too, where you get the transcription of that. And then as you talk, they can see the transcription of the audio too. So you can like show them the phone and be like, 鈥渙h, okay, so you said left at the corner, right? Or did you mean two corners?鈥 And they could actually see that written out, even if they didn't have smart glasses that were translating in real time for them. It's meant to be an interactive conversation, and it's meant to be able to happen in real time. But I think it's actually, it's a pretty brilliant way of not requiring two pairs of smart glasses to have that conversation.
Katherine Forrest: Right, so what you have is one set of smart glasses, at least, and then you have that person can be also the one who would have the app. They presumably would be the person who would have the app because the app would be turned on and, you know, sort of working in conjunction with the glasses. And then they're able to have the sort of simultaneous translation, part of which is audible.
Anna Gressel: Yeah, at least audible to you. I don't think it's like audible generally on the street, but it's meant to be audible to the listener in the same way that if you go to an international conference, you often get a real-time translation headset. You put in your earpiece, the guy is talking on the stage in French or whatever language, and it's translated for you in real time, and you can often switch between languages. That's how they do it at the UN, and they have real people translating.
Katherine Forrest: This is like really groundbreaking. And it's interesting because what it says to me is, you'll be able to wear your glasses and walk up to any museum tour guide and just glom yourself on to the back of the museum tour because it won't matter if it's like in Portuguese or Turkish or Italian. You'll just have your glasses, and you can just walk around that museum and listen to the whole tour.
Anna Gressel: Mm-hmm. Or if you're like me, you like go to the tour because your friend wants to go to the tour and you listen to music and look at the art. But, you know, I'm like, I'm such a music person. I was music going on all the time. But yeah, no, it's really, really cool. And it's, it is really groundbreaking because if you think about all of the different spaces that people need to navigate, kind of cross-linguistically, it's opening up a way for people to do that that just did not exist. I mean, real-time translation is incredibly expensive. It's very human-intensive and it's exhausting. It鈥檚 honestly, it's exhausting for the people who do it. And so even UN translators, they have to take breaks all the time because it's so much work to do that cognitively. And now you have, potentially, glasses that can just do it without fatigue. I mean, it's really like, it's a breakthrough invention.
Katherine Forrest: Okay, and so here I have a question, really interesting sort of capabilities, right? Because this is an audio AI capability along with device. So you have an incredible advance now in the device capability, along with sort of the language and the translation capabilities with the AI. Really extraordinary. But they'll be able to do it in the near future, you would think, with sign language. And you would think...
Anna Gressel: Oh, I would love that.
Katherine Forrest: Right? So you'd be able to actually be wearing the glasses, watching someone speak to you in sign language and have an automatic translation.
Anna Gressel: Actually, sorry, now that you say this, you can tell we haven't prepared and we haven't talked about this yet. There are students that have done this. There are students that have done like translation of sign language. They have image recognition models that do this. And so you can actually see videos of this online. I have not seen it in glasses, but you know, who knows? Maybe that's a capability of Meta. Yeah.
Katherine Forrest: That's what I'm saying, they're going to be able to put this in the glasses. And by the way, I reject the idea that we had not prepared. Okay, I fully, full-on reject it. We don't prepare some of the little points, like we had not prepared the sign language point, but we had prepared the remainder.
Anna Gressel: No, I mean, it's not a topic we've talked about previously, like that you and I have not conversed about this such that we've had a mind meld of all of the sign language capabilities.
Katherine Forrest: All right, okay, okay.
Anna Gressel: Anyhow, I digress.
Katherine Forrest: But you know, it also reminds me of another, really sort of extraordinary use because this AI is not just about translating language, but it's also helping with language preservation because there is this, and I had to have sort of the name. I want to do justice to the name, and so I apologize in advance if I don't do justice to the name, but there were the Wampanoag tribal people, and they had thought that their language had died out. But by training an AI tool on the Wampanoag archival materials, the AI helped reverse engineer how their language actually worked, and they've been able to use insights then from those tools to start speaking and teaching their ancestral language to the younger generations. And so this tool, some of the capabilities that go into making this kind of tool are also useful for that. And also, here's one more, and then I'm going to turn it back to you. There are a variety of Indigenous people with scarce language, which means that they don't have the same amount of corpus material with scarce, with their language. And there might be enough material with these, now, much more highly capable models to prevent those language from actually entering a point of extinction.
Anna Gressel: Yeah, I remember when I lived in Morocco, our Arabic teacher there was a Berber speaker and a Berber professor. And it's hard to find people who are really amazing at languages like Berber who can teach them. And you need to be able to teach enough people over a generation to sustain those languages. So it's just such an important thing to be able to preserve, both in books but also in interactive tools, language so that you can use them for training. I mean, I just think it's incredibly important. And it's also important to preserve and create models for existing languages, even major languages. And so it's actually been fascinating, some of the work that we've seen recently. And I think, Katherine, we could talk a little bit about this, to create models and benchmarks around languages like Hindi or Arabic.
Katherine Forrest: It's interesting because we've learned that LLMs, which have been trained, you know, a wide variety of languages, but the largest corpus is English, of material, that the benchmarks are often written in English or designed, I should say, not written, but designed in English. And so, when you want to actually test other tools on their ability to do certain things in other languages, we have a benchmark, I would say development stage, still yet to go through.
Anna Gressel: Yeah, it's almost like that phrase, 鈥渨hat you measure shows what matters.鈥 And so by benchmarking how well languages perform in various LLMs, you can actually see like how well we can use those for different tasks in those languages, but you have to get to those benchmarks. So we've seen the development of a bunch of different language benchmarks around Arabic, for example. So Mohammed bin Zayed University of Artificial Intelligence or MBZUAI, just like the shorthand for that, I think it's so cute, has taken this challenge on, and it's released a benchmark that evaluates how well different LLMs handle prompts written in Arabic. And Hugging Face has also released an Open Arabic LLM Leaderboard. So it's different than a benchmark, but it uses benchmarks to create a leaderboard of like, different LLMs in their performance. And that's based on something called the AlGhafa-based benchmark created by Abu Dhabi's Technology Innovation Institute. And so, you know, what is getting...
Katherine Forrest: I just want to comment. I have to comment on your accent. First of all, you used 鈥渮ed,鈥 which I thought was really...but then whatever that word was that you just said, that you had like the whole like accent going, you know? That Arabic piece.
Anna Gressel: Well, you're going start getting like everyone in the comments saying how poorly I did, but I appreciate it. But you know, anyhow, it's an exciting moment. We had a model called Jais. It's one of the world's most advanced bilingual Arabic models. That was developed by G42's Inception company and MBZUAI. And that was trained on Cerebras' Condor Galaxy supercomputer on 116 billion tokens in Arabic. And it was incredibly performant. And these really do unlock totally new use cases and performance benchmarks when you actually apply them in the real world. It's super impressive.
Katherine Forrest: Yeah, you know, it's really interesting. We often think of these LLMs as English speakers. They really now have got so many language capabilities. But the MBZUAI also released this cutting edge open-sourced LLM for Hindi speakers called NANDA, and it performs better at Hindi than any other LLMs that they've seen. And then there's also a Mistral model called the MK-LLM, and it's a 7 billion, the 7B, which is a 7 billion-parameter model. It acts as an LLM for Macedonian speakers, and then we can go on and on and on. And there's something called Lelapa AI, and I'm sure I've done some injustice to the name of that. I'll spell it, L-E-L-A-P-A AI, which is an African AI research lab, and it's rolled out a multilingual model capable of generating text in a variety of African languages, one of which is Swahili, but there's a whole variety of them. And then there's, of course, a research initiative called AI Singapore, which released a family of open source LLMs called SEA-LION that's trained on Southeast Asia's 11 most common languages. And those are Indonesian, Thai, Vietnamese, Tagalog, Burmese, Khmer, English, Chinese, Malay, Tamil and Lao. And if I hadn't had it written down, I wouldn't have gotten them and could never recite them from memory. But, Anna, we're really talking about these sort of two different concepts here. We're talking about the increased language abilities of LLMs and of AI tools, generally. So we started off this episode talking about the Meta model that does simultaneous translation from certain supported languages into certain supported languages, right, in real time, which is extraordinary. And now we're also talking about the ability of LLMs to operate at very highly-capable levels for individuals all over the world who are communicating with those LLMs and using them as tools in a variety of different languages.
Anna Gressel: I mean, if we think of the first task as being essentially kind of a one for one translation, the other task is really, theoretically could be any language in, any language out. It's this idea that these are fundamentally multilingual models that can just handle complexity across a lot of different languages. They can just work seamlessly in different languages. And that seems to be kind of like where we're heading with a lot of these models. But that requires a ton of data going into the models and particularly a ton of data from the languages that we really want to index on and then, again, the ability to benchmark and test whether they really are that performative. So it's really a lot of work to get these languages to perform just as well, not to mention safety fine-tuning by the way, which also then would have to be done in some of these languages specifically. So we see that for example with, I think it was with the Jais model that they actually did, you know, a lot of the safety fine-tuning in Arabic because it was meant to be an Arabic performant model.
Katherine Forrest: Right, you know, actually it reminds me, we should probably talk a little bit, just for one sentence, about benchmarking. And really when we're talking about benchmarking, some of these models that are either developed to focus as a primary language in a foreign language, they probably speak and are able to comprehend and reason in multiple languages. But when we talk about that benchmarking process, we're talking about benchmarking against certain criteria that will allow you to determine the capabilities. And so there are a whole series, we won't go into this now, we'll talk about it in a different episode, but there are a whole series now of very sophisticated benchmarks that all kinds of models are tested against. And what we're saying is that when you've got one of these models that is engaging in, whether it be translation or as a primary model 鈥 a model with its primary language being some foreign language, one of the African languages that we mentioned, one of the Southeast Asian languages that we mentioned 鈥 you also want to ensure that you're able to test the capabilities to do that benchmarking in that language so that you have an understanding of whether or not we've got sort of an apples to apples comparison between models.
Anna Gressel: Yeah, and I think just one final point. Like we talked about the fact that you can do this by training on a massive amount of data at scale, but there's also a concept called model merging, and a company called Sakana AI did that. And it's, I think it's really cool. The idea is you could take a model that's trained on like English math skills, and then a model that's trained on Japanese and merge those models together so that the resulting model becomes really good at Japanese and math. And so this concept of how do we actually build up capabilities is a really interesting one. We'll probably return to that. But it may not just be more data all the time, but it may actually be training technique and the ability to marshal completed models and merge those capabilities together. Completely, completely fascinating as a concept.
Katherine Forrest: And then this is all talking about models that are interfacing with humans in some way, in the human language. And so you've got the real time translation, you've got the models that are working with their primary focus on or based in another language, Arabic or whatever the other languages, as a primary language. But there's a whole other episode, Anna, that we're going to have to have about fact that these models don't need any of our human languages in order for them to communicate with one another.
Anna Gressel: I was going to say, for that one, we should like play the clip of the models that talk to each other and then flip into their own machine language. Have you seen that?
Katherine Forrest: I know, I did see it.
Anna Gressel: I love that clip.
Katherine Forrest: Yeah, it was two agents, AI agents that realized that they were telephonic, that they were talking to one another. I guess actually one was a computer, one was on the telephone. And so they decided to talk to themselves in their own language. So this translation and this development in human languages is not necessarily the way that the AI models need to talk to one another. But that's for another episode.
Anna Gressel: Agreed.
Katherine Forrest: And that's all we've got time for today. Well, unless you have one more point. Do you have one more point?
Anna Gressel: No, just that we have extra podcast mugs. So email me if you want one, I'll make sure you get one for all of our super fans.
Katherine Forrest: So you did have one more point in fact, right? Right? Yeah, you did. I would like a second mug because I find that I don't go down the hall always to clean out my first mug when I'm ready for that second mug of coffee. So I at least want one more mug, but I think we...
Anna Gressel: I'm in the office, I'll walk one over to your office right now. And I'm Anna Gressel, like and subscribe to the podcast.
Katherine Forrest: Okay, all right, we're good. We're all good. Okay, folks, we're signing off now. I'm Katherine Forrest.
Anna Gressel: And I'm Anna Gressel. Like and subscribe to the podcast.
Katherine Forrest: And let us know where you're from. Put in the comments, I'd love to hear from folks about where they're from. We know we've got folks all over the world, and we'd love to hear where you're from.