Rosa Sala - Nubart

Dr. Rosa Sala

CEO of Nubart

What to Expect (and What Not to Expect) from AI vs. Human Interpretation

Interpreter in a booth performing simultaneous interpretation during a conference, with headphones and microphone.

The End of Benevolence: Why We Demand Perfection from AI That We Never Asked of Humans. From the perspective of a professional interpreter who has worked in simultaneous translation booths for decades and now leads the development of AI interpretation systems.


Contents of this article


The Perspective of a Human Interpreter

For family reasons, I am bilingual in German and Spanish. Before founding Nubart, I dedicated my life to other activities, one of the most successful and—why deny it—most lucrative being that of an interpreter. I started at age 16 while working as a secretary for a real estate company: I had to accompany and linguistically support my Spanish boss as he showed coastal apartments to German clients. I soon realized that I could finish my day and go to lunch much earlier if, instead of translating every sentence consecutively, I did it simultaneously, whispering the translation into my boss's or the client's ear. It took me years to find out that this technique I thought I had invented is called "chuchotage."

Shortly after, I professionalized this side activity. I worked in corporate meetings, at international congresses, and offered my services in every imaginable modality of interpretation: simultaneous in a booth, consecutive, relay for other translators...

Interpreter in simultaneous translation booth
The author waiting for the presentation to start in a soundproof translation booth

I often received context documents from the client to prepare: glossaries or even—rarely—the full manuscript of the speaker's text. But even with conscientious preparation days before the event, an interpreter can never achieve a perfect interpretation.

But how can we measure the quality of an interpretation performed in real-time, at high speed, and without the possibility of consulting a dictionary? Perhaps we can approximate it by comparing the result of a simultaneous interpretation with the quality that the same interpreter would have achieved translating that same speech in writing, at home, with time and resources, and without the pressure of real-time. If we take that written translation as a theoretical benchmark (100%), the accuracy level of professional real-time interpretation is necessarily lower. An experienced interpreter can get remarkably close to that ideal reference (in the range of 90%), while in real conditions, there is always an inevitable margin of loss.

That small gap that separated me and my best colleagues from perfection didn't seem to bother anyone. My clients admired the human ability to jump at breakneck speed between languages and were always satisfied with the result. They rewarded my days not only with €500 a day without a peep but with warm congratulations. It was exhausting work, but highly satisfying.

The Failures of Human vs. AI Interpretation

Today, I work full time as the CEO and co-founder of Nubart. In the summer of 2025, we launched Nubart TRANSLATE, an AI-powered simultaneous interpretation system (with voice and text). In reality, it is a product technologically equivalent to what my work as an interpreter has been for decades. My experience has contributed positively to the development of this product. However, the situation has changed radically.

I don't feel that AI translation like Nubart TRANSLATE has more or fewer errors than a good human interpreter. Simply put, AI errors are of a different nature.

AI language models, for example, have superhuman memory and are capable of translating technical terminology much better than any flesh-and-blood interpreter. They are tremendously effective when a speaker rattles off a list of figures (a classic nightmare for a human interpreter).

But sometimes they fail where a human would have no problem: in distinguishing the speaker's voice from another voice sounding in the background that should not be translated, for example. Or they struggle to place periods and commas correctly. Or they are "too precise," replicating potential speaker errors that a human interpreter would have identified and corrected immediately, almost without thinking. In fact, excessive precision is often the Achilles' heel of an AI translation. The human interpreter, by contrast, smooths the speech, makes it more idiomatic, and often summarizes it by removing unnecessary redundancies.

AI interpretation costs a small fraction of what it would cost to deploy a team of human interpreters, both in purely economic terms and in logistical and coordination effort. In any other product or service, price determines the level of expectations. Drivers expect higher performance from a Porsche Carrera than from a Volkswagen Polo, or from a senior McKinsey consultant than from an MBA student.

The Incongruent Expectations Generated by AI Simultaneous Translation

And this is where we get to the heart of this article: that logic does not apply to simultaneous translation. The expectations of Nubart TRANSLATE clients far exceed those I experienced in my career as a human interpreter.

Both my clients and my professional colleagues would have considered it rude to confront me with a recording of my work and point out interpretation errors one by one or present me with a list of improvements to keep in mind for future services. Curiously, behaviors that were unthinkable with human interpreters have become commonplace when the interpretation is performed by an AI system.

Some Nubart TRANSLATE clients send us long, detailed lists of how certain terms should be translated, including "preferred expressions" for translations that are objectively correct. They record snippets of translation obtained through our free 30-minute test and compare it with a corrected version, "for next time."

Why are expectations so radically different when it comes to AI translation? It is a fascinating question that reveal much about our relationship with technology.

First Fallacy: Mathematical Precision

Humans unconsciously assume that if it is a machine, it must be exact. We can forgive a student who makes a mistake adding in their head, but we won't forgive a calculator. But working with human language is much more subjective and complex than working with numbers. Two and two will always be four, but how many different ways are there to translate a verse? What is the "correct" way to do it?

Second Fallacy: Software is Not Subject to Conditions

Human interpreters are valued like stage performers. Their performance is assessed based on various conditions: complexity, stress, exhaustion, cognitive load, and the need for concentration.

With interpretation software, we tend to think that external conditions do not influence it, but that is only partly true. AI also needs the speaker to speak at the right pace, for their sentences to be intelligible, for the microphone to be of good quality, and for the audio to be well isolated, without echoes or interference.

Third Fallacy: The Unsympathetic, Perfect Machine

We are willing to forgive a human. But no one feels empathy or compassion for a machine. Even less so if that machine is powered by AI, perceived by many—not without reason—as a threat to traditionally human skills. There is a certain malicious joy and "species pride" when we detect errors in a "competitor" like artificial intelligence. When the system makes a mistake, what is seen is a product that must be "debugged."

Why Overly Long Glossaries Make Translation Worse

But then, can't the AI's result be perfected through a glossary?

Integrating the glossary can train the AI to use the client's preferred terminology in most cases, similar to how it would be done with a human interpreter. These lists of terms are born out of a genuine interest in improving quality, but sometimes also out of a desire for control by the organizer and the fantasy that events are perfectly predictable.

As a human interpreter, I have spent days memorizing entire vocabularies sent by the client that were ultimately barely used. When those glossaries were very long (200 specific terms about a robotic priming application, for example), it all felt like cognitive noise and significantly increased my stress level, making the work feel constrained and rigid rather than fluid. A short list of high-impact terms with an explanatory text that would have helped me understand the technological context would have led to much better interpretation. No professional interpreter uses more than a limited number of active terms at any given time. Beyond that, preparation becomes counterproductive.

Neural translation systems work by calculating probabilities between words: which word usually appears next to which other word. Every time a glossary forces the translation of a specific term, it not only changes that term—it changes the translation probabilities of all the other words in the sentence. This is confirmed by OneWord, a German company specializing in machine translation optimization, which found that “an overloaded glossary can even lead to more errors” due to this cascade effect on the neural system's probabilities.

A language does not work like a mathematical equation that always obeys the same rules. A client may want us to translate “market-driven sustainability strategy” as “estrategia de sostenibilidad impulsada por el mercado.” But what the speaker ends up saying is “the tactic we use to be sustainable is always based on the impulses that the market has provided us.” The same idea, expressed differently.

Now the AI has to consult the glossary, identify that it contains a similar, but not identical, phrase, and decide whether to translate it according to the instructions or to follow the speaker's wording literally. But here's the technical problem: when AI forces a glossary translation, it alters the probability model for the rest of the sentence. It's like changing one piece in a domino: the other pieces also move.

In addition, long glossaries inevitably contain ambiguous terms: words that should be translated differently depending on the context. But most glossary systems cannot handle this contextual ambiguity. They simply apply the first translation in the list, or the last, or ignore the term altogether. The result is unpredictable: forced translations that are wrong half the time.

The paradox is clear: a concise glossary with 15-20 key terms improves translation. A comprehensive glossary with 200 terms degrades it.

The Real Challenge is Not Technological. It is Educational.

This expectation gap will not be solved by a more advanced AI. It will be solved by a better understanding of what interpretation—human or artificial—really is: an extraordinarily complex real-time translation exercise that works with highly fluid material like language and functions within margins of professional excellence, not perfection.

AI simultaneous interpretations like Nubart TRANSLATE tend to offer a quality level comparable to that of the best human interpreters, with the transformative advantages of AI: accessible cost, immediate availability, and unlimited scalability.

Is it perfect? No. Is it professional, effective, and revolutionary? Absolutely. Is it improvable? Undoubtedly, yes. And at Nubart, we work tirelessly on that.

Technology has democratized something that was previously a luxury reserved for organizations with large budgets. Let us celebrate that revolution, but with realistic expectations of what interpretation—human or artificial—can really achieve.

If you are considering offering AI interpretation at your next event, we would be happy to help you design it with realistic expectations and maximum impact for your audience.


We hope you found this article helpful. Subscribe to our newsletter to stay updated on our latest publications.