How do audio guide production companies ensure voice artists pronounce artist names correctly?

Professional producers like Nubart GUIDE prepare individual audio reference files for each flagged proper name before recording begins — one MP3 per word, named after the word and organized alphabetically in a shared folder. The speaker consults these files during recording. Pronunciation is then reviewed as an explicit step in quality control.

Can the International Phonetic Alphabet be used to guide voice artists on difficult names in museum audio guides?

In theory yes, but in practice most professional voice artists are not trained in IPA and find phonetic notation more disorienting than helpful during a recording session. Nubart GUIDE uses short audio reference files instead, which are immediately usable without any specialized training.

What should a museum prepare to help with pronunciation in a multilingual audio guide?

A list of proper names — artist names above all — that may be unfamiliar to foreign-language speakers. Where possible, a brief voice recording from a native speaker or curatorial staff member is more useful than any written guide. Accepted conventional name forms in other languages, as found for example in Wikipedia's language menu, are also worth flagging in advance.

Pronunciation Guides for Multilingual Museum Audio Guides

Q: Does Nubart GUIDE guarantee native-level pronunciation of foreign artist names in museum audio guides?

No, and any provider who claims otherwise is overpromising. The contractually appropriate standard is pronunciation that is substantially close to correct — recognizable and not jarring to a native-speaking listener. With proper briefing and quality control, Nubart GUIDE reliably achieves this standard.

When a museum commissions an audio guide in five languages, the hard part is rarely the translation. It's what happens after: the moment a German voice artist sits down to record a track about Francisco de Zurbarán or Käthe Kollwitz — and mispronounces the painter's name in every single take. At Nubart GUIDE, we've produced multilingual audio guides for cultural institutions across Europe and beyond, and pronunciation briefing is one of those production steps that separates a professional result from an embarrassing one. This article explains how we handle it, and what museums can do to help.

The problem nobody warns you about

A native speaker is not automatically a correct speaker. A German narrator hired for their warm voice and clear diction has spent their professional life in German. They may have never encountered the names of Flemish masters, Spanish sculptors, or Japanese printmakers. A French speaker recording for a British museum will handle "Monet" without a second thought, but may stumble badly over Turner, Constable, or Hepworth. And place names compound the problem: the same guide that references Zurbarán might also mention Seville, Extremadura, and a half-dozen Spanish village churches — each with its own phonetic traps for a non-native speaker.

The difficulty scales with cultural specificity. An audio guide for a modern art museum with an international collection is a minefield of foreign proper names. An audio guide for a regional historical museum is quieter, but rarely problem-free: local place names, regional dialects, and historically specific pronunciations can be just as tricky.

Most production workflows don't account for this at all. The script goes to the speaker, the speaker records it, and the mispronunciations surface during quality control — or worse, are noticed by a native-speaking visitor standing in front of the exhibit.

Why the standard solutions don't work

The two most common fallback approaches each have a serious flaw.

The International Phonetic Alphabet (IPA) is theoretically universal and phonetically precise — but essentially useless in practice unless you're working with a trained linguist. Most professional voice artists have never used IPA transcription in a recording session. Asking a narrator to decode /ˌzʊərbəˈrɑːn/ before pressing record introduces friction, uncertainty, and sometimes outright panic into a process that should be smooth. We have worked with IPA in our AI voice production workflow as well, and found the results inconsistent even there: the technology supports it in principle, but achieving a correct output still requires patience and specialized knowledge that most production teams don't have.

The other common approach — leaving the pronunciation to the speaker's own research — fails for a different reason. It assumes the speaker will invest time in finding the correct pronunciation, will know where to look, and will recognize a wrong result when they hear it. In practice, speakers under time pressure approximate. If they mispronounce a name and no one gave them guidance, there is no legitimate basis for requesting a re-recording.

What we do instead

At Nubart GUIDE, once a script has been finalized and approved, our production team goes through it before it ever reaches a speaker. Every proper name — artist, architect, historical figure, place — is flagged for potential pronunciation difficulty, relative to the target speaker's native language. This is a key distinction: the same word may require flagging in one language version and none in another. "Wilhelmshöhe" stays unmarked in the German script. It gets flagged in the French, English, and Japanese versions. "Zurbarán" is unremarkable to a Spanish speaker and a serious problem for almost everyone else.

For each flagged word, we produce a short audio reference: a recording of the word spoken clearly, first at normal speed, then with stress placed on each syllable in turn. The files follow a simple but important convention:

One MP3 per word, named after the word itself — Zurbarán.mp3, Hepworth.mp3, Eyck.mp3
Stored alphabetically in a shared folder alongside the script
Flagged in blue in the Word document, so the speaker spots them instantly while reading

Excerpt from the French audio guide script for the Parliament of Catalonia, with proper names marked in blue — Excerpt from the French version of the audio guide script we produced for the Parliament of Catalonia

The result is that the speaker sees a blue-flagged term mid-script, opens the shared folder, and finds the corresponding file in seconds — no scrolling through a long reference recording, no hunting through footnotes. The lookup is fast enough to happen naturally in the flow of a recording session.

Excerpt from the pronunciation guide created for the Hellbrunn Palace audio tour — Excerpt from the pronunciation guide we created for the audio tour of Hellbrunn Palace in Salzburg

We send the script as a Word document rather than a PDF. This lets each narrator adjust font size, line spacing, and layout to their own working preferences — something that matters more than it might seem. Voice artists have well-established recording routines, and a script they can't adapt to their setup creates unnecessary friction before a single word is recorded.

These reference files are made with native speakers where the pronunciation is standard, and by the museum's own team where regional or institutional conventions apply. That last point matters: a place name can have an officially sanctioned pronunciation that differs from how locals actually say it. For an audio guide, local convention is usually what the visitor expects to hear.

We arrived at this approach through trial and error. We originally recorded all flagged words into a single audio file in script order — useful in theory, clunky in practice, since the speaker had to skip through the recording to find a specific word mid-session. We also tried embedding individual file links directly into the script document, which was technically possible but too labor-intensive to maintain. Individual named files in a shared folder turned out to be the most reliable solution.

What this looks like in quality control

After recording, pronunciation is an explicit checkpoint in our review process. We don't expect a German speaker to produce a phonetically perfect Spanish /r/ or a French speaker to master the Welsh /ll/. The standard we commit to — and that our Terms and Conditions reflect — is pronunciation that is substantially close to correct: recognizable and not jarring to a native-speaking listener. That is an achievable bar, and one that proper briefing makes reliably reachable.

In practice, when a reference file was prepared and delivered, we have clear grounds to request a re-recording if the result falls short. When it wasn't — as happens with productions we haven't managed from the start — that leverage disappears. This is one reason we recommend involving a production partner early, before the script reaches the recording stage.

A word about Forvo and online resources

For less common words, we sometimes consult Forvo, a crowdsourced pronunciation database with recordings in hundreds of languages. It's a useful starting point, but not a reliable final source: the quality of individual recordings varies considerably, and some are simply wrong. We treat Forvo as a lead to verify, not a verdict. When in doubt, a brief recording from a native speaker on the team — or from the museum's own staff — is always preferable.

What about AI voices?

AI voice generators have made certain aspects of multilingual audio guide production faster and more accessible. Pronunciation of foreign proper names is not one of them. In our experience producing AI narration at Nubart GUIDE's entry service level, artist names and foreign toponyms remain among the most persistent failure points — and unlike a human speaker, an AI voice cannot be given a reference recording to learn from. If pronunciation accuracy across multiple languages and cultural contexts matters to your project, this is one of several reasons to weigh carefully when choosing between AI and human narration. We've written about this tradeoff in more detail in our assessment of AI voices for museum audio guides.

What museums can do on their end

The most useful thing a museum can provide before production starts is a list of proper names that might be unfamiliar to foreign speakers — artist names above all, but also place names, historical figures, and any collection-specific terminology. Three things are particularly helpful:

A proper name inventory: every artist, architect, and location name in the script that a non-native speaker might mispronounce. You don't need phonetics — just the list.

A language check: if a name has an accepted conventional form in other languages (the kind you find by checking the language menu on Wikipedia), flag it. "Firenze" and "Florence" are an easy example; many museum-specific names are less obvious.

A voice reference: an informal smartphone recording of a curator or staff member saying the tricky names out loud. Two seconds of audio is more useful than a page of written guidance.

The museums that provide this information at the start of a project end up with better audio guides. It's a small investment that prevents a disproportionately large problem.

Name handling actually begins one step earlier, during translation — where the same proper names are carried across languages and given their accepted local forms. Our translation guidelines for audio guides cover that side of the process.

Nubart's team

When a voice artist doesn't know how to say the artist's name