Monday, June 2, 2025

My Thoughts on Amazon's Virtual Voice Technology after a Year of Using it as an Author and Publisher

 As many of you may already know, I've been participating in the beta program for Amazon's Virtual Voice technology via its Kindle Direct Publishing (KDP) platform for well over a year now. In that time, I've published seven novel-length books and four "story singles" using the tech, and have therefore gained a level of familiarity with the tool and its nuances that I'd say goes a fair bit beyond what a casual or average user of it might experience.

As with any new, in-development software, there have definitely been some growing pains with Virtual Voice from a user perspective. Lost data, poorly implemented or undesirable "features," and general struggles when attempting to report and resolve issues with the system have all reared their ugly heads and been "par for the course" throughout the beta but thus far, I'd have to say that my overall experience with Virtual Voice has been fairly positive.

A bit of context: I have a BA in journalism with an emphasis on publication production. I have been actively working as an author and publisher of speculative fiction for a little over eight years. Before that, and indeed throughout some of that time, I have worked as a software engineer for well over twenty years, using a variety of tech stacks in support of a variety of industries. I also published one traditional, self-narrated audiobook via Amazon's ACX platform just prior to my acceptance into the beta, which I suspect may have factored into my early inclusion in said program. All that to say that this is not my first rodeo when it comes to writing, publishing, and a plethora of technologies.

That said, let's address the elephant in the room as I did in my previous post about this subject. In my estimation, "no," Amazon's Virtual Voice technology is not "AI," either in the proper, traditional sense of the concept or the highly-questionable and suspect way it's thrown around in the tech industry at large these days. To put it simply, Virtual Voice is a natural evolution of the same synthetic voice technologies that have existed for decades. It is not a "push a button, get an audiobook" sort of thing, and in point of fact requires a significant amount of time and effort from anyone using it to produce valid and acceptable results. 

"How much time and effort?" I hear you asking. It can vary somewhat but in my experience, about three to six times as long as the audio of the finished product ends up being. So, a five-hour audiobook, something in the neighborhood of 40-50,000 words, will take between 15 and 30 hours of fairly diligent, detail-oriented work to complete. That's certainly no trivial task, however, comparing it to the time and effort associated with traditional audiobook recording, editing, and post-production, the potential advantages of Virtual Voice will quickly become apparent to anyone with sufficient experience to make such judgements.

But wait! Before you run off and fire up ye ol' KDP Dashboard, there are a few significant caveats to consider:

  • Virtual Voice is still very much an "in-development" technology. That means changes to it, which are at times quite significant, occur on a regular basis. One of the most potentially-devastating can be updates to the virtual voices themselves. While those generally improve their potential to sound "better" in a variety of circumstances, they almost always require a full review of the published material and copious editing of it in the Virtual Voice Studio, not just to achieve those at times marginally-better results, but to correct any issues that may have been caused by the update. This has happened to me several times throughout the beta with various books, requiring me to effectively re-do all of the work on them to keep them sounding their best. That can be particularly brutal with longer books, featuring 10 hours of audio or more, as the 3-6X production time estimate applies to them as well. For example: My longest book, which is a bit over 134,000 words, had to be completely re-worked twice in the past year: once from scratch as a result of data loss triggered by a voice update, and again, with significant tweaks to that voice-training data required by a subsequent voice update.

A screenshot of Virtual Voice Studio
Virtual Voice Studio allows authors and publishers to "teach" virtual voices to properly read their works. This involves adding data points that represent pauses, pronunciation and punctuation changes, voice speed alterations, and more. The process to achieve the best-possible result for a given publication is generally quite extensive and time-consuming.

  • No matter how much effort you put into it, Virtual Voice is never going to sound like a top-tier human narrator. It can sound very good if proper care is taken to get the most out of the system that it's capable of producing but at best, it can deliver a solid, dry reading of the text, with occasional bits of emphasis to prevent things from sounding too monotonous. In fairness, certain types of listeners may enjoy, or even prefer, the consistent, predictable delivery Virtual Voice tends to provide, when the rough edges are smoothed out via editing in the studio, but I think it's highly unlikely that it will ever be able to approach the level of a dramatic reading from an author, or an experienced and talented narrator, or cast.
  • Any audiobook work you do with Virtual Voice is tied to that system since it produces the audio for you. This means that if Amazon decides to pull the plug on the project at some point in the future, as it recently did with Kindle Vella for example, you may find yourself scrambling to salvage your investment, or simply out any work that you put into it, depending on how that hypothetical scenario might unfold.
  • As alluded to previously, there is a lot of confusion and misinformation floating around the internet about what Virtual Voice is and what's involved in using it properly, which isn't helped by the fact that, in its typical fashion, Amazon has been fairly quiet about the tech throughout its development, relying on the authors and publishers such as myself using it to "take the lumps" and communicate the realities of it to potential readers without much in the way of assistance in that regard.

Still, even with all those considerations, I wouldn't discourage any interested author or publisher from investigating Virtual Voice for their own projects. For all its potential pitfalls, there are as many or more opportunities for benefit and success related to it in my opinion, assuming that it's used with caution and care, just like any other tool.

My audiobooks - both virtually voiced and traditionally narrated.

No comments:

Post a Comment