AI Helpers & Agents – Matthew De George

As soon as the OpenAI API started to offer voice and image generation (or at least as soon as I discovered it) I wrote a little Python script to make videos.

The script was simple: provide a script and it generates audio spoken words from a script, it then looks at the script again as a whole to generate a bunch of prompts to generated images. Finally, it generates the images using those prompts.

The first example (above) was the Organisational Reasoning teaser video for a new YouTube channel. There is an irony in this that the whole point of “Organisational Reasoning” is to move beyond the fad of AI content generation and focus on how AI advances will impact the design of organisations.

I was still interested in the more modest aim of being able to “express” pre-existing writings as video and audio. Giving audiences multiple ways to consume content, or being able to quickly publish written words on podcasting platforms, feels like a useful capability for solo creators.

I used the preface for my old, draft, ManageWithoutThem book for the above example. It worked quite well.

Then recently, while attending the virtual Board of Innovation Autonomous Summit the avatar-based approach of Synthesia.io caught my eye. I was very disappointed with what it did with the script as it tried to turn prose into… whatever this is; but the avatars are very impressive (and I think you can use a script directly if you want).

These tools are flooding in now, and Descript have just emailed me today with ChatGPT-enabled tool that pushes the result into their already full-featured editing engine, so you can continue to refine the video with all their standard tools.

A quick example below produces results similar to my original Python script. However, being able to continue editing the video makes it much more powerful.