Introduction

Since the release of OpenAI’s ChatGPT in 2022, foundation generative AI models have evolved to enable new capabilities and gone multi-modal. Today, artificial intelligence can accept inputs and provide outputs in multiple formats, including images, video, audio, and computer code. 

Dedicated Video Models like OpenAI’s Sora, Meta’s SAM2, and Runway’s Gen-3 Alpha are pushing the boundaries of what’s possible with AI-generated video. The potential is undeniable. Earlier this year, OpenAI executives met Hollywood studios to pitch Sora to filmmakers, while Odyssey, a year-old startup, announced it’s building Hollywood-grade visual AI that should generate cinematic scenery, characters, and lighting. 

This is the right time for digital media publishers to seize the AI opportunity. Generative AI applications can streamline workflows, reduce costs, and help publishers adopt more engaging formats like video at scale. However, knowing where to begin can be a challenge. 

On one end, foundational models provide a flexible playground for those inclined to geek out and customize them as needed. However, they lack contextual understanding, exhibit inconsistency, and are difficult to tailor for publisher-specific tasks. On the other end, ready-to-use solutions like Aeon offer features tailored to meet the specific needs of digital media publishers.

The market for AI-driven video generation tools is expanding rapidly. According to GlobeNewswire, it is projected to increase from just over half a billion dollars in 2023 to nearly $3 billion by 2033, reflecting a compound annual growth rate (CAGR) of over 18%. Most of that growth will be in the media and entertainment segment, and text-to-video is expected to hold the largest growth share.

Today, an increasing number of AI apps and platforms promise to generate AI videos using a variety of models and methods—some yielding impressive outcomes, while others fall short. Evaluating and navigating these options can be overwhelming. That's why we've tested some of the best, so you, as a publisher, don’t have to.

In this blog, we will review some of the best AI Video generators, focusing on their standout features and the value they offer to digital media publishers in their AI video creation process. They are listed in no particular order.  

Aeon

Aeon is built for publishers by publishers with a goal to drive more engagement, site traffic, and revenue. It's the most scalable video production solution, bringing cutting-edge AI capabilities within publishers' reach while offering full editorial control to maintain brand integrity.

image12

Aeon brings enterprise-grade video superpowers to publishers with simple, low-lift workflows:

image6

Aeon’s patented AI technology converts web content into an endless stream of visual narratives, scaling reach and engagement across every page. With just a few clicks, the AI-powered platform turns content into engaging videos, letting publishers create professional quality videos in minutes, not days.

Key Features

Because publisher goals are central to Aeon’s design, its features are more nuanced, resulting in tools that effectively address the specific needs of scaling content, streamlining workflows, and maximizing audience engagement. While Aeon’s feature list continues to expand, here are some key highlights:

  • Fine-tune voices to match brand identity. Create custom voice settings, save presets, and seamlessly integrate them into your videos for a unique and authentic brand voice.

  • Effortlessly clip and crop videos for social media. Smart Crop takes your input video and automatically crops it to a specified aspect ratio, ensuring that your content is perfectly formatted for any platform.

  • Introduce real emotion into text-to-speech video. Real Emotion generates voiceovers that sound remarkably natural and engaging.

  • AI-powered music integration to enhance content appeal. Aeon integrates with Melodie, a leading music library, to bring you an exclusive, AI-powered solution for finding the ideal music track for your videos.

  • Caption videos automatically in a style consistent with your brand. Offers diverse styles, customization options, dynamic placement, and smart caption selection for engaging and memorable videos.

  • Introduce Parallax effects to transform ordinary images into captivating animations, making visual content creation effortless.

  • Edit Tags - balance AI efficiency with full creative control and choose the best of both worlds.

  • Automatic Profanity Bleeping - find and bleep out expletives automatically, eliminating the need for manual work.

  • Multiple Aspect Ratios - The multi-aspect ratio feature for video resizing and motion graphics effortlessly streamlines your video production workflows.

  • XML Export for Adobe Premiere Pro seamlessly transfers Aeon video projects to Premiere Pro while preserving all the elements. 

Pricing

At Aeon, we prioritize understanding your needs before recommending the ideal plan. Drop your contact, and we will reach out with a solution tailored for you.

Pros

Aeon empowers publishers of all sizes to implement a comprehensive video strategy. Whether you are focused on ad revenue—seeking to increase video inventory, scale branded sponsorships, and drive higher CPM—or in editorial, aiming to leverage AI to automate portions of your video production workflow, Aeon helps streamline content creation and reduce time to market. If you are an e-commerce team looking to boost engagement and conversion metrics through video content, Aeon has you covered.

Cons

Aeon is trusted by leading global publishers who value our platform. However, we remain eager for feedback as a young, growing team. If you're an existing Aeon customer, we welcome your candid insights to help us continuously improve.

Synthesia

Synthesia is among the top enterprise-grade AI video tools publishers can use to produce high-quality videos without expensive equipment or specialized skills. It takes a unique approach to video creation by generating AI avatars (synthetic people that are human-like) that appear to be speaking words from a video script you give. 

image5Human-like AI avatars on Synthesia

With over 230+ AI Avatars to choose from (the number depends on your plan), Synthesia turns a supplied video script into an audio format using AI text-to-speech technology and tailors it to a specific language, accent, or dialect using an AI voice generator. It then syncs that audio to an avatar’s facial expression and mouth movements, rendering a final ready-to-use video.

image9

Key features

  • Personal Avatars—create a digital twin that looks and sounds like you, speaks 29 languages, and is set against a realistic background.

  • Avatar Builder—change the color of the avatar’s outfit and add a logo to reflect your brand identity.

  • Dialogue—create scenes with multiple avatars to recreate real-life conversations and engaging scenarios on any topic.

  • AI Voices and Languages—choose from hundreds of natural-sounding voices to transform your script into high-quality voiceovers.

  • Voice Cloning—clone your voice and pair it with your personal avatar to create your perfect digital twin.

  • 1-Click Translations—automatically translate the video into 140+ languages for a worldwide audience reach.

  • AI Video Assistant—transform ideas, decks, PDFs, and websites into videos using AI and pre-designed templates.

  • Templates—300+ templates for different use cases to get going in minutes.

  • Media Library— millions of royalty-free images, videos, icons, GIFs, and soundtracks to add to videos.

Enterprise plan customers can use API integration to link Synthesia to Descript, known for its AI-powered audio and video editing capabilities, to get the best of both worlds. This is particularly useful for those relying on Descript's “Overdub” feature to clone voice. 

image10

Synthesia also provides a sophisticated API interface for integrating data into an automated video creation workflow. For instance, Synthesia-Zapier integration is a no-code feature that lets you set up an automatic video personalization workflow or other data-centered video use cases. 

Pricing

We recommend testing Synthesia with its free plan that provides access to basic features with a video length limitation of 36 minutes. Three other paid options are available to choose from:

image18

Synthesia pricing (As of Sept 2024)

Pros

Synthesia’s user experience is intuitive and easy to navigate. This lets you focus on creating compelling videos without skimping on quality. You can create lifelike videos without hiring actors or renting studios while choosing avatars, languages, and accents that closely match your brand style. It also eliminates the need for video editing skills.

Cons

While Synthesia offers a decent selection of avatars and voices, the range may still fall short for those with particular branding needs. Minor hiccups in lip-syncing and pronunciation can also occur, slightly tarnishing the otherwise professional videos. At $59 a month, the Creator Plan offers just 360 minutes of videos per year, suggesting that their enterprise offering may come with a steep price tag, especially for publisher use cases.

Use cases

The demonstrated use cases are mostly internal. Well-known brands, including Dupont, Xerox, and Zoom, have used Synthesia to create training videos for employees and salespeople while significantly cutting costs and production time. Nevertheless, digital media publishers can leverage the platform’s potential to speed up video-production workflows. 

Rizzle

Rizzle eliminates the need for editing expertise and is perfect for small businesses, content creators, freelancers, and podcasters. The platform generates attention-grabbing videos in minutes while providing good creative control to the user.

Rizzle uses advanced technology to automate everything in one go, including text incorporation, media selection, video layouts, animations, transitions, and more.

image8

Key Features

  • Convert Text to Video - accepts text inputs such as prompts, blogs, tweets, and URLs to generate video in minutes. This includes background audio or voiceovers, animation, text elements, and media. Visual elements can be enhanced post-generation using the regenerate feature, or preferred music can be added.

  • Convert Audio to Video - podcast audio files can be converted to videos using downloadable links or by dragging and dropping the audio file (supports mp3, AAC OGG)

  • Extensive Customization Options—lets you set the stage to meet your specific needs. It allows control over the duration and provides built-in text-summarization/expansion capabilities. Create videos using a summary, the original text, or an expanded version of your input. Text, media, and metadata can be individually edited for precise control during the regeneration flow.

  • Add Background Music and Voiceovers—Rizzle-generated videos can be enhanced with background music or automatic voiceovers from in-built voice and accent options that convert text to speech or trending audio from a copyright-compliant stock library.

  • Popular Aspect Ratios - lets you create videos in 9:16 and 16:9 aspect ratio formats, suiting different platform needs.

  • Diverse Media Options - provides access to thousands of editorial + creative royalty-free stock images, videos, and audio through Rizzle's media library.

  • High-Resolution Viewing - allows you to generate high-definition videos (up to 4K), increasing quality as you climb up the subscription tiers.

  • Copyright Handling - ensures copyright compliance by integrating with stock media providers through APIs and verifying media copyrights.

  • Free Trial - generate videos up to 10 minutes in length using the free trial. 

Pricing

Rizzle’s pricing plans are designed to cater to a diverse audience of creators, from curious beginners to seasoned professionals. The starter tier, priced at $8 monthly, provides 1200 tokens and is suited for aspiring YouTubers, podcasters, and freelancers. 

The Grow tier ($250 monthly) is designed to meet substantial video production requirements, providing 36000 tokens annually with support for five users. On average, each 10-minute video with assets requires about 1800 tokens. Additional token packs are available for purchase.  Rizzle also offers enterprise plans for larger teams with custom branding needs and API integrations to create videos at scale.

Pros

Rizzle is easy to use and requires no editing experience. Its AI-powered video generation produces high-quality results, offering a wide range of styles and templates with extensive customization options for personalized video generation. Multi-platform compatibility makes sharing across various channels easy. 

Cons

Control over fine-grained editing aspects is limited. Certain design elements may require additional customization, while certain features are unavailable on the free plan. 

Use Cases

Rizzle is a great one-stop AI platform to speed up creative production for publishers, video editors, podcasters, YouTube content creators, short video content creators, and thumbnail designers.

Captions

Captions is an AI-powered video creative studio that allows users to automatically add captions, edit videos, and create content with just a few clicks or taps. It offers features like AI voiceovers, dubbing, eye contact correction, and noise reduction, making it an all-in-one solution for both beginners and experienced content creators.

image11

Captions analyzes uploaded video and audio content to automatically generate custom B-rolls, images, animations, sound effects, and transitions that enhance the original footage. Import your unedited video, select an editing style, and let AI handle the rest. 

The platform even applies relevant zooms, transitions, graphics, and captions aligned with the chosen style. It supports AI-powered speech enhancement, automatic trimming of filler words, and eye contact correction.

Key Features

  • AI Script Writer—automatically generates scripts from prompts, sample videos, or articles, helping creators overcome writer's block.

  • AI Video Editing Suite—offers tools for automatic B-roll generation, transitions, and sound effects, streamlining the video editing process.

  • AI Dubbing—translates and dubs videos into over 40 languages, while maintaining the original speaker's voice for seamless localization.

  • AI Eye Contact—corrects eye contact in videos with a single click, ensuring subjects appear to look directly at the camera.

  • AI Shorts—extracts short, high-impact clips with viral potential from longer video content.

  • AI Denoise—automatically removes background noise, improving audio quality with minimal effort.

  • AI Captions—generates precise captions for videos, enhancing accessibility and engagement.

  • Teleprompter—enables users to record videos directly within the app, with a built-in teleprompter feature for smooth presentations.

Pricing

In the web app, you don’t see any pricing until you try to export your work, which isn’t a good practice. A subscription from the Android app costs $9.99 per month or $89.99 per year. iPhone users see different pricing options: Pro ($9.99 per month) and Max ($24.99 per month). Some of the more full-service AI creation features require a Max plan.

Pros

Captions is a versatile application suitable for content creation in various industries and use cases, including marketing, education, corporate training, and social media. It addresses several pain points, including time consumption, engagement issues, accessibility, and technical skills gaps. 

Cons

Though it also has a few standard video editing tools, they fall short of what you get in most other video editing apps. The most compelling AI features require the priciest subscription. The platform is slow and unreliable; most features require upload and server processing. 

Use Cases

Captions’ approach to video is different from others because its tools are designed for specifically editing talking videos. This makes it best suited for videos that convey an idea or experience. 

Hour One

Hour One is popular for its simple interface and advanced features. It offers realistic virtual presenters, voice cloning, and an extensive library of resources, making content production highly customizable.

image4

Key Features

  • Stock AI avatars—to easily localize your content by selecting the avatar that best matches your audience.

  • Custom Studio Avatars—to create an avatar that looks and sounds like you to represent your company.

  • Dub words into any video—upload your video to Hour One’s playground to generate auto-dubbing in any voice or language.

  • Webcam and Selfie avatar—create an avatar directly from your webcam or download our mobile app (iOS) to create one in minutes.

  • Offers text-to-speech, 100+ language support, voice cloning, and auto translations

  • Video editing includes a library of Immersive 3D templates and an intuitive plug-and-play interface for utilizing your in-house design resources. With just a few clicks, you can get the look and feel you want.

  • Supports multiple platform formats & aspect ratios, layout options, screen recorder, and closed captions.

  • AI Video tools—include AI wizards that create scripts with GPT-4 integrations, helping transform ideas into full videos with a single text prompt.

  • The platform also integrates with popular apps and platforms, including YouTube, Articulate360, and Hubspot. 

Pricing

Hour One is expensive, with 360 minutes of video under the pro plan billed at $120. 

image7

Hour One pricing as of September 2024

Pros

The features are intuitive and easy to use and discover. Rendering is quick, and a shared workspace feature is available for team collaboration. Users report positive feedback about the quality of avatars.

Cons

The drag-and-drop interface, while user-friendly, lacks the ability to review each scene independently. The free version, a natural choice for exploration, presents challenges with aspect ratios, leading to stretched and untidy images. 

Use Cases

Publishers can use Hour One to transform headlines and captivate audiences with news and sports content broadcasting. The platform offers over 100 life-like news avatars and real-time news item editing that help bring stories to life in various languages, accents, and ethnicities, enhancing impact and global reach.

Waymark

Waymark harnesses AI to empower advertisers to craft compelling Ad-focused video content. The platform is designed to provide a powerful suite of tools for creating engaging, high-quality TV, CTV, and digital ads that captivate and convert. 

Earlier this year, Waymark signed deals with prominent media companies, including the E.W. Scripps Co., Fox TV Stations, Spectrum Reach, Gray Television, Beasley Media Group, and Morgan Murphy Media. According to CEO Alex Persky-Stern, Waymark has generated over one million ads in 2023 across various media sectors. 

Key features

  • AI-powered video generation automatically creates video ads based on business information and web data. 

  • Web integration scans the internet based on input URL for relevant business data to incorporate into videos. 

  • Customization options allow users to refine videos with adjustments to branding elements. 

  • Multi-platform optimization ensures videos are tailored for TV, CTV, and digital channels. 

  • A user-friendly interface offers a simple, prompt-based generation system for ease of use. Free draft creation allows users to generate and edit draft videos at no cost. 

  • A built-in approval system enables clients to sign off on video creatives before publication. 

  • Real-time analytics provide insights into video performance to inform marketing strategies.

Pricing

Waymark can be tested for free by creating an account that lets users generate draft videos. The Team plan is best suited for smaller organizations that operate within a single office. An Enterprise plan is available for media outfits that span multiple markets and designated market areas and comes with user roles and centralized controls. Both plans have custom pricing available upon inquiry. 

Pros

Waymark AI’s intuitive user interface and AI-powered video creation make professional video ad creation easy regardless of technical expertise. A wide array of customizable templates and advanced editing features allow users to maintain brand alignment.

Cons

While the platform’s AI-powered video creation simplifies the process, it limits the level of control users have over their video content. Access to more advanced features requires a monthly subscription, which may not be cost-effective for all.

Use Cases

Waymark makes premium video advertising accessible to businesses lacking the budget, time, and expertise. Publishers can integrate Waymark's technology into their existing advertising infrastructure through a plugin that allows them to provide advertisers with instant ad creation capabilities as part of the campaign booking process. This can attract more local advertisers by offering affordable video ad creation.

Fliki

Fliki is a creator-focused AI tool that helps transform ideas into stunning videos. It offers lifelike voiceovers, dynamic video clips, and a suite of AI-powered features—all accessible through an intuitive text-to-video editor.

image3

Key Features

  • Text to Video: effortlessly creates stunning videos in minutes with intuitive script-based editing, high-quality AI voices, and a vast collection of media assets that help transform ideas into stunning videos with just a few clicks.

  • AI Voiceover: Transform text into lifelike speech. Choose from over 2000 ultra-realistic voices in 80+ languages, saving time and voiceover costs.

  • Idea-to-Video: Enter a few words to describe your video idea, style, or tone, and within seconds, Fliki's generative AI will create a compelling script and fetch relevant media to accompany it.

  • AI Avatars: Choose from 70+ avatars to represent your narration or create ultra-realistic custom AI avatars (enterprise users) that perfectly match your specific requirements.

  • AI Voice: allows users to create a realistic clone of their voice by recording a 2-min sample.

  • Blog to Video: creates engaging videos from blog articles with URLs as input.

  • PPT to Video: script-based editing and advanced AI voiceovers effortlessly transform static PPTs into videos.

  • Tweet to Video: leverages the power of motion graphics and engaging visuals to turn ordinary tweets into captivating audio-visual experiences.

  • Image to Video: creates videos from images, adding lifelike voiceovers, sound effects, music, and animations.

Pricing

Fliki’s is free to test with a 5-minute limit. The premium plan ($66 monthly) provides 7200 minutes of yearly credit, with quality restricted to 1080p. Publishers looking to scale can contact sales for an enterprise plan. 

image2Fliki’s pricing as of September 2024

Pros

A user-friendly dashboard and interface, quick and easy text-to-video conversion, a variety of voices to choose from, multilingual voice-overs, support for different voice styles, a stock library, and Competitive pricing. 

Cons

Limited creative control: AI voices fail to replicate the nuance and warmth of a human voice-over artist. Limited image generation capabilities: AI-generated images might not always match the vision and can appear generic. Cost can add up, especially when volume is high. Synthesia’s AI-powered avatars offer more realism than Fliki.

Use Cases

With a wide range of features, Fliki is ideal for educators and trainers, social media marketers, entrepreneurs, small businesses, and anyone new to video creation. Its user-friendly interface makes it perfect for beginners looking to create videos without technical expertise. However, publishers looking for highly customized videos must look for alternatives, as Fliki's automation limits creative control. 

Pictory

Pictory is an AI-powered video editing and production platform that converts existing content—such as articles, scripts, and videos—into short-form videos for social media and marketing. Its user-friendly design makes it accessible and requires no editing knowledge. It automates important processes such as finding content highlights, picking images, and implementing transitions while also enabling customization to meet your brand’s style.

image13Pictory workflow with this BBC story as a URL input

Key Features

image1Source: Pictory September 2024

With a plethora of features (see image) on offer, Pictory is a highly functional and clever AI video tool with a number of use cases, making it extremely quick to turn out video content. It can be used in four main ways:

  • Script to video – uses AI to turn a pre-written video script (or transcript) into a full video.

  • Article to video - same as above, except you’re turning an article – e.g., a blog, press release, or news story – into a video.

  • Edit video using text – a video editor that works purely based on text so that you can edit it almost like a word processor. Great for transcribing podcasts, interviews and turning long-form videos into shorter snippets.

  • Visuals to video—this is more of a classic video editing. You can drag and drop images and videos to create a video slideshow.

Pricing

A free trial lets you test Pictory without requiring payment details. You can create 3 video projects, each up to 10 minutes long. The relatively basic starter package costs just $19 per month, rising to $39 for the Professional plan and $99 per month for teams. For publishers looking to scale, an enterprise plan is available upon inquiry.

image17Pictory pricing as of Sep 2024

Pros

Pictory is fast, at least for basic use cases. Uploads and processing times are quick, while editing is easy and intuitive. One can turn around a video from scratch in a matter of minutes. 

It's simple and easy to use, with great library assets, particularly audio tracks. The ability to remove ‘filler words’ and silences from audio and video files is a useful feature. Voices from Elevenlabs (available in the professional plan) are realistic - you wouldn't know it’s AI. Transcription quality is great, too. 

Cons

Generated clips sometimes lack context and accuracy while looking too generic and “stock.” They may not pick the best scenes or images for your text, and there are limited aspect ratios to choose from. The ‘storyboard’ editor tool is basic and may limit freedom and creative options for advanced users. The AI voice-over might not follow the punctuation correctly or mispronounce some words or names. You might need to edit or redo your video to get it right. The support team, while helpful, is reported to be slow to respond.

Use-cases

Publishers can use Pictory to repurpose content efficiently, create engaging social media videos, and ensure brand consistency through features like rapid AI-powered editing, logo placement, custom colors, and brand-specific fonts. Additionally, Pictory can generate short video summaries of longer articles or reports to boost audience engagement.

Capsule

Capsule is currently in public beta but makes some bold claims:

image16Source: Capsule Video

It’s video editing UI draws inspiration from tools like Notion and Slides, making it feel approachable. Layered with AI, it makes creating videos effortless. Its entirely browser-based and powered by ultra-fast cloud GPUs, meaning your device specs need not be high-end. The tool is built for collaboration so that large teams can create gorgeous, on-brand motion graphics in a repeatable and consistent way.  

image15

Key Features

  • Custom B-rolls: an automatic speed recognition model transcribes audio from the video into text, generates images using a diffusion model, and uses Capsule's video markup language (CapsuleScript) to automate video editing, stitching B-roll into the video.

image14

  • AI-Captions: quickly creates accurate captions while maintaining stylization and animation within your brand guidelines.

  • Focus on collaboration: Capsule claims it will soon allow everyone on the team to collaborate on video projects, eliminating the need for apps like Slack, Docs, Slides, Figma, Dropbox, Premier, After Effects, etc.

  • Audio Features: - the Voicemagic feature removes background noises, boomy room sounds, and even loud breaths between words. The AutoMix feature balances the volume of dialogue and music across the entire video with the click of a button. Capsule has partnered with Soundstripe to provide access to a library of 10,000+ royalty-free songs from Grammy-winning artists. 

Pricing

Capsule offers a free trial, requiring a business email to register. Pricing begins at $99 per month for small businesses, with an enterprise plan available for larger organizations.

image19

Capsule pricing as of September 2024

Pros

Capsule addresses critical pain points in the industry, enabling content and marketing teams to collaborate and work more efficiently. The introduction of collaboration features stands to simplify video production workflows with AI, providing a powerful alternative to traditional outsourcing and editing tools.

Cons

Because Capsule is in the beta phase, many features are being rolled out and may not be currently available. 

Use Cases

Capsule pitches itself as a tool for content teams at media agencies and newsrooms, marketing teams at startups, and large corporations looking to scale video content and remove production bottlenecks. This makes it a "must-explore" tool for publishers.

Conclusion

In the rapidly growing generative AI video market, publishers have a unique opportunity to leverage advanced AI tools that streamline workflows, reduce costs, and enhance content engagement. Options range from foundational models at one end to tailored solutions like Aeon at the other, which offer features specifically designed to meet publishers' diverse needs. Now is the time to thoughtfully assess these tools, identify the most suitable ones, and supercharge your strategic business objectives with the power of AI and video.