Automating Personalized Videos with FFMPEG

By November 9, 2018 Uncategorized

About a year ago, Lee Martin published a Medium post about a marketing campaign he recently completed on behalf of Marilyn Manson’s new album, Heaven Upside Down. Using geolocation data from a previous concert contest, Martin rendered 25,000 unique videos that were emailed to Manson fans that gave the impression of a satellite specifically monitoring their home, while the song “We Know Where You F*cking Live” played.

Martin describes his process in detail of creating the videos, utilizing Mapbox’s API for the satellite imagery, Imagemagick to convert the photos to black and white, Dataclay’s Templater for rendering the videos, Amazon S3 for hosting the files, and Sidekiq and Postmark for sending the emails. He said the campaign at the time had a 50% open rate and only 5 spam flags (absolutely staggering for email marketing). It’s a great article.

Recently I worked with a client that was also building an automated video personalization flow, and though the team currently utilizes Templater in a cloud environment, they wanted to utilize server-based scripting as much as possible, so I’ve been working a lot with FFmpeg and Imagemagick lately. Re-reading the post, I noticed that Martin was utilizing Templater and After Effects in a very basic capacity- spinning the satellite photo and compositing the graphics with After Effects is like hiring an oil portraitist to paint a house. Respectfully, After Effects and Templater are more appropriate for heavier tasks.

Therefore I sought out to recreate Martin’s project entirely in the command line.

Until this last year, I’ve only used FFmpeg to convert video files from one format to another. As I dug into the features, I’m amazed at how comprehensive the program is: it can convert nearly any audio and photo format as well as video, smooth out shaky footage, composite with alpha channels, and even OCR (transcribing text onscreen). So the first step in recreating Martin’s project is to render out a satellite JPEG photo to a :15 video loop where the footage is black and white, and rotated 360°.

ffmpeg -loop 1 -t 15 -r 30 -i geo.jpeg -vf “rotate=.133*PI*t:ow=90:oh=900:c=none, hue=s=0” twist.mp4

This creates a

  • :15 second video,
  • 30 fps,
  • rotated .133 radians per frame,
  • 900×900 video frame (the maximum size a 1280×1280 Mapbox picture can be rotated without seeing edges)
  • Saturation set to “0”

Next we create the crosshairs. This is the upside-down cross of the album, the album name, artist, and release date, and credit to Mapbox and Digital Globe. This could have been done with Photoshop and rendered as a transparent PNG, but for the sake of command-line development, I created the image in Imagemagick. Because I didn’t have the precise fonts used, I found their closest proxies on Google Fonts, PT Serif and UnifrakturCook

convert -size 900×900 xc:none -fill white -font PT-Serif-Bold -pointsize 18 -gravity northwest -annotate +57+57 “MARILYN MANSON” -font PT-Serif-Bold -pointsize 18 -gravity northeast -annotate +57+57 “IN STORES OCTOBER 6” -font PT-Serif-Bold -pointsize 18 -gravity southeast -annotate +57+57 “©DIGITALGLOBE” -font UnifrakturCook -pointsize 36 -gravity center -annotate +0-385 “Heaven Upside Down” satellite.png

This generates a 900×900 transparent PNG image with all text.

composite -geometry +57+57 -gravity southwest Mapbox-Logo.png ‘[100×25]’ satellite.png satellite.png

This composites the Mapbox logo

convert -size 900×900 xc:none -stroke white -strokewidth 6 -draw “line 57,319 57,602” -draw “line 453,319 453,602” -draw “line 57,453 113,453” -draw “line 418,503 488,503” -draw “line 418,545 488,545” -draw “line 793,311 849,311” -draw “line 793,453 849,453” -draw “line 793,595 849,595” crosshairs.png

This generates a 900×900 transparent PNG image with all crosshairs

composite crosshairs.png satellite.png satellite.png

This composites both images together to one master PNG.

Inadvertently, this is also using the wrong tool for the job: because Imagemagick runs on the command line and doesn’t have a GUI, generating this image took a lot more time than composing in Photoshop. The advantage of Imagemagick would come if any of the text had to dynamically change (ie. including the name of the recipient, or the concert date of the nearest venue, etc.) Because there was only one master composite graphic for all finished videos, drafting this in the command line was unnecessary.

Next came creating a :15 sample of the title track. Cutting a snippet from a audio file can also be done directly in FFmpeg, but because this wouldn’t be dynamically changing with each output video, and to reduce render time of FFmpeg from analyzing the entire ~3 minute track unnecessarily, I used open-source Audacity to do this. Using a GUI to specifically grab a precise starting point in the song saved having to manually draft which fraction of a second to start the cut.

So now we have our spinning satellite photo as a black and white :15 video, our satellite graphic to composite, and our audio track, we now compress the video to be under 6MB to be small enough to attach directly in an email.

ffmpeg -i twist.mp4 -i satellite.png -i WKWYFL.wav -filter_complex “[0:v][1:v] overlay=(W-w)/2:(H-h)/2:enable=’between(t,0,20)'” -map 2:a -pix_fmt yuv420p output.mp4

Without having a copy of the emailed video file, it’s hard to determine what precisely Martin attached in his emails: his Twitter post is 480×480, his Instagram is 640×640. Video dimensions go a long way in determining the final file size, so for the sake of argument we’re exporting 640×640 (if you want users to re-post to Instagram, the size Instagram appears to currently play is 600×600).

ffmpeg -i output.mp4 -c:v libx264 -crf 24 -c:a aac -b:a 128k -vf scale=640:640 -pix_fmt yuv420p final.mp4

For the sake of compatibility, ensuring that these videos will play on nearly every computer and mobile device possible, we have to encode these to h.264 mp4’s. Other video codecs and containers have been engineered to run much more efficiently (encode faster, in smaller file sizes, with better visual and audible quality), but can’t be guaranteed to run on any device or be uploaded to any social media player. Oh well.

But there you have it. Lee Martin’s personalized marketing video generated entirely in the command line, which means this can all be written into a single node or python script- API call to Mapbox, rendered to video, uploaded to S3, and even emailed using a service like Envelopes for Python or SendMail for Javascript.

Unfortunately at this point it’s really hard to do an apples-to-apples comparison from my setup to Martin’s. Again, we’re not able to parse his exact video settings and dimensions, and the specs of his “heroic 2012 Macbook Pro” (I’m rendering on a 2010 Mac Pro tower, with 3.33 GHz 6-Core Intel and 32GB RAM). As Martin describes a year later in another Medium post about rendering 1,300 videos for the Foo Fighters, After Effects is engineered to only utilize one core in the processor, so trying to enable mult-core rendering requires additional plugins that basically just offload compositions to FFmpeg.

Rendering out a similar video in my standard After Effects environment, and even utilizing AfterCodecs, which is a plugin designed to utilize FFmpeg libraries within Adobe to make rendering faster and in better quality, a 6MB video takes 11 seconds to render, and is noticeably poorer quality. The FFmpeg command-line scripts above take 13 seconds to render, and FFmpeg is designed to utilize as many cores as it as has available. This means that in a standard server environment, the API call, rendering, and uploading will all take ~16 seconds on a system similar to my existing computer. However in a server environment the computer design is your choosing, so you could implement a 96 vCPU instance on Amazon EC2 or Google App Engine (there’s no such thing as a physical 96-core CPU, so having your own is impossible), and only pay for how long it’s used (very rough calculation I estimate is less than 10 hours for all 25,000 videos, computing cost would easily be in the $100’s.)

Again, this was an exploration towards how feasible an entirely open-source, command-line operation can generate personalized videos at scale compared to using comparable commercial software packages. Martin’s system rendered 3-4,000 videos per day, this system above increases that to around 5,500 per day. In theory, this would also reduce man hours, as each step in Martin’s flow were separated, and this flow compiles more actions in a single script. However, as I’m still only a beginner-level developer, the man-hours it took to create these scripts was a lot longer than the task would have been to handle everything with Photoshop, After Effects, and Templater (again, generating the overlay with Imagemagick was entirely superfluous if there were no dynamic layers).

However, an important note to remember is how great it is to integrate After Effects where it shines best (creating unique and incredible motion graphics and visual effects), and FFmpeg where it shines best (applying a vast collection of variables and rendering at scale). For example, use After Effects to create an animated master composition, render out the video with an alpha channel, and use FFmpeg for compositing and exporting. The fact is there’s a lot of room to be explored in personalization and automation in videos and graphics. And it’s important to use the right tool for the job.

EDIT – 10 November 2018

After posting this, @LeeMartin and @FFmpeg chimed in, and I re-wrote the script to adjust the step sequence. The sequence above separates the steps as:

  1. Create a :15 30fps 900×900 video, rotate the satellite photo, and convert to black & white
  2. Composite the video, satellite graphics, and add the song track to a new video
  3. Scale down the video to 640×640, and compress the compiled video to a lower file size.

The code below simplifies all these steps into one string, and further improves the file for webstreaming:

ffmpeg -loop 1 -t 15 -r 30 -i geo.jpeg -i satellite.png -i wkwyfl.wav -filter_complex “[0:v]rotate=.133*PI*t:ow=’900′:oh=ow:c=none,hue=s=0[v0];[v0][1:v] overlay=0:0:enable=’between(t,0,20)'[v1];[v1]scale=640×640[v2]” -map ‘[v2]’ -map 2:a -c:v libx264 -crf 23 -c:a aac -b:a 128k -pix_fmt yuv420p -movflags +faststart output.mp4

Now the only argument that needs to be passed in is the Mapbox satellite photo (which Martin wrote in his node script). This script now takes my computer ~14 seconds to render (again, in only a single command), and with fewer files and less storage space needed altogether. Couple this with a single python or node script, and the entire process can be created and uploaded in 3 steps.