Image cropping in the machine learning age

Image cropping has increasingly become a pain in the ass. With content flowing more freely in the web and on different brand platforms like apps or social media more and more crops of the same image are needed. Even if you just think of a website you often need half a dozen or more different crop renditions of an image. Currently there are two technologies to crop images automatically:

Using simple cropping functions, like in ImageMagick. These techniques mostly just crop from the center (or another gravity of you implement it that way) and they don’t care about the content of the image at all.
More advanced algorithms, like the one used for instance in Cloudinary, use face detection. They try to detect faces in images and use this as gravity for the crop. This requires simple image recognition algorithms that have been around for a while and are not involving any machine learning. While this is certainly better, face detection often falls short: Faces are missed by the detection, objects are detected as faces that aren’t and if an image has no faces at all it could lead to weird results. Also, some images have faces but they aren’t necessarily the most important part of the image.

There is clearly a lot for room of improvement here. Twitter is rolling out a new cropping algorithm powered by machine learning. This is based on getting to get the most salient region of an image. There has been a lot of research on this, academics did eye tracking studies to understand which parts of an image humans fixate when looking at it. Mostly this seem to be faces, text, animals, objects and regions of high contrast. Based on that insights a neural network can be trained to find the salient region—and center the crop around it.

Twitter has now developed these algorithms further to use them with high performance in a large-scale production situation. The results are stunning and show the clear advantages of this approach.

… and with the new one powered by machine learning. Source and more about the research: blog.twitter.com

I think the possibilities are endless: From automatic crops in all kinds of digital media but also in print layouts. Imagine you could just size a frame (by a layout application box, CSS or just proportions), assign an image to it and it is automatically cropped in the best way possible. I hope to see this very soon in print applications, digital infrastructure and DAM systems.

Georg Obermayr

I’m one of those guys in the media production and publishing scene, that is often labeled as a thought leader. But I’m a practitioner. Day in and day out I work as Head of Crossmedia Production in an advertising agency. I’m hands on creating content infrastructures and designing websites, apps and social media stuff that are driven by these infrastrucutures. This it what grounds me. And it is this daily business work that helps me identifying the trends and emerging topics of our field. With that kind of real world knowledge, I’m an active participant in bringing our industry forward: I write a lot about agile publishing, digital publishing, development, and media production, not just here but also in well know magazines and journals. I’m a keynote speaker at conferences and do a lot of trainings and consulting work. Since I’m originally a print person, I was involved in developing industry guidelines for PDFX-ready. I co-authored the book “Agile Publishing”, still the 400 pages reference work on how agile processes move user experience and storytelling in the spotlight of todays multichannel world. I’m living at the intersection of design, content, technology and marketing. How hypes can be moved into practical use is what drives me every day.
www.xing.com/profile/Georg_Obermayr
www.linkedin.com/in/georgobermayr
www.twitter.com/georgobermayr
Buy the book "Agile Publishing" on Amazon

Read next The learning robots are coming

Georg Obermayr