AI Creativity and the Challenge of Time

Part of PRiSM Future Music #2

Introduction

PRiSM Co-Director Marcus du Sautoy is the Simonyi Professor for the Public Understanding of Science at the University of Oxford and Honorary Visiting Professor at the RNCM. He wrote this new piece especially for PRiSM Future Music 2#. As well as reading the article below, Marcus made an accompanying presentation of this work which was streamed as part of Future Music #2.

AI Creativity and the Challenge of Time

By Professor Marcus du Sautoy

15 June 2020

When I started writing The Creativity Code, my book exploring the potential of code to be creative in the arts, I was expecting AI to be a decent writer, a pretty good composer but a poor visual artist. My expectations were driven by a couple of considerations. The current wave of AI is driven by data. There seemed like a huge data set of written material for an AI to analyse and learn from. Music similarly was rich in data. Both of these also have a discrete quality about them which meant there was some sequential nature to the evolution of the form. If I have a sequence of words or notes then there is going to be a discrete choice of what comes next. The visual realm although rich in data seemed far trickier to navigate. Traditionally computer vision has always been regarded as one of the great hurdles in creating AI. The brain has evolved over millions of years to take in a huge amount of visual data and to interpret it. Navigating music and words are both much younger activities for the human brain.

So it was quite a surprise as I researched the book to find my expectations were completely wrong. One of the great successes of machine learning has been to create code that is extremely good at recognising what is in a visual image. By analysing enough labelled images, the code has built up a bank of questions that enables the software to identify what is in an image it hasn’t seen.  But if the code is good at recognising visuals how good is it at creating its own visuals?

There have been some very successful projects that have learnt about the underlying style of an artist to be able to reproduce more of the same. The Next Rembrandt project produced a very convincing portrait in the style of Rembrandt that often splits an audience when I show it next to a real Rembrandt.

Figure 1 Which image is the AI Rembrandt?

But for me what is more interesting is code that can break out of current styles and offer something new. The idea of a Generative Adversarial Network or GAN pitches two algorithms against each other in a creator versus discriminator role and is tasked with creating visuals that break the mould but not too much so that we still recognise what is produced as art.

For me this combination of one algorithm that creates new proposals that are then critiqued by the discriminator algorithm reflects much about human creativity. Paul Klee wrote about this process in his own work: “Already at the very beginning of the productive act, shortly after the initial motion to create, occurs the first counter motion, the initial movement of receptivity. This means: the creator controls whether what he has produced so far is good.”

I have been working closely over the last two years with a Mexican artist Eduardo Terrazas. I have been particularly fascinated in an image that he uses extensively in his work over the decades that he calls The Cosmos. He has taken this basic diagram and interpreted in many different ways.

Figure 2 Cosmos by Eduardo Terrazas

Given the many decades of work that Terrazas has produced I believed that there was enough data for a GAN to train on with the chance for it to offer something new.

I took a piece of code developed by Ahmed Elgammal in Rutgers called playform.io and trained it on Terrazas’s work. What was exciting for me was that the code understood the diagram underlying the images but offered its own very distinctive take on the structure.

Figure 3: Output from playform.io trained on Terrazas’s Cosmos images.

But as I have experimented more with the code and explored other outputs based on different data sets, I have begun to recognise that the code has got its own particular style that is beginning to emerge. So much so that when I’ve shown these images to people who have been exposed to a fair amount of AI generated art, the style is giving them the key to distinguishing it from human art that I might show alongside the AI images.

Although AI has been quite successful in the visual realm, something I was not expecting, it has in my opinion been far less successful when it comes to music and the written word. The sequential nature of both initially gave AI a way to generate its own examples. By learning from the kind of things that might follow particular sequences of words or notes, the AI could offer ways to continue a sentence or phrase that locally made quite a lot of sense. This is why you will find some pretty successful examples of short form text generation. Or for example a piece of code like the Jazz Continuator that is able to riff in real time with a human jazz musician by taking a phrase offered by the human and replying with something that makes musical sense as a response.

Last year in collaboration with Robert Laidlow we reproduced as part of the 2019 Future Music session at RNCM an experiment that we first created at the Barbican in London in which an AI trained on Bach’s keyboard works was able to fill in the gaps of one of the English suites that we gave it where chunks had been removed. The audience found it very difficult to distinguish between the AI and the real Bach in this hybrid piece. Interestingly the one person who could tell was the pianist: the AI was difficult to play because it did not sit under the fingers as smoothly as the Bach did. Since the AI is not embodied it doesn’t care about fingering.

The other interesting comment that Mahan Esfahani, the keyboard player at the Barbican event, made to me about our experiment is that one has to remember why Bach called these suites English as opposed to the French suites or the Italian concerto. For Bach language was a key ingredient in his compositional process even when there were no words. The English suite is meant to capture something of the cadence of the language. We had failed to include this in our dataset for the AI to learn on. And of course this is always going to be one of the problems with hoping to get code to be genuinely and interestingly creative. Our data set when we as humans are creating is based not just on visuals or music or literature but a huge fusion of all these influences and more. Quite often the AI will fail to mirror our creativity because of the narrow nature of the learning process. This is one reason that AI has such difficulty often with interpreting language because the interpretation is informed by more than just the written word. History, culture, social context are also key data sets.

There are examples of sentences called Winograd challenges which illustrate the limitations of learning language just from data of well formed sentences without wider context. Take the sentence: The city councilmen refused the demonstrators a permit because they [feared/advocated] violence. The choice of feared or advocated clearly changes what the word “they” refers to. A human will be able to unpick this because of context and previous knowledge. A machine has extreme difficulty sorting out the ambiguity.

But for me the defining challenge for AI is to create coherent long-form prose or music. When algorithms attempts to go beyond the successes of locally generating jazz riffs, bursts of Bach or short-form text then they start to struggle. The Jazz continuator is boring after 5 minutes of noodling; text starts to lose the plot beyond 350 words. The careful overarching structure of a piece of Bach can’t be learned from short bursts of notes. Although the sequential nature of these art forms was allowing AI a way to start creating it also perhaps is the reasons for its failure to go further.

There have already been proposals to move away from analysing music via symbolic representation which thinks of it like the written word and instead analysing the raw audio files that emerge from playing this symbolic code. MuseNet, the algorithm created by OpenAI that we used to realise our Bach project, works in the former mode by processing and generating symbolic music in the form of MIDI files. While DeepMind’s Wavenet in contrast takes audio files as its data set to learn on. (More can be found in Christopher Melen’s blog post on the PRiSM website A Short History of Neural Synthesis).

For me it is the time element that music and to a certain extent also literature exploit that is missing from a static visual which makes these disciplines different in quality to visual art. It might be that we have to find ways of analysing music a-temporally as if we were looking at it all in one go rather than listening to it over time. Doing that might help build an AI that can create globally interesting music. Taking some of the successful algorithm from the visual realm and interpreting them musically I think will be a fascinating way to approach this challenge.

Take for example Google’s wonderful X degrees of separation. Using Machine Learning techniques that analyze the visual features of artworks, X Degrees of Separation finds pathways between any two artefacts, connecting the two through a chain of artworks. The algorithm takes one on a scenic route full of surprising connections, masterful works by unknown artists and revealing the hidden beauty of mundane objects.

Figure 4 X degrees of separation: from the Brygos painter to Rembrandt.

Is there for example a way to map music compositions into a high dimensional space such that we can chart a chain of connection taking us from Bach’s English Suites to Stormzy’s Shut Up. In my collaboration with Emily Howard on Four Musical Proofs and a Conjecture we explored a version of this idea where Emily composed a path from Beethoven’s Op.130 to Schubert’s Death and the Maiden.

One of the challenges here is that although visuals are low dimensional by their nature, music is a hugely high dimensional shape to encode. The time element again means that finding a way to say if two pieces of music are “close” in a parameter space would be extremely challenging. But for me it’s cracking this challenge of a global appreciation of a work of literature or music that is the next hurdle in creating AI that it is going to go beyond a clever AI soundbite.

Resources

https://experiments.withgoogle.com/x-degrees-of-separation

https://www.playform.io/

https://www.rncm.ac.uk/research/research-centres-rncm/prism/prism-blog/a-short-history-of-neural-synthesis/

https://www.nextrembrandt.com/

About the Author

Marcus du Sautoy is the Simonyi Professor for the Public Understanding of Science at the University of Oxford and Honorary Visiting Professor at the RNCM, where he is Co-Director of PRiSM. He is author of The Creativity Code (Fourth Estate 2019).