speak, sing…

26 Feb 2021

An Encounter with the Artificial

Composer in zoom rehearsal call with clarinet player

José del Avellanal Carreño in rehearsal with Oliver Janes

RNCM fourth-year undergraduate student José del Avellanal Carreño (RNCM School of Composition) writes about his collaboration with the Birmingham Contemporary Music Group (BCMG) clarinetist, Oliver Janes, using the software PRiSM SampleRNN in his compositional process. This is the first work from an ongoing partnership between PRiSM and BCMG.

José’s work focuses on musical narrative and the interrogation of ‘human music’ and the exploration of its problematic boundaries. He is the first RNCM student to complete a work using PRiSM SampleRNN in collaboration with PRiSM Research Software Engineer, Dr Christopher Melen.

The premiere of ‘speak, sing…’ was recorded live on 26 January 2021 at CBSO Centre, Birmingham. This piece is a BCMG commission for ‘Soliloquies & Dialogues’ in partnership with PRiSM. Released 5 March 2021 –  watch here.

Introduction

One of the most important concepts behind my recent compositional work is the idea of ‘human music’, a notion which I first encountered through my interest in chess. Due to the incredible power of present-day chess engines, terms such as ‘human’ and ‘computer’ moves have become commonplace jargon amongst commentators and analysts, describing the level of probability of a real-life player detecting said moves without technological assistance.

I found myself intrigued and fascinated by this notion, and it didn’t take long for this idea to find its way into my musical thinking – what is it that we understand as ‘human’ in the realm of music? What constitutes the ‘non-human’? How does one attempt to define the divide between the two? What lies in between? These challenges and provocations offered me exciting territory for the exploration of musical ideas. The announcement of a new call for proposals by PRiSM and BCMG inspired me to shape a new approach to this subject, based around PRiSM’s newest technological development in Machine Learning – the neural network PRiSM SampleRNN.

It is worth noting that, despite their innocent appearance, there is no simple, immediate answer to the aforementioned questions. The idea of ‘human’ in music allows for a great variety of interpretations, due to all of the possible musical parameters that can be submitted to scrutiny in our considerations, but also to the inherently referential – and perhaps to an extent, subjective – nature of the notion of ‘human’. There may not be a satisfactory solution to these complex issues, but to me, that does not invalidate the legitimacy of these questions as a valuable creative stimulus. In fact, I find their openness to be a really attractive quality, an invitation to a wide array of exciting possibilities which I aspire to explore in my work.

With this in mind, I drafted and developed a proposal for a new work for clarinet and electronics. My strategy was to contrast my own music with that generated by PRiSM SampleRNN – representing the ‘human’ and ‘non-human/artificial’ domains, respectively. I would then interrogate these terms, closing the gap between them and bringing them together. In this way, the division between the opposites – human and artificial – is created only to be subsequently questioned and challenged.

A Quick Word on SampleRNN

Before getting into further detail about the compositional process of the piece, it is worth explaining how PRiSM SampleRNN works. As PRiSM Research Software Engineer Dr Christopher Melen explains in his highly informative article A Short History of Neural Synthesis, SampleRNN is an ‘end-to-end’ system, which takes an input of raw audio samples for the generation of new original audio. Understanding this is key: the network cannot function without any given input – it requires pre-existing sonic material to learn from, in order to produce new content.

Taking this into consideration, I based my working strategy around a series of recorded improvisations by the performer, which would act as an external stimulus for a series of independent responses – my own and those generated by PRiSM SampleRNN – and provide the main musical material for the construction of the piece. This approach allowed me to define a balanced context for the comparison of ‘human’ and ‘artificial’ in the work, since they share a common starting point, but also gave me the opportunity to have the player’s creative influence informing the compositional process, which is something I find really exciting.

For this purpose, BCMG clarinettist Oliver Janes, for whom this piece was written, recorded two contrasting and highly evocative improvisations, which would become the basis of all the musical material of the piece.

RNCM PRiSM · Audio 1 – Improvisation 1 (snippets)
RNCM PRiSM · Audio 2 – Improvisation 2 (snippets)

*Snippets of improvisations presented with permission from Oliver Janes.

Responses

After a time of intense, repeated listening to Oliver’s improvisations, and the development of a good amount of musical material in response, the moment came to test the same process on PRiSM SampleRNN. For this step, I counted on the invaluable help and assistance of Dr Melen, and we experimented with the possibilities of the PRiSM network throughout an initial test session and a later more extensive one.

We explored the readjustment of parameters such as the amount of input audio, the sample rate, the number of epochs – the number of times the input is analysed – and the ‘temperature’ – a degree of randomness in the generation of new samples – as well as using pre-existing audio samples as a seed for the generation process. I had absolutely no clue what to expect when the different batches of generated audio were sent my way! This is what we got…

RNCM PRiSM · Audio 3 – SampleRNN (improv 1, Sample Rate 22050, Temp 0.3)
RNCM PRiSM · Audio 4 – SampleRNN (improv 1, Sample Rate 16k, Epoch 90, Temp 0.6)
RNCM PRiSM · Audio 5 – SampleRNN (improv 1, Sample Rate 16k, Epoch 140, Random Seed, Temp 0.2)

Please turn down your volume before playing this one!

RNCM PRiSM · Audio 6 – SampleRNN (improv 1&2, Sample Rate 16k, Temp 0.9)

These results were indeed surprising, and – as you can hear – presented various degrees of success. The amount of input audio available – roughly 13 minutes – is likely to have posed some problems, such as making the network prone to overfitting. With hindsight, it would have been worth asking for longer improvisations in my instructions to Oliver.

Nonetheless, my first reaction to hearing the audio samples was actually relief! Due to the limited input material available, there was no guarantee that the network would be able to learn enough to generate anything apart from noise and silence. Despite the damaged audio quality and sudden bursts of noise, the PRiSM SampleRNN responses overall contained really interesting and fascinating musical material to work with, which in a wonderfully uncanny fashion evoked the sound of the clarinet.

It may be that some changes in my approach might have offered me less convoluted results, but nevertheless I found myself with a substantial amount of exciting, original, and stimulating content for my piece – and most importantly, highly contrasting to the set of responses I wrote myself.

Crafting a Piece

Building on the ideas expressed in my proposal, I decided to shape my piece around a general narrative focused on the encounter between the ‘human’ and ‘artificial’ worlds, and how this contact sets off a series of exchanges that progressively close the divide between the two. Using both sets of responses, I assembled my distinct musical environments – ‘human’/clarinet and ‘artificial’/electronics – and started planning out different strategies for bringing them closer together.

I was really attracted to the thought of both domains learning from each other, expanding their musical environments through the integration of elements from the other. This approach presented the main creative challenge, which I came to identify with the idea of ‘translation’: how to express the ‘artificial’ soundworld through the ‘human’ medium (live clarinet), and vice versa. Ultimately, Oliver’s recorded improvisations became my means for introducing ‘human’ characteristics into the electronic realm, and I used transcriptions of the PRiSM SampleRNN-generated audios to incorporate ‘artificial’ features into the live clarinet part.

In summary, the work presents two simultaneous levels of narrative: the dialogue-like interactions between both domains and the internal transformations produced as a result. These transformations take place independently and at a different rate on both realms; by the end of the piece, the electronics seem to have reached the climax of their exchange, while the clarinet is still at an early stage of its encounter with the ‘artificial’ world.

Final Thoughts

Working on this project, while at times challenging (I do not want to think about many hours I may have spent transcribing recordings and cleaning audio…) has been a fantastic and hugely rewarding experience.  I feel truly honoured to have been so actively involved with PRiSM’s wonderfully exciting technology, a fascinating creative tool that surprised me, confused me, intrigued me, inspired me, and allowed me to push my ideas further in a new, refreshing way.

Today’s ongoing research in the field of AI/ML presents invaluable potential for the exploration of the possibilities of its artistic application. PRiSM SampleRNN and its “delirious, slightly broken, angular, inscrutable and ambiguous” outputs (quoting from PRiSM Lecturer in Composition Dr Sam Salem’s article A Psychogeography of Latent Space) offer a glimpse into this fascinating new universe. As technology in continuous development and refinement, its future prospects look incredibly attractive, and I truly look forward to seeing how other composers and creatives will integrate it within their work, or use it to expand their artistic vision.

a photo of handwritten scores jumbled on the floor

Transcriptions and responses for ‘speak, sing…’

Acknowledgements

I would like to express my gratitude to PRiSM Director Professor Emily Howard and BCMG Artistic Director Stephan Meier for their invaluable help in making this project a reality. I would also like to thank Oliver Janes for his fantastic improvisations, the very cornerstone of the piece, stimulating and instructive in equal measure. Last, but not least, I wish to send a wholehearted thank you to Dr Christopher Melen, for his guidance and support in every step of the process, as well as his endless patience with my many questions and queries – spanning hours of Zoom meetings and an obscene number of emails. Thank you for making this piece possible.

The short film below shares insights from José and Oliver’s rehearsal for the performance of ‘speak, sing…’ Compare the PRiSMSampleRNN output that José shares above, with the ‘live’ clarinet that he wrote for Oliver.

Also in this section...