December 2022
Kartavya Gupta his Master of Science Kartavya Gupta in Exact Humanities (EH). His research work was supervised by Dr. Ashwin Jayanti. Here’s a summary of his research work on Improvised Sequence Generation in North Indian Classical Music:
In recent years there has been a lot of research work regarding media synthesis and generation using artificial intelligence. Artificial intelligence programs like Dall-e can generate an image from a phrase. StyleGAN from Nvidia can create images of the faces and bodies of people who do not exist. Video manipulation using deep learning can change one person’s likeness to another using deep learning and neural networks. Programs like GPT-3 can generate text of such quality that it is hard to determine whether a human wrote it or not. There has also been much progress in generating music using artificial intelligence. MuseNet by Openai, a deep neural network that can produce 4-minute musical pieces that have different instruments and styles. Jukebox by Openai, a neural network that produces music and simple singing as raw audio for a few genres and artistic styles. WaveNet from Deepmind, a generative model that generates audio waveforms, can generate background/Ambient music using deep learning. There exists a lot of research work that takes as its subject western popular music and classical music to generate new performances using old performances by an artist, but there exists no research work for the generation of improvised sequences for North Indian classical music.
This thesis presents three methods to generate improvised sequence generation in North Indian classical music. The first method we propose uses context-free grammar, which can be used to generate these sequences. We also discuss how to generate these sequences and problems with this method. We illustrate this method by preparing context-free grammar for yaman raga.
In our second method to this problem, We propose a bi-gram language model to generate improvised sequences. We propose a bi-gram model to evaluate this approach. To assess this model, we introduce our Pt. Ajoy Chakraborty Bhoopali Compositions Data Set. All compositions in this data set are by a single artist, Pandit Ajoy Chakraborty. We evaluate the bi-gram model on this data-set to achieve a TOP1 accuracy of 46.63 percent and a TOP-3 accuracy of 83.50 percent on this data set. We discuss how to qualify and generate new improvised sequences using this model. We also discuss the disadvantages of this model.
In our third method to the problem, we propose a LSTM-RNN based model to generate improvised sequences. We discuss the motivation for this method. We also discuss model architecture and training settings. We evaluate our first LSTM-RNN based model on Pt. Ajoy Chakraborty Bhoopali Compositions Dataset. We achieve TOP-1 accuracy of 45.56 percent and TOP-3 accuracy of 80.42 percent. We discuss how to qualify and generate a new improvised sequence using this model. Further, we also discuss the disadvantages of this method.
Finally, we discuss these three methods in contrast with each other, the concept of creativity and co-creativity, and the future progression of this work.