2024-10-18

kickratt: krlogo new (Default)
Ai music composition review

To simplify the Ai process to understand exactly what it does and how it fits into the process of creating a generated composition like ARBOREAL. This systematic procedure to compose ARBOREAL auditioned GPT3 & LSTM models and their prediction abilities & studied their differences. Both types of transformative models were examined and used in the song's composition. Both model types completed predictions on instruments, duets and the full three-piece band structures produced by the algorithms. Both models make random next note choices, by constructing probabilities from the musical structures of the midi input in the datasets they train on. Midi compositions that have been converted to text or symbolic, then concatenated or ordered into directory datasets. The more files of dichotomy to be found in a dataset, files of variation grounded in a concept, like the key of C in the case of Arboreal, the more accurate the modeling of the probability choices available for training. Most of the time invested in this project was constructing original datasets. The final datasets were constructed of both original pure data generations & Ai predictions. These ongoing advances to the score were interrogated within for thematic ideas. As these thematic datasets advanced so did the desired outcome. The development of themed dataset, or databases based off a more select group of the data, advances the dataset and thus advances the composition. In our process we found that datasets didn't need to contain all that they have learned, on the contrary, in removing data we found that the prediction models returned better results that advanced more rapidly than to the song's conclusion.

Using original generated midi input data for our Ai training datasets, creating thematic datasets that promote directional change in the song prediction outcomes. Removing first generation data to advance prediction outcomes... There is a possibility that we are the first here (on the KicKRaTT project) to do this.

All other artists and commercial productions for or containing Ai music have employed datasets containing commercial or historical information, known midi scores and obvious samples for the training dataset of their Ai model. That the dataset building process is a constant in an Ai model's development. And last there is this conversation out there that Ai model's will require more and more grid power from the grid to maintain their datasets. This is untrue. Datasets just need to be managed, and old data thrown out to give rise to new data. Which gives rise to new outcomes. If you keep the same old data for the needs of now you risk getting the same old answer.

On that note...totally theorizing here... "could an Ai model's knowledge and ability become stagnant if the dataset wasn't managed?" Creating a sort of stuck in its ways Ai. A stubborn Ai model personality if you will...food for thought???

The GPT3 & LSTM models do essentially the same thing, though the LSTM considers previous notes.

Consider the words of Christine McLeavey Payne (CMP) https://christinemcleavey.com/
Interview with CMP https://medium.com/dsnet/interview-with-openai-fellow-christine-mcleavey-payne-aaef948ad571

From the interview...
"On one hand, it’s hard to see an LSTM as creative. Behind the scenes, my model is just predicting how likely it thinks each note is for the upcoming step. It then randomly chooses between those notes, based on that probability. I can increase the randomness and maybe it will seem more creative, but it’s still just based on patterns in the training data."

On the other hand, maybe human creativity is similar in ways. We all have building blocks that we put together into patterns — often usual, predictable ones, and then sometimes wacky random ones. Maybe the most important thing about human creativity is our ability to recognize the times when our wacky and random experiments are actually beautiful art."

On the one hand she describes LSTM predicting to its simplest, every Ai composer should give this credence. CMP just took the mystery right on out of Ai and brought it Ai down to earth as the tool that it is.
On the other hand, she makes this very insightful statement about human creativity. I love it.

The GPT3 model predicts along the same line as the pure data algorithms generate. From its analysis of the patterns found in the midi to text datasets. After training on our pure data datasets, we found our GPT3 model's ARBOREAL predictions were determined with almost the same sets of conditional probabilities as the original algorithm structures. The probability differences between the original pure data algorithm and the GPT3 probabilities (determined with datasets of 100 or more files contained) were within a +/- 10. If our algorithm generated in key: A then E, back to D. The GPT3 model produced outcome E then D, back to A. In our examination of the GPT3 prediction, the model just produced an alternative to the pure data generated original. And a rough estimate of a thirty percent chance of the GPT3 model predicting the same notes in the same order as the original score.

We were able to define in the GPT3 model a template scale in the of C for the drum kit prediction.

The LSTM model adds to the GPT3 model the additional functionality of memory. Thus, its name Long Short-Term Memory. So how does this memory function change the outcome of prediction. Using the A to E to D example, the LSTM will give credence to the first note in prediction A, and from the patterns of the dataset will determine that D is the more consistently chosen next note pick over E, and that after D is the more probable choice is E. That from the original A to E to D score the LSTM predicts A to D to E. So, the LSTM makes a more considerate next note choice over the general probability choices of the GPT3 model.

In the end they both can yield the same result it's just the way in which they are going about it. The predictive GPT3 model just looks for a same or alternative pattern of the original notes in the dataset and presents this as it's predicted outcome. The LSTM model predicts the next note played giving credence to the pervious note played. For our Ai 2024 entry, the first primary dichotomy found in our datasets between our files was made between the random generated and expression generated midi scores produced with pure data. The second variation of file dichotomy was incorporating our studio pianist's musicality into the algorithm conditional statements. The third dichotomy created was the difference between these two predicting models (GPT3 & LSTM) created for our datasets that was never expressed in our Ai song process document.

song-build-03

ISMIR report
https://ismir.net/
https://transactions.ismir.net/

What is the ISMIR?
The ISMIR is the International Society for Music Information Retrieval
a non-profit organization seeking to advance research in the field of music information retrieval (MIR)—a field that aims at developing computational tools for processing, searching, organizing, and accessing music-related data.

The ISMIR founds itself on the IEEE code of standards
https://www.ieee.org/about/corporate/governance/p7-8.html

ISMIR endeavors are brought to you by...

music.ai(music.ai)
adobe(adobe.com)
suno(suno.com)
riffusion(riffusion.com)
pro sound effects(prosoundeffects.com)
deezer(deezer.com/us/)
steinberg(steinberg.net)
Yamaha(usa.yamaha.com)
cochl.ai(cochl.ai)
Dolby(dolby.com)
Netflix(research.netflix.com)
Audible Magic(audiblemagic.com)
bmat(bmat.com)
yousician(yousician.com)
* The ISMIR has been sponsored by different companies each year over the past decade, with a smaller group of consistent sponsors (adobe,steinberg,dolby).

Ask the question why all of these companies are interested in the field of music information retrieval (MIR)—a field that aims at developing computational tools for processing, searching, organizing, and accessing music-related data. The ISMIR conference is educated each year through many technical research papers submitted by teams, students & researchers of Ai technologies, applications and real word users in the reals of digital audio. In my research I have yet to see a paper on analog audio, so it's a safe bet that basically the ISMIR is about anything that can be done with digital audio. A lot of sponsored effort for the developing of commercial audio tools. But if we are at the precipice to a new realm of audio, I guess I can understand all of the interest. Three-dimensional audio? 32-bit float recording? Sample frequency analysis. Yes, there is a lot going on in audio.

The entire list of papers submitted this year, 2024 can be seen here.
https://ismir2024.ismir.net/accepted-papers

"Human-AI Music Process: A Dataset of AI-Supported Songwriting Processes from the AI Song Contest"

and I found references to ISMIR papers related to the International Ai Song Writing Contest back to 2019.

Here is a previous paper submitted to the ISMIR about the Ai Song Contest of 2020
https://program.ismir2020.net/static/final_papers/167.pdf

I will link to the 2024 Ai Song Contest paper as soon as it becomes available. But I have been unable to locate any of the Ai Song Contest papers pertaining to the "Human-Ai process of the Ai Song Contest" submitted to the ISMIR after 2020.

In the days following the Ai Song Contest, I came across this graphic presentation below, posted on linkedin, depicting the use of Ai technologies by contest participants. Or how Ai was used in their song writing process.

ISMIR-rep

Page Summary