kickratt: krlogo new (Default)
Had Deforest Kelley been the ship's musician, he might have stated, "I am a musician, not a data scientist." There exists a conversation here that every musician must have when integrating new technology into their craft. What are my intentions with this new technology and how much will it distract me from my original purpose attempting to integrate it? If a certain technology leads you too far from the place where you were creating music, is that truly the path you wish to take? However, if your talents are adaptable enough, then significant changes in direction may be just what the situation calls for.

AI Hardware built

Processor: AMD 7950x
GPUs: (2) GeForce 4060
RAM: 64Meg
OS: popOS
AI models: Optimus_Virtuoso (morpheus -> yoda) & deepjazz

AI model links...

PIanoAI, Google_AI_duet & deepjazz

https://github.com/schollz/PIanoAI

LSTM model

https://github.com/jisungk/deepjazz

Project Los Angeles

https://github.com/asigalov61
https://github.com/asigalov61/Los-Angeles-Music-Composer
https://github.com/asigalov61/Optimus-VIRTUOSO
https://github.com/asigalov61/Morpheus
https://github.com/asigalov61/Yoda

In the developer's own words..

"Alex' Project Los Angeles is dedicated to helping build a fair, safe, ethical, and RIGHT Artificial General Intelligence :)"

Thank you, Alex, for your work.

LLM models used in studio learning workstation breakdown...

Ai models used...

Alex Sigalov
https://github.com/asigalov61

Project Los Angeles
NEW Orpheus SOTA Transformer, Godzilla Transformers, Giant Music Transformer
NEW Tegridy Tools, midi_doctor

asigalov61 : Morpheus (quick learn GPT3)
uploaded to GitHub 3 years ago

Main Features:
1) Most advanced Music AI technology to-date (GPT3+RPR[RGA]) with FULL(!) attention
2) Multiple-embedding technology (proper MIDI encoding with binary velocity)
3) 5-in-1 capabilities: performance, continuation, melody, accompaniment, inpainting(!)
4) Multi-channel MIDI capabilities (9 simultaneous MIDI instruments + drums)
5) Distributed training capabilities (easily train on multiple GPUs out of the box)
6) Pure PyTorch implementation (you only need PyTorch for training and inference)
7) Super-optimized and streamlined code (easy to understand and to modify)
8) BONUS: CLaMP capabilities (CLIP for Music)
FAQ
Q) How long should I train for?
A1) Train for no more than 1 epoch. This usually works well. Training longer usually degrades performance.
A2) You can try to cheat with the help of RPR and train only to full convergence (make sure to use random shuffling). But it is really dataset/task dependent so such trick may not always work for your particular purpose.
Q) What is the idea behind Morpheus 128x128?
A) We basically want to try to squeeze music into symmetrical AND reasonable space. In this case its [127, 127, 127, 127*10, 1]. Music generally loves symmetry. So do the transformer NNs. It's not the most perfect arrangement, nor it is the most universal, but it does show better results over asymmetrical encoding schemas.
Q) Why Morpheus 128x128 does not use chordification?
A) Chordification greatly helps to save on train data size indeed. Unfortunately, this comes at a price: quality loss, especially on delicate datasets. Therefore, to allow for maximum music output quality Morpheus 128x128 excludes chordification.

asigalov : YODA
GPT3+RGA(RPR) Version
uploaded to GitHub 3 years ago

Main features:
1) Most advanced Music AI/Transformer technologies to choose from
2) Multiple-embedding technology
3) Pure PyTorch implementation
4) Distributed/Multiple GPU training out-of-the-box

asigalov : PERCEIVER
uploaded to GitHub 2 years ago
based on lucidrains
https://github.com/lucidrains/perceiver-ar-pytorch
Perceiver is geared for PopMusic generation

Google's SOTA(State of the Art) GPT
Perceiver-AR Music Transformer Implementation and Model

https://arxiv.org/abs/2202.07765

Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks.

see attached pdf

PERCEIVER deprecated YODA, YODA deprecated MORPHEUS (what does this mean)
Deprecation is a status applied to software features to indicate that they should be avoided, typically because they have been superseded. Although deprecated features remain in the software, their use may raise warning messages recommending alternative practices, and deprecation may indicate that the feature will be removed in the future. Features are deprecated—rather than immediately removed—in order to provide backward compatibility and give programmers who have used the feature time to bring their code into compliance with the new standard.

asiglov : Euterpe
Multi-Instrumental Music Transformer trained on select datasets
uploaded to GitHub 3 years ago

Features include:
1) Improvisation
2) Single Continuation
3) Auto-Continuation
4) Inpainting
5) Melody Orchestration

asiglov : Optimus-VIRTUOSO
uploaded to GitHub 4 years ago
DEPRECIATED in favor of Morpheus

ORIGINAL MUSIC COMPOSITION: Optimus-VIRTUOSO Compound Music Composer with custom MIDI option
This is basically an improved reproduction of OpenAI's MuseNet. With large enough dataset, you should get comparable(or better) results

And no, you do not really need Sparse Attention/Transformers here unless you want to train on gigabytes of data

BEST OF BOTH WORLDS: Optimus-VIRTUOSO: Relative Global Attention Edition
GPT3 + RGA(RPR) = AWESOME

If OpenAi's MuseNet and Google's Piano Transformer could have a baby, that would be it :)

Optimus-VIRTUOSO: Multi-Instrumental Relative Global Attention Edition
Now featuring a pre-trained Piano-Violin duo model and Endlesss Violin Carousel MIDI dataset!
This is the same awesome RGA implementation as above but with full multi-instrumental support, continuation options, and other cool features. Check it out! :)

-----------------Other in-house models

Evan Chow transformer: jazzML
uploaded to GitHub 9 years ago
https://github.com/evancchow/jazzml

OpenMusinet3 - Trained for 4096 context tokens (Model is able to support up to 32k)
12 instruments (Unlike Musenet, the instruments aren't categorized by section, its basically part numbers)
4 dynamics. Qwen-2 model, but with GPT-2 tokenizer?? (Works weirdly well)

deepjazz - Using Keras & Theano for deep learning driven jazz generation
I built deepjazz in 36 hours at a hackathon. It uses Keras & Theano, two deep learning libraries, to generate jazz music. Specifically, it builds a two-layer LSTM, learning from the given MIDI file. It uses deep learning, the AI tech that powers Google's AlphaGo and IBM's Watson, to make music -- something that's considered as deeply human.

Ji-Sung Kim - deepjazz author
uploaded to GitHub 9 years ago
Princeton University, Department of Computer Science
hello (at) jisungkim.com

deepjazz Citations
This project develops a lot of preprocessing code (with permission) from Evan Chow's jazzml. Thank you, Evan! Public examples from the Keras documentation were also referenced.

This is a pretty complete list of the in-house learning models used for composing predicted midi scores. If your building your own learning studio system for "MIDI ONLY", hopefully this list can help get you started. MIDI predicting with these AI models is still a very cyclic process.

The simplest way I describe composing with these tools, is that the system produces 250 jigsaw puzzle pieces to a 50-piece jigsaw puzzle. Does that make any sense???

There are only three links you will need to focus your attention...

https://github.com/
https://huggingface.co/
https://system76.com/pop/

Our original Learning Workstation was installed on Linux Debian
The current Workstation is on Pop! Os by System75