Had Deforest Kelley been the ship's musician, he might have stated, "I am a musician, not a data scientist." There exists a conversation here that every musician must have when integrating new technology into their craft. What are my intentions with this new technology and how much will it distract me from my original purpose attempting to integrate it? If a certain technology leads you too far from the place where you were creating music, is that truly the path you wish to take? However, if your talents are adaptable enough, then significant changes in direction may be just what the situation calls for.
AI Hardware built
Processor: AMD 7950x
GPUs: (2) GeForce 4060
RAM: 64Meg
OS: popOS
AI models: Optimus_Virtuoso (morpheus -> yoda) & deepjazz
AI model links...
PIanoAI, Google_AI_duet & deepjazz
https://github.com/schollz/PIanoAI
LSTM model
https://github.com/jisungk/deepjazz
Project Los Angeles
https://github.com/asigalov61
https://github.com/asigalov61/Los-Angeles-Music-Composer
https://github.com/asigalov61/Optimus-VIRTUOSO
https://github.com/asigalov61/Morpheus
https://github.com/asigalov61/Yoda
In the developer's own words..
"Alex' Project Los Angeles is dedicated to helping build a fair, safe, ethical, and RIGHT Artificial General Intelligence :)"
Thank you, Alex, for your work.
LLM models used in studio learning workstation breakdown...
Ai models used...
Alex Sigalov
https://github.com/asigalov61
Project Los Angeles
NEW Orpheus SOTA Transformer, Godzilla Transformers, Giant Music Transformer
NEW Tegridy Tools, midi_doctor
asigalov61 : Morpheus (quick learn GPT3)
uploaded to GitHub 3 years ago
Main Features:
1) Most advanced Music AI technology to-date (GPT3+RPR[RGA]) with FULL(!) attention
2) Multiple-embedding technology (proper MIDI encoding with binary velocity)
3) 5-in-1 capabilities: performance, continuation, melody, accompaniment, inpainting(!)
4) Multi-channel MIDI capabilities (9 simultaneous MIDI instruments + drums)
5) Distributed training capabilities (easily train on multiple GPUs out of the box)
6) Pure PyTorch implementation (you only need PyTorch for training and inference)
7) Super-optimized and streamlined code (easy to understand and to modify)
8) BONUS: CLaMP capabilities (CLIP for Music)
FAQ
Q) How long should I train for?
A1) Train for no more than 1 epoch. This usually works well. Training longer usually degrades performance.
A2) You can try to cheat with the help of RPR and train only to full convergence (make sure to use random shuffling). But it is really dataset/task dependent so such trick may not always work for your particular purpose.
Q) What is the idea behind Morpheus 128x128?
A) We basically want to try to squeeze music into symmetrical AND reasonable space. In this case its [127, 127, 127, 127*10, 1]. Music generally loves symmetry. So do the transformer NNs. It's not the most perfect arrangement, nor it is the most universal, but it does show better results over asymmetrical encoding schemas.
Q) Why Morpheus 128x128 does not use chordification?
A) Chordification greatly helps to save on train data size indeed. Unfortunately, this comes at a price: quality loss, especially on delicate datasets. Therefore, to allow for maximum music output quality Morpheus 128x128 excludes chordification.
asigalov : YODA
GPT3+RGA(RPR) Version
uploaded to GitHub 3 years ago
Main features:
1) Most advanced Music AI/Transformer technologies to choose from
2) Multiple-embedding technology
3) Pure PyTorch implementation
4) Distributed/Multiple GPU training out-of-the-box
asigalov : PERCEIVER
uploaded to GitHub 2 years ago
based on lucidrains
https://github.com/lucidrains/perceiver-ar-pytorch
Perceiver is geared for PopMusic generation
Google's SOTA(State of the Art) GPT
Perceiver-AR Music Transformer Implementation and Model
https://arxiv.org/abs/2202.07765
Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks.
see attached pdf
PERCEIVER deprecated YODA, YODA deprecated MORPHEUS (what does this mean)
Deprecation is a status applied to software features to indicate that they should be avoided, typically because they have been superseded. Although deprecated features remain in the software, their use may raise warning messages recommending alternative practices, and deprecation may indicate that the feature will be removed in the future. Features are deprecated—rather than immediately removed—in order to provide backward compatibility and give programmers who have used the feature time to bring their code into compliance with the new standard.
asiglov : Euterpe
Multi-Instrumental Music Transformer trained on select datasets
uploaded to GitHub 3 years ago
Features include:
1) Improvisation
2) Single Continuation
3) Auto-Continuation
4) Inpainting
5) Melody Orchestration
asiglov : Optimus-VIRTUOSO
uploaded to GitHub 4 years ago
DEPRECIATED in favor of Morpheus
ORIGINAL MUSIC COMPOSITION: Optimus-VIRTUOSO Compound Music Composer with custom MIDI option
This is basically an improved reproduction of OpenAI's MuseNet. With large enough dataset, you should get comparable(or better) results
And no, you do not really need Sparse Attention/Transformers here unless you want to train on gigabytes of data
BEST OF BOTH WORLDS: Optimus-VIRTUOSO: Relative Global Attention Edition
GPT3 + RGA(RPR) = AWESOME
If OpenAi's MuseNet and Google's Piano Transformer could have a baby, that would be it :)
Optimus-VIRTUOSO: Multi-Instrumental Relative Global Attention Edition
Now featuring a pre-trained Piano-Violin duo model and Endlesss Violin Carousel MIDI dataset!
This is the same awesome RGA implementation as above but with full multi-instrumental support, continuation options, and other cool features. Check it out! :)
-----------------Other in-house models
Evan Chow transformer: jazzML
uploaded to GitHub 9 years ago
https://github.com/evancchow/jazzml
OpenMusinet3 - Trained for 4096 context tokens (Model is able to support up to 32k)
12 instruments (Unlike Musenet, the instruments aren't categorized by section, its basically part numbers)
4 dynamics. Qwen-2 model, but with GPT-2 tokenizer?? (Works weirdly well)
deepjazz - Using Keras & Theano for deep learning driven jazz generation
I built deepjazz in 36 hours at a hackathon. It uses Keras & Theano, two deep learning libraries, to generate jazz music. Specifically, it builds a two-layer LSTM, learning from the given MIDI file. It uses deep learning, the AI tech that powers Google's AlphaGo and IBM's Watson, to make music -- something that's considered as deeply human.
Ji-Sung Kim - deepjazz author
uploaded to GitHub 9 years ago
Princeton University, Department of Computer Science
hello (at) jisungkim.com
deepjazz Citations
This project develops a lot of preprocessing code (with permission) from Evan Chow's jazzml. Thank you, Evan! Public examples from the Keras documentation were also referenced.
This is a pretty complete list of the in-house learning models used for composing predicted midi scores. If your building your own learning studio system for "MIDI ONLY", hopefully this list can help get you started. MIDI predicting with these AI models is still a very cyclic process.
The simplest way I describe composing with these tools, is that the system produces 250 jigsaw puzzle pieces to a 50-piece jigsaw puzzle. Does that make any sense???
There are only three links you will need to focus your attention...
https://github.com/
https://huggingface.co/
https://system76.com/pop/
Our original Learning Workstation was installed on Linux Debian
The current Workstation is on Pop! Os by System75
AI Hardware built
Processor: AMD 7950x
GPUs: (2) GeForce 4060
RAM: 64Meg
OS: popOS
AI models: Optimus_Virtuoso (morpheus -> yoda) & deepjazz
AI model links...
PIanoAI, Google_AI_duet & deepjazz
https://github.com/schollz/PIanoAI
LSTM model
https://github.com/jisungk/deepjazz
Project Los Angeles
https://github.com/asigalov61
https://github.com/asigalov61/Los-Angeles-Music-Composer
https://github.com/asigalov61/Optimus-VIRTUOSO
https://github.com/asigalov61/Morpheus
https://github.com/asigalov61/Yoda
In the developer's own words..
"Alex' Project Los Angeles is dedicated to helping build a fair, safe, ethical, and RIGHT Artificial General Intelligence :)"
Thank you, Alex, for your work.
LLM models used in studio learning workstation breakdown...
Ai models used...
Alex Sigalov
https://github.com/asigalov61
Project Los Angeles
NEW Orpheus SOTA Transformer, Godzilla Transformers, Giant Music Transformer
NEW Tegridy Tools, midi_doctor
asigalov61 : Morpheus (quick learn GPT3)
uploaded to GitHub 3 years ago
Main Features:
1) Most advanced Music AI technology to-date (GPT3+RPR[RGA]) with FULL(!) attention
2) Multiple-embedding technology (proper MIDI encoding with binary velocity)
3) 5-in-1 capabilities: performance, continuation, melody, accompaniment, inpainting(!)
4) Multi-channel MIDI capabilities (9 simultaneous MIDI instruments + drums)
5) Distributed training capabilities (easily train on multiple GPUs out of the box)
6) Pure PyTorch implementation (you only need PyTorch for training and inference)
7) Super-optimized and streamlined code (easy to understand and to modify)
8) BONUS: CLaMP capabilities (CLIP for Music)
FAQ
Q) How long should I train for?
A1) Train for no more than 1 epoch. This usually works well. Training longer usually degrades performance.
A2) You can try to cheat with the help of RPR and train only to full convergence (make sure to use random shuffling). But it is really dataset/task dependent so such trick may not always work for your particular purpose.
Q) What is the idea behind Morpheus 128x128?
A) We basically want to try to squeeze music into symmetrical AND reasonable space. In this case its [127, 127, 127, 127*10, 1]. Music generally loves symmetry. So do the transformer NNs. It's not the most perfect arrangement, nor it is the most universal, but it does show better results over asymmetrical encoding schemas.
Q) Why Morpheus 128x128 does not use chordification?
A) Chordification greatly helps to save on train data size indeed. Unfortunately, this comes at a price: quality loss, especially on delicate datasets. Therefore, to allow for maximum music output quality Morpheus 128x128 excludes chordification.
asigalov : YODA
GPT3+RGA(RPR) Version
uploaded to GitHub 3 years ago
Main features:
1) Most advanced Music AI/Transformer technologies to choose from
2) Multiple-embedding technology
3) Pure PyTorch implementation
4) Distributed/Multiple GPU training out-of-the-box
asigalov : PERCEIVER
uploaded to GitHub 2 years ago
based on lucidrains
https://github.com/lucidrains/perceiver-ar-pytorch
Perceiver is geared for PopMusic generation
Google's SOTA(State of the Art) GPT
Perceiver-AR Music Transformer Implementation and Model
https://arxiv.org/abs/2202.07765
Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks.
see attached pdf
PERCEIVER deprecated YODA, YODA deprecated MORPHEUS (what does this mean)
Deprecation is a status applied to software features to indicate that they should be avoided, typically because they have been superseded. Although deprecated features remain in the software, their use may raise warning messages recommending alternative practices, and deprecation may indicate that the feature will be removed in the future. Features are deprecated—rather than immediately removed—in order to provide backward compatibility and give programmers who have used the feature time to bring their code into compliance with the new standard.
asiglov : Euterpe
Multi-Instrumental Music Transformer trained on select datasets
uploaded to GitHub 3 years ago
Features include:
1) Improvisation
2) Single Continuation
3) Auto-Continuation
4) Inpainting
5) Melody Orchestration
asiglov : Optimus-VIRTUOSO
uploaded to GitHub 4 years ago
DEPRECIATED in favor of Morpheus
ORIGINAL MUSIC COMPOSITION: Optimus-VIRTUOSO Compound Music Composer with custom MIDI option
This is basically an improved reproduction of OpenAI's MuseNet. With large enough dataset, you should get comparable(or better) results
And no, you do not really need Sparse Attention/Transformers here unless you want to train on gigabytes of data
BEST OF BOTH WORLDS: Optimus-VIRTUOSO: Relative Global Attention Edition
GPT3 + RGA(RPR) = AWESOME
If OpenAi's MuseNet and Google's Piano Transformer could have a baby, that would be it :)
Optimus-VIRTUOSO: Multi-Instrumental Relative Global Attention Edition
Now featuring a pre-trained Piano-Violin duo model and Endlesss Violin Carousel MIDI dataset!
This is the same awesome RGA implementation as above but with full multi-instrumental support, continuation options, and other cool features. Check it out! :)
-----------------Other in-house models
Evan Chow transformer: jazzML
uploaded to GitHub 9 years ago
https://github.com/evancchow/jazzml
OpenMusinet3 - Trained for 4096 context tokens (Model is able to support up to 32k)
12 instruments (Unlike Musenet, the instruments aren't categorized by section, its basically part numbers)
4 dynamics. Qwen-2 model, but with GPT-2 tokenizer?? (Works weirdly well)
deepjazz - Using Keras & Theano for deep learning driven jazz generation
I built deepjazz in 36 hours at a hackathon. It uses Keras & Theano, two deep learning libraries, to generate jazz music. Specifically, it builds a two-layer LSTM, learning from the given MIDI file. It uses deep learning, the AI tech that powers Google's AlphaGo and IBM's Watson, to make music -- something that's considered as deeply human.
Ji-Sung Kim - deepjazz author
uploaded to GitHub 9 years ago
Princeton University, Department of Computer Science
hello (at) jisungkim.com
deepjazz Citations
This project develops a lot of preprocessing code (with permission) from Evan Chow's jazzml. Thank you, Evan! Public examples from the Keras documentation were also referenced.
This is a pretty complete list of the in-house learning models used for composing predicted midi scores. If your building your own learning studio system for "MIDI ONLY", hopefully this list can help get you started. MIDI predicting with these AI models is still a very cyclic process.
The simplest way I describe composing with these tools, is that the system produces 250 jigsaw puzzle pieces to a 50-piece jigsaw puzzle. Does that make any sense???
There are only three links you will need to focus your attention...
https://github.com/
https://huggingface.co/
https://system76.com/pop/
Our original Learning Workstation was installed on Linux Debian
The current Workstation is on Pop! Os by System75