Automata: Building an AI That Builds Itself | Python + Mistral Magic

The Colab: https://colab.research.google.com/drive/1vkVUQ-NDZFYAgtIe63_UAkvIjrC_uzvo?usp=sharing

I’ve been brewing this idea for a couple weeks. Really want to see if I can create a dataset building loop where I can auto web crawl/scrape and build a dataset using an LLM. Then training (or fine tuning) a transformer (might start with GPT-2) with that dataset and progressively evaluating (pass/fail) and rebuilding the dataset until the transformer is performing well. In the end hoping you give this AI an example instruction/response and it will generate a dataset and a language model that can perform said task. Let’s see if this works!

💰Support the stream!
CashApp: https://cash.app/$jawerty210
Venmo: https://venmo.com/jawerty210
Buy Me a Coffee: https://www.buymeacoffee.com/bjGHFVW355

Join The Discord: https://discord.gg/dv4TSzsk27

My Podcast: https://www.youtube.com/@schemeology

My Socials:
Github: https://github.com/jawerty
Twitter: https://twitter.com/jawerty
LinkedIn: https://www.linkedin.com/in/jawerty/
Twitch: https://www.twitch.tv/jaredthecoder10x
Rumble: https://rumble.com/c/c-3572412

00:00 Wtf am I building?
16:50 Setting up the Colab
23:40 Scaffolding / Thinking through architecture
1:23:00 Writing the RUN LOOP
1:41:05 The Google Search Query Generator
2:01:15 Putting it all together / Running the loop
2:37:00 Writing the Google scraper
3:02:00 Raw Data to Prompt parser
3:40:35 Google is detecting us, Writing a Brave Scraper
4:00:40 Testing Google Search Query Gen
4:20:30 Writing a Query Randomizer
4:47:48 Writing a raw data cleaner…then removing
5:07:00 Trying to get the dataset builder prompting to work
5:55:42 Colab is getting f*ckin SLOW
6:24:55 I just need to see it generate good data
6:42:40 Version VII In Chat has a great idea!
6:56:35 I’m getting to a special place
7:16:42 GPU Struggle to death
7:26:40 Writing the auto trainer evaluation
7:48:16 I find a hilarious mistake
8:01:24 The truth / Things start to make sense
8:16:15 Finally have a dataset being auto built / Saying Goodbye