You could soon run ChatGPT-like models on your laptop

Large Language Models(LLM's) have landed en-force in the public space. The most famous at the moment being ChatGPT. At the moment getting access means sending your personal data over to a 'trusted' provider. Nothing new so far, we do this with every Google and Bing search. In fact identifying what a person might be thinking about in the last minute or over the last week could potentially be easily reveleaved through a users search and browse history alone.
Meta's LLaMa release/leak has AI programmers really busy
Meta(Facebook) ahd their new ChatGPT like model released just recently giving hackers, tinkerers and everything in-between a chance to build and improve this new tech without restriction or company priorities getting in the way. Open-source for the win! We've seen the power of this with Stable Diffusions approach putting it squarely in the spot-light and for good reason.
'Quantization' is the nerd-word of the day. Effectively a techique used to reduce a models size and inprove hardware performance without comprimising very much on the models accuracy. Feel free to geek out on the details here. It's seems that using this technique was part of the approach a programmer used to make running Meta's new AI model on a laptop. Albeit, a very expensive laptop, i.e Apples new M powered Macbook.
For now we're gonig to have to wait for devs to get to bring out of #nocode solutions for mass access, as well as iron out the kinks and potential bugs. The code shared was also based on the smallest of the LLaMa variants with 7 Billion paramaters. Where are Meta has released model sizes up to 65 Billion parameters. At the moment it's safe to assume that more parameters equates to a more powerful AI model.
Bottom line:
You get close to big corporate AI performance without sharing sensitive information or shelling out for usage. Some of the Apple's newer laptops could potentially run the larger variants of Meta's new AI model. This shows just how quickly the AI space is moving. From the status quo being "big models" run by "big companies" to potentially running something close to the equivilant on your phone in the next couple of years, or months!
Sources:
- Post training quantization - TensorFlow page
- Inference of Facebook's LLaMA model in pure C/C++, Georgi Gerganov - Github