Lightweight AI Chatbot Runs Locally on Phones or Old PCs, No Cloud Required
Most AI chatbots require a lot of processing power, so they usually reside in the cloud. Of course, you can also run his ChatGPT on your PC or Raspberry Pi, but the local program is sending prompts over the internet to his OpenAI’s server for a response. Some LLMs (Large Language Models) can be run locally, Requires powerful GPU Equipped with a lot of VRAM. Is it not possible or possible to run a chatbot locally on a mobile device?
A new open source project called MLC LLM It’s lightweight enough to run locally on almost any device, including iPhones and older PC laptops with integrated graphics. Also, once you install MLC LLM, you can turn off the internet as all data and processing is done on your system.
“MLC” stands for Machine Learning Compilation, a process used by the developers of this project to slim down models to make them easier to work with. The same group of researchers going by the name MLC AI has a related project Web LLM, run the entire chatbot in a web browser. The project also includes additions from his Catalyst program at Carnegie Mellon University, the SAMPL machine learning research group, the University of Washington, Shanghai Jiao Tong University, and his OctoML.
MLC LM uses Vicuna-7B-V1.1, based on Meta’s LLaMA, a lightweight LLM trained in March and April 2023. Not as good as GPT3.5 or GPT4, but pretty decent considering its size.
MLC LLM is now available for Windows, Linux, macOS and iOS, following easy-to-follow instructions posted by the project founder on their site and full source code Available on GitHubThe PC version can be installed by anyone, but the iOS version requires the device to use TestFlight, Apple’s developer system, and limits the number of iOS users who can install the test app to 9,000 at a time.
Senior Editor Andrew E. Freedman installed the MLC LLM test app on an iPhone 11 Pro Max with a 3 GB download. However, the app crashed after showing the message “” on launch.[System] Initialize…” each time he runs it.
There are many reasons why using a local chatbot is preferable to using a cloud-hosted solution such as ChatGPT. Data is kept locally so privacy is not compromised, internet access is not required to use the data, and you have more control over the output.
Install MLC LLM on your PC
I had no problems installing and running MLC LLM on a ThinkPad X1 Carbon (Gen 6) laptop running Windows 11 with a Core i7-8550U CPU and Intel UHD 620 GPU. This is a 5 year old laptop with integrated graphics and no VRAM.
To set up MLC LLM I had to first install Miniconda for Windowsis a lite version of the popular Conda package manager (you can use the full version of Anaconda). Conda allows you to create separate environments with their own set of Python packages that do not conflict with other packages on your system.
After installing Miniconda, I launched Anaconda Prompt (the version of Command Prompt that runs Conda). Then I used the mlc.ai set of steps to create an environment called mlc-chat and downloaded the language model into it. The Vicuna-7B-V1.1 model requires only 5.7GB of storage space, and the rest of the project consumes another 350MB.
Chatbots run in a command prompt window. To launch it I had to activate the mlc-chat conda environment and enter the command mlc_chat_cli.
Using MLC LLMs
When you launch MLC LLM’s chatbot, the first thing you’ll be asked for is your name. They then greet you and ask how you can help. You can ask questions.
On my laptop, the bot was very slow to respond, taking nearly 30 seconds to start typing responses to queries. As with ChatGPT, you type your answers as you watch, so it may take a minute or two for the full answer to appear. I imagine this will be faster on more powerful devices.
The quality of responses received from LLMs was unremarkable. It gave accurate answers to some factual questions, but for me it made up a fictional biography. was treated as an entirely new conversation, so it was not possible to take follow-up questions.
When I asked the bot to tell me who the fifth president of the United States was, the unconventional but honest answer was named Thomas Jefferson. Most people would say James Monroe was correct, but he was the fifth president, so Jefferson’s second term is the fifth overall when counting presidential terms.
I asked the MLC LLM chatbot, “What is Tom’s hardware?” But when I asked him about himself (“Who is Avram Pilch?”), he was completely fictional. My last name was misspelled, he built a company that didn’t exist, and I was his CEO, and he made augmented reality contacts and lenses. . The response is:
“Avram Piltsh is CEO and co-founder of OneSight, a company that aims to develop and market contact lenses with the ability to read phone numbers with a camera and access files through contact lenses. Known for developing eye-reading contact lens technology, he is also a noted entrepreneur and inventor who has made significant contributions to the development of the technology industry.He has a passion for innovation and the complex problems of the technology industry. known for its dedication to finding solutions to
As far as I know, there’s no one named Avram Piltsh (with an S), and I couldn’t find a company named OneSight that makes AR contact lenses. There is a charity called the OneSight EssilorLuxottica Foundation that provides eyeglasses to people who otherwise could not get them.
When I asked Avram to tell me more about the company, I was confused because I couldn’t follow the reference to the previous question.
Also, when I asked the MLC LLM chatbot to write a sonnet about graphics cards, it didn’t follow the sonnet prosody and wrote a very boring poem of 15 lines instead of 14.
In the world of tech that emphasizes specs,
It turns out that the graphics card rules the roost
A fierce battle unfolds for each brand
Each advertises its own thing and each has its own claim
to be the best, to hold
But in the end it’s not a card, it’s bold
Too much Nvidia or AMD.
The deal is, the more you have, the more you’ll see your frames grow
It’s all about the in-game multi-GPU stack
But what about power? How long until it’s gone?
No worries there!some power is more than we can share
It’s not the cost or price we pay
But the number needed to run the game on that day
And if that game is taxing the system
Needs more power to get the selected framerate
That poem would make Vorgon blush. Perhaps future language models will provide a more satisfying experience.