I lifted the title directly from Simon Willison's post: llamafile is the new best way to run a LLM on your own computer, even though I don't have the expertise to know if it's the BEST way. I can tell you it's a damn easy way, even on Windows! Simon's explanation works on a mac, but you can also run this on Windows with the following miniscule changes.
- Download the 4.26GB
llamafile-server-0.1-llava-v1.5-7b-q4
file from Justine’s repository on Hugging Face. - Open Command Prompt and navigate to the location of your downloaded file.
- Simply type the filename and wait for your default browser to display http://127.0.0.1:8080/
Simon reports that on his M2 Mac he's seeing 55 tokens per second. On my brand-spanking new Dell office machine running Windows 10, I get 5.5 tokens per second :-( I was getting ready to grump all over Windows, here, but on my 2-year-old M1 MacBook Air it barely runs at all, clocking in with 0.35 tokens per second. And now my MacBook Air has crashed - maybe don't try running it on yours, or do, and tell me why mine sucked so hard! :-)