image

LM Studio is a desktop application designed for developing and experimenting with Large Language Models (LLMs) on your local machine. It provides a more familiar chat interface, one click install seamless integration with models from Hugging Face, and the capability to run a local server that mimics OpenAI endpoints.

Compatible Large Language Models (LLMs) from Hugging Face can be run in GGUF (llama.cpp) and MLX formats (Mac only). While GGUF text embedding models are available, some may not work on your machine or could be too large.

Key Functionalities

  • Local LLM Execution: Run various LLMs directly on your computer. They're offline, so you don't need internet access. No data is sent to the cloud.
  • Chat Interface: Chat using a user-friendly standard chat interface that has great syntax highlighting and formatting and lots of customization options.
  • Model Download & Search: Download and search for models via Hugging Face directly from the app.
  • Local Server: Serve models through endpoints similar to OpenAI’s API.
  • Configuration Management: Manage local models and customize the settings to your liking.

Installation Guide

System Requirements

I'm using a Apple Silicon particularly for this. LM Studio supports:

  • macOS: Apple Silicon (M1/M2/M3/M4) with macOS 13.4 or newer; 16GB+ RAM recommended.
  • Windows: x64/ARM systems with at least 16GB RAM and AVX2 instruction set support.
  • Linux: Ubuntu 20.04 or newer, x64 only.

Note: Intel-based Macs are currently unsupported.

Getting LM Studio Installed

Typically I use the Homebrew package manager for Mac, but you can also download the installer from the LM Studio Downloads page.

  1. Download the Installer: Visit LM Studio Downloads to download the installer for your operating system.
  2. Run the Installer: Launch the downloaded file and follow the on-screen instructions.
  3. Install LM Runtimes: Press ⌘ Shift R (Mac) or Ctrl Shift R (Windows/Linux) to install necessary runtimes like llama.cpp (GGUF) or MLX.

Or for Homebrew users:

brew install lm-studio

The Homebrew tap is available on Homebrew Formulae.


Using LM Studio

Running an LLM

Download some Models:

Load & Chat with your new Model:

  • Switch to the Chat tab.
  • Use ⌘ L (Mac) or Ctrl L (Windows/Linux) to open the model loader.
  • Select a downloaded or sideloaded model and load it with desired configuration parameters.

image

Managing Chats

  • Create Conversations: Use + N (Mac) or Ctrl N (Windows/Linux).
  • Organize Conversations: Create folders using ⌘ Shift N (Mac) or Ctrl Shift N (Windows/Linux).
  • Duplicate Conversations: Right-click on a chat and select "Duplicate".

image

Chatting with Documents

  • Attach document files (.docx, .pdf, .txt) to your chats.
  • LM Studio uses RAG (Retrieval-Augmented Generation) for long documents, extracting relevant parts to enhance context.

image

Finding Models in LM Studio

Searching & Downloading Models

  • Discover Tab: Accessible via ⌘ 2 (Mac) or Ctrl 2 (Windows/Linux).
  • Search Options: Use keywords or specific user/model strings; insert full Hugging Face URLs.

image

Managing Model Directory

  • There's an internal downloader/directory browser for Hugging Face models. Def browse around, there's suggested/popular models or you can search for something specific. You can also download models directly from the browser.
  • You can search for models using keywords (e.g., llama, gemma, lmstudio) or by entering a specific user/model string, including full Hugging Face URLs.
  • The terms like Q3_K_S and Q_8 refer to different versions of the same model, varying in fidelity. The "Q" stands for "Quantization", a technique that compresses model file sizes at the cost of some quality.

Advanced Features

Configuring Presets

Save commonly used system prompts and inference parameters as named presets for different use cases (reasoning, creative writing, etc.). This field (often found in AI prompting interfaces) is designed to give the AI context and guidelines before you provide your main request. It's where you tell the model how to behave or what rules to follow when generating its response.

Think of it as: Setting the stage, giving instructions to an assistant, or defining the "personality" of the AI for a specific task.

image

Per-model Defaults

  • Set default load settings for each model directly from the My Models tab.

Prompt Template Customization

  • Override the default prompt template in the My Models tab using Jinja or manual specifications.

Speculative Decoding

So in the sidebar of the Chat tab, you'll see a section called "Speculative Decoding". Speculative decoding is a method that enhances the generation speed of large language models (LLMs) while maintaining response quality.

Speculative decoding involves two models: a larger "main" model and a smaller, faster "draft" model. The draft model quickly suggests potential tokens, which the main model verifies against its own generation. The main model only accepts tokens that align with its output and always generates one additional token after the last accepted draft token. Both models must share the same vocabulary for the draft model to be effective.


API and Server Usage

image

LM Studio as a Local LLM API Server

  • Serve models through Developer tab using OpenAI compatibility mode, enhanced REST API, or lmstudio-js SDK.
  • Run LM Studio as a service without GUI for server deployment or background processing.

Structured Output

  • When Structured Output are enabled, model outputs conform to the tool definition provided. Enforce JSON schema-based structured output from LLMs via the /v1/chat/completions endpoint.

This example below just shows how to make a structured output request using the curl utility.

curl http://{{hostname}}:{{port}}/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "{{model}}", "messages": [ { "role": "system", "content": "You are a helpful jokester." }, { "role": "user", "content": "Tell me a joke." } ], "response_format": { "type": "json_schema", "json_schema": { "name": "joke_response", "strict": "true", "schema": { "type": "object", "properties": { "joke": { "type": "string" } }, "required": ["joke"] } } }, "temperature": 0.7, "max_tokens": 50, "stream": false }'

The API allows structured JSON outputs via the /v1/chat/completions endpoint when a JSON schema is provided. This enables the LLM to respond with valid JSON that adheres to the specified schema, similar to OpenAI's Structured Output API, and is compatible with OpenAI client SDKs.

Note: Not all models support structured output, especially those with fewer than 7 billion parameters.


LM Studio is an powerful tool for local development and experimentation with Large Language Models. Its intuitive interface, robust feature set, and flexible configuration options make it suitable for both beginners and advanced users. Whether you're looking to experiment with existing models or develop your own applications, LM Studio provides a solid foundation for local LLM projects.

Keep your data and queries safe and offline. Hope this helps you in your LLM / dev journies.


Further Resources: