Local LLM
Learn how to deploy Local LLMs into your private containerized servers for security and privacy.
Introduction to Local LLMs
Local Large Language Models (LLMs) are AI models that run directly on your own computer or server—rather than relying on a cloud-based service like OpenAI or Azure. By hosting the model locally, users maintain full control over data privacy and network security, making this approach ideal for organizations that operate within air-gapped, classified, or sensitive environments. This approach has been gaining significant popularity across the AI community, with local deployment options becoming more powerful over time thanks to ongoing research and development from industry leaders such as OpenAI, Google DeepMind (Gemini), and others.
Hardware Considerations
Unlike cloud-hosted AI models, Local LLMs run using your system’s own hardware resources, such as the CPU and GPU. All inference happens locally, model performance, response latency, and even compatibility depend entirely on your machine’s available compute resources. Before deploying, ensure you understand your system’s hardware capabilities—especially GPU memory (VRAM) and CPU performance—as these directly impact whether a given model can run efficiently.
When selecting a local model, you’ll often see terms like 7B, 13B, or 20B. These refer to the number of parameters (in billions) used to train the model. Model size influences accuracy, memory requirements, and latency:
-
Smaller models (≤ 7B) – Lightweight, faster, and suitable for laptops or edge devices.
-
Larger models (≥ 13B–1280B) – More capable, but require powerful GPUs or servers.
For more information on Local LLMs, options available, and capabilities, check out resources such as Hugging Face and LMStudio's Model Catalog.
Deployment Options for Local LLMs
You can host a local LLM in several ways, depending on your preferred setup and tools:
-
Self-Hosted Servers – Launch your own OpenAI-compatible API server (e.g., via http://127.0.0.1:[port]/v1/) directly from your machine's terminal/console.
-
Third-Party Tools – Use user-friendly UI platforms that automatically manage LLM loading, inference, and API serving such as LM Studio, AnythingLLM, and Ollama.
-
Docker Containers – Host your LLM runtime inside a Docker environment for reproducibility and integration with other apps.
Each of these options exposes a specific Base URL endpoint (e.g. http://127.0.0.1:[port]/v1/) that Innoslate can connect to, using standard OpenAI-compatible APIs.
Configuring Innoslate to Use a Local LLM
Once your Local LLM server is up and running, whether using LM Studio or a self-hosted API, you can connect it to Innoslate to enable AI capabilities such as Requirements AI, Test Case AI, and Risk AI, all while keeping data local and secure.
Step 1. Verify Your Local LLM Server is Running
Before connecting Innoslate, make sure your LLM server is active and reachable. In most setups, the server will run on a localhost endpoint, such as:
http://127.0.0.1:[port]/v1/
or, if you’re using Docker to host Innoslate, use:
http://host.docker.internal:[port]/v1/
Note: The /v1/ endpoint must be OpenAI-compatible (supporting routes such as /v1/models and /v1/chat/completions).
You can test this by visiting the URL in your browser or terminal:
curl http://127.0.0.1:[port]/v1/models
When running tools like LM Studio, you should see a “Reachable at” status confirming the server is ready, as shown in your LM Studio logs. This points to the utility of front end UI tools that visualize such information without requiring additional commands or testing.
Step 2. Open Innoslate Preferences
-
Select the "[organization]'s Preferences" button in the Admin Dashboard.
-
In the preferences modal, scroll to Artificial Intelligence Options.
Step 3. Enable AI and Select Local LLM
Ensure that Artificial Intelligence and Chat AI Options are enabled for configuration. Then, configure the following fields:
- Provider: Select "Local LLM" from the dropdown menu. This tells Innoslate to connect to your local AI server but will be using OpenAI-compatible API.
Note: Until the release of Innoslate 4.13, users need to use "OpenAI" as the dropdown. - Base URL: Enter the endpoint where your LLM server is hosted.
- For standard local enterprise setups: http://127.0.0.1:[port]/v1/
- For Docker-based Innoslate deployments: http://host.docker.internal:[port]/v1/ Make sure this URL matches the address your LLM server displays.
- Chat Model: Click the 🔍 (search) icon to fetch available models from your local server. Once connected, you'll see an auto-populated list of all models the server has access to.
- Token Limit: Set your preferred maximum token length per request (0 = no limit).
Step 4. Save and Update Preferences
After filling in the required fields:
-
Click "Update" to save your preferences.
-
Innoslate will now connect to your local LLM for AI-powered features.
Step 5. Verify Functionality
After saving your preferences, test the connection by performing a simple AI action in Innoslate—such as generating a requirement or summarizing text. If configured correctly, you’ll see a response generated locally and corresponding logs displayed in your LLM server terminal, confirming that the request and response were processed on your machine.
The image below shows an example of a terminal log capturing the AI completion process, including the prompt messages transmitted, and processing data such as input, output, and reasoning tokens. This output verifies that all inference is handled securely within your local environment.
Configuring Innoslate with a Local LLM allows you to use advanced AI capabilities while maintaining complete control over your data and compute environment. Once connected, all inference runs securely on your machine, ensuring full privacy and independence from external cloud services. With the setup complete, you’re now ready to use Innoslate’s AI features, powered entirely by your locally hosted model.