Your Private AI Cloud: How to Run LLMs, Stable Diffusion, and Automation Agents on RakSmart Bare Metal Servers

Introduction: The Shift from Cloud AI to Local AI

For the past two years, the AI revolution has been dominated by cloud-based services. ChatGPT, Claude, Gemini, Midjourney — all of these run on servers owned by massive corporations. You send your data to their APIs, they process it, and they send back the results.

This model works, but it comes with significant drawbacks. First, you pay per token or per image, and costs can spiral out of control. Second, you send potentially sensitive business data to third-party servers. Third, you have no control over model updates, downtime, or rate limits.

A powerful techtrend is emerging: the shift toward local AI. Running AI models on your own infrastructure gives you privacy, predictable costs, and complete control. And with modern raksmart.com/cps/6807″ target=”_blank” rel=”noopener”>bare metal servers from providers like RakSmart, local AI is no longer just for large enterprises with six-figure hardware budgets.

In this post, we will explore exactly how to set up and run local AI models on RakSmart bare metal and high-performance VPS plans. We will cover large language models (LLMs) like Llama 3 and Mistral, image generation models like Stable Diffusion, and automation agents that can perform tasks across your infrastructure. You will learn how to turn a RakSmart data center server into your own private AI cloud.


Why RakSmart for Local AI?

Before we dive into the technical setup, let us address the obvious question: Why RakSmart specifically?

Reason 1: High CPU Core Counts. AI inference (using a trained model to generate outputs) benefits from multiple CPU cores. RakSmart’s bare metal servers offer up to 32 physical cores, allowing you to run multiple AI tasks in parallel.

Reason 2: Generous RAM Allocations. Large language models require significant RAM. A 7-billion-parameter model like Llama 2 7B needs 8GB to 16GB of RAM just to load. RakSmart’s bare metal plans offer 64GB, 128GB, or even 256GB of RAM.

Reason 3: Fast NVMe Storage. Loading a model from disk takes time. RakSmart’s NVMe SSDs read at 3,000+ MB per second, cutting model load times from minutes to seconds.

Reason 4: Affordable Pricing. Running the same AI workloads on AWS or Google Cloud would cost 5x to 10x more. RakSmart’s bare metal plans start at a fraction of the price of cloud AI instances.

Reason 5: Full Root Access. You need root access to install drivers, libraries, and models. RakSmart gives you full control over your server.


Hardware Requirements for Local AI

Not all AI models have the same requirements. Here is a quick guide to matching RakSmart server specs to your AI workload.

For Small LLMs (1B to 3B parameters)

  • Use case: Text summarization, basic chat, code completion
  • RakSmart VPS plan: 8 vCPU, 16GB RAM, 100GB NVMe
  • Estimated cost: $30 to $50/month
  • Models: Phi-3 Mini, TinyLlama, Gemma 2B

For Medium LLMs (7B to 13B parameters)

  • Use case: Advanced chat, document analysis, creative writing
  • RakSmart Bare Metal plan: 16 cores, 64GB RAM, 500GB NVMe
  • Estimated cost: $150 to $250/month
  • Models: Llama 3 8B, Mistral 7B, CodeLlama 13B

For Large LLMs (30B to 70B parameters)

  • Use case: Complex reasoning, multi-document Q&A, agent workflows
  • RakSmart Bare Metal plan: 32 cores, 128GB+ RAM, 1TB NVMe
  • Estimated cost: $400 to $600/month
  • Models: Llama 3 70B, Mixtral 8x7B

For Image Generation (Stable Diffusion)

  • Use case: Generating images, editing photos, creating assets
  • RakSmart Bare Metal with GPU: Requires GPU (RakSmart offers GPU options)
  • Alternative: CPU-only Stable Diffusion is possible but slow (2-5 minutes per image)

Step-by-Step: Installing Ollama on a RakSmart Bare Metal Server

Ollama is the easiest way to run LLMs locally. It packages models into a simple command-line tool and REST API.

Prerequisites

  • A RakSmart bare metal or high-RAM VPS running Ubuntu 22.04 or 24.04
  • SSH access to your server
  • At least 16GB of RAM (32GB+ recommended)

Step 1: Connect to Your RakSmart Server

bash

ssh root@your-server-ip

Step 2: Update the System

bash

apt update && apt upgrade -y

Step 3: Install Ollama

Ollama provides a one-line install script:

bash

curl -fsSL https://ollama.com/install.sh | sh

The script will detect your hardware, install dependencies, and set up Ollama as a system service.

Step 4: Verify Installation

bash

ollama --version

You should see something like ollama version 0.5.4

Step 5: Pull Your First Model

Let us start with a small, fast model to test:

bash

ollama pull llama3.2:3b

This downloads the 3-billion-parameter version of Llama 3.2. The download size is approximately 2GB and takes 2-5 minutes on a standard internet connection.

Step 6: Run an Inference

bash

ollama run llama3.2:3b "Explain what a data center does in one sentence."

The model will generate a response. The first run after pulling will be slower because the model loads into RAM. Subsequent runs are much faster.

Step 7: Keep Ollama Running as a Service

Ollama automatically runs as a systemd service. Check its status:

bash

systemctl status ollama

To ensure it starts on boot:

bash

systemctl enable ollama

Step 8: Access Ollama’s API Remotely

By default, Ollama only listens on localhost. To access it from other servers or your local machine, you need to bind to 0.0.0.0.

Edit the systemd service file:

bash

mkdir -p /etc/systemd/system/ollama.service.d/
nano /etc/systemd/system/ollama.service.d/override.conf

Add these lines:

text

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

Reload and restart:

bash

systemctl daemon-reload
systemctl restart ollama

Now you can send API requests from anywhere:

bash

curl http://your-server-ip:11434/api/generate -d '{
  "model": "llama3.2:3b",
  "prompt": "Why is automation important?",
  "stream": false
}'

Beyond Chat: Using Local AI for Automation

Running a chat model on your RakSmart server is just the beginning. The real power comes from integrating local AI into automation workflows.

Automation Use Case 1: Automated Document Processing

Imagine you run a business that receives dozens of customer support emails, contracts, or invoices every day. Reading and categorizing each document manually takes hours.

With a local LLM running on RakSmart, you can automate this process.

The Workflow:

  1. Emails arrive at your server (via a forwarder or IMAP).
  2. A script extracts the email body and subject line.
  3. The script sends the text to your local Ollama API with this prompt:“Categorize this email into one of: sales inquiry, support request, complaint, partnership offer, or spam. Also extract any names, dates, and action items. Return the result as JSON.”
  4. The LLM returns structured data.
  5. Your script creates a ticket in your support system, forwards sales inquiries to your CRM, or archives spam.

Why Local Instead of Cloud? Customer emails may contain sensitive information. Sending them to OpenAI or Google could violate privacy policies or data protection laws. Running on your own RakSmart bare metal server keeps all data inside your controlled environment.

Automation Use Case 2: Content Summarization for Internal Wikis

If your team uses internal documentation (Confluence, Notion, or even a WordPress wiki), the amount of information can become overwhelming. New employees struggle to find what they need.

Set up a nightly automation on your RakSmart VPS:

  1. Crawl all internal documentation pages.
  2. Send each page to your local LLM with the prompt: “Summarize this page in 3 bullet points. Focus on actionable information.”
  3. Store the summaries in a separate database.
  4. Generate a “daily digest” email for your team with summaries of recently updated pages.

This turns a chaotic wiki into a knowledge management system.

Automation Use Case 3: Log Analysis and Anomaly Detection

Your servers generate thousands of log lines every hour. Manually reviewing logs for errors or security incidents is impossible.

Run an automation script every hour that:

  1. Collects the last 1000 log lines from your RakSmart server’s system log, web server log, and database log.
  2. Sends them to your local LLM with the prompt: “Analyze these logs for errors, warnings, or unusual patterns. Flag anything that requires human attention. Ignore routine messages.”
  3. If the LLM finds something concerning, send an alert to your phone via email, Slack, or Telegram.

This is like having a junior system administrator working 24/7 for free.


Running Stable Diffusion for Image Automation

Text generation is useful, but image generation opens even more possibilities for automation.

Installing Stable Diffusion on CPU (RakSmart Bare Metal without GPU)

If your RakSmart bare metal server does not have a GPU, you can still run Stable Diffusion using CPU-optimized backends. Performance will be slower (2-5 minutes per image), but for background automation tasks, this is often acceptable.

Installation:

bash

# Install Python and dependencies
apt install python3-pip python3-venv libopenblas-dev -y

# Create a virtual environment
python3 -m venv stable-env
source stable-env/bin/activate

# Install stable-diffusion.cpp (CPU-optimized)
git clone https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp
mkdir build && cd build
cmake ..
make -j4

# Download a model (SD 1.5 is lighter than SDXL)
wget https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.ckpt

Generate an image:

bash

./bin/sd -m v1-5-pruned-emaonly.ckpt -p "a beautiful data center with blue LED lights, highly detailed" -o output.png

Automation Use Case: Dynamic Social Media Images

You run a news website or a blog. Every time you publish a new post, you need a featured image. Finding or creating images manually is time-consuming.

Set up an automation on your RakSmart server:

  1. A WordPress hook triggers whenever a new post is published.
  2. The hook sends the post title and categories to a Python script.
  3. The script constructs a prompt: “Create a professional, minimalist featured image for a blog post titled [TITLE] about [CATEGORY]. Style: clean, modern, blue and white color scheme.”
  4. The script calls your local Stable Diffusion installation.
  5. The generated image is saved and attached to the WordPress post as the featured image.

This entire process runs automatically. Every post gets a unique, relevant featured image without any human intervention.


Open Claw Meets Local AI: Intelligent Web Agents

Recall the Open Claw concept from our previous series — scripts that “claw” data from the web. When you combine Open Claw with a local LLM, you get intelligent web agents that can understand and act on what they find.

Example: Intelligent Price Monitoring

A basic Open Claw script scrapes prices and stores them. An AI-enhanced version:

  1. Scrapes competitor pricing pages.
  2. Sends the scraped HTML to your local LLM with the prompt: “Extract all product names, prices, discount percentages, and any ‘limited time’ language. Return as JSON.”
  3. The LLM understands context. It can identify that “Buy one get one 50% off” is a discount even if there is no percentage symbol.
  4. Your script compares the extracted data with your own pricing.
  5. If the LLM detects a “limited time” or “flash sale” phrase, your script can trigger an immediate alert to your team.

Example: Automated Form Filling and Testing

If you run a web application, you need to test forms regularly. An AI-powered Open Claw script can:

  1. Navigate to your signup form.
  2. Use a local LLM to generate realistic test data (names, emails, addresses).
  3. Fill out the form automatically.
  4. Submit it and verify the success message.
  5. Repeat 100 times with different data.

This is automated QA testing, running 24/7 on your RakSmart server.


Performance Tuning for Local AI on RakSmart

To get the best performance from your local AI models, apply these optimizations.

Memory Optimization

LLMs consume significant RAM. Monitor usage with:

bash

htop
free -h

If your server runs out of RAM, the operating system will start swapping to disk, which destroys performance. Ensure your RakSmart plan has enough RAM for the largest model you intend to run.

CPU Optimization

For CPU-based inference, enable these optimizations:

bash

# Install optimized math libraries
apt install libopenblas-dev liblapack-dev

# For Ollama, set the number of threads
export OLLAMA_NUM_THREADS=$(nproc)

Disk Optimization

Store models on the NVMe drive, not on a network volume. NVMe is 10x to 50x faster than standard SSDs for random reads.

bash

# Check your disk type
lsblk -d -o name,rota
# If ROTA = 1, it's spinning disk. ROTA = 0 means SSD/NVMe.

Security Considerations for Local AI

Running AI models on your own RakSmart server is generally more secure than using cloud APIs, but you still need to follow best practices.

1. Keep Your Server Updated

bash

apt update && apt upgrade -y

2. Restrict API Access

Do not leave your Ollama API open to the entire internet. Use a firewall:

bash

ufw allow from YOUR_HOME_IP to any port 11434
ufw enable

3. Run Models as a Non-Root User

Create a dedicated user for AI workloads:

bash

useradd -m -s /bin/bash ai-user
su - ai-user

4. Monitor Resource Usage

Set up alerts for CPU, RAM, and disk usage. A runaway model could consume all resources and crash your server.


Future Techtrends: Multi-Model Agents

The next wave of techtrends in local AI is multi-model agents. Instead of one model doing everything, you run multiple specialized models that communicate with each other.

On a powerful RakSmart bare metal server, you could run:

  • A small, fast LLM (3B parameters) for simple classification tasks
  • A medium LLM (13B parameters) for complex reasoning
  • A vision model (like Moondream) for analyzing screenshots and images
  • A text-to-speech model for generating audio responses

These models run in parallel. An agent orchestrator sends each task to the most appropriate model. The result is an AI system that can see, read, reason, and speak — all running on your own hardware in a RakSmart data center.


Conclusion: Your Private AI Cloud Awaits

You do not need to be a large enterprise to run powerful AI models. With a RakSmart bare metal server, you can host your own private AI cloud for less than $200 per month. You get privacy, predictable costs, complete control, and the ability to integrate AI deeply into your automation workflows.

In the next blog post, we will move from running individual models to building full automation pipelines that connect AI agents to external systems, databases, and APIs — creating truly autonomous digital workers.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *