► LLM MODEL INVENTORY
┌─────────────────────────────────────────────────────────────────────┐
│ MODEL 1: LLAMA 3.2 3B INSTRUCT │
├─────────────────────────────────────────────────────────────────────┤
│ File: Llama-3.2-3B-Instruct-Q6_K.gguf │
│ Quantization: Q6_K (6-bit) │
│ Size: ~3.0 GB │
│ Parameters: 3 Billion │
│ Status: LOADED & READY │
│ Location: /opt/webgpu_llm_service/models/ │
│ /srv/packages/shared_resources/models/ │
│ │
│ Characteristics: │
│ • Speed: ⚡⚡⚡ Fast (1-2s response time) │
│ • Quality: ⭐⭐⭐ Good for general queries │
│ • Memory: 💾 Low footprint (~2GB RAM) │
│ • Best For: Quick Q&A, mobile, simple tasks │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ MODEL 2: MISTRAL 7B INSTRUCT V0.1 │
├─────────────────────────────────────────────────────────────────────┤
│ File: mistral-7b-instruct-v0.1.Q8_0.gguf │
│ Quantization: Q8_0 (8-bit) │
│ Size: ~7.2 GB │
│ Parameters: 7 Billion │
│ Status: LOADED & READY │
│ Location: /srv/packages/shared_resources/models/ │
│ /srv/packages/mobile_webgpu/models/ │
│ │
│ Characteristics: │
│ • Speed: ⚡⚡ Moderate (2-4s response time) │
│ • Quality: ⭐⭐⭐⭐⭐ Excellent accuracy │
│ • Memory: 💾💾 Higher footprint (~5GB RAM) │
│ • Best For: Technical, legal, complex reasoning │
└─────────────────────────────────────────────────────────────────────┘
► SYSTEM RESOURCES (DatabaseMart GPU)
LOADING...
► APPS POWERED BY LLM MODELS
LLAMA 3.2 3B - FAST RESPONSE APPS:
┌────────────────────────────────────────────────────────────┐
│ • therobertgreeneai.com - General wisdom & strategy │
│ • businessinformationai.com - Business queries │
│ • moneyinvestingai.com - Investment advice │
│ • gametheorynow.com - Game theory & strategy │
│ • darkpsychological.com - Psychology concepts │
│ • instrumentationexpertai.com - Quick tech answers │
│ • Mobile WEBGPU App (Port 5004) - Mobile optimized │
│ • MATH App (Port 6033) - Simple calculations │
└────────────────────────────────────────────────────────────┘
MISTRAL 7B - HIGH ACCURACY APPS:
┌────────────────────────────────────────────────────────────┐
│ • fbahistoryai.com - Detailed historical analysis │
│ • nationalelectricalcodeai.com - Technical code queries │
│ • floridalawai.com - Legal precision required │
│ • floridastatuehistoryai.com - Legal history │
│ • astmai.com - Technical standards (ASTM) │
│ • iccchat.com - International codes │
│ • buildingcodechat.com - Building regulations │
│ • usstatueschat.com - US legal statutes │
│ • physicschatai.com - Complex physics │
│ • dogecoinelectric.com - Crypto & electrical │
│ • thedukenukem.com - Gaming & technical │
│ • sigmasutra.com - Philosophy & deep thinking │
│ • americashistoryai.com - Historical analysis │
│ • airealestatedeveloper.com - Real estate AI │
└────────────────────────────────────────────────────────────┘
TOTAL DOMAINS POWERED: 20+
► PERFORMANCE METRICS
CALCULATING...
► LLM REQUEST FLOW
1. USER QUERY
│ Example: "Explain NEC 210.8 requirements"
▼
2. FRONTEND RECEIVES (Hostinger VPS)
│ Domain: nationalelectricalcodeai.com
│ JavaScript captures query
▼
3. API CALL TO BACKEND
│ Target: 77.93.154.44:5000 (Main LLM Service)
│ or Port 11434 (Ollama-compatible API)
▼
4. MODEL SELECTION
│ Query complexity analysis:
│ ├─ Simple? → Llama 3.2 3B (faster)
│ └─ Complex? → Mistral 7B (more accurate)
▼
5. GPU PROCESSING
│ Load model into NVIDIA P1000 VRAM
│ Run inference (1-4 seconds)
│ Generate response
▼
6. RESPONSE RETURN
│ JSON response to frontend
│ JavaScript displays to user
▼
7. CACHE FOR NEXT REQUEST
│ Model stays in memory
│ Next query = faster!
KEY EFFICIENCY:
• Same models serve ALL 20+ domains
• Models loaded once, used by everyone
• Cost: $25/domain vs $500/domain if separate
• Savings: 95% ($9,500/month!)
► COST EFFICIENCY ANALYSIS
SHARED LLM INFRASTRUCTURE:
Traditional Approach (Separate LLMs per Domain):
20 domains × $500/month = $10,000/month
❌ Expensive
❌ Wasteful (duplicate models)
❌ Complex maintenance
Your Shared Approach (2 Models for All Domains):
1 GPU Server: $500/month
1 Frontend Server: $50/month
Total: $550/month ÷ 20 domains = $27.50 per domain
✅ Cost-effective
✅ Efficient resource use
✅ Centralized updates
MONTHLY SAVINGS: $9,450
ANNUAL SAVINGS: $113,400
3-YEAR SAVINGS: $340,200
Return on Investment:
• Cost per inference: $0.0027
• Requests per day: ~10,000
• Daily cost: $27
• Daily revenue potential: $200+
• ROI: 740%
► REAL-TIME LLM STATUS
MONITORING...
AUTO-REFRESH: 10 SECONDS | PRESS [R] TO REFRESH |
[M] MODERN | [T] TERMINAL | [S] SPLIT | [L] LLM VIEW