koratex/Ai_Assistant

Fork 0

Koratex 7ef71f47d1 Initial release 0.5

2026-05-24 13:31:30 +02:00

11 KiB

Raw Blame History

Setup Guide

Prerequisites

Before starting, ensure you have the following installed:

Python 3.10 - Download from Microsoft Store or python.org
VS Code - Download here
Node.js and npm - Download here (includes npx)
GPT-SoVITS - One-click installer

1. Project Setup

Create Virtual Environment

Open VS Code
File → New Window
File → Open Folder (select your project directory)
Press Ctrl+Shift+P to open the command palette
Type Python: Create Environment and select it
Note: If you don't see this option, install the Python extension:
- Go to the Extensions sidebar (Ctrl+Shift+X)
- Search for "Python" and install it
- Close and reopen VS Code, then try again
Select Venv and choose Python 3.10
Uncheck "Install dependencies from requirements.txt" (we'll do this manually)
Click OK

Install Dependencies

Open a new terminal: Terminal → New Terminal
Verify your virtual environment is active (you should see .venv in the prompt):
```
(.venv) F:\your_project_path>
```

Install dependencies using uv (faster) or pip:

# Option 1: Using uv (recommended - faster)
pip install uv
uv pip install -r requirements.txt

# Option 2: Using pip
pip install -r requirements.txt

This should take 30 seconds to 1 minute depending on your system.

2. API Configuration

Create .env File

Create a .env file in the root directory with the following content:

OPENAI_API_KEY="sk-proj-YOUR_API_KEY"
GROQ_API_KEY="YOUR_GROQ_API_KEY"

Get API Keys

OpenAI API Key:
- Sign up at OpenAI Platform
- Add $5 credit (should last 1-2 months for typical usage)
- Copy your API key to the .env file
Note: You can customize this to use a local AI model if preferred (streaming code doesn't support this yet, but local model support is planned)
Groq API Key (Free):
- Sign up at Groq Console
- Copy your API key to the .env file

3. Configuration

Character Configuration

There are two main configuration files:

A. `character_config.yaml`

Set the AI prompt
Configure ASR (Automatic Speech Recognition) context
Add reference audio sample (must be 3-10 seconds long)
Enter the text spoken in the audio file

B. `client/config.js`

Change the 3D model (VRM_PATH)
Adjust mouth audio threshold (MOUTH_THRESHOLD)
Place model files in client/models/ directory
Update the filename in config
Important: Model must be in VRM 1.0 format (export setting in VRoid Studio)

This file also holds the VR_CONFIG block. The most useful knob there is dollyPosition: [x, y, z] — if your avatar looks too small (or too far/too close) when you put on the headset, edit dollyPosition to move the VR rig. Y raises/lowers your virtual eye level, Z is distance from the avatar. Other tunables in the same block: touchRadius (proximity needed to trigger a touch), triggerTouchRadius (extended range while the trigger is held), touchCooldown, and haptic intensity.

C. `client/scene/roomConfig.js`

Toggles the room/environment around the avatar. enabled: false runs with no room (default). Set enabled: true and edit url / position / rotation / scale to load a GLB. There's a Japanese classroom example pre-filled at the bottom — copy those values over the active block to switch to it. fixDepth: true is a workaround for rooms that z-fight.

D. `client/scene/objectsConfig.js`

Drops static GLB props into the scene. Each entry needs name (unique key the AI can reference), url (path under client/backgrounds/glb/), and a transform. Two commented-out examples (fish, mattress) show the shape.

Where to find rooms and props: Sketchfab has a huge library of free GLB downloads. Drop the file into client/backgrounds/glb/ and reference it in roomConfig.js or objectsConfig.js.

4. Starting the Servers

Option A: Automatic Start (Recommended)

Edit start_server.bat

Change the following line to match your GPT-SoVITS installation path:

set SOVITS_PATH=D:\PyProjects\GPT-SoVITS-v3lora-20250228\GPT-SoVITS-v3lora-20250228

Run the script:
- In terminal: start_server.bat
- Or double-click the file in File Explorer
Do not close any of the terminal windows that open

Option B: Manual Start

If automatic start doesn't work:

Start the Python server:
```
cd server
python server.py
```
Start the animation server (open a second terminal):
```
cd client
npx vite
```
Open your browser and go to: http://localhost:5173

You should see a 3D model floating on screen.

5. Running the Chat

Run the main chat script:
```
cd server
python main_chat_v9.py
```
This is the current voice loop. It records on speech, streams the LLM response, generates TTS per chunk, and plays it back in order. It also runs a background click dispatcher that responds verbally when the avatar is touched (see Section 8).

MCP tool calls (optional): If you have an MCP server running and a config at ~/MCP_functions/mcp_config.json (or the path set in env var MCP_CONFIG_PATH), the script automatically picks it up and lets the model interleave speech and tool calls. Without that file it just streams text — no extra setup needed.
Troubleshooting: If you encounter issues, run the setup check script:
```
cd server
python check_setup.py
```
The check now includes click-interaction and walk-to tests in addition to the LLM / mic / TTS / VRM checks.

6. VR Support

The client supports WebXR through the Vite dev server.

Quest setup (Air Link or Link Cable)

Connect your Quest to the PC:
- Link Cable: plug the headset in, accept the "Allow access" prompt inside the headset, and switch to Quest Link from the Quest menu.
- Air Link: make sure both devices are on the same network and pair them through the Oculus PC app.
Start the servers (Section 4) and refresh the client at http://localhost:5173 inside the headset's browser, or click the Enter VR button on the desktop client while the headset is active.
If the avatar looks too small, too far, or too low, edit VR_CONFIG.dollyPosition in client/config.js and refresh.

Touch / hand interactions

In VR you can reach out and touch the avatar. Hand proximity within touchRadius triggers a click; squeezing the trigger extends the range to triggerTouchRadius. Each touch:

Plays a quick reaction sound + animation on the client (handled by the /send_click_interaction endpoint).
Buffers a [the user touched your <region>] action that the chat loop's background dispatcher picks up and forwards to the LLM, so the avatar also reacts verbally.

Tunables live in VR_CONFIG (touchCooldown, hapticIntensity, hapticDuration).

7. Customizing the Scene

The world around the avatar is built from two simple config files:

client/scene/roomConfig.js — one room/environment at a time. Set enabled: true and point url at a GLB under client/backgrounds/glb/.
client/scene/objectsConfig.js — list of static props loaded into the scene. Each entry needs name, url, position, rotation, scale.

Sketchfab is the easiest place to grab free GLB rooms and props. Download → drop into client/backgrounds/glb/ → register in the relevant config → refresh the browser.

8. Server Endpoints (for reference / scripting)

The Python bridge (server/server.py, port 8001) exposes a small REST + WebSocket API the client and chat loop both use. You can hit any of these from curl / Postman to drive the avatar manually.

Endpoint	What it does
`POST /talk`	Play an audio file with lip sync + expression
`POST /animate`	Trigger a Mixamo (`.fbx`) or VRMA (`.vrma`) animation. Set `animate_type` to `start_mixamo`, `start_vrma`, or `auto` to detect from extension
`POST /animate_and_talk`	Combined VRMA + audio with optional delay
`POST /set_state`	Switch the head/eye micro-state: `idle`, `listening`, `thinking`, `talking`
`POST /walk_to`	Walk the avatar to `{x, y, z}` at a given `speed`
`POST /stop_movement`	Cancel walking, return to idle
`POST /teleport_to`	Instant move to `{x, y, z}` (no walk anim)
`POST /set_movement_speed`	Adjust walking speed on the fly
`POST /load_movement_animation`	Load walk/idle GLB for the movement system (`anim_type: "walk"` or `"idle"`)
`POST /send_click_interaction`	Touch reaction (called by the client when you click/touch the avatar in VR or with the mouse)
`GET /pop_pending_actions`	Drains buffered click actions — used by `main_chat_v9.py` to fold touches into the LLM prompt
`POST /vr/position` / `GET /vr/position`	Push / read the VR headset position
`WS /ws`	WebSocket the client subscribes to for all broadcasts above

server/process/vrm_func/vrm_ping.py and vrm_states_ping.py are thin Python wrappers around the most-used POSTs if you want to drive the avatar from your own scripts. test_vr_positions.py shows a walk_to example.

9. Customization

Facial Expressions

Currently the model's expression defaults to relaxed. You can wire in your own emotion classifier or set it manually.

To change the expression, edit the chunk loop in server/main_chat_v9.py:

for item, item_type in stream_with_functions(messages):
    if item_type == "text":
        text_chunk = item
        tts_text = clean_llm_output(text_chunk)

        # Option 1: plug in an emotion classifier
        # emotion = get_emotion(text_chunk, None, None)
        # expression = map_emotion_to_expression(emotion)

        # Option 2: set manually (current default)
        expression = "relaxed"

Supported VRM 1.0 expressions:

happy
angry
sad
relaxed
surprised
neutral

Summary

✅ Install prerequisites (Python 3.10, VS Code, Node.js, GPT-SoVITS)
✅ Create virtual environment and install dependencies
✅ Configure API keys in .env
✅ Customize character_config.yaml, client/config.js, and the client/scene/* configs
✅ Start servers (automatic or manual)
✅ Run server/main_chat_v9.py
✅ (Optional) Plug in VR via Link / Air Link, tweak VR_CONFIG
✅ (Optional) Drop in rooms / props from Sketchfab
✅ (Optional) Customize facial expressions

For issues, run server/check_setup.py to diagnose problems.

11 KiB Raw Blame History