Ai_Assistant/SETUP_GUIDE.md

# Setup Guide

## Prerequisites

Before starting, ensure you have the following installed:

- **Python 3.10** - [Download from Microsoft Store](https://apps.microsoft.com/store/detail/python-310/9PJPW5LDXLZ5) or [python.org](https://www.python.org/downloads/)
- **VS Code** - [Download here](https://code.visualstudio.com/)
- **Node.js and npm** - [Download here](https://nodejs.org/) (includes npx)
- **GPT-SoVITS** - [One-click installer](https://github.com/RVC-Boss/GPT-SoVITS)

---

## 1. Project Setup

### Create Virtual Environment

1. Open VS Code
2. **File → New Window**
3. **File → Open Folder** (select your project directory)
4. Press **Ctrl+Shift+P** to open the command palette
5. Type `Python: Create Environment` and select it

   > **Note:** If you don't see this option, install the Python extension:
   > - Go to the Extensions sidebar (Ctrl+Shift+X)
   > - Search for "Python" and install it
   > - Close and reopen VS Code, then try again

6. Select **Venv** and choose **Python 3.10**
7. **Uncheck** "Install dependencies from requirements.txt" (we'll do this manually)
8. Click **OK**

### Install Dependencies

1. Open a new terminal: **Terminal → New Terminal**
2. Verify your virtual environment is active (you should see `.venv` in the prompt):
   ```
   (.venv) F:\your_project_path>
   ```

3. Install dependencies using uv (faster) or pip:
   ```bash
   # Option 1: Using uv (recommended - faster)
   pip install uv
   uv pip install -r requirements.txt
   
   # Option 2: Using pip
   pip install -r requirements.txt
   ```
   This should take 30 seconds to 1 minute depending on your system.

---

## 2. API Configuration

### Create .env File

Create a `.env` file in the root directory with the following content:

```text
OPENAI_API_KEY="sk-proj-YOUR_API_KEY"
GROQ_API_KEY="YOUR_GROQ_API_KEY"
```

### Get API Keys

1. **OpenAI API Key:**
   - Sign up at [OpenAI Platform](https://platform.openai.com/api-keys)
   - Add $5 credit (should last 1-2 months for typical usage)
   - Copy your API key to the `.env` file
   
   > **Note:** You can customize this to use a local AI model if preferred (streaming code doesn't support this yet, but local model support is planned)

2. **Groq API Key (Free):**
   - Sign up at [Groq Console](https://console.groq.com/keys)
   - Copy your API key to the `.env` file

---

## 3. Configuration

### Character Configuration

There are two main configuration files:

#### A. `character_config.yaml`
- Set the AI prompt
- Configure ASR (Automatic Speech Recognition) context
- Add reference audio sample (must be 3-10 seconds long)
- Enter the text spoken in the audio file

#### B. `client/config.js`
- Change the 3D model (`VRM_PATH`)
- Adjust mouth audio threshold (`MOUTH_THRESHOLD`)
- Place model files in `client/models/` directory
- Update the filename in config
- **Important:** Model must be in VRM 1.0 format (export setting in VRoid Studio)

This file also holds the **`VR_CONFIG`** block. The most useful knob there is `dollyPosition: [x, y, z]` — if your avatar looks too small (or too far/too close) when you put on the headset, edit `dollyPosition` to move the VR rig. `Y` raises/lowers your virtual eye level, `Z` is distance from the avatar. Other tunables in the same block: `touchRadius` (proximity needed to trigger a touch), `triggerTouchRadius` (extended range while the trigger is held), `touchCooldown`, and haptic intensity.

#### C. `client/scene/roomConfig.js`
Toggles the room/environment around the avatar. `enabled: false` runs with no room (default). Set `enabled: true` and edit `url` / `position` / `rotation` / `scale` to load a GLB. There's a Japanese classroom example pre-filled at the bottom — copy those values over the active block to switch to it. `fixDepth: true` is a workaround for rooms that z-fight.

#### D. `client/scene/objectsConfig.js`
Drops static GLB props into the scene. Each entry needs `name` (unique key the AI can reference), `url` (path under `client/backgrounds/glb/`), and a transform. Two commented-out examples (fish, mattress) show the shape.

> **Where to find rooms and props:** [Sketchfab](https://sketchfab.com/) has a huge library of free GLB downloads. Drop the file into `client/backgrounds/glb/` and reference it in `roomConfig.js` or `objectsConfig.js`.

---

## 4. Starting the Servers

### Option A: Automatic Start (Recommended)

1. Edit `start_server.bat`
2. Change the following line to match your GPT-SoVITS installation path:
   ```batch
   set SOVITS_PATH=D:\PyProjects\GPT-SoVITS-v3lora-20250228\GPT-SoVITS-v3lora-20250228
   ```
3. Run the script:
   - In terminal: `start_server.bat`
   - Or double-click the file in File Explorer
4. **Do not close any of the terminal windows that open**

### Option B: Manual Start

If automatic start doesn't work:

1. **Start the Python server:**
   ```bash
   cd server
   python server.py
   ```

2. **Start the animation server** (open a second terminal):
   ```bash
   cd client
   npx vite
   ```

3. Open your browser and go to: [http://localhost:5173](http://localhost:5173)
   
   You should see a 3D model floating on screen.

---

## 5. Running the Chat

1. Run the main chat script:
   ```bash
   cd server
   python main_chat_v9.py
   ```

   This is the current voice loop. It records on speech, streams the LLM response, generates TTS per chunk, and plays it back in order. It also runs a background click dispatcher that responds verbally when the avatar is touched (see Section 8).

   > **MCP tool calls (optional):** If you have an MCP server running and a config at `~/MCP_functions/mcp_config.json` (or the path set in env var `MCP_CONFIG_PATH`), the script automatically picks it up and lets the model interleave speech and tool calls. Without that file it just streams text — no extra setup needed.

2. **Troubleshooting:** If you encounter issues, run the setup check script:
   ```bash
   cd server
   python check_setup.py
   ```

   The check now includes click-interaction and walk-to tests in addition to the LLM / mic / TTS / VRM checks.

---

## 6. VR Support

The client supports WebXR through the Vite dev server.

### Quest setup (Air Link or Link Cable)

1. Connect your Quest to the PC:
   - **Link Cable:** plug the headset in, accept the "Allow access" prompt inside the headset, and switch to Quest Link from the Quest menu.
   - **Air Link:** make sure both devices are on the same network and pair them through the Oculus PC app.
2. Start the servers (Section 4) and **refresh the client** at [http://localhost:5173](http://localhost:5173) inside the headset's browser, or click the **Enter VR** button on the desktop client while the headset is active.
3. If the avatar looks too small, too far, or too low, edit `VR_CONFIG.dollyPosition` in `client/config.js` and refresh.

### Touch / hand interactions

In VR you can reach out and touch the avatar. Hand proximity within `touchRadius` triggers a click; squeezing the trigger extends the range to `triggerTouchRadius`. Each touch:
- Plays a quick reaction sound + animation on the client (handled by the `/send_click_interaction` endpoint).
- Buffers a `[the user touched your <region>]` action that the chat loop's background dispatcher picks up and forwards to the LLM, so the avatar also reacts verbally.

Tunables live in `VR_CONFIG` (`touchCooldown`, `hapticIntensity`, `hapticDuration`).

---

## 7. Customizing the Scene

The world around the avatar is built from two simple config files:

- **`client/scene/roomConfig.js`** — one room/environment at a time. Set `enabled: true` and point `url` at a GLB under `client/backgrounds/glb/`.
- **`client/scene/objectsConfig.js`** — list of static props loaded into the scene. Each entry needs `name`, `url`, `position`, `rotation`, `scale`.

[Sketchfab](https://sketchfab.com/) is the easiest place to grab free GLB rooms and props. Download → drop into `client/backgrounds/glb/` → register in the relevant config → refresh the browser.

---

## 8. Server Endpoints (for reference / scripting)

The Python bridge (`server/server.py`, port 8001) exposes a small REST + WebSocket API the client and chat loop both use. You can hit any of these from `curl` / Postman to drive the avatar manually.

| Endpoint | What it does |
|---|---|
| `POST /talk` | Play an audio file with lip sync + expression |
| `POST /animate` | Trigger a Mixamo (`.fbx`) or VRMA (`.vrma`) animation. Set `animate_type` to `start_mixamo`, `start_vrma`, or `auto` to detect from extension |
| `POST /animate_and_talk` | Combined VRMA + audio with optional delay |
| `POST /set_state` | Switch the head/eye micro-state: `idle`, `listening`, `thinking`, `talking` |
| `POST /walk_to` | Walk the avatar to `{x, y, z}` at a given `speed` |
| `POST /stop_movement` | Cancel walking, return to idle |
| `POST /teleport_to` | Instant move to `{x, y, z}` (no walk anim) |
| `POST /set_movement_speed` | Adjust walking speed on the fly |
| `POST /load_movement_animation` | Load walk/idle GLB for the movement system (`anim_type: "walk"` or `"idle"`) |
| `POST /send_click_interaction` | Touch reaction (called by the client when you click/touch the avatar in VR or with the mouse) |
| `GET /pop_pending_actions` | Drains buffered click actions — used by `main_chat_v9.py` to fold touches into the LLM prompt |
| `POST /vr/position` / `GET /vr/position` | Push / read the VR headset position |
| `WS /ws` | WebSocket the client subscribes to for all broadcasts above |

`server/process/vrm_func/vrm_ping.py` and `vrm_states_ping.py` are thin Python wrappers around the most-used POSTs if you want to drive the avatar from your own scripts. `test_vr_positions.py` shows a `walk_to` example.

---

## 9. Customization

### Facial Expressions

Currently the model's expression defaults to `relaxed`. You can wire in your own emotion classifier or set it manually.

**To change the expression**, edit the chunk loop in `server/main_chat_v9.py`:

```python
for item, item_type in stream_with_functions(messages):
    if item_type == "text":
        text_chunk = item
        tts_text = clean_llm_output(text_chunk)

        # Option 1: plug in an emotion classifier
        # emotion = get_emotion(text_chunk, None, None)
        # expression = map_emotion_to_expression(emotion)

        # Option 2: set manually (current default)
        expression = "relaxed"
```

**Supported VRM 1.0 expressions:**
- `happy`
- `angry`
- `sad`
- `relaxed`
- `surprised`
- `neutral`

---

## Summary

1. ✅ Install prerequisites (Python 3.10, VS Code, Node.js, GPT-SoVITS)
2. ✅ Create virtual environment and install dependencies
3. ✅ Configure API keys in `.env`
4. ✅ Customize `character_config.yaml`, `client/config.js`, and the `client/scene/*` configs
5. ✅ Start servers (automatic or manual)
6. ✅ Run `server/main_chat_v9.py`
7. ✅ (Optional) Plug in VR via Link / Air Link, tweak `VR_CONFIG`
8. ✅ (Optional) Drop in rooms / props from Sketchfab
9. ✅ (Optional) Customize facial expressions

For issues, run `server/check_setup.py` to diagnose problems.
Initial release 0.5 2026-05-24 13:31:30 +02:00			`# Setup Guide`

			`## Prerequisites`

			`Before starting, ensure you have the following installed:`

			`- Python 3.10 - [Download from Microsoft Store](https://apps.microsoft.com/store/detail/python-310/9PJPW5LDXLZ5) or [python.org](https://www.python.org/downloads/)`
			`- VS Code - [Download here](https://code.visualstudio.com/)`
			`- Node.js and npm - [Download here](https://nodejs.org/) (includes npx)`
			`- GPT-SoVITS - [One-click installer](https://github.com/RVC-Boss/GPT-SoVITS)`

			`---`

			`## 1. Project Setup`

			`### Create Virtual Environment`

			`1. Open VS Code`
			`2. File → New Window`
			`3. File → Open Folder (select your project directory)`
			`4. Press Ctrl+Shift+P to open the command palette`
			5. Type `Python: Create Environment` and select it

			`> Note: If you don't see this option, install the Python extension:`
			`> - Go to the Extensions sidebar (Ctrl+Shift+X)`
			`> - Search for "Python" and install it`
			`> - Close and reopen VS Code, then try again`

			`6. Select Venv and choose Python 3.10`
			`7. Uncheck "Install dependencies from requirements.txt" (we'll do this manually)`
			`8. Click OK`

			`### Install Dependencies`

			`1. Open a new terminal: Terminal → New Terminal`
			2. Verify your virtual environment is active (you should see `.venv` in the prompt):
			```
			`(.venv) F:\your_project_path>`
			```

			`3. Install dependencies using uv (faster) or pip:`
			```bash
			`# Option 1: Using uv (recommended - faster)`
			`pip install uv`
			`uv pip install -r requirements.txt`

			`# Option 2: Using pip`
			`pip install -r requirements.txt`
			```
			`This should take 30 seconds to 1 minute depending on your system.`

			`---`

			`## 2. API Configuration`

			`### Create .env File`

			Create a `.env` file in the root directory with the following content:

			```text
			`OPENAI_API_KEY="sk-proj-YOUR_API_KEY"`
			`GROQ_API_KEY="YOUR_GROQ_API_KEY"`
			```

			`### Get API Keys`

			`1. OpenAI API Key:`
			`- Sign up at [OpenAI Platform](https://platform.openai.com/api-keys)`
			`- Add $5 credit (should last 1-2 months for typical usage)`
			- Copy your API key to the `.env` file

			`> Note: You can customize this to use a local AI model if preferred (streaming code doesn't support this yet, but local model support is planned)`

			`2. Groq API Key (Free):`
			`- Sign up at [Groq Console](https://console.groq.com/keys)`
			- Copy your API key to the `.env` file

			`---`

			`## 3. Configuration`

			`### Character Configuration`

			`There are two main configuration files:`

			#### A. `character_config.yaml`
			`- Set the AI prompt`
			`- Configure ASR (Automatic Speech Recognition) context`
			`- Add reference audio sample (must be 3-10 seconds long)`
			`- Enter the text spoken in the audio file`

			#### B. `client/config.js`
			- Change the 3D model (`VRM_PATH`)
			- Adjust mouth audio threshold (`MOUTH_THRESHOLD`)
			- Place model files in `client/models/` directory
			`- Update the filename in config`
			`- Important: Model must be in VRM 1.0 format (export setting in VRoid Studio)`

			This file also holds the `VR_CONFIG` block. The most useful knob there is `dollyPosition: [x, y, z]` — if your avatar looks too small (or too far/too close) when you put on the headset, edit `dollyPosition` to move the VR rig. `Y` raises/lowers your virtual eye level, `Z` is distance from the avatar. Other tunables in the same block: `touchRadius` (proximity needed to trigger a touch), `triggerTouchRadius` (extended range while the trigger is held), `touchCooldown`, and haptic intensity.

			#### C. `client/scene/roomConfig.js`
			Toggles the room/environment around the avatar. `enabled: false` runs with no room (default). Set `enabled: true` and edit `url` / `position` / `rotation` / `scale` to load a GLB. There's a Japanese classroom example pre-filled at the bottom — copy those values over the active block to switch to it. `fixDepth: true` is a workaround for rooms that z-fight.

			#### D. `client/scene/objectsConfig.js`
			Drops static GLB props into the scene. Each entry needs `name` (unique key the AI can reference), `url` (path under `client/backgrounds/glb/`), and a transform. Two commented-out examples (fish, mattress) show the shape.

			> Where to find rooms and props: [Sketchfab](https://sketchfab.com/) has a huge library of free GLB downloads. Drop the file into `client/backgrounds/glb/` and reference it in `roomConfig.js` or `objectsConfig.js`.

			`---`

			`## 4. Starting the Servers`

			`### Option A: Automatic Start (Recommended)`

			1. Edit `start_server.bat`
			`2. Change the following line to match your GPT-SoVITS installation path:`
			```batch
			`set SOVITS_PATH=D:\PyProjects\GPT-SoVITS-v3lora-20250228\GPT-SoVITS-v3lora-20250228`
			```
			`3. Run the script:`
			- In terminal: `start_server.bat`
			`- Or double-click the file in File Explorer`
			`4. Do not close any of the terminal windows that open`

			`### Option B: Manual Start`

			`If automatic start doesn't work:`

			`1. Start the Python server:`
			```bash
			`cd server`
			`python server.py`
			```

			`2. Start the animation server (open a second terminal):`
			```bash
			`cd client`
			`npx vite`
			```

			`3. Open your browser and go to: [http://localhost:5173](http://localhost:5173)`

			`You should see a 3D model floating on screen.`

			`---`

			`## 5. Running the Chat`

			`1. Run the main chat script:`
			```bash
			`cd server`
			`python main_chat_v9.py`
			```

			`This is the current voice loop. It records on speech, streams the LLM response, generates TTS per chunk, and plays it back in order. It also runs a background click dispatcher that responds verbally when the avatar is touched (see Section 8).`

			> MCP tool calls (optional): If you have an MCP server running and a config at `~/MCP_functions/mcp_config.json` (or the path set in env var `MCP_CONFIG_PATH`), the script automatically picks it up and lets the model interleave speech and tool calls. Without that file it just streams text — no extra setup needed.

			`2. Troubleshooting: If you encounter issues, run the setup check script:`
			```bash
			`cd server`
			`python check_setup.py`
			```

			`The check now includes click-interaction and walk-to tests in addition to the LLM / mic / TTS / VRM checks.`

			`---`

			`## 6. VR Support`

			`The client supports WebXR through the Vite dev server.`

			`### Quest setup (Air Link or Link Cable)`

			`1. Connect your Quest to the PC:`
			`- Link Cable: plug the headset in, accept the "Allow access" prompt inside the headset, and switch to Quest Link from the Quest menu.`
			`- Air Link: make sure both devices are on the same network and pair them through the Oculus PC app.`
			`2. Start the servers (Section 4) and refresh the client at [http://localhost:5173](http://localhost:5173) inside the headset's browser, or click the Enter VR button on the desktop client while the headset is active.`
			3. If the avatar looks too small, too far, or too low, edit `VR_CONFIG.dollyPosition` in `client/config.js` and refresh.

			`### Touch / hand interactions`

			In VR you can reach out and touch the avatar. Hand proximity within `touchRadius` triggers a click; squeezing the trigger extends the range to `triggerTouchRadius`. Each touch:
			- Plays a quick reaction sound + animation on the client (handled by the `/send_click_interaction` endpoint).
			- Buffers a `[the user touched your <region>]` action that the chat loop's background dispatcher picks up and forwards to the LLM, so the avatar also reacts verbally.

			Tunables live in `VR_CONFIG` (`touchCooldown`, `hapticIntensity`, `hapticDuration`).

			`---`

			`## 7. Customizing the Scene`

			`The world around the avatar is built from two simple config files:`

			- `client/scene/roomConfig.js` — one room/environment at a time. Set `enabled: true` and point `url` at a GLB under `client/backgrounds/glb/`.
			- `client/scene/objectsConfig.js` — list of static props loaded into the scene. Each entry needs `name`, `url`, `position`, `rotation`, `scale`.

			[Sketchfab](https://sketchfab.com/) is the easiest place to grab free GLB rooms and props. Download → drop into `client/backgrounds/glb/` → register in the relevant config → refresh the browser.

			`---`

			`## 8. Server Endpoints (for reference / scripting)`

			The Python bridge (`server/server.py`, port 8001) exposes a small REST + WebSocket API the client and chat loop both use. You can hit any of these from `curl` / Postman to drive the avatar manually.

			`\| Endpoint \| What it does \|`
			`\|---\|---\|`
			\| `POST /talk` \| Play an audio file with lip sync + expression \|
			\| `POST /animate` \| Trigger a Mixamo (`.fbx`) or VRMA (`.vrma`) animation. Set `animate_type` to `start_mixamo`, `start_vrma`, or `auto` to detect from extension \|
			\| `POST /animate_and_talk` \| Combined VRMA + audio with optional delay \|
			\| `POST /set_state` \| Switch the head/eye micro-state: `idle`, `listening`, `thinking`, `talking` \|
			\| `POST /walk_to` \| Walk the avatar to `{x, y, z}` at a given `speed` \|
			\| `POST /stop_movement` \| Cancel walking, return to idle \|
			\| `POST /teleport_to` \| Instant move to `{x, y, z}` (no walk anim) \|
			\| `POST /set_movement_speed` \| Adjust walking speed on the fly \|
			\| `POST /load_movement_animation` \| Load walk/idle GLB for the movement system (`anim_type: "walk"` or `"idle"`) \|
			\| `POST /send_click_interaction` \| Touch reaction (called by the client when you click/touch the avatar in VR or with the mouse) \|
			\| `GET /pop_pending_actions` \| Drains buffered click actions — used by `main_chat_v9.py` to fold touches into the LLM prompt \|
			\| `POST /vr/position` / `GET /vr/position` \| Push / read the VR headset position \|
			\| `WS /ws` \| WebSocket the client subscribes to for all broadcasts above \|

			`server/process/vrm_func/vrm_ping.py` and `vrm_states_ping.py` are thin Python wrappers around the most-used POSTs if you want to drive the avatar from your own scripts. `test_vr_positions.py` shows a `walk_to` example.

			`---`

			`## 9. Customization`

			`### Facial Expressions`

			Currently the model's expression defaults to `relaxed`. You can wire in your own emotion classifier or set it manually.

			To change the expression, edit the chunk loop in `server/main_chat_v9.py`:

			```python
			`for item, item_type in stream_with_functions(messages):`
			`if item_type == "text":`
			`text_chunk = item`
			`tts_text = clean_llm_output(text_chunk)`

			`# Option 1: plug in an emotion classifier`
			`# emotion = get_emotion(text_chunk, None, None)`
			`# expression = map_emotion_to_expression(emotion)`

			`# Option 2: set manually (current default)`
			`expression = "relaxed"`
			```

			`Supported VRM 1.0 expressions:`
			- `happy`
			- `angry`
			- `sad`
			- `relaxed`
			- `surprised`
			- `neutral`

			`---`

			`## Summary`

			`1. ✅ Install prerequisites (Python 3.10, VS Code, Node.js, GPT-SoVITS)`
			`2. ✅ Create virtual environment and install dependencies`
			3. ✅ Configure API keys in `.env`
			4. ✅ Customize `character_config.yaml`, `client/config.js`, and the `client/scene/*` configs
			`5. ✅ Start servers (automatic or manual)`
			6. ✅ Run `server/main_chat_v9.py`
			7. ✅ (Optional) Plug in VR via Link / Air Link, tweak `VR_CONFIG`
			`8. ✅ (Optional) Drop in rooms / props from Sketchfab`
			`9. ✅ (Optional) Customize facial expressions`

			For issues, run `server/check_setup.py` to diagnose problems.