# Setup Guide ## Prerequisites Before starting, ensure you have the following installed: - **Python 3.10** - [Download from Microsoft Store](https://apps.microsoft.com/store/detail/python-310/9PJPW5LDXLZ5) or [python.org](https://www.python.org/downloads/) - **VS Code** - [Download here](https://code.visualstudio.com/) - **Node.js and npm** - [Download here](https://nodejs.org/) (includes npx) - **GPT-SoVITS** - [One-click installer](https://github.com/RVC-Boss/GPT-SoVITS) --- ## 1. Project Setup ### Create Virtual Environment 1. Open VS Code 2. **File → New Window** 3. **File → Open Folder** (select your project directory) 4. Press **Ctrl+Shift+P** to open the command palette 5. Type `Python: Create Environment` and select it > **Note:** If you don't see this option, install the Python extension: > - Go to the Extensions sidebar (Ctrl+Shift+X) > - Search for "Python" and install it > - Close and reopen VS Code, then try again 6. Select **Venv** and choose **Python 3.10** 7. **Uncheck** "Install dependencies from requirements.txt" (we'll do this manually) 8. Click **OK** ### Install Dependencies 1. Open a new terminal: **Terminal → New Terminal** 2. Verify your virtual environment is active (you should see `.venv` in the prompt): ``` (.venv) F:\your_project_path> ``` 3. Install dependencies using uv (faster) or pip: ```bash # Option 1: Using uv (recommended - faster) pip install uv uv pip install -r requirements.txt # Option 2: Using pip pip install -r requirements.txt ``` This should take 30 seconds to 1 minute depending on your system. --- ## 2. API Configuration ### Create .env File Create a `.env` file in the root directory with the following content: ```text OPENAI_API_KEY="sk-proj-YOUR_API_KEY" GROQ_API_KEY="YOUR_GROQ_API_KEY" ``` ### Get API Keys 1. **OpenAI API Key:** - Sign up at [OpenAI Platform](https://platform.openai.com/api-keys) - Add $5 credit (should last 1-2 months for typical usage) - Copy your API key to the `.env` file > **Note:** You can customize this to use a local AI model if preferred (streaming code doesn't support this yet, but local model support is planned) 2. **Groq API Key (Free):** - Sign up at [Groq Console](https://console.groq.com/keys) - Copy your API key to the `.env` file --- ## 3. Configuration ### Character Configuration There are two main configuration files: #### A. `character_config.yaml` - Set the AI prompt - Configure ASR (Automatic Speech Recognition) context - Add reference audio sample (must be 3-10 seconds long) - Enter the text spoken in the audio file #### B. `client/config.js` - Change the 3D model (`VRM_PATH`) - Adjust mouth audio threshold (`MOUTH_THRESHOLD`) - Place model files in `client/models/` directory - Update the filename in config - **Important:** Model must be in VRM 1.0 format (export setting in VRoid Studio) This file also holds the **`VR_CONFIG`** block. The most useful knob there is `dollyPosition: [x, y, z]` — if your avatar looks too small (or too far/too close) when you put on the headset, edit `dollyPosition` to move the VR rig. `Y` raises/lowers your virtual eye level, `Z` is distance from the avatar. Other tunables in the same block: `touchRadius` (proximity needed to trigger a touch), `triggerTouchRadius` (extended range while the trigger is held), `touchCooldown`, and haptic intensity. #### C. `client/scene/roomConfig.js` Toggles the room/environment around the avatar. `enabled: false` runs with no room (default). Set `enabled: true` and edit `url` / `position` / `rotation` / `scale` to load a GLB. There's a Japanese classroom example pre-filled at the bottom — copy those values over the active block to switch to it. `fixDepth: true` is a workaround for rooms that z-fight. #### D. `client/scene/objectsConfig.js` Drops static GLB props into the scene. Each entry needs `name` (unique key the AI can reference), `url` (path under `client/backgrounds/glb/`), and a transform. Two commented-out examples (fish, mattress) show the shape. > **Where to find rooms and props:** [Sketchfab](https://sketchfab.com/) has a huge library of free GLB downloads. Drop the file into `client/backgrounds/glb/` and reference it in `roomConfig.js` or `objectsConfig.js`. --- ## 4. Starting the Servers ### Option A: Automatic Start (Recommended) 1. Edit `start_server.bat` 2. Change the following line to match your GPT-SoVITS installation path: ```batch set SOVITS_PATH=D:\PyProjects\GPT-SoVITS-v3lora-20250228\GPT-SoVITS-v3lora-20250228 ``` 3. Run the script: - In terminal: `start_server.bat` - Or double-click the file in File Explorer 4. **Do not close any of the terminal windows that open** ### Option B: Manual Start If automatic start doesn't work: 1. **Start the Python server:** ```bash cd server python server.py ``` 2. **Start the animation server** (open a second terminal): ```bash cd client npx vite ``` 3. Open your browser and go to: [http://localhost:5173](http://localhost:5173) You should see a 3D model floating on screen. --- ## 5. Running the Chat 1. Run the main chat script: ```bash cd server python main_chat_v9.py ``` This is the current voice loop. It records on speech, streams the LLM response, generates TTS per chunk, and plays it back in order. It also runs a background click dispatcher that responds verbally when the avatar is touched (see Section 8). > **MCP tool calls (optional):** If you have an MCP server running and a config at `~/MCP_functions/mcp_config.json` (or the path set in env var `MCP_CONFIG_PATH`), the script automatically picks it up and lets the model interleave speech and tool calls. Without that file it just streams text — no extra setup needed. 2. **Troubleshooting:** If you encounter issues, run the setup check script: ```bash cd server python check_setup.py ``` The check now includes click-interaction and walk-to tests in addition to the LLM / mic / TTS / VRM checks. --- ## 6. VR Support The client supports WebXR through the Vite dev server. ### Quest setup (Air Link or Link Cable) 1. Connect your Quest to the PC: - **Link Cable:** plug the headset in, accept the "Allow access" prompt inside the headset, and switch to Quest Link from the Quest menu. - **Air Link:** make sure both devices are on the same network and pair them through the Oculus PC app. 2. Start the servers (Section 4) and **refresh the client** at [http://localhost:5173](http://localhost:5173) inside the headset's browser, or click the **Enter VR** button on the desktop client while the headset is active. 3. If the avatar looks too small, too far, or too low, edit `VR_CONFIG.dollyPosition` in `client/config.js` and refresh. ### Touch / hand interactions In VR you can reach out and touch the avatar. Hand proximity within `touchRadius` triggers a click; squeezing the trigger extends the range to `triggerTouchRadius`. Each touch: - Plays a quick reaction sound + animation on the client (handled by the `/send_click_interaction` endpoint). - Buffers a `[the user touched your ]` action that the chat loop's background dispatcher picks up and forwards to the LLM, so the avatar also reacts verbally. Tunables live in `VR_CONFIG` (`touchCooldown`, `hapticIntensity`, `hapticDuration`). --- ## 7. Customizing the Scene The world around the avatar is built from two simple config files: - **`client/scene/roomConfig.js`** — one room/environment at a time. Set `enabled: true` and point `url` at a GLB under `client/backgrounds/glb/`. - **`client/scene/objectsConfig.js`** — list of static props loaded into the scene. Each entry needs `name`, `url`, `position`, `rotation`, `scale`. [Sketchfab](https://sketchfab.com/) is the easiest place to grab free GLB rooms and props. Download → drop into `client/backgrounds/glb/` → register in the relevant config → refresh the browser. --- ## 8. Server Endpoints (for reference / scripting) The Python bridge (`server/server.py`, port 8001) exposes a small REST + WebSocket API the client and chat loop both use. You can hit any of these from `curl` / Postman to drive the avatar manually. | Endpoint | What it does | |---|---| | `POST /talk` | Play an audio file with lip sync + expression | | `POST /animate` | Trigger a Mixamo (`.fbx`) or VRMA (`.vrma`) animation. Set `animate_type` to `start_mixamo`, `start_vrma`, or `auto` to detect from extension | | `POST /animate_and_talk` | Combined VRMA + audio with optional delay | | `POST /set_state` | Switch the head/eye micro-state: `idle`, `listening`, `thinking`, `talking` | | `POST /walk_to` | Walk the avatar to `{x, y, z}` at a given `speed` | | `POST /stop_movement` | Cancel walking, return to idle | | `POST /teleport_to` | Instant move to `{x, y, z}` (no walk anim) | | `POST /set_movement_speed` | Adjust walking speed on the fly | | `POST /load_movement_animation` | Load walk/idle GLB for the movement system (`anim_type: "walk"` or `"idle"`) | | `POST /send_click_interaction` | Touch reaction (called by the client when you click/touch the avatar in VR or with the mouse) | | `GET /pop_pending_actions` | Drains buffered click actions — used by `main_chat_v9.py` to fold touches into the LLM prompt | | `POST /vr/position` / `GET /vr/position` | Push / read the VR headset position | | `WS /ws` | WebSocket the client subscribes to for all broadcasts above | `server/process/vrm_func/vrm_ping.py` and `vrm_states_ping.py` are thin Python wrappers around the most-used POSTs if you want to drive the avatar from your own scripts. `test_vr_positions.py` shows a `walk_to` example. --- ## 9. Customization ### Facial Expressions Currently the model's expression defaults to `relaxed`. You can wire in your own emotion classifier or set it manually. **To change the expression**, edit the chunk loop in `server/main_chat_v9.py`: ```python for item, item_type in stream_with_functions(messages): if item_type == "text": text_chunk = item tts_text = clean_llm_output(text_chunk) # Option 1: plug in an emotion classifier # emotion = get_emotion(text_chunk, None, None) # expression = map_emotion_to_expression(emotion) # Option 2: set manually (current default) expression = "relaxed" ``` **Supported VRM 1.0 expressions:** - `happy` - `angry` - `sad` - `relaxed` - `surprised` - `neutral` --- ## Summary 1. ✅ Install prerequisites (Python 3.10, VS Code, Node.js, GPT-SoVITS) 2. ✅ Create virtual environment and install dependencies 3. ✅ Configure API keys in `.env` 4. ✅ Customize `character_config.yaml`, `client/config.js`, and the `client/scene/*` configs 5. ✅ Start servers (automatic or manual) 6. ✅ Run `server/main_chat_v9.py` 7. ✅ (Optional) Plug in VR via Link / Air Link, tweak `VR_CONFIG` 8. ✅ (Optional) Drop in rooms / props from Sketchfab 9. ✅ (Optional) Customize facial expressions For issues, run `server/check_setup.py` to diagnose problems.