Text-to-video, transcription, and automatic clip extraction.
The Video superskill is a complete AI video pipeline. Generate videos from text prompts or reference images using Veo 3.1, Kling V3 Pro, or Sora 2 Pro — in resolutions up to 4K. Upload existing videos and they are automatically transcribed, then the AI analyzes the transcript to identify and extract highlight clips. Clips can be reframed to portrait orientation for social media stories and reels, all handled by your agent end-to-end.
Capabilities available out of the box.
Platforms and services your agent connects to.
CLI commands available with this superskill.
List videos in your library.
clawpow videos listGet details of a specific video including transcription status and metadata.
clawpow videos get <videoId>Upload a video or audio file. Transcription starts automatically.
clawpow videos upload <file>Delete a video and its clips from the library.
clawpow videos delete <videoId>Analyze a transcription and create short-form clips with optional portrait reframing.
clawpow videos clip <videoId>List clips for a video with optional status filter.
clawpow videos clips <videoId>Generate a video from a text prompt or reference image using AI.
clawpow videos generateGet details and status of a video generation.
clawpow videos generation <generationId>List all video generations.
clawpow videos generationsCreate a new video project for composing timeline-based videos.
clawpow videos projects createList all video projects.
clawpow videos projects listGet project details including timeline elements.
clawpow videos projects get <projectId>Delete a video project.
clawpow videos projects delete <projectId>Add a video, caption, graphic, or soundtrack element to a project timeline.
clawpow videos projects add <projectId>Remove an element from a project timeline.
clawpow videos projects remove-element <projectId>Update properties of an existing element on the timeline.
clawpow videos projects update-element <projectId>Render a video project into a final MP4 file via Remotion Lambda.
clawpow videos projects render <projectId>Generate a new AI actor with portrait (9:16) and landscape (16:9) images.
clawpow videos actors createList all available actors (platform and user-created).
clawpow videos actors listGet details of a specific actor including all image URLs.
clawpow videos actors get <actorId>Delete a user-created actor.
clawpow videos actors delete <actorId># Video
Upload videos, transcribe them, extract highlight clips, or generate entirely new videos from text prompts.
## When to Use
- User wants to upload and manage video content
- User needs a video transcribed
- User wants to extract highlight clips from a longer video
- User wants to generate a video from a text prompt or image
- User wants to repurpose long-form video into shorter clips for social media
## How It Connects
- **Social Media** → Attach video clips to social posts using `--media` with the asset ID.
- **Documents** → Video transcriptions are saved as document pages for reference and search.
- **Design** → For static images, use the design skill instead. Video handles moving content.
- **Blog** → Embed video content in blog articles or create written summaries from transcripts.
- **Newsletter** → Include video links or thumbnails in newsletter video blocks.
## Quick Start
List your videos:
```bash
clawpow videos list --json
```
Upload a video:
```bash
clawpow videos upload ./my-video.mp4 --title "My Video" --json
```
## Commands
### List Videos
```bash
clawpow videos list --json
```
List all videos in your library.
**Returns:** Array of `{_id, fileName, durationMs, transcriptionStatus, assetId, _creationTime}`
### Get Video
```bash
clawpow videos get <videoId> --json
```
Get details of a specific video including transcription status and metadata.
**Options:**
- `<videoId>` (required, positional) -- Video ID
**Returns:** Full video object with `{_id, fileName, mimeType, durationMs, transcriptionStatus, detectedLanguage, transcript, assetId, url, studioUrl, _creationTime}`
### Upload Video
```bash
clawpow videos upload ./my-video.mp4 --title "My Video" --json
clawpow videos upload ./podcast.mp3 --title "Episode 42" --json
```
Upload a video or audio file to your library. Transcription starts automatically after upload.
**Options:**
- `<file>` (required, positional) -- Path to video or audio file
- `--title <text>` (required) -- Title for the video
- `--language <code>` -- Language hint for transcription (e.g. "en", "es"). Auto-detected if omitted.
**Supported formats:** mp4, mov, webm, mpeg, avi, mp3, wav, m4a, ogg, flac
**Returns:** `{"videoId": "..."}`
### Remove Video
```bash
clawpow videos delete <videoId> --json
```
Delete a video and its clips from the library.
**Options:**
- `<videoId>` (required, positional) -- Video ID
**Returns:** `{"deleted": true, "videoId": "..."}`
### Create Clips
```bash
clawpow videos clip <videoId> --json
clawpow videos clip <videoId> --format portrait --max-clips 3 --json
clawpow videos clip <videoId> --min-clips 2 --max-clips 5 --max-duration 60 --json
```
Analyze a transcription and create short-form clips with optional portrait reframing. Requires the video to be transcribed first.
**Options:**
- `<videoId>` (required, positional) -- Video ID (must be transcribed)
- `--format <format>` -- Output format: portrait or landscape (default: portrait)
- `--min-clips <n>` -- Minimum number of clips to generate (1-10, default: 1)
- `--max-clips <n>` -- Maximum number of clips to generate (1-10, default: 5)
- `--max-duration <seconds>` -- Maximum clip duration in seconds (10-180, default: 90)
**Returns:** `{"videoId": "...", "status": "clipping", "nextCommand": "clawpow videos clips <videoId>"}`
### List Clips
```bash
clawpow videos clips <videoId> --json
clawpow videos clips <videoId> --status ready --json
```
List clips for a video with optional status filter.
**Options:**
- `<videoId>` (required, positional) -- Video ID
- `--status <status>` -- Filter by clip status
**Returns:** Array of `{_id, videoId, title, startTime, endTime, durationMs, format, status, assetId, url, _creationTime}`
### Generate Video
```bash
clawpow videos generate --prompt "A drone shot over mountains at sunset" --json
clawpow videos generate --prompt "Product reveal" --image ./product.png --model kling-v3-pro --json
clawpow videos generate --prompt "City timelapse" --model sora-2-pro --duration 15 --json
clawpow videos generate --prompt "Abstract art" --model veo-3.1 --resolution 4k --no-audio --json
```
Generate a video from a text prompt, or from a text prompt plus a reference image.
**Options:**
- `--prompt <text>` (required) -- Text prompt describing the video
- `--model <name>` -- Model: veo-3.1 (default), kling-v3-pro, sora-2-pro
- `--image <path>` -- Reference image (switches to image-to-video mode)
- `--resolution <res>` -- 720p, 1080p (default), 4k (veo-3.1 only)
- `--duration <seconds>` -- Duration in seconds (default: 5, clamped to model max)
- `--aspect-ratio <ratio>` -- 16:9 (default), 9:16, 1:1
- `--no-audio` -- Disable audio generation
- `--json` -- Structured output
**Returns:** `{"generationId": "...", "status": "queued", "studioUrl": "...", "nextCommand": "clawpow videos generation <generationId>"}`
**Model limits:**
| Model | Max Resolution | Max Duration |
|-------|---------------|-------------|
| veo-3.1 | 4K | 8s |
| kling-v3-pro | 1080p | 10s |
| sora-2-pro | 1080p | 25s |
### Get Generation
```bash
clawpow videos generation <generationId> --json
```
Get details and status of a video generation.
**Returns:** Full generation object with `{_id, status, mode, model, prompt, resolution, duration, audio, assetId, width, height, outputDurationMs, error, url, studioUrl, _creationTime}`
### List Generations
```bash
clawpow videos generations --json
clawpow videos generations --status ready --json
```
List all video generations with optional status filter.
**Options:**
- `--status <status>` -- Filter by status: pending, processing, uploading, ready, failed
**Returns:** Array of generation objects with `url` included when an asset is available
## Output Format
All commands support `--json` for structured output. Without `--json`, output is human-readable.
## Important Notes
- Always use `--json` for machine-parseable output
- Video IDs are Convex document IDs (e.g., "k57abc123def")
- Transcription and clip creation are paid operations (consume credits)
- Video listing, getting, uploading, and clip listing are free operations
- A video must be transcribed before clips can be created
- Clip generation uses AI to find the most engaging segments
- Portrait reframing automatically adjusts framing for vertical video formats
- Video generation is a paid operation (consumes credits based on model, resolution, duration, and audio)
- Available models: veo-3.1 (best quality, supports 4K), kling-v3-pro (best for cinematic/dialogue), sora-2-pro (longest duration up to 25s)
Works great together with these.