Intelligent Auto-Framing or Speaker (Subject)Tracking while on Video Meeting.
The system must use a combination of Audio Localization and Face Detection to identify the active speaker and dynamically zoom/pan the frame to focus on them.
When multiple people are visibly present, the system must dynamically adjust the zoom and position to frame all subjects equally and optimally.
The primary AI model for face detection and initial framing calculation should run on the local client (Desktop/Mobile) to minimize latency and server load.
Ensure the video framing adjustment is synchronized with the audio feed to prevent the unsettling effect of seeing the frame move after the speaker begins talking.
1
vote