My Vox - AI Powered Voice Cloning

CASE STUDY

AI-based web application for singing voice conversion

Terzetto Labs is a web3 and AI service provider firm striving to make life easier for the average daily user in the tech world. A client seeking to launch a web app that could convert users' voices into other singers' vocals using AI-trained models approached the team. The main goal was to provide a seamless user experience, allowing users to easily access and use the AI models through a web interface. An additional feature for song sharing on platforms like Spotify, from which users could gain an audience and earn revenue, was also required. An on-demand GPU implementation approach was utilized by the team to create an impactful yet seamless experience for the users.

The Client

The client was a tech enthusiast couple who wanted to create a difference in the lives of daily users who want to create and improve their vocals using AI tuning in the best way possible. They have a deep passion for music and also like to sing indie songs which motivated them to build such an app where users can earn money based on their subscribers. The early deployment of the app helped the client to enjoy the first mover advantage with high network accessibility.

Potential Challenges

The main potential challenge was to host the trained AI model on standard web servers, without causing any latency and inference issues. However, this was a difficult task to achieve as standard servers are not meant to host such heavy computations due to the inherent limitations.

Soon, during development it became quite evident that the model's performance was suboptimal due to insufficient computational resources. The limits of CPU processing were frequently being hit even with the simple matrix calculations for the neural network.

Performance Issues with AI Models on CPU Web Servers

01
Low Latency

Real-time voice conversion required instant processing, but CPU-based servers introduced delays. Optimizing inference time was essential for a seamless experience.

02
Poor Performance of AI model

Limited computational power on CPUs led to slow model performance, making conversion times impractically long and reducing overall efficiency.

03
Computationally Extensive

AI models required heavy processing for deep learning computations. Standard web servers struggled with these workloads, causing frequent performance bottlenecks.

04
Bandwidth Bottlenecks

Handling large audio files and AI-generated outputs efficiently was critical. High data transfer rates could slow performance and affect responsiveness.

05
Parallel Optimization

CPUs lacked the ability to efficiently run AI models in parallel, leading to slow execution times. A better infrastructure was needed to handle multiple tasks simultaneously.

06
Bad User Experience

Delays, buffering, and slow interactions frustrated users. Optimizing both backend processing and frontend responsiveness was crucial for a smooth experience.

Integration Complexity

Therefore, a need was identified for a hardware accelerator or GPU service to perform parallel processing while maintaining system latency and scalability. Acknowledging the challenges and limitations of hosting AI models on traditional web servers, comprehensive in-depth research was conducted on the optimization and intricacies of AI models on CPUs. The aim was to find a cost-effective and scalable solution that could handle peak loads without compromising performance.

Terzetto Labs Solution

At first, a detailed discussion was conducted with the client to understand their vision and requirements. Intensive research was later carried out to identify the best practices for hosting AI models and optimizing performance. After careful consideration, an on-demand GPU service was identified that provided the necessary computational power in an economical manner.

The client agreed with this approach, so the team proceeded with the training and implementation of AI models that could be hosted on an on-demand GPU-service architecture. These models were integrated into the web interface where all user interactions occur. Thorough testing was also conducted to ensure the system's reliability and performance.

In addition to training AI models, another major task was to deliver the project within a tight deadline with utmost accuracy. This required synchronous and meticulous planning, which is assured to clients.

Monetization Channel: A methodology has also been implemented to distribute users' songs to other platforms like SoundCloud, Spotify, and Amazon. With this feature, songs with licensed AI voices can be distributed by users to earn revenue from third-party platforms.
Integration of Payment System: Stripe was integrated into the website for payment processing, providing a secure and efficient way to handle transactions. This integration enables seamless financial operations while ensuring compliance with industry standards.
Design of Implementation:

Impact & Outcomes

Through the efforts of the team, the project was delivered within the tight deadline and according to the features defined by the client. The following were implemented:

01
GPU Implementation

On-demand GPU implementation for better response time. For example, the conversion time of a 7.8 MB audio on CPU & GPU were:

On CPU:7 min 11 sec.
On GPU:just 28 sec.

02
Credit-Based System

To ensure accessibility while maintaining a sustainable revenue model, a credit-based system was implemented with free initial credits and paid top-up options.

03
Sharing and Monetization

Enabled the users to share their AI-generated songs across multiple platforms and monetize their content.

04
Payment Integration

The Stripe payment gateway was integrated into the application. This ensures a reliable and user-friendly experience for purchasing credits and processing payments.

05
Fast Conversion times

The AI model was meticulously optimized to ensure fast and accurate voice conversion.

06
Intuitive UI/UX

The platform features a clean and modern UI that guides users through the voice conversion process with minimal friction

The Bottom Line

The development of the project ensured high accuracy and low latency of the AI model, leading to a seamless user experience. The figures stated by the client confirm that significant engagement and satisfaction were achieved by the project.

Case Study

Terzetto Labs Develops an AI-Powered SaaS for Video Sales Letters (VSL)