Terzetto Labs is a web3 and AI service provider firm striving to make life easier for the average daily user in the tech world. A client seeking to launch a web app that could convert users' voices into other singers' vocals using AI-trained models approached the team. The main goal was to provide a seamless user experience, allowing users to easily access and use the AI models through a web interface. An additional feature for song sharing on platforms like Spotify, from which users could gain an audience and earn revenue, was also required. An on-demand GPU implementation approach was utilized by the team to create an impactful yet seamless experience for the users.
The client was a tech enthusiast couple who wanted to create a difference in the lives of daily users who want to create and improve their vocals using AI tuning in the best way possible. They have a deep passion for music and also like to sing indie songs which motivated them to build such an app where users can earn money based on their subscribers. The early deployment of the app helped the client to enjoy the first mover advantage with high network accessibility.
The main potential challenge was to host the trained AI model on standard web servers, without causing any latency and inference issues. However, this was a difficult task to achieve as standard servers are not meant to host such heavy computations due to the inherent limitations.
Soon, during development it became quite evident that the model's performance was suboptimal due to insufficient computational resources. The limits of CPU processing were frequently being hit even with the simple matrix calculations for the neural network.
Real-time voice conversion required instant processing, but CPU-based servers introduced delays. Optimizing inference time was essential for a seamless experience.
Limited computational power on CPUs led to slow model performance, making conversion times impractically long and reducing overall efficiency.
AI models required heavy processing for deep learning computations. Standard web servers struggled with these workloads, causing frequent performance bottlenecks.
Handling large audio files and AI-generated outputs efficiently was critical. High data transfer rates could slow performance and affect responsiveness.
CPUs lacked the ability to efficiently run AI models in parallel, leading to slow execution times. A better infrastructure was needed to handle multiple tasks simultaneously.
Delays, buffering, and slow interactions frustrated users. Optimizing both backend processing and frontend responsiveness was crucial for a smooth experience.
Therefore, a need was identified for a hardware accelerator or GPU service to perform parallel processing while maintaining system latency and scalability. Acknowledging the challenges and limitations of hosting AI models on traditional web servers, comprehensive in-depth research was conducted on the optimization and intricacies of AI models on CPUs. The aim was to find a cost-effective and scalable solution that could handle peak loads without compromising performance.
At first, a detailed discussion was conducted with the client to understand their vision and requirements. Intensive research was later carried out to identify the best practices for hosting AI models and optimizing performance. After careful consideration, an on-demand GPU service was identified that provided the necessary computational power in an economical manner.
The client agreed with this approach, so the team proceeded with the training and implementation of AI models that could be hosted on an on-demand GPU-service architecture. These models were integrated into the web interface where all user interactions occur. Thorough testing was also conducted to ensure the system's reliability and performance.
In addition to training AI models, another major task was to deliver the project within a tight deadline with utmost accuracy. This required synchronous and meticulous planning, which is assured to clients.
Through the efforts of the team, the project was delivered within the tight deadline and according to the features defined by the client. The following were implemented:
On-demand GPU implementation for better response time. For example, the conversion time of a 7.8 MB audio on CPU & GPU were:
To ensure accessibility while maintaining a sustainable revenue model, a credit-based system was implemented with free initial credits and paid top-up options.
Enabled the users to share their AI-generated songs across multiple platforms and monetize their content.
The Stripe payment gateway was integrated into the application. This ensures a reliable and user-friendly experience for purchasing credits and processing payments.
The AI model was meticulously optimized to ensure fast and accurate voice conversion.
The platform features a clean and modern UI that guides users through the voice conversion process with minimal friction
The development of the project ensured high accuracy and low latency of the AI model, leading to a seamless user experience. The figures stated by the client confirm that significant engagement and satisfaction were achieved by the project.