ready to get started?
Receive news, announcement and reports
In this project, we created Deep Learning model which can able to detect background acoustics it will convert the audio into text, and print it as captions below the video/audio For this project, we created a bi-directional transformer which can is able to understand the audio files as an encoder and convert this audio file data into mid data, which can be decoded by the decoder as text. This project can be useful in many different ways to different types of people, like it can be helpful for deaf people to get to know about the background acoustics using the captions or it can help the government to investigate any audio files to get information on the background noises and many other things.
Technologies used
We used Python and JAVA as our main programming language Created Deep Learning model i.e. Bi-directional Transformer using Pytorch library of python. Moreover, we used JAVA for data processing and formatting and the size of the dataset was around 4GB.
Difficulties we faced
The most challenging part of this project was to create such a bi-directional model which is able to understand such audio acoustics and can bifurcate that from the normal audio.
Solutions
To overcome this challenge we did some R&D and created our own transformer model from scratch.
Receive news, announcement and reports
A-1205, PNTC, Times Of India Press Rd, Vejalpur, Ahmedabad, Gujarat 380015
IN: +91 9157652641 info@tesseracttechnolabs.com
© 2022 All Rights Reserved | Tesseract Technolabs | Privacy Policy | Terms & Conditions