ready to get started?
Receive news, announcement and reports

In this project, we created Deep Learning model which can able to detect background acoustics it will convert the audio into text, and print it as captions below the video/audio For this project, we created a bi-directional transformer which can is able to understand the audio files as an encoder and convert this audio file data into mid data, which can be decoded by the decoder as text. This project can be useful in many different ways to different types of people, like it can be helpful for deaf people to get to know about the background acoustics using the captions or it can help the government to investigate any audio files to get information on the background noises and many other things.
Technologies used
We used Python and JAVA as our main programming language Created Deep Learning model i.e. Bi-directional Transformer using Pytorch library of python. Moreover, we used JAVA for data processing and formatting and the size of the dataset was around 4GB.
Difficulties we faced
The most challenging part of this project was to create such a bi-directional model which is able to understand such audio acoustics and can bifurcate that from the normal audio.
Solutions
To overcome this challenge we did some R&D and created our own transformer model from scratch.