Case Study

project description

  • In this project, we created Deep Learning model which can able to detect background acoustics it will convert the audio into text, and print it as captions below the video/audio

  • For this project, we created a bi-directional transformer which can is able to understand the audio files as an encoder and convert this audio file data into mid data, which can be decoded by the decoder as text.

  • This project can be useful in many different ways to different types of people, like it can be helpful for deaf people to get to know about the background acoustics using the captions or it can help the government to investigate any audio files to get information on the background noises and many other things.

Technologies used

  • We used Python and JAVA as our main programming language

  • Created Deep Learning model i.e. Bi-directional Transformer using Pytorch library of python.

  • Moreover, we used JAVA for data processing and formatting and the size of the dataset was around 4GB.

Difficulties we faced

  • The most challenging part of this project was to create such a bi-directional model which is able to understand such audio acoustics and can bifurcate that from the normal audio.


  • To overcome this challenge we did some R&D and created our own transformer model from scratch.

CV-NLP-Audio Processing

10 months

Share now

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on whatsapp
Share on email


project images

ready to get started?

Receive news, announcement and reports