Pakistani Professor Develops First Ever Urdu Speech Recognition Database

Smart little artificial intelligence-based assistants can be found everywhere these days. For us Pakistanis, however, the biggest hurdle to using these systems in our everyday lives remains the language barrier.

Most assistants are programmed to recognize speech in English and even today, there is a lack of programs that can recognize and translate Urdu speech. That may be about to change soon, thanks to a group of Pakistani researchers at ITU’s Center for Speech and Language Technologies (CSaLT) laboratory.

For any language to be recognizable to a computer, there needs to be a corpus of words, the most basic ingredient of a language. The corpus is a database of all the basic distinct sounds (phenome) used in everyday speech in a specific language.

Dr. Agha Ali Raza, an assistant professor at Information Technology University, Lahore, and a PhD in Language Technologies, has, along with his team, released a corpus of Urdu sentences that covers all possible distinct sounds for public.

Called the “CSaLT Phonetically Rich Urdu Speech Corpus”, it consists of a 70-minutes transcribed read speech consisting of 708 sentences covering all the possible 63 phenomes. In total, it consists of 5,656 unique words and is available for download at the research center’s website.

“Speech recognition is a two-step process. The corpus will give the computer application access to all possible phonemes used in the formation of meaningful Urdu words from everyday speech,” said Dr. Raza.

He further elaborates that although there are 63 distinct phonemes in Urdu, these don’t correspond to 63 distinct sounds in everyday speech. He also explained that a sound made for a phoneme may vary from one utterance to another depending on the phoneme used before and after it in a word. As a result, for any phoneme x, there will be 63*x*63 possible (tri-phoneme) sounds. The corpus he is releasing covers for all these possible sounds.

Dr.Raza’s work on this corpus started under the supervision of Dr. Sarmad Hussain as part of his master’s’ thesis at the National University of Computer and Emerging Sciences FAST, Lahore. Later on, he and Dr. Hussain were also helped by Huda Sarfraz, Inaam Ullah and Zahid Sarfaraz.

Thanks to this corpus, the process of making a speech recognition program for the Urdu language has just gotten a lot easier. All that is needed is a repository of the words used every day in the Urdu language.

“We hope that release of this corpus will also prove beneficial for regional languages in the country and languages lacking ample linguistic resources all over the world. Those interested in working on those languages can follow our technique to develop similar corpora of sentences in those languages,” he says.

“The technique used in development of this corpus will work for any language for which written material is available.”

Source— TechnologyReview

📢 For the latest Education news and analysis join ProPakistani's WhatsApp Group now!

Follow ProPakistani on Google News & scroll through your favourite content faster!

Rehan Ahmed

welldone :)

Good work, I have been using their Phonetic Keyboard for few years now. Would be nice if this can be incorporated into Android Search somehow.

Actually this can be a major breakthrough in VoIP development because language barrier was a issue if this nurtures well we can develop major applications in our own language.

Superb

any download link?

Aamir says:

January 23, 2017 at 5:26 pm

Click here to go to the download page, then click “Download”.

Note:
Download via IDM as it’s of 457MB.
- Asad says:
  
  January 23, 2017 at 11:21 pm
  
  what can we do this file??

Great Job!

Pakistani Professor: Grow up.

great work (Y)
i am student of CS from karachi, i am doing my graduation project “Urdu Sales Bot” but the only issue i faced is Urdu text to voice can you help me ? by giving any SDK (API) for C# or Python ?
it would be very helpful for me :)
Thank You

Pakistani Professor Develops First Ever Urdu Speech Recognition Database

Rehan Ahmed

Latest News

Pakistan Can Become A $3 Trillion Economy By 2047, Says World Bank

Here's the Latest List of Most Valuable Combat Sports in the World In…

Punjab Health Department Issues Threat Alert Over Congo Virus Outbreak

Pakistan's Squash Prodigy Qualifies For World Open Squash Event

Historic Moment as Pakistani Baseball Player Joins MLB Draft League f…

Now Trending

lens

Spotify Reunites with Coke Studio to Continue Elevating Local Artists Globally

Fahad Mustafa Makes a Cheeky Joke About Shoaib Ma…

Sami Khan and Sonya Hussyn Set to Sizzle On-Scree…

Viral Indian Street Vendor Caught Mixing His Bodi…

Junaid Khan Denies Collaboration with Khushi Kapo…

Spotify Celebrates a Decade of Bringing K-Pop to …

perspective

A Love Letter to Pakistan: A Foreign CEO Reflects on 5 Years

NFTs: The Next Big Thing to Redefine Proof of Own…

What are NFTs and Why are they the Future?

Reassessing the cost of the crisis, while busines…

5 Ultimate Rules of Entrepreneurship from a VC’s …

Arvelon Co-Founder Speaks on The “Fair̶…

ProPakistani Community

Rehan Ahmed

129

Latest News

Pakistan Can Become A $3 Trillion Economy By 2047, Says World Bank

Here's the Latest List of Most Valuable Combat Sports in the World In…

Punjab Health Department Issues Threat Alert Over Congo Virus Outbreak

Pakistan's Squash Prodigy Qualifies For World Open Squash Event

Historic Moment as Pakistani Baseball Player Joins MLB Draft League f…

Now Trending

lens

Spotify Reunites with Coke Studio to Continue Elevating Local Artists Globally

Fahad Mustafa Makes a Cheeky Joke About Shoaib Ma…

Sami Khan and Sonya Hussyn Set to Sizzle On-Scree…

Viral Indian Street Vendor Caught Mixing His Bodi…

Junaid Khan Denies Collaboration with Khushi Kapo…

Spotify Celebrates a Decade of Bringing K-Pop to …

perspective

A Love Letter to Pakistan: A Foreign CEO Reflects on 5 Years

NFTs: The Next Big Thing to Redefine Proof of Own…

What are NFTs and Why are they the Future?

Reassessing the cost of the crisis, while busines…

5 Ultimate Rules of Entrepreneurship from a VC’s …

Arvelon Co-Founder Speaks on The “Fair̶…

Follow Us

ProPakistani Community