Kosha - building an open music database for indian classical music
Submitted by Srihari Sriraman (@ssrihari) on Wednesday, 20 January 2016
Technical level: Beginner
Learn about a technically stimulating open source project in the music domain that allows you to contribute to the community directly, and immediately.
Why isn’t there a standard dictionary of ragas? Why isn’t there a standard list of compositions of the trinity of Carnatic Music?
For a form of music that is centuries old, revolves around ragas and compositions, there is surprisingly less information that is structured, coherent, and on the internet. While the question of “Why?” can be daunting, “How do we solve it?” is easily answered: We build an open music database that is built by the community, for the community.
And while we’re at it, we’ll build adaptive and intelligent internet crawlers in Clojure, implement a ‘soundex’ algorithm for indic languages within Postgres, analyse music programmatically to rate the quality, and even find patterns in them!
Skeleton of the talk
The data is out there
- Detail of the different kinds of sources for the database. The most promising websites, books, the biggest reserves of recorded music, the varying qualities of recorded music, and their usability for research purposes.
- [6 minutes]
Mining the data into one place - a database
- What has been accomplished so far - of the kosha repository, the ragavardhini repository and r4g4.com.
- Building an intelligent crawler/scraper that can automatically tag/categorize the data that it scrapes. So we don’t have to write scrapers manually for every website.
- Mining data off books (digital and physical), retrieving music from known large reserves (AIR, academy, etc).
- [10 minutes]
Cleaning, knitting and organizing
- Data from multiple sources will be duplicated, filled with noise, incomplete, and inaccurate. We’ll engineer ways to deduplicate, denoise, and connect pieces of information together so we fill the missing gaps. This is non trivial, and enters the big data realm.
- To find related chunks of data, we’ll have to improve the algorithm to search for a given keyword, given that we’re representing indic language words in english.
- [10 minutes]
Using the mined data
- What has been accomplished so far - of r4g4’s apis and interface.
- Building an interface to search the contents by keywords or free form, play music with ease, while having access to all the information about the kriti (meaning, notation, etc).
- Building APIs so other applications can use the database for other purposes (research).
- [10 minutes]
Why I think this is a big step forward
- Enter fantasy land (not really).
- Elimination of text books in music classes, removing need to remember scales of thousands of ragas, embracing compositions with meaning every time, learning at homes, having access to reliable information in our hands.
- [5 minutes]
Srihari is a FOSS enthusiast. He has contributed to Gimp, Eclipse, Diaspora and is excited about opportunities to give back. Over the last couple of years, he has worked on building an experimentation platform, delving into a particularly dense domain, meeting tight latency SLAs, and engineering assembly lines in software using Clojure.
He sings, and does music things in technology – he has worked on synthesizing gamakas, and has been building an open carnatic music database (the ragavardhini repo, r4g4.com) in his spare time.
He is a partner at nilenso, a hippie tree hugging bicycle riding software cooperative based in Bangalore. He blogs, plays basketball, and performs carnatic music occasionally.