31 Days, 31 Ideas

31 Days, 31 Ideas

31 innovative ideas to transform the Jewish future from Daniel Sieradski, posted over the course of 31 days, beginning January 1, 2010.

January 6, 2010 at 11:50pm
Comments

home

6. Jewish texts XML specification, repository and API

Today’s topic is really not going to dazzle anyone, but it’s fundamental to fostering a culture of innovation around educational software development in the Jewish community.

If there was one thing that, I believe, could spark a creative revolution, it would be providing unfettered access to the sum-total of public domain Jewish texts (Tanakh, Misnah, Gemara, Meforshim, etc.) as an XML dataset for free and unconditional use by Web software developers.

Ever look at E-Daf.com? It’s pretty sweet. You get graphical scans of the whole Talmud online with audio lectures that translate and explain what’s going on in the text. But you can’t copy the text and you can’t click the words, so you can’t get more directly helpful functionality like click-on translations, pronunciations, concordance, or even search. Therefore, learning, researching and interacting with the text is slow going and limited.

This problem is solved by offering actual digitized text (like that of this page), rather than scans, which can be copied and modified. And so that’s not really a problem, right? There are plenty of Web sites out there today that offer fully digitized versions our primary Jewish sources, including Snunit, which has most of them online. But there’s a snag: The way that text is formatted. Sure, you can copy and paste this text, you can search through it, and you can even manually modify it to create hyperlinks. But, it’s not dynamic in a way that’s useful to Web developers seeking to build advanced interactive tools that empower educators, students, researchers, rabbis, congregants, publishers, artists…

What’s needed is XML text, or more specifically, an XML standard for Jewish texts. XML is a way of formatting text that enables it to be understood by software as a dataset, imbuing content (in our case, specific characters, words, paragraphs, chapters and so on) with metadata that can be associated in a database (such as translations, cantillation marks, vowels, etc.). You can view some example XML datasets here.

Currently, numerous independent efforts are underway to create XML datasets from Jewish texts, including TanakhML — by far the best demonstration I’ve encountered thus far of the potential of XML when applied to Hebrew texts (poorly designed as it is), and The Open Siddur Project, which is a promising and inspiring work in progress.

But why should these individuals have to take this burden upon themselves when the entire Jewish community is to benefit? And why should every developer who has a great idea have to reinvent the wheel every time they want to innovate a new Jewish software project?  Coding these texts as XML is arduous and time-consuming.  The task is daunting enough to stop most enterprising developers in their tracks.

What I propose is the creation of a Jewish texts XML consortium that draws on the knowledge of the best and brightest Hebrew linguists and Jewishly-minded software engineers to devise a standardized specification for biblical Hebrew texts. This consortium would then oversee the creation of an online repository of XML formatted Jewish texts along with an application programming interface (API) that enables developers to access and utilize that data in their applications.

An example of a current working model for this approach, minus the consortium, is Hebcal.com, which provides free and open access to Hebrew calenders formatted to meet various specifications, including vCal, iCal and RSS, an XML variant. Hebcal is used often by Web software developers seeking to integrate Hebrew calendar support into their applications.

And whereas it could take years to process the entire Jewish bookshelf, you could potentially crowdsource the data entry to the Jewish public by creating a system like Amazon’s Mechanical Turk which offers micropayments for the performance of data entry tasks, or the Brooklyn Museum’s, which organizes games and other creative activities that involve participants in tagging their collection.

Once that door is unlocked, the sky’s the limit for creative Jewish software development. As for now, the field’s growth is stunted.

Tomorrow, an example of an application that would benefit from such a breakthrough.

Notes

  1. 31days posted this

blog comments powered by Disqus