<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" 
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:admin="http://webns.net/mvcb/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">
	<channel>
<title>Entagen Blog</title><link>http://www.entagen.com/index.html</link><description>Accelerating Insight</description><dc:language>en</dc:language><dc:creator>chris@entagen.com</dc:creator><dc:rights>Copyright 2008 Entagen</dc:rights><dc:date>2010-09-24T16:00:49-04:00</dc:date><admin:generatorAgent rdf:resource="http://www.realmacsoftware.com/" />
<admin:errorReportsTo rdf:resource="mailto:chris@entagen.com" /><sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>
<lastBuildDate>Mon, 24 Nov 2008 14:32:43 -0500</lastBuildDate><item><title>Adding case-insensitive ORDER BY support to SPARQL queries with OpenRDF Sesame</title><dc:creator>chris@entagen.com</dc:creator><category>Semantic Technologies</category><dc:date>2010-09-24T16:00:49-04:00</dc:date><link>http://www.entagen.com/page3/page11/files/6e69c54a361952e9ea5b341e156e7161-5.html#unique-entry-id-5</link><guid isPermaLink="true">http://www.entagen.com/page3/page11/files/6e69c54a361952e9ea5b341e156e7161-5.html#unique-entry-id-5</guid><content:encoded><![CDATA[<p>We&#8217;ve been doing a lot of work with triples and SPARQL queries lately and I came across a need to order the results of a given SPARQL query.  It&#8217;s simple enough to do &#8211; just add an ORDER BY clause to your query, like this:</p> <br /><pre> <br />SELECT * WHERE {<br /> 	?id &lt;http://www.w3.org/2008/05/skos#prefLabel&gt; ?preferredLabel .<br />}<br />ORDER BY DESC( ?preferredLabel  )<br />limit 100</pre> <br /><p>TADA!  The results of that query are now ordered by the ?preferredLabel placeholder.  Except that they&#8217;re actually being ordered by the ASCII code of that text in the preferredLabel &#8211; so you end up with a <strong>case-sensitive</strong> ORDER BY.  That means that all the uppercase letters will appear before lowercase letters (&#8216;Foo&#8217; comes before &#8216;bar&#8217;, even though you would expect the order to be &#8216;bar&#8217;, then &#8216;Foo&#8217;).</p> <br /><p>Luckily, the fine contributors to the <a href="http://www.openrdf.org/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.openrdf.org');">OpenRDF</a> project have a solution.  In the sesame-ext directory of their subversion repository lives a little project called &#8220;xpath-functions&#8221;.  You can get a little bit of the background from this <a href="http://sourceforge.net/mailarchive/message.php?msg_id=711A6E44E5028648A8D15B1DC32E58E0848319%40scomp0038.wurnet.nl" onclick="javascript:pageTracker._trackPageview('/outbound/article/sourceforge.net');">mailing list thread</a> &#8211; but suffice it to say that this project has several classes that implement the <a href="http://www.openrdf.org/doc/sesame2/api/org/openrdf/query/algebra/evaluation/function/Function.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.openrdf.org');">Function</a> interface allowing you to perform xpath functions like:</p> <br /><ul> <br /><li>Concat</li> <br /><li>Date</li> <br /><li>DateTime</li> <br /><li>LowerCase</li> <br /><li>UpperCase</li> <br /></ul> <br /><p>I wasn&#8217;t able to find a JAR file already built and ready for download but it&#8217;s easy enough to build the JAR from scratch yourself.</p> <br /><p>First, check out the source code for sesame-ext xpath-functions project (http://repo.aduna-software.org/svn/org.openrdf/sesame-ext/xpath-functions/trunk/)</p> <br /><p>Make a few updates to the pom.xml in the checked out code &#8211; update the sesame version placeholder to the current version you&#8217;re using (in this case we&#8217;re working with version 2.3.1):</p> <br /><pre> <br />&lt;properties&gt;<br />	&lt;project.build.sourceEncoding&gt;UTF-8&lt;/project.build.sourceEncoding&gt;<br />	&lt;project.reporting.outputEncoding&gt;UTF-8&lt;/project.reporting.outputEncoding&gt;<br /> <br />	&lt;sesame.version&gt;2.3.1&lt;/sesame.version&gt;<br />	&lt;slf4j.version&gt;1.5.6&lt;/slf4j.version&gt;<br />&lt;/properties&gt;<br /></pre> <br /><p>Add the aduna maven repository to the pom.xml:</p> <br /><pre> <br /> &lt;repositories&gt;<br />   &lt;repository&gt;<br />     &lt;id&gt;aduna&lt;/id&gt;<br />     &lt;name&gt;aduna&lt;/name&gt;<br />     &lt;url&gt;http://repo.aduna-software.org/maven2/releases&lt;/url&gt;<br />   &lt;/repository&gt;<br />&lt;/repositories&gt;<br /></pre> <br /><p>and then run:<br /> <br /><code><br /> <br />mvn package<br /> <br /></code><br /> <br />from the command line to build the jar file.</p> <br /><p>If you&#8217;re working with a local Sesame repository, you can just add the JAR to your classpath.  If you&#8217;re using the openrdf-sesame WAR file running on an application server to host your repository, then you&#8217;ll need to add the JAR to the classpath of that WAR file on the server.</p> <br /><p>Once you have the JAR file in the appropriate spot, we can leverage the xpath functions in the ORDER BY clause, like this:</p> <br /><pre> <br />PREFIX fn: &lt;http://www.w3.org/2005/xpath-functions#&gt;<br />SELECT * WHERE {<br /> 	?id &lt;http://www.w3.org/2008/05/skos#prefLabel&gt; ?preferredLabel .<br />}<br />ORDER BY DESC( fn:lower-case(?preferredLabel)  )<br />limit 100</pre> <br /><p>Notice the &#8220;PREFIX: fn&#8221; line and the &#8220;fn:lower-case(?preferredLabel)&#8221; in the ORDER BY clause.  Now the ORDER BY will be applied to the results in a <strong>case-insensitive</strong> manner!</p>]]></content:encoded></item><item><title>Large-scale data the end of scientific method?</title><dc:creator>chris@entagen.com</dc:creator><category>Data Mining</category><category>Visualization</category><category>Omics</category><dc:date>2008-12-17T10:11:26-05:00</dc:date><link>http://www.entagen.com/page3/page11/files/cea51800dfc20adea1cf8f1982dc2c21-3.html#unique-entry-id-3</link><guid isPermaLink="true">http://www.entagen.com/page3/page11/files/cea51800dfc20adea1cf8f1982dc2c21-3.html#unique-entry-id-3</guid><content:encoded><![CDATA[Chris Anderson, editor in chief of Wired Magazine, recently wrote an interesting article entitled <a href="http://www.wired.com/science/discoveries/magazine/16-07/pb_theory" rel="self">&ldquo;The End of Theory: The Data Deluge Makes the Scientific Method Obsolete&rdquo;</a> and subsequently took some serious criticism in the comments section of the online article and in further counter <a href="http://arstechnica.com/news.ars/post/20080625-why-the-cloud-cannot-obscure-the-scientific-method.html" rel="self">articles</a> for proclaiming basically that the scientific method is dead. In one of his concluding statements he writes,<br /><br />&ldquo;The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.&rdquo;<br /><br />While it is no doubt true that the availability of the types of large-scale data that we have today make de novo pattern-finding much more possible and powerful, that doesn&rsquo;t negate the power of time-tested hypothesis testing. Instead, I would contend that this isn&rsquo;t an either / or scenario. Both processes, both ways of doing things are powerful and useful, and in fact it&rsquo;s when you use the two together in an iterative cycle that you can gain the greatest value from and the deepest insight into your data.<br /><br />For example, in much of the work that we&rsquo;ve done, it is in the process of examining data for the sake of testing a hypothesis that we see a novel pattern or correlation which offers new insight. The follow up to that insight is then critical and is rightly so, a process of hypothesis testing.<br /><br />There is another aspect to this all that requires consideration as well. At some point, the massive amounts of data, and all of the analyses conducted on them need to yield knowledge. That to me is possibly the greatest hurdle in this brave new data filled world we&rsquo;re in.<br /><br />Chris&rsquo;s article contains a number of examples, each of which contains a snapshot of gorgeous visualizations rendered for the sake of conveying the results derived from the various analyses run. That&rsquo;s because in many ways, at the very core of all of this, visualization is simply one of the best if not the best bridge we have between data and knowledge, between a sea of information and something that we can act on, whether it is to design a novel therapeutic, buy or sell stock or detect an influenza outbreak.<br />]]></content:encoded></item><item><title>Mass TLC Panel Discussion</title><dc:creator>chris@entagen.com</dc:creator><category>Events</category><dc:date>2008-12-03T00:30:47-05:00</dc:date><link>http://www.entagen.com/page3/page11/files/588368a334f8a2cc647ce9a246844cd3-2.html#unique-entry-id-2</link><guid isPermaLink="true">http://www.entagen.com/page3/page11/files/588368a334f8a2cc647ce9a246844cd3-2.html#unique-entry-id-2</guid><content:encoded><![CDATA[Christopher Bouton participated in a <a href="http://lifesciences081202.eventbrite.com/" rel="self">panel discussion</a> for the <a href="http://www.masstlc.org/" rel="self">Massachusetts Technology Leadership Council</a> on December 2nd. Dr. Joseph Cerro of the <a href="http://www.schoonergroup.com/" rel="self">Schooner Group LLC</a> moderated. Other participants included,<br /><br />Dr. Martin Leach, Executive Director, Basic Research and Biomarker IT, Merck Research Laboratories.<br />Jens Hoefkens, Managing Director, GeneData USA<br />Michelle Pontinen, R&D Practice Leader, Capgemini US, LLC<br />Matthew Trunnell, Head, Research Computing, Broad Institute<br />]]></content:encoded></item></channel>
</rss>
