The RDF community does a pretty good job writing code around specifications. But there could be more generic code that supports people testing or implementing algorithms for RDF graph data. There are many tiny libraries on my local machine which fit into that gap, just lacking better documentation or code coverage. @rdfjs/score is one of them, which finally got a small readme file and was added as an experimental feature to rdf-ext.
rdfjs@score
helps to score terms within a dataset.
Different algorithms are implemented, and some functions which combine multiple score functions in different ways.
I have used it already for the following use cases:
- Sort subjects/resources based on importance for the user
- Find the root of a tree-like graph
- Select the literal with the best language match
Let’s have a closer look at each of the use cases based on an example which can also be found in this Gist. But before we go into details, here is a code snippet that shows how to import the packages and how to prepare the data for the function calls:
1 | import housemd from 'housemd' |
Sort subjects/resources based on importance
If you have a list of subjects/resources, you may want to sort them in the UI to show the more important ones at the top.
PageRank is used to create a score value based on the relative importance of a term.
See the Wikipedia article for more details.
In the code a new score function instance is created with rdf.score.pageRank()
.
That function is called with the input
data.
The result is sorted with rdf.score.sort
.
That’s it.
The code for the output shows how to access the term and score value.
1 | import rdf from 'rdf-ext' |
Find the root
You will face that problem if you have an API that is defined very open about the incoming node types and names. A ruleset for finding the root could look like this:
- Check if a term matches the request URL
- Find resources with a specific rdf:type
- Find the root if there could be nested resources with the same rdf:type
The following example implements exactly the described ruleset:
rdf.score.exists
checks if there is a subject that matches the given named node and scores it with 1.
rdf.score.type
score resources with the given rdf:type with 1.
rdf.score.pageRank
is again used.
This time it takes care to find the root of a nested structure.
rdf.score.product
combines the result of rdf.score.type
and rdf.score.pageRank
.
Everything is then wrapped with rdf.score.fallback
.
It will call one score function after another till a result is found.
If rdf.score.exists
already returns a result, the other score functions are not called at all.
1 | import rdf from 'rdf-ext' |
Select the best language match
A very common problem is the selection of literal with the best language match.
The order of the languages defines the priority and *
can be given as a wildcard.
The example does exactly that with the rdf.score.language
function.
1 | import rdf from 'rdf-ext' |
Summary
I’m pretty happy with the basic interfaces and structures of the library. The score functions already cover a broad spectrum of use cases but were implemented on demand without an overall concept. There could be multiple ways to get the same results for some cases. The names of the score functions are ok, but maybe there are more intuitive ones. That’s why the package is still a 0.x version. Feedback is welcome so that I can incorporate it into the 1.0 release.