Score RDF/JS terms

Posted on 2022-10-02

The RDF community does a pretty good job writing code around specifications. But there could be more generic code that supports people testing or implementing algorithms for RDF graph data. There are many tiny libraries on my local machine which fit into that gap, just lacking better documentation or code coverage. @rdfjs/score is one of them, which finally got a small readme file and was added as an experimental feature to rdf-ext.

rdfjs@score helps to score terms within a dataset. Different algorithms are implemented, and some functions which combine multiple score functions in different ways. I have used it already for the following use cases:

Sort subjects/resources based on importance for the user
Find the root of a tree-like graph
Select the literal with the best language match

Let’s have a closer look at each of the use cases based on an example which can also be found in this Gist. But before we go into details, here is a code snippet that shows how to import the packages and how to prepare the data for the function calls:

import housemd from 'housemd'
import rdf from 'rdf-ext'

const ns = {
  rdf: rdf.namespace('http://www.w3.org/1999/02/22-rdf-syntax-ns#'),
  schema: rdf.namespace('http://schema.org/')
}

// import he House M.D. dataset
const dataset = rdf.dataset(housemd({ factory: rdf }))

// get all subject named nodes in the dataset
const subjects = rdf.termSet(
  rdf.clownface({ dataset })
    .in()
    .filter(ptr => ptr.term.termType === 'NamedNode')
    .terms
)

// get all literal objects in the dataset
const literals = rdf.termSet(
  rdf.clownface({ dataset })
    .has(ns.rdf.type, ns.schema.Person)
    .out(ns.schema.givenName)
    .filter(ptr => ptr.term.termType === 'Literal')
    .terms
)

// the input structures that will be handed over to the score functions
const input = { dataset, terms: subjects }
const inputLiterals = { dataset, terms: literals }

export {
  input,
  inputLiterals,
  ns,
}

Sort subjects/resources based on importance

If you have a list of subjects/resources, you may want to sort them in the UI to show the more important ones at the top. PageRank is used to create a score value based on the relative importance of a term. See the Wikipedia article for more details. In the code a new score function instance is created with rdf.score.pageRank(). That function is called with the input data. The result is sorted with rdf.score.sort. That’s it. The code for the output shows how to access the term and score value.

import rdf from 'rdf-ext'
import { input } from './common.js'

function example () {
  const results = rdf.score.sort(rdf.score.pageRank()(input))

  for (const result of results) {
    console.log(`${result.term.value} (${result.score})`)
  }
}

example()

Find the root

You will face that problem if you have an API that is defined very open about the incoming node types and names. A ruleset for finding the root could look like this:

Check if a term matches the request URL
Find resources with a specific rdf:type
- Find the root if there could be nested resources with the same rdf:type

The following example implements exactly the described ruleset:

rdf.score.exists checks if there is a subject that matches the given named node and scores it with 1. rdf.score.type score resources with the given rdf:type with 1. rdf.score.pageRank is again used. This time it takes care to find the root of a nested structure. rdf.score.product combines the result of rdf.score.type and rdf.score.pageRank. Everything is then wrapped with rdf.score.fallback. It will call one score function after another till a result is found. If rdf.score.exists already returns a result, the other score functions are not called at all.

import rdf from 'rdf-ext'
import { input, ns } from './common.js'

function example () {
  const results = rdf.score.sort(
    rdf.score.fallback([
      rdf.score.exists({ subject: rdf.namedNode('https://housemd.rdf-ext.org/person/lisa-cuddy') }),
      rdf.score.product([
        rdf.score.type(ns.schema.Person),
        rdf.score.pageRank()
      ])
    ])(input)
  )

  for (const result of results) {
    console.log(`${result.term.value} (${result.score})`)
  }
}

example()

Select the best language match

A very common problem is the selection of literal with the best language match. The order of the languages defines the priority and * can be given as a wildcard. The example does exactly that with the rdf.score.language function.

import rdf from 'rdf-ext'
import { inputLiterals } from './common.js'

function example () {
    const results = rdf.score.sort(rdf.score.language(['en', 'de', '*'])(inputLiterals))

  for (const result of results) {
    console.log(`${result.term.value} (${result.score})`)
  }
}

example()

Summary

I’m pretty happy with the basic interfaces and structures of the library. The score functions already cover a broad spectrum of use cases but were implemented on demand without an overall concept. There could be multiple ways to get the same results for some cases. The names of the score functions are ok, but maybe there are more intuitive ones. That’s why the package is still a 0.x version. Feedback is welcome so that I can incorporate it into the 1.0 release.

bergis universe of software, hardware and ideas