AIFB DataSet

From Wikiversity
Jump to navigation Jump to search

AIFB DataSet is a Semantic Web (RDF) dataset used as a benchmark in data mining. The dataset consists of a single approximately 3 megabyte large file. It records the organizational structure of AIFB at the University of Karlsruhe.

Prerequisites[edit | edit source]

Get the data[edit | edit source]

The dataset is distributed from https://figshare.com/articles/AIFB_DataSet/745364.

1 Download the data file. Which file format is the data encoded with?

Notation3
RDF XML
JSON-LD

2 Which ontology does it use?

SWRC
FOAF
SIOC


Get context[edit | edit source]

The dataset was used in Kernel Methods for Mining Instance Data in Ontologies. Find and read the part of the dataset on page 10.

How many instances does the paper record of the class "Person"?

2,547
1,058
1,232


Python[edit | edit source]

Setup a Python environment with rdflib installed and load the AIFB file and count the number of times the "affiliation" property is used:

from rdflib import Graph, URIRef

g = Graph()
g.load('aifbfixed_complete.n3', format='n3')
len(list(g.triples((None, URIRef("http://swrc.ontoware.org/ontology#affiliation"), None))))

The URI for the affiliations can be obtained with:

affiliations = g.triples((None, URIRef("http://swrc.ontoware.org/ontology#affiliation"), None))
groups = set(affiliation[2] for affiliation in affiliations)

How many different affiliations are there?

Find the name of the affiliations via "http://swrc.ontoware.org/ontology#name".