Jan Gylta jgylta@online.no
Copyright © 2002 Jan Gylta, Robert Barta
When developing Topic Map based applications it is indispensable to deal with the data structures the underlying API provides to the programmer. This article tries to demonstrate how this can be accomplished with a significant abstraction using an XPath derivate, XTMPath. We will show basic use to navigate through Topic Map related data structures but also a way how to create new data based on XTMPath expressions. All this will be done using XTM::base, a set of Perl packages for Topic Map processing.
One of the problems we encountered in developing applications based on XTM::base is that one needs a thorough understanding of the data structures to do even the simplest accesses and manipulations. While XTM::base does its best to abstract away internal representation, the deeply nested data structures provide a high entrance barrier for a casual programmer with basic Topic Map know-how. Things really get cross once the map is not only queried and traversed, but also modified or even extended.
In the following we want to demonstrate how the tasks of navigating and localizing Topic Map information can be simplified with XTMPath. XTMPath is a specialization of XPath allowing a declarative way to specify in what particular information a developer is interested. XTMPath is insofar a specialization as
Another use of XTMPath is to create Topic Map data structures, say, associations and topics. This extends the familiar XPath approach one step further. But first, we look at the problem of localizing information.
Before we can search we need a simple topic map. Let us assume that we cover a popular Australian theme in AsTMa=
VB (beer-brand) bn: Victoria Bitter oc: http://www.fosters.com.au/beer/about/brands/beer/vic_bitter.asp in (marketing): Australia's favorite full strength beer brand BudWeiser (beer-brand) bn: BudWeiser oc: http://www.budweiser.com/ (is-owned-by) owner : anheuser-busch property : BudWeiserand that we have stored this into a file beers-r-us.atm. Then we can easily read that in using the CPAN module XTM::base:
use XTM; use XTM::AsTMa; my $tm = new XTM (tie => new XTM::AsTMa (file => 'beers-r-us.atm'));After that the $tm object contains our topic map.
To make use of XTM::Path you will need to import the module and to instantiate an object from it. This object can be used for searching within maps or map components like topics and associations.
use XTM::Path; my $xtmp = new XTM::Path (default => $tm);Our map $tm is used as the default search context if nothing else is specified later.
At first let us find all the topics with instance type beer-brand using the method find.
my @beers = $xtmp->find(
'topic[instanceOf/topicRef/@href = "#beer-brand"]');find returns a list of object references, each pointing to an XTM::topic object.
Since we have not specified any new context, the context to search for defaults to our map.
The XTMPath expression itself will be familiar to XPath users: First the XTMPath processor will try to find topic nodes within the map and then it will try to filter out those for which the predicate instanceOf/topicRef/@href = "#beer-brand" evaluates to true.
The instanceOf/topicRef/@href might look a bit confusing at first but it all comes from the way a topic is represented in XTM. The BudWeiser topic in our map would be
<topic id="BudWeiser">
<instanceOf>
<topicRef xlink:href="#beer-brand"/>
</instanceOf>
<occurrence>
<resourceRef xlink:href="http://www.budweiser.com/"/>
</occurrence>
</topic>
As usual, the @ character is used to address an
attribute while the rest is simply traversing through XTM elements.
What is probably going to be hardest for a beginner is to correctly identify a path to the desired element, especially if one is used to work only with AsTMa=. The best way to overcome this problem would be to have a sample topic and association available in XML and identify the path from there. The XTM standard itself might be handy as well.
If we would further want to iterate over each of these topics and extract base names and occurrences of each we will have to provide the current topic as the search context:
foreach my $topic (@beers) {
my @basenames = $xtmp->find('baseNameString/text()', $topic);
my @occurrences = $xtmp->find('occurrence/resourceRef/@href', $topic);
print qq{<a href="$occurrences[0]">$basenames[0]</a>\n};
}
We can observe that asking for text() or an attribute value like @href
will give us a Perl string scalar and no object as a DOM programmer would expect. Still, we will receive a list
of values from which we only selected the first one for output.
That a path expression is interpreted relative to a context is not new to XPath developers. There are some considerable differences, though, how XTMPath treats axes. First, there is are some limitations in the current implementation in that it supports only child and descendent axes using the '/' syntax. More conceptually, in contrast to generic XPath, the XTMPath processor is well aware of the underlying DTD. It can immediately determine whether an expression like topic/member can return any value or not.
The knowledge can also be used to implement some DWIMming (Do What I Mean) functionality which Perl programmers feel so comfortable with: Instead of explicitely defining a path to the parts in which we are interested, we can leave the details of the route up to the processor. For example, the following XTMPath expressions are equivalent:
/topic/occurrence/resourceData /topic//resourceData topic/resourceDataIn all cases the processor will look first for a topic data structure in the current context (a map object). Since it knows that a resourceData component must be inside an occurrence component, there is no need to elaborate on the intermediate steps or to use the descendant axis '//'. Only in situations in which there is no unique path the processor will flag an error. In this light, there is no difference between using '//' or '/', or using '/' at the front of an expression, for that matter.
Other limitations have to do with the fact that XTMPath is used within a programming environment. So, for instance, predicates of the form position() = 2 are not implemented as it is assumed that the programming language has already list processing features at hand.
Working with XTM::Path one soon realizes that it adds some overhead and that it should be used with care. Among other tricks mentioned in the man page one is to use variables, similar to the way SQL APIs allow you to do.
Let us assume that we would look for topics having a particular base name:
foreach my $name (...list of names...) {
my @bn = $xtmp->find (qq{topic[baseNameString = "$name"]}, $tm);
...
}
For every iteration of the loop the XTMPath processor will first parse the expression which contains
a (possibly different) base names every time. Then it will execute the query.
As parsing an XTMPath expression is an expensive operation we can introduce a parameterized version of the expression which remains the same throughout all iterations:
foreach my $name (...list of names...) {
my @bn = $xtmp->find ('topic[baseNameString = ?n]',
$tm,
{ n => $name});
...
}
Having introduced a variable n, the expression is parsed only the first time (and then
remains cached in the $xtmp object). In a separate hash we provide the mapping between
variable names and their values for one particular query.
Simple tests show that this can have significant speedups if you deal with many different names as above.
XTMPath was also thought as a simple means to create Topic Map data structures. To create a new topic with the type beer-brand, all that has to written is
my $XXXX = $xtmp->create (
'/topic[instanceOf/topicRef/@href = "#beer-brand"]');
The created topic would have been defined in AsTMa= as
t-00000001 (beer-brand)with some automatically generated topic id.
As it is possible to specify a number of predicates in an XTMPath expression, we can also provide more information for the topic
my $XXXX = $xtmp->create (
'/topic
[@id = "xxxx"]
[instanceOf/topicRef/@href = "#beer-brand"]
[baseNameString = "XXXX"]
[occurrence/resourceRef/@href = "http://www.xxxx.com.au/home.asp"]
');
This would be equivalent to
XXXX (beer-brand) bn: XXXX oc: http://www.xxxx.com.au/home.asp
Again, the XTM::Path module is also smart enough to make educated guesses on the interpolation of paths if they contain gaps. /topic/instanceOf/topicRef/@href would be the equivalent to /topic/instanceOf/@href because topicRef is the only element below /topic/instanceOf that has an attribute href.
The predicates are all interpreted in the current search/create context. This can be rather useful when it comes to creating more complex structures like associations:
my $a = $xtmp->create(
'/association
[instanceOf/topicRef/@href = "#is-owned-by"]
[member
[roleSpec/topicRef/@href = "#owner"]
[topicRef/@href = "#foster-group"]
]
[member
[roleSpec/topicRef/@href = "#property"]
[topicRef/@href = "#VB"]
]
');
The association in object $a would be the same as if created via XTM:
<association>
<instanceOf>
<topicRef xlink:href="#is-owned-by"/>
</instanceOf>
<member>
<roleSpec>
<topicRef xlink:href="#owner"/>
</roleSpec>
<topicRef xlink:href="#foster-group"/>
</member>
<member>
<roleSpec>
<topicRef xlink:href="#property"/>
</roleSpec>
<topicRef xlink:href="#VB"/>
</member>
</association>
Once the association has been handled and an association object is created,
all predicates are interpreted on that hierarchical level. The brackets
around instanceOf/topicRef/@href create the association type without actually changing the
context. The following members are created on the same level.
As one can see from the creation of the members, the predicates can also be nested. Within one member we move the context one step down underneath the member element. These are created before generating the role and player for that member.
The notable part is that predicates can contain other XTMPath expressions, including those with further predicates. In that we are following the assertion that an XTMPath expression should itself evaluate to true if applied to the generated object.
The XTM::Path module is however not limited to only creating topics or associations. Any valid XTM data structure can be built. The following only creates a member which can be further processed:
my $member = $xtmp->create(
'/member
[roleSpec/topicRef/@href = "#owner"]
[topicRef/@href = "#foster-group"]
');In this article we presented a programming technique to navigate through Topic Map data structures using an XPath-like approach. This abstraction allows casual programmers to deal with deeply nested internal representations without the need to learn the complete rules. The XTMPath processor operates then on the internal data after all the necessary merging has occurred. This approach, however, assumes that the developer is familiar with the XML structure this information would have when serialized into XTM. This may be a problem in cases where other authoring notations are used.
We then showed that XTMPath can also be used to generate data structures. While convenient and certainly less errorprone than using a low-level API, the method cannot automatically provide defaults as prescribed in the XTM standard.
Useful for navigation, XTMPath does not have the ambition to become a fully-fledged Topic Map query language. It is meant for localization of information and is adequate for medium sized problems. It is not suitable for heavy duty as there comes some performance penalty with the implementation. While as proof-of-concept sufficient, the actual implementation lacks a couple of features, for instance, that only a small subset of XPath predicate operators were adopted. There is also no technical reason, why operations like update and delete should not be covered.
Of not only academic interest, though, is how the knowledge about a particular XML structure, such that of XTM, impacts on the implementation of an XPath-like processor. Only this knowledge makes actually DWIMming possible.