|A semantic modelling approach to biological parameter interoperability|
Lowry, R.; Bird, L.; Haaring, P. (2007). A semantic modelling approach to biological parameter interoperability, in: Vanden Berghe, E. et al. (Ed.) Proceedings Ocean Biodiversity Informatics: International Conference on Marine Biodiversity Data Management, Hamburg, Germany 29 November to 1 December, 2004. VLIZ Special Publication, 37: pp. 121-128
In: Vanden Berghe, E. et al. (Ed.) (2007). Proceedings Ocean Biodiversity Informatics: International Conference on Marine Biodiversity Data Management, Hamburg, Germany 29 November to 1 December, 2004. VLIZ Special Publication, 37. IOC Workshop Report, 202. VI, 192 pp., more
In: VLIZ Special Publication. Vlaams Instituut voor de Zee (VLIZ): Oostende. ISSN 1377-0950, more
|Also published as |
- Lowry, R.; Bird, L.; Haaring, P. (2004). A semantic modelling approach to biological parameter interoperability, in: Ocean Biodiversity Informatics, Hamburg, Germany: 29 November to 1 December 2004: book of abstracts. pp. 14, more
Modelling; Parameters; Marine
The BODC Parameter Dictionary currently contains over 16.500 terms of which nearly 11.000 pertain to biological parameters. The Rijkswaterstaat database in the Netherlands covers over 10,000 types of measurement, most of which are either chemical or biological. A requirement to populate a metadatabase described in terms of the BODC dictionary from the Rijkswaterstaat database meant that parameter interoperability between these information sources needed to be addressed. One technique for approaching this is manual mapping, working term by term through one of the information sources then searching for matching terms in the other. However, whilst this may be feasible for dictionaries containing tens of terms, it is totally unrealistic when the counts run into the thousands and so an alternative, automated approach was required.Automation was initially attempted using a semantic matching tool developed at Rijkswaterstaat to offer a restricted list of BODC terms (preferably a single term) as the possible matches for each measurement. However, this met with limited success because the BODC dictionary consisted of plain language terms that not been written with machine processing in mind and had no constraints on either syntax or vocabulary. To appreciate the problem consider the programming required to recognise that ‘Calanus abundance’, ‘Number of Calanus’, ‘Calanus count’ and ‘Abundance of Calanus’ essentially mean the same thing. Further, no dictionary, especially a dictionary without vocabulary constraints, is perfect and there is a high risk that matches will be missed due to basic errors such as spelling mistakes.The Rijkswaterstaat database is described in terms of a data model that qualifies measurements through associated attributes describing, amongst other things, what was measured and how it was measured. This is an example of a semantic model in which an entity is described in terms of discrete items of information, called semantic elements. Ideally, these elements are atomic, unambiguous and therefore ideally suited to machine interpretation. It was concluded that the only way a mapping could be achieved would be to develop a model along similar lines to describe the BODC dictionary and then map the two models.