Linked Open Data Overview

This resource is intended to give an overview of the benefits and challenges of opening up and semanticizing data, as well as the potential challenges an institution might face in doing so.

List of abbreviations

.CSV : Comma-Separated Values
.JSON : JavaScript Object Notation
API : Application Programming Interface
CHIN : Canadian Heritage Information Network
CIDOC CRM : Conceptual Reference Model of the International Council of Museums' International Committee for Documentation
LOD : Linked Open Data
RDF : Resource Description Framework
SPARQL : SPARQL Protocol and RDF Query Language
URI : Uniform Resource Identifier

Linked Open Data: A Short Introduction

Linked Open Data (LOD) methodologies are increasingly being advocated by the heritage community to structure, disseminate, exchange and use museum data. This approach relies on two essential principles:

the data must be accessible under an open licence (i.e. having as few legal restrictions as possible, as these prevent reuse by third parties);
the data must be linked on the basis of common and predefined standards (models and vocabularies).

In short, LOD are information shared under an open licence (with greater or lesser restrictions) and linked based on principles and technologies defined by the World Wide Web Consortium (Uniform Resource Identifier (URI), Resource Description Framework (RDF), SPARQL Protocol and RDF Query Language (SPARQL) entry point, etc.). The use of common vocabularies to properly categorize and identify the data comprising this content, together with standardized data models, increases the usability of various computer applications. These applications can thus cross-analyze, interpret, reuse and contextualize this information, even if it initially comes from different sources. This means that information held by multiple institutions about the same item can be gathered and used easily and for the benefit of everyone, thus fostering the emergence of new inter-institutional knowledge.

Such sharing reduces the work involved in many facets of digital collection management by drawing on expertise and updates provided by other teams with established authority.

The use of LOD allows the burden of data management to be dispersed by:

facilitating research for practitioners;
promoting open documentation with heritage content;
contributing to the presentation and enhancement of content by heritage professionals, researchers and the public.

Several external resources can be consulted within our Zotero library.

Linked Open Data for Cultural Institutions

At the moment, the Canadian Heritage Information Network (CHIN) is developing a model intended for Canadian collections of artefacts. Its facet dedicated to people and groups is currently being developed and will be tested shortly. The Objects facet of this Collections model is intended to be able to align with linked.art’s model for art institutions.

If, as an institution, you want to semanticize your data, CHIN would be happy to collaborate with you on this matter and advise you as best we can. As a general rule, you should take the following main elements into consideration:

The use of open licenses on your data: knowing that you can choose which data to make available and that different licenses can be applied to different data (although an open license is always preferable from an LOD standpoint). For example, you could decide to make all information pertaining to an object available in LOD without providing the image of that object.
The cleaning of your data: knowing that messy data is publishable data that will not be as semantically sound. There are tools to semi-automate this process (OpenRefine and the Getty’s extension to it, for example). CHIN can advise you on this if need be. Keep in mind that if you want to publish rich LOD, the data cleaning process must be integrated with a semantic model that suits your needs. This will largely depend on the semantic valuation you are aiming to reach.
The development of a cultural heritage semantic model is most often based on the Conceptual Reference Model of the International Council of Museums' International Committee for Documentation (CIDOC CRM), and this will be the case with CHIN’s model. The easiest way for an institution to semanticize its data is to use a pre-existing model rather than develop one of its own. You are invited to use CHIN's model once it is available, and should you wish to use linked.art's version, CHIN will be happy to put you in contact with them.
The publication of the semanticized and enriched data does not amount to its visualisation. As a result, the development of interface(s) is the next important step in a digital data strategy that is specific to your institution, should you want to make the data available to the public online. In most cases, the model you use or develop should not be determined by your intended visual displays (interfaces). Rather, it should be selected or developed according to your needs and use cases (such as domain experts’ questions that could eventually become queries).

You will find below a list of benefits and challenges that we have identified as part of our research. Keep in mind that many of the challenges can be mitigated by using a strategic approach.

Benefits

LOD offer a number of advantages, especially when it comes to accessibility and visibility online. LOD are a set of tools and principles that can benefit heritage institutions because they can:

Increase the discoverability rate of:
- Institutions and their collections;
- Artefacts and actors (people and groups) represented in the dataset;
- Anyone who openly contributes.
Foster more nuanced data (online and offline) by:
- Generating new knowledge;
- Creating new results that original authors/owners of data were not looking for/into;
- Showing errors that might have gone unnoticed.
Contribute to greater knowledge and understanding of the data by:
- Helping disseminate new ideas more rapidly and widely, which in turn triggers new research studies and serves as an impetus for knowledge;
- Making this knowledge widely known through reuse and publication, which can be put to immediate use in teaching;
- Enabling citizen advocacy groups and researchers to analyze data, producing new and better insights.
Diminish the financial and human resources needed for day-to-day tasks by:
- Distributing the maintenance of data across the network when it comes to researching, gathering and presenting heritage data;
- Minimizing the risk of using old/outdated metadata.
Offer an opportunity to engage stakeholders as well as citizens:
- Researchers and academics might be interested in micro-data;
- Decision makers and the public might be interested in higher-level aggregates;
- More people can access information, including those who would otherwise not have access to institutions and their databases, etc.;
- Citizens and others can familiarize themselves with the collections so that the museum reach and societal impact can be much broader, especially as a contributor to an open, knowledgeable and creative society, considering how people increasingly expect transparency from museums;
- Institutions can themselves use the datasets to further engage their own audience.
Standardize data which:
- Diminishes the risk of data loss through multiple conversions;
- Enables manipulation and analysis of data, making it more easily usable and visualizable;
- Renders heritage information more accessible to search engines.
Encourage socio-economic development by:
- Adopting transparency and accountability principles when it comes to engaging audiences;
- Making data re-usable for profit and non-profit organizations alike by giving broad access to the most recent data, which organizations can then build on;
- Offering better documentation and statistics when asking for private or public funding (or, in turn, when evaluating such proposals on the part of the public body).

Institutions that do enter the open-access arena usually do so for the following reasons:

The high cost of administering rights and permission fees for artworks that are subject to copyrights is comparable or superior to that of paying fees for these works (although this is highly dependent on the collection);
As a result of the remix culture of the Internet, it is now something that audiences are expecting from museums;
Open-access principles are considered to be a mission-serving imperative of the 21st century;
It fosters community engagement and expands the reach and scalability of online collections.

Challenges

The value of the data catalogue is realized when it is used by people, so that it relies on engagement of users more than on availability of data:
- Users should be in a position to discover the data they are anticipating and be equipped to use it;
- Rigorous work might be devalued because it takes longer to produce and much more resources to promote than “noisy” content (such as a big controversy or chatter about no specific content).
The passage to LOD entails a paradigm shift when it comes to assessing and commenting data:
- It entails acquiring new expertise or networks of advisors who are knowledgeable about LOD;
- Institutions are often fearful that they will lose their ability to sell images, hence cutting themselves off from significant revenue and financial independence (image revenues, however, are usually minimal, especially in Canada, where the market is relatively small; in addition, it is possible to open only select data, thus excluding images if necessary);
- Who is considered to have authority and knowledge over information (as opposed to data, which remains strictly under the umbrella of its host institution) might change as more information is generated;
- The decentralization of information implies subjecting data to public scrutiny and questioning the authority of institutions, especially in the case of conflicting or problematic data for sensitive datasets.
The catalogue has to be built according to who the users will be, which might involve:
- A re-evaluation of the needs of the community following a change in the data management landscape (where the users of the data will no longer be solely cataloguers);
- A need for the data to not only be structured and classified, but to also be meaningfully and consistently organized (i.e. the information not only has to be retrievable, the path to reach it and where it is within the structure is meaningful as well);
- A transparent data production/contribution process where users expect to have access to original information, be able to scrutinize it and have a way to manipulate it themselves.
There is a risk of users misinterpreting or misrepresenting data either deliberately or through a lack of understanding:
- This might generate intense debates with no single authority to adjudicate who is knowledgeable and who is not. However, the reverse is also true, as opening up data exposes it to scrutiny by a wider set of experts that the host institution might not have known about;
- Everyone must be able to use, reuse and redistribute easily, but provisions to communicate with data contributors (at all stages, namely production, storage and distribution) must also be offered to users.
Opening data is generally not a priority for stakeholders:
- Maintaining, cleaning and opening data can be resource-intensive;
- Fear of criticism when it comes to problematic, incomplete or inaccurate datasets is a real concern for institutions;
- Converting an existing dataset to an LOD portal can be daunting, especially as information technologies and management systems have been developed without considering public use or the groups that are now likely to mobilize the data.

Feasibility Guidelines

In an interview with Jason Bailey, Neal Stimler suggested the following process to navigate the opening of your data (Bailey 2019: 1-2):

Perform a thorough rights assessments using relevant resources such as:
Consult with licensed legal counsel
Build tools to provide mass self-serve access to data and digital asset sets. These tools typically come in the form of:
1. A museum's collection on a website;
2. A public application programming interface (API);
3. A GitHub repository of data in the Comma-Separated Values (.CSV) and (JavaScript Object Notation) .JSON formats. Data should be offered with the same permissions and legal frameworks as associated image assets. The API serves application developers and partners, while .CSV and .JSON formatted data mainly support researchers and scholars.
Ensure open-access content is hosted in partnership with crucial aggregation platforms such as Wikidata, Wikimedia Commons and Internet Archive.
Ensure decisions are evaluated and made with respect to cultural and ethical considerations of open access in collaboration with communities and scholars.
An internal working group or project team from relevant areas across the organization should be assembled. The internal group would be directed by a project manager who leads the project vision and has ultimate decision-making authority. Partnerships with allied organizations engaged with an institution’s users and working directly with Creative Commons is strongly recommended to implement best practices.

For more information on LOD, including an overview of best practices when publishing LOD, please refer to CHIN’s working documentation on the matter (please be advised that certain contents may only be available in French).

Selected Bibliography

Bailey, Jason. 2019. ‘Solving Art’s Data Problem - Part One, Museums’. Artnome (blog). 29 April 2019.

Data, Open Art. 2018. ‘Museums: Interactive Map with Wikidata’. Open Data Art (blog). 16 December 2018.

Edson, Michael Peter. 2019. ‘Wikimania 2019 Keynote Address’. Keynote presented at the Wikimania 2019, Stockholm, SE, April 29.

Goldman, Kathryn. 2018. ‘Open Access Images of Public Domain Work’. Creative Law Center (blog). 2018.

Hyland, Bernadette, Ghislain A. Atemezing, and Boris Villazón-Terrazas. 2014. “Best Practices for Publishing Linked Data.” W3C Working Group Note. January 9, 2014.

Kela, Riitta. 2019. ‘Opening Collections as Open Data: Challenges and Possibilities’. In Documenting Culture: A Culture of Documentation. International Council of Museums (ICOM). Tokyo, JP.

McCarthy, Douglas. 2019. ‘Licensing Policy and Practice in Open Glam’. Medium, 30 May 2019.

Oomen, Johan, Enno Meijers, and Wilbert Helmus. 2016. ‘Network Digital Heritage: Towards A Distributed Network of Heritage Information’. International Conference on Digital Preservation (IPRES). Amsterdam, NL: Dutch Digital Heritage Network.

Open GLAM. 2020. ‘Declaration on Open Access for Cultural Heritage’. 21 January 2020.

Open Knowledge Foundation. 2012. ‘Resources’. OpenGLAM. 27 November 2012. Retrieved 12 May 2021.

Openness: Politics, Practices, Poetics(PDF format). 2017. Living Archives. Malmö, SE: Malmö University.

Sanderhoff, Merete, ed. 2014. Sharing Is Caring: Openness and Sharing in The Cultural Heritage Sector. Copenhagen, DK: Statens Museum for Kunst.

Schrier, Bill. 2014. ‘Government Open Data: Benefits, Strategies, and Use’. The Evans School Review, Alumni Perspective, 4 (1): 12–27.

Stimler, Neal, and Louise Rawlinson. 2019. ‘Where Are The Edit and Upload Buttons? Dynamic Futures for Museum Collections Online’. In MuseWeb. Boston, MA: MuseWeb 2019.

Stinson, Alex. 2018. ‘Wikidata in Collections: Building a Universal Language for Connecting GLAM Catalogs’. Medium (blog). 9 April 2018.

Vathana, Anly, and Dev Pramil Audsin. 2013. ‘An Open Analysis on Open Data’ (PDF format). Submission paper. In Open Data on the Web, 4. London, GB: W3C.

Wallace, Andrea. 2017. ‘Access and the Digital Surrogate: Openness as a Philosophy’. presented at the National Digital Forum, Wellington, NZ, November 27.

Page details

2023-08-30

Language selection

Search