The third dialogue of the Road to Bern via Geneva initiative titled ‘Sharing data: Towards a data commons’ was held online on 26 May 2020. The event was organised by the Permanent Mission of Switzerland to the UN in Geneva and the Geneva Internet Platform (GIP), and was co-hosted by European Organization for Nuclear Research (CERN) and the International Trade Centre (ITC).
In his introductory remarks Jean-Pierre Reymond, Chargé de mission, Head of Innovation Partnerships, Permanent Mission of Switzerland in Geneva, noted that the aim of the dialogue series is to prepare the contribution of the Geneva ecosystem to the UN World Data Forum in Bern. The series of dialogues cover the full data cycle: collection, protection, sharing, and the use of data; to allow the development of a community of experts and practitioners working together to implement digital co-operation. Cross-sectoral discussions among international organisations and other partners on data challenges and opportunities contribute to building the right bridges for digital co-operation.
Sharing and developing commons with clear co-operation rules means more prosperity for all. Data are no exception in this regard.
Dorothy Tembo (The full speech of Ms Tembo can be accessed here), ITC Executive Director ad interim, introduced a new phrase: ‘intelligent data is the new data’. She highlighted an emerging data dilemma: we are surrounded by all kinds of data; but there is also a constant lack of the right kind of data for policy response. For example, there is not enough data to estimate progress towards the sustainable development goals (SDGs) or to shape environmental policies. The solution to this dilemma is data sharing: making as much data available to as many people as possible while fully observing the data privacy and security rules. Sharing will enable data to realise its value, and it will reveal gaps and what is needed to fill them.
Tembo then briefly presented the work of ITC on data. The mission of ITC is to foster inclusive and sustainable development through particular focus on micro, small, and medium enterprises (MSMEs). ITC’s data products such as trademark, market access mark, export potential mark, and the sustainability mark enable companies and institutions to identify market opportunities and help policymakers assess national trade performance and prepare for trade negotiations.
ITC collaborates closely with international organizations on data, in particular with the WTO and UNCTAD. Together with the WTO and UNDESA, ITC developed the e-ping portal that help SMEs track product requirements in export markets. With the WTO and UNCTAD, ITC is developing the Global Trade Helpdesk (GTH) – an accessible free online portal that brings together crucial trade information from across agencies to MSMEs.
International organisations rely on national governments for data, however collection of data at a national level is not easy and is resource-heavy. In terms of fostering data sharing in the public domain, ITC recommends:
1. Developing a set of common principles for data sharing;
2. Increasing awareness of data sharing among governments and institutions, not only for sharing primary data, but also for sharing process data and intelligence.
3. Investing in capacity building for data collection and processing.
The private sector also needs to be engaged in more data sharing. For such collaboration, the private sector needs to meet assurance and incentives such as:
1. Putting in place protocols for public-private data sharing: This will provide greater assurance that whatever data companies choose to share will only be used for sound public objectives under agreed terms and conditions.
2. Ensuring full privacy protection: As private sector data can be privacy-sensitive, it should be aggregated and anonymised with clear guidelines on cleaning data before sharing.
3. Sharing data should be easy: Simple changes in procedures such as online portals can reduce the cost of sharing data for companies.
4. Incentives: In the form of joint products, policy development, reciprocal sharing, recognition, and honor awards could generate greater interest for businesses to share data more actively.
Finally, Tembo stressed that only by putting in place adequate measures for security and privacy can ensure that the benefits of data sharing outweigh the potential risks.
Eckhard Elsen, Director for Research and Computing, Member of CERN directorate, underlined that the data the European Organization for Nuclear Research (CERN) collects needs to be preserved and accessible to users. CERN has operated under the principles of collaboration and openness amongst researchers since it was founded in 1954. Collaboration yields better results than the sum of individual contributions. In the context of the COVID-19 pandemic, if all the data collected were shared, countermeasures against the Coronavirus might be accelerated. Openness starts with open source code, algorithms, and key ideas that can be used to analyse data. In this regard, the motto is ‘what has been paid by the public should be made available to the public’. Since 2014, CERN has been implementing open access policies and has been offering scientific articles free of charge. It has founded the Sponsoring Consortium for Open Access Publishing in Particle Physics (SCOAP), a global consortium of 3000 libraries to make all scholarly output on particle physics available for free. CERN is committed to including books and whatever future media that proves to be useful for scientific research. CERN already piloted an Open Data Portal to test theoretical ideas to develop algorithms and explore new analysis ideas. Large structured data benefits many, such as machine learning algorithms, as it makes the machine code more robust and it may lead to new discoveries. Elsen concluded by noting that CERN is in the middle of defining an open access and open data policy to look at the longer term use of data. CERN is publicly funded and wants to give back to society by making the result of its research available for interested parties; giving open access to research results; open data to improve algorithms; and open software and reuse of software.
Session 1 – Setting the scene: Benefits and risks of sharing/not sharing data
Mr Robert Koopman (Chief Economist, World Trade Organization (WTO)) noted that all three Geneva-based trade organisations have different data efforts.
The UN Conference on Trade and Development (UNCTAD) collects trade data from all of its member states, integrates it globally, and makes it widely available to other agencies. The WTO mainly focuses on trade data, tariff data, and non-tariff measured data related to member commitments under the agreements. Data sets at the WTO also create a foundation for members to negotiate trade commitments. The International Trade Centre (ITC) collects data from UNCTAD and WTO, as well as other public and private data, and integrates it to make it available and usable to the private sector for private-sector decision-making.
All these organisations face challenges regarding data sharing, including: the creation of an inventory of public and private data sets that might be useful to stakeholders; the sensitivity of members’ data; and the discrepancy of incentives around data collection in the private and public sectors.
Koopman noted that there is, however, opportunity to use some of the principles operating under the public sector in a way that allows the public sector to take private-sector data and generate more insights. He also noted that national statistical agencies lack funding to transition their capabilities to providing data insights. On data categories, Koopman said that statisticians and the management structure sometimes do not share the same vocabulary.
Mr Thierno Ibrahima Diop (Lead Data Scientist, Baamtu (Senegal); Co-founder, GalsenAI; Ambassador, Zindi) spoke about access to data in the African context.
The public sector in Africa does not have a lot of data, and does not have the infrastructure to hold, protect, or share data with the public. Additionally, the public sector struggles with digitalisation, human resources, and funding necessary for data engineering. The public sector can conduct research and data collection when helped by international organisations, while the private sector, such as telecom companies, does not have the will to share data.
Ibrahima Diop noted that developing countries should be supported in digitalising, as well as in collecting, storing, anonymising, analysing, and sharing data. The private sector should be made aware of the benefits of sharing data, and have researchers and data scientists work on data, trying new algorithms to gain more value of data.
Mr Diego Kuonen (Professor of Data Science, Geneva School of Economics and Management (GSEM), University of Geneva; CEO, Statoo Consulting) highlighted that there are two important moments in the lifetime of data: the creation and the usage of data by the customer or user.
Before data is shared, it is necessary to reflect on how it was created, what were the measuring instruments used to collect it, how much trustworthy data there is, who can use it, and what can be done with it. He also noted that open data offers the possibility to share data, but that this is not obligatory.
Governments, international organisations, and companies tend to create data graveyards, and there is no assessment of the trustworthiness of this data. Trustworthy outputs necessitate trustworthy inputs. Creating a culture of data for the ones that create it, and the ones who use it, is one of the key challenges for the trustworthiness of data measuring systems.
Kuonen noted that data has strategic value. He also suggested building up the common terminology strategically in companies, from the top-down. There are common elements among all the players, and we need the collaboration of different players in the data ecosystem so they can benefit from each other.
There are four important barriers to overcome in order to systematically and universally leverage data to inform public decision-making, and to realise the full potential of privately-held data for the public good, noted Ms Nuria Oliver (Chief Data Scientist, Data-Pop Alliance).
These barriers include: the political and human dimension data sharing challenges; many technological challenges; governance and ethical barriers and challenges; and barriers related to the lack of existing financial models that would ensure that projects leveraging privately-held data for the public good, are financially sustainable.
Oliver concluded by sharing the following: in many cases actionable insights derived from data may be enough, and data sharing may not be necessary; participatory systems and standards, as well as oversight mechanisms, should be developed as to enable data sharing across companies and sectors; investments in education, capacity building, research, and outreach efforts are important; incentives should be implemented, and regulations enabled, to accelerate the use of data that is in the public interest by following ethical principles; government and engagement models should be promoted (e.g. through data stewards, chief ethical officers, and oversight boards); state and regional business-to-government data sharing groups should be established; local and regional centres of excellence, which leverage privately held data that is in the public interest, should be supported, and should set an example; and funding to address the four barriers for leveraging data and enabling sustainable models for data sharing for the public good, should be provided. Data can enable us to transition to evidence-driven societies, for and by the people, because sharing is caring.
Session moderator Ms Marion Jansen (Chief Economist, Director of Division for Market Development, ITC) concluded the session by underlining the main takeaways. ‘Sharing is caring’, but actors first need to consider which data categories they want to share, and on which level of processing. Incentives for sharing data might need to be designed by taking into account the types of challenges that various actors are facing. Actors also have to take into account what they want to do with the data, because they should not create additional data graveyards.
After the morning panel, the audience was split into break-out groups to identify concrete proposals for the UN World Data Forum.
Mr Craig Burgess, Development Cooperation Specialist the World Health Organization (WHO), moderated the first breakout room and divided his proposal into three key issues; infrastructure, organisation, and principles:
- Infrastructure: need to identify hierarchy, data stewards, decision-making bodies, common codes, and data qualifiers.
- Organisational needs: (a) the need for harmonisation of laws and policies, (b) creating memorandum of understandings (MoUs) between organisations and other actors, (c) setting up structures to collect (including micro); store (clear duration); analyse; and use quality data while protecting citizen rights and confidentiality.
- Principles: important to go beyond security and confidentiality, and address quality, transparency, trust, equity, and ethics (both in terms of accountability to communities most in need and in terms of fair competition for access to data).
With regard to the second question, Burgess noted that stimulating public-private partnerships can leverage technical and financial resources and strengthen understanding, co-ordination, and information flows between sectors to address community based problems. However, issues such as trust, ethics, and asymmetries between public and private sectors need to be addressed – possibly through corporate social responsibility plans, regulation, fairer ecommerce and ensuring governments retain ownership of public data.
Mr Yannick Heiniger, Partnerships Manager, the International Committee of the Red Cross (ICRC), referred to one of the points that was transversally present in his breakout session; the need to truly engage with intermediary and/or neutral actors, who could:
- Explore what data sharing means for different organisations (that have different mandates),
- Generate indicators needed to start data sharing,
- Building and compiling evidence and case studies that illustrate the balance between data sharing and data protection.
He also stressed that organisations need to equip their leadership and staff with relevant digital skills and collaborate with universities and other educational institutions to develop and provide courses on digital literacy. According to Heiniger, data education can be divided into three levels:
- Building capacity on basic digital hygiene and basic data knowledge.
- Creating interdependencies between data, technologies, legal and organisational issues.
- Providing decision-makers a safe space to exchange views and learn from each other.
Reflecting on the second question, Heininger pointed to the common denominator of most of the organisations participating in the Road to Bern; the need for more examples, real stories, and more collaboration. The biggest challenge is to balance data sharing with data protection in a way that reinforces the different mandates of organisations.
Ms Alica Daly, Senior Policy Officer on Artificial Intelligence and Data, the World Intellectual Property Organization (WIPO), pointed out that increasing the capacity of international organisations to deal with data is probably the most important deliverable from the breakout discussion on the first question. She also referred to the need of having clear objectives for the use of the collected data, such as providing benefits to those sharing the data and for the public good.
With regard to the second question, she highlighted, amongst other things, the need to determine current barriers (governance, funding, skills, resource, complexity of data, perception of lack of audience for data, and lack of skilled audience for data) and creating more awareness of success stories, best practises, and advantages of data sharing.
Session 2 – How do we move forward: Win-win solutions for data sharing
Moderated by Dr Tim Smith (Head of the Collaboration, Devices, Applications Group, European Organization for Nuclear Research (CERN)), the session aimed to address the following three tensions:
- Tension between purpose-driven data sharing and ‘simple’ open data
- Tension between data equality and equity
- Tension between incentives and obligations
Dr Zeynep Engin (Senior Research Associate, University College London (UCL)) brought a human-rights perspective into the conversation by emphasising that data connects to real people and real issues. When thinking of data sharing, we need to take into consideration individuals, i.e., data subjects and their privacy.
According to Engin, the discussion around data sharing should be taken to a higher level and include issues of human rights, and democratic values and principles.
Mr John Wilbanks (Chief Commons Officer, Sage Bionetworks) noted that we should not confuse open data with data commons, and that we should define a data commons spectrum, and avoid framing data as either open or closed. To that end, he referred to the current COVID-19 crisis and the inability to use open data due to privacy concerns. To address the crisis, his team opted for a governed data commons hosted in a central repository that cannot be downloaded or redistributed, but is accessible to trusted users (e.g. researchers, universities, healthcare institutions).
However, these governed commons, that represent ‘an estuary between open and closed’ data, are quite a challenge to many organisations due to the lack of technical infrastructure, economic funding, and human training. To address these issues, Wilbanks advocated for transnational infrastructure support.
Mr Alex Sceberras Trigona (Former Minister of Foreign Affairs of Malta) reflected upon his involvement in promoting the protection of the critical infrastructure of the Internet as a global public good and as a commons, and referred to data as part of this critical infrastructure. He also stressed the need for an architecture which would allow the creation of commons categories, and would be structured around topics such as purpose, persons, time, place, and the like.
At the domestic sphere, he advocated for a regulator that would operate the shared data. On the other hand, this is hardly possible to install at the international level.
Ms Veronica Cretu (Governance Lead, Innovating Governance Association Austria) shared her experience in supporting governments, especially in Eastern Europe, in releasing open government data. The idea behind the initiative was that the more data the government releases, the more transparent it will be, and the more citizens’ trust it will generate.
She also referred to a lot of tension and resistance from government officials to open up, stemming mostly from their lack of understanding of what constitutes data. Cretu has noticed a gradual increase in awareness and understanding among governments on the one hand, and a push from the citizenry, the private sector, and the civil society which demanded more data, on the other.
She also noted that, while governments across the globe are increasingly adopting data regulations and frameworks, the results are still quite modest. According to her, investing in infrastructure is equally important as appropriate policies and frameworks.
Ms Mitchell Baker (Executive Chairwoman and CEO, Mozilla Foundation and Mozilla Corporation) reflected on the Common Voice project launched by Mozilla, which has the aim to build a data library of voices, a commons that would particularly help those areas where not everyone can read or where reading requires education that is unavailable.
Baker also pointed to the immense value of open source data, or the open web as they call it, but noted how this was an inconceivable notion, a ‘crazy idea’, just a couple of decades ago. She also highlighted the lessons learned from Open Source Software about chosing the right licence for the open data, consciously covering the commercialisation aspects. However, she warned that the idea that everything should be free of charge and completely open ‘has come back to haunt us’ and has opened the doors to economic and political surveillance.
Baker added that public goods are underfunded and called on the UN to ensure funding for open source and public good projects.