1 Introduction

The ongoing development of the Internet and Information Communication Technologies in general provides access to enormous—and growing—volumes of textual information. These volumes of textual information are varied and in different languages. However, IT systems, which are able to easily process data, cannot directly process human language. The emergence of new technologies, based on increased computation power and access to huge amounts of data in this post digitalization era are converting Human Language Technologies (HLT) into a real solution to overcome language barriers. In this sense, HLT is a critical enabling technology to exploit the available amounts of digital resources [1].

HLT applies scientific methods and information technology to the understanding of human language. Tools such as search engines, intelligent personal assistants, text classifiers and machine translation have become essential to our day-to-day work. However, there are many other situations in which the application of natural language processing and machine translation might be a key for offering citizens new advanced services and for optimizing processes in both business and in public administrations. Any step towards improved understanding, synthesis, classification or machine translation of unstructured information generates value for society and could apply to all business sectors.

To have a better understanding, it is important to highlight the reasons behind adopting a Plan for Language Technologies at the national level. Language technologies market is growing rapidly. Recent reports estimate substantial growth in the global market over the coming years, among the Top 10 Strategic Technology Trends, natural language processing appears as essential technology under the first trend “Artificial Intelligence and Advanced Machine Learning”. This technology is directly linked with massive data and extensive parallel processing power characterizing this era aiming at the development of systems which “can learn and change future behavior, leading to the creation of more intelligent devices and programs”. Moreover, conversational systems appear as the seventh trend [2]. In addition to this technological progress, other key factors support the adoption of a plan for the advancement of language technologies in Spain. These factors could be explained within different contexts: the national, the European and the international contexts.

At the Spanish national level, both the Digital Agenda for Spain and the Spanish Strategy for Science, Technology and Innovation establish the development of “Digital Economy and Society” as one of the general challenges that require the greatest efforts in Scientific and Technological Research, Development and Innovation (R&D&I) [3]; they also highlight the potential of Information and Communication Technologies (ICT) sector as one of Spain’s strengths for leading scientific, technological and business development. This industry is identified as a Strategic Innovative Area [3]. Furthermore, the Autonomous Regions of Spain also highlight the potential for ICT to be a driving force for the economy in their Research and Innovation Strategies for Smart Specialization (RIS3) [4].

The State Secretariat for Information Society and the Digital Agenda (hereinafter SESIAD, its Spanish acronym) is the responsible body for the implementation and coordination of the Digital Agenda as part of the Strategic Action Plan for the Digital Economy and Society of Spain. In this respect, SESIAD has opted for the adoption of the Plan of Language Technologies as integrating natural language processing and machine translation could optimize services and processes in terms of quality and quantity.

At the European Union level, there are 24 official languages and more than 60 national and regional minority languages [5]. Spanish language is one of the 24 official languages, while co-official languages of Spain: Catalan, Galician and Euskera (Basque) are among the 60 national and regional minority languages. Language diversity is one of the greatest cultural assets, however this asset is also considered a barrier that needs to be overcome, especially when new initiatives are taken at the European level, such as the establishment of a Digital Single Market (DSM). HLT industry could play a crucial role since language technologies components are embedded in many digital products [6, 9].

However, development of applications for a specific language and, in many cases, for a particular area of knowledge, is dependent on the availability of technology and resources for the language in a designated field of expertise.

In the case of Spain, the availability of such resources for the Spanish language—but to a lesser extent and with some considerable gaps—is similar to that for German or French, despite the number of Spanish speakers being much higher. For the co-official languages of Spain, the level is lower. The cost of these resources is substantial and cannot be borne by small- and medium-sized enterprises (SMEs).

To ensure that applications are available in Spanish and the co-official languages of Spain, a comprehensive plan for fostering HLT should take into consideration some measures such as:

  • Improving the quantity, quality, variety and availability of the supporting language resources and tools. In this respect, language resources could be built from the vast amount of information generated by the Public sector within the framework of RPSI Policy, i.e. Policy of Re-use of Public Sector Information [7].

  • Expanding training in these technologies to ICT professionals in the private and public sectors.

  • Introducing Natural Language Processing (NLP) and Machine Translation (MT) technologies into the content managed by Public Administration. The impact of introducing these technologies in sectors such as Health or Justice could significantly improve public services quantitatively and qualitatively.

At the International level, Spanish is the world’s second most widely spoken language after Mandarin Chinese, based on the number of native speakers, and third in terms of total number of speakers, after English [8]. Thus, the Spanish language’s capacity for internationalization is huge, as nine of every ten speakers are located outside Spain [8]. In addition, Spanish is estimated to be the second most widely used language in business transactions globally, due to growth in the Latin American market. Sharing a common language represents an opportunity to strengthen ties with the Ibero-American community and reach out for a wider market in the field of HLT industry.

The Spanish Language Technologies Plan is launched by State Secretariat for Information Society and Digital Agenda (SESIAD) is a five-year plan, framed within the Digital Agenda for Spain. Its geographical and institutional scope cover the various Autonomous Regions and the three co-official languages of Spain: Catalan, Galician and Euskera (Basque) languages.

Given the multidisciplinary nature of language technology, the Plan is inter-institutional and it is based on promoting language technologies by coordinating all the actions of Spain’s Central Administration, in conjunction with the authorities of Spain’s Autonomous Regions.

The Plan is structured in five pillars with different measures under each pillar.

2 HTL in Spain

Based on the steps taken at the ministerial levels, the Committee of Experts was asked to prepare a report in which an analysis of the HLT situation in Spain is presented together with needs assessment within the Spanish context. The report pointed out the strengths, weaknesses, opportunities and threats as shown in Table 1.

Table 1. SOWT Analysis of HLT in Spain.

From the above analysis, the following conclusions could be driven:

  • The language technology sector is a cross-cutting emerging sector, linked to innovation, with the capacity to drive growth, competitiveness and high quality employment.

  • Development in HLT is unstoppable and Spain should seize upon this opportunity.

Spain has the means, but the country’s Central Administration must promote and coordinate initiatives in conjunction with the authorities of Spain’s Autonomous Regions and with the countries of Ibero-America, to make the most of this opportunity.

3 Objectives and Implementation

The general objective of the Spanish Language Technology is to promote HLT in Spanish and Spain’s co-official languages. Three specific objectives are identified:

  1. 1.

    Development of Linguistic Infrastructure

  2. 2.

    Promoting HLT Industry

  3. 3.

    Counting on Public Administration as a Promoter of HLT Industry

The break down into activities is structured within an implementation plan that guarantees coordination of efforts to find possible synergies and to avoid overlapping. The Implementation Plan is structured in five pillars with a set of measures identified under each pillar. The five pillars are the following:

  1. 1.

    A governance pillar to ensure the management and follow up of the activities.

  2. 2.

    Three core technical pillars corresponding to the three specific objectives.

  3. 3.

    A fifth pillar for the implementation of a set of flagship projects.

3.1 Governance Pillar

This pillar is operating horizontally through the lifetime of the Plan to ensure coordination, complementarity, collaboration and mutual assistance among the involved parties at the different levels of Spanish Administration: national, regional and local. Compatibility and complementarity with European strategies and initiatives is also an important dimension in the governance pillar [1]. It contemplates the operational planning and the periodic evaluation of the Plan.

Two established committees are responsible of the governance of the plan: The Steering Committee and the Experts’ Committee. The first is a decision making body focusing on managerial aspects, while the second acts as a liaison between the Steering Committee and the stakeholders providing technical advice to the decision makers. Both implement the governance through three measures:

  • Operational Planning

  • Evaluation and Follow up

  • Coordination with Public Administrations

3.2 Development of Linguistic Infrastructure

This the first of the three pillars which directly map to the following specific objectives of the Plan. This pillar focuses on building a robust infrastructure for Language Technologies including both language resources and processing tools for Spanish and co-official languages. It is important to adopt an integrative approach which avoids overlapping and seeks coordination of efforts and synergies between stakeholders. The following four measures are planned to develop the Linguistic Infrastructure:

  1. 1.

    Design a plan for the development of linguistic infrastructure after creating an inventory of the currently available linguistic infrastructures classified and assessed according to their quantity, quality and availability.

  2. 2.

    The designed plan should consider as an aim reducing the gap between linguistic infrastructure for NLP and MT in Spanish and co-official languages of Spain and those in English. In addition, campaigns should be organized for the assessment and evaluation of language resources.

  3. 3.

    Select technical interoperability standards, appropriate licence policies and mechanisms for protecting personal data in the generation of language resources.

  4. 4.

    Purchase or develop common tools for generating and evaluating linguistic infrastructures.

  5. 5.

    Facilitate accessibility to available high-quality linguistic infrastructure in Spanish and the co-official languages of Spain, free of charge or at a low cost (at least for innovative SMEs, the research sector and the public administrations).

3.3 Promoting HLT Industry

The report conducted by the Committee of Experts in NLP and MT in Spain shed light on the need to boost the HLT industry as Public Administrations and many sectors in the industry are not aware of its potential and its useful role, if applied, in a broad range of products and services. To face this challenge, it is necessary to increase the visibility of HLT industry both locally and internationally.

HLT Visibility. At the local level, support should be given to HLT industry in order to reach a high profile that enables it to attract talented specialists to Spain and to compete internationally.

This support should consider setting the mechanisms to encourage the transfer of knowledge from academia to industry. Furthermore, it should contemplate training of researchers and developers in the field as a necessary step to a create critical mass of specialists in the HLT field in Spain.

For the implementation, the following measures are planned

  1. 1.

    Design a plan to raise the industry’s profile and improve transfer from research to industry.

  2. 2.

    Train professionals and specialists in the HLT field through coordinated actions aiming at improving the profile of HLT providers. Among these actions, the following are planned:

    • including specific courses on language technology in university programmes

    • creating online training (MOOCs)

    • organizing competitions such as hackathons

    • supporting industrial doctorate and master’s programmes

    • offering specialised grants.

  3. 3.

    Enhance visibility through the following actions:

    • organizing training seminars for SMEs and professionals;

    • organizing conferences, seminars and forums

    • participation in national and international trade fairs

    • promoting cloud computing (SaaS)

Internationalization of HLT Sector. At the international level, support should be provided to Spanish companies in the HLT sector to enable them to reach out to the international markets, especially the Ibero-American and North American markets among other emerging markets. For corporations that are already well-established in the Ibero-American market, different support should be given to consolidate their competitiveness and their position in this market.

On the other hand, it is indispensable to strengthen cooperation with Ibero-American counterparts to seek their support and active involvement in order to join efforts to lead the HLT industry for the Spanish language.

A set of measures are identified for implementing the HLT internationalization:

  1. 1.

    Conducting a study on the state of the art of the internationalisation of Spanish companies in the HLT industry, especially in countries with biggest markets for internationalisation.

  2. 2.

    Design an internationalisation plan.

  3. 3.

    Cooperation with Ibero-America through participation in key events such as the Ibero-American Summit and through collaboration with the Secretariat-General for Ibero-America (SEGIB) in addition to other Ibero-American institutions.

  4. 4.

    Coordination with Spanish Agency for International Cooperation and Development (AECID), Network of Science Counsellors Abroad and other existing associations of Spanish scientists abroad.

  5. 5.

    Integration of HLT into the areas that are currently being funded within the international dimension of the Strategic Action Plan for the Economy and Digital Society.

  6. 6.

    Including HLT industry among the investment opportunities in Spain under the ICT sector of the “Invest in Spain Programme” which targets attracting foreign investment to Spain.

  7. 7.

    Including HLT in the agreements and memoranda of understanding (MoU) signed with Ibero-American countries (or from other regions) in the future.

  8. 8.

    Promoting the development of linguistic infrastructure and availability of open public information on variants of Spanish.

  9. 9.

    Identifying trade fairs, conferences or other events for marketing the products and projects being carried out by Spanish companies in HLT industry. These activities could be carried out in collaboration with entities such as ICEX1, the Spanish public entity responsible of promoting exportation, investment and internationalization for the Spanish industry.

  10. 10.

    Studying the possibility of providing grants to incubators or accelerators, or of presenting twinning projects between small and large companies with an international dimension.

3.4 Public Administration as Promoter of HLT Industry

This pillar could be seen from two perspectives: The Public Administration as a user of NLP and MT on one hand, and the Public Administration as provider of resources for NLP and MT, on the other hand.

Creating Shared Platforms Within Public Administrations for HLT. Considering the Public Administration as a user, HLT components could be integrated in its processes and systems. This integration would apply the recommendations of the Commission for the Reform of the Public Administrations (CORA) [10] as public services could be improved significantly in terms of quality, capacity, efficiency and cost.

A shared platform for resources implies saving costs and avoiding duplication of efforts. Thus, making procedures more efficient. Moreover, HLT could provide Public Administration with useful insights through data analysis. The suggested platform will be based on re-usable and interoperable components, preferably with unrestrictive, open source licences.

A set of measures are identified to implement this line:

  1. 1.

    Design a development plan for natural language processing and machine translation platforms in the public administrations.

  2. 2.

    Establish a clear organisational and financial structure to ensure the sustainability of the platform beyond the lifetime of the Plan.

  3. 3.

    Development of a common natural language processing and machine translation platform for the public administrations, with the following essential requirements:

    • Facilitate the launching of advanced services based on natural language processing and machine translation in the national and regional administrations.

    • Develop a scalable infrastructure based on components for the parallel processing of large corpora of documents.

    • Guarantee confidentiality of public services.

    • Add different components and language resources to the flow of language processing with different licensing models and processing methods.

    • Introducing tools for anonymization, editing, post-editing of machine translations, etc.

    • Facilitate access to general-purpose language resources pillar and field-specific resources (mainly those specialized resources necessary for the development of the flagship projects proposed in the fifth pillar.

    • Ensure exploitation and standardization of the language resources generated under the RPSI policy.

    • Enable different models of implementation and distribution (embedded, local cluster, remote, and implementation at supercomputing centres).

Linguistic Resources Generated from Public Administrations Within the Policy for Re-use of Public Sector Information. This line contemplates the Public Administration as provider of resources for HLT. This objective is based on the basic assumption that the huge amounts of data and information generated by Public Administration could be used as valuable linguistic resources.

Nevertheless, converting these resources into freely available linguistic resources (or available at a low cost) requires a set of measures for the development of tools and the adoption of technical interoperability standards, appropriate licence policy and mechanisms for protecting personal data. These measures are detailed as follows:

  1. 1.

    Apply the recommendations of the “Re-use of Public Sector Information” (RPSI) [7] policy through:

    • Introducing the concept of linguistic linked open data in RPSI.

    • Promoting this concept within the public administrations.

    • Introducing the concept of linguistic linked open data at international forums and events such as the International Open Data Conference (IODC), in collaboration with Ibero-America, to raise the profile of the linguistic linked open data policy.

  2. 2.

    Identify corpora of public-sector information that could be converted into language resources.

  3. 3.

    Select technical interoperability standards, open-licence policies and personal data protection mechanisms.

  4. 4.

    Facilitate access to the common tools necessary for generating and exploiting these language resources (e.g. anonymisers, text alignment, processing flows) on the natural language processing platform of the public administrations foreseen in this Plan.

  5. 5.

    Develop a catalogue of these open language resources within the open data portal, with an advanced user experience.

3.5 Flagship HLT Projects in Public Administration

This pillar focuses on developing a set of flagship HLT projects within strategic sectors in the Public Administration. These projects should address well identified problems in which the integration of HLT would highly impact the services or the products offered by the public administration on a wide scale. Thus, they require a vertical and horizontal scalability to process large amounts of information and to serve many users at the same time. Succeeding in implementing these projects would create a high social impact and, hence, would convert them into a reference model for other public administrations. It would also prove the latent potential of HLT if integrated in public services (Fig. 1).

The Plan contemplates strategic sectors of high impact such as Health, Justice, Education and Tourism. However, more sectors might be considered during the lifetime of the Plan based on the following criteria:

  • Commitment of the competent bodies to ensure leadership by those who understand the problem well and are competent to solve it.

  • Clear and demonstrated benefits of HLT.

  • High economic and social impact.

  • Development of the entire value chain.

  • Generation of re-usable resources to avoid dependency on certain exclusive solutions and technologies which prevent resource portability.

  • Synergies with the other measures in the Plan for the Advancement of Language Technology and, in particular, with the generation of language resources and with the public administrations’ natural language processing and machine translation platform.

  • Transferability to future projects.

Fig. 1.
figure 1

Structure of the Plan

4 Concluding Remarks

Since its official launch in October 2015, implementation is in progress according to the activities of the different pillars. Steps have been taken to join efforts and bring together different actors in HLT from public administration, academia and industry. Activities and initiatives taken are available on the official webpage of the Plan: http://www.agendadigital.gob.es/tecnologias-lenguaje/Paginas/plan-impulso-tecnologias-lenguaje.aspx.