1 Introduction

In the recent years, artificial intelligence models have shown good results in various tasks of visual computing [1, 2]. In this context, the concept of digital cultural heritage (CH) twin has been proposed, which contextualizes the concept of digital twin well known in the field of artificial intelligence within the CH domain, with the aim of defining a complete digital modeling and characterization of every CH item.Footnote 1 The digital representation of any possible CH item should be characterized, then, by the presence of two components, the first one regarding the aforementioned physical description of the item, and the second one, the digital CH twin, to be used for representing the intangible (immaterial) and emotional messages transmitted by the item.

The work described in this paper must be understood, mainly, as a contribution towards a better and more complete, with respect to the current standards, digital representation of the CH of the entities of any possible kind. Up to now, as also shown in the subsequent related work section, this digital representation has mainly taken into account the physical aspects of the different CH items, denoting in general with the term physical all those measurable/easily quantifiable properties that are currently used to characterize and identify these items, i.e. dimensions, support, execution technique, style, name of the artist, information about the owner, places and collections to which the item belongs, etc. The immaterial/conceptual features proper to the CH entities concern fundamental aspects of the aesthetic fruition processes, normally totally neglected in the context of the digitization operations, like emotions, feelings, sentiments, evocations, associations, cultural suggestions, temporal and historical recollections, etc. Even the simple, factual description of the inner meaning expressed by these entities-particularly important given that it represents the core of the emotional message they transmit to the user/observer-is normally expressed in a particularly sketchy way. Note that European Commission (EC) seems to be aware of this state of affairs see, e.g. a 2020 EC Call for the implementation of advanced digitization operations where the Commission asked for the digital representation of immaterial aspects like the socio-historical and cultural context and of the temporal evolution, with, apparently, few concrete results. Accordingly, the concept of digital CH twin has been proposed to take correctly (and simultaneously) into account the two dimensions, physical and immaterial/conceptual.

The present work conforms to the concept of digital CH twin by emphasizing, in particular, the importance of the immaterial and emotional aspects of the CH items; in spite, in fact, of the different works proposed in the literature for pattern extraction and representation from a visual computing perspective [7], and the recent focus of the computer vision research community towards deep learning-based problems with large-scale training datasets [8, 9], this research direction is still little explored. Accordingly, it seems there is no way to formalize in a computer-usable manner which can be considered as sufficiently exhaustive (and interoperable) for the scope, these emotional aspects. To overcome this limitation, the Narrative Knowledge Representation Language (NKRL) [10] has been adopted for digitally modeling the digital CH twin, due to its ability to represent in a simple but rigorous and efficient way complex situations and events, behaviours, attitudes, etc., see also our previous work in this context described in [11]. NKRL has been expressly created, thanks mainly to several European projects, to formalize and manage as accurately as possible that real-world, dynamically characterized and highly ubiquitous entities, which are designated as narratives, see [12, 13]. In an operational context, a narrative is seen as a coherent set (i.e. its components are logically and chronologically related), possibly of cardinality = 1, of spatio-temporally constrained elementary events describing the activities, states, experiences, behaviours etc. of the (human and non-human) entities involved in the global narrative. In this general framework, a recent experiment has been conducted in the context of a project regarding an advanced digital representation of the Mona Lisa (La Gioconda) painting by Leonardo Da Vinci, considering in particular the hidden painting that lies beneath Mona Lisa’s portrait on the same poplar panel. In this experiment, NKRL [10] has been selected for digitally modeling the whole Mona Lisa’s digital CH twin, with particular attention to the aspects related to the hidden painting narrative.

The reasons of the choice of NKRL for the implementation of our digital CH twins, as it will appear clearly in the next sections, are mainly linked to its greater expressiveness with respect to the usual tools in the Semantic Web (SW) mode. This lack of expressiveness problem is well known within the SW community itself see, e.g. [14] and, from a more technical point of view [15]. This problem derives mainly by the systematic adoption, by the SW community, of the so-called binary model as the main knowledge representation principle. In this approach, as its name indicates, the number of arguments that can be associated with a given predicate/property is limited to two: the general format of those “triples” that represent the basic structures on which it rests the whole SW edifice can be expressed, then, as simple relationships of the type individual1-property-individual2/value. The limitations of this approach become quickly apparent as soon as we must represent everyday situations implying the presence of several entities, linked by multiple relationships associated with strong spatio-temporal constraints. The improved NKRL’s expressiveness depends largely from its adoption of an n-ary Knowledge Representation approach with respect to the standard binary one, see Sect. 3 below for the details, approach that it is (obviously) essential to denote difficult notions as the emotions, feelings, sentiments, etc. that must be represented inside our digital CH twins. From the specific viewpoint of the contributions to the progresses of NKRL of the work described in this paper as compared to [10], the most important concerns, undoubtedly, a new, additional demonstration of its general (and well-recognized) richness and flexibility. This can be easily verified by noticing NKRL’s capacity, largely displayed in this paper, for taking correctly into account a very important (both from a quantitative and qualitative/cultural point of view) on the one side, and very difficult from a Knowledge Representation practice on the other, component of the Cultural Heritage domain represented by the Iconographic Narratives, which are not considered and analyzed in [10].

The following main points summarize the contributions of this paper.

  • We introduce a digital CH twin which includes, not only the physical quantifiable properties of the CH items, as in the previous literature, but also the immaterial/conceptual features, i.e. emotions, feelings, sentiments, evocations, associations, cultural suggestions, temporal and historical recollections, etc.;

  • In NKRL, we adopt an n-ary Knowledge Representation approach with respect to the standard binary one, that is essential to model emotions, feelings, sentiments, etc. that must be represented inside the digital CH twin;

  • The extension of the work introduced in [10] is manifold: (i) we apply NKRL to the CH domain, (ii) we study the NKRL’s capacity to take into consideration the Iconographic Narratives, a very important and very difficult, from a Knowledge Representation practice, component of the CH domain.

The paper is organized as follows. Section 2 provides a general background about the CH digitization procedures present in the literature. Section 3 describes preliminary concepts about the digital modeling through NKRL. Section 4 presents the data and experimental setting of adopting NKRL for modeling the hidden painting’s narrative. Section 5 describes the obtained results from the experiment. Section 6 makes a discussion about the obtained results. Finally, Sect. 7 draws conclusions about the proposed approach and outlines future work directions.

2 Related works

CH digitization procedures have now a long tradition of work that goes back to the sixties. They were limited, initially, to some elementary descriptions of the CH items that could be likened, in practice, to standard annotations created making use of keywords (metadata) extracted from thesauri as the Art and Architecture Thesaurus (AAT, http://www.getty.edu/research/tools/vocabularies/aat/index.html), ICONCLASS (http://www.iconclass.nl) or the Union List of Artist Names, ULAN (http://www.getty.edu/research/tools/vocabularies/ulan/index.html). They were typically focused, then, on the utilization-mainly for documentary/information retrieval purposes-of the physical aspects of the digital CH twins introduced above. The use of (relatively simple) data models very popular in the eighties-nineties as the original Dublin Core (http://dublincore.org/) proposal defined as pidgin metadata in [16] or of more sophisticated models like VRA (Visual Resources Association) Core 4 XML Schema (https://www.loc.gov/standards/vracore/schemas.html) has contributed to improve the efficacy and generality of these annotation procedures. The iconographic items’ descriptions obtained thanks to this kind of processes correspond then with reference, e.g. to the VRA model-to the use of unstructured sequences of binary entities of the property-value type (like “name-Peter Paul Rubens”or “culture-Flemish”) linked to simple XML attributes. The conversion of several of these models into RDF-compatible tools under the influence of the Semantic Web initiative-see the DCMI (Dublin Core Metadata Initiative) Abstract Model [17] or the RDF(S) version of the CIDOC CRM tool (http://www.cidoc-crm.org/rdfs/5.0.4/cidoc-crm)-has further increased their interoperability and standardization potential. CIDOC CRM (Conceptual Reference Model)-see [18] for the last version (6.2.6)-is a well known, powerful and outstanding tool that aims at providing definitions and a formal, ontological-oriented structure for describing the main implicit and explicit concepts (99 in the last version) and relationships (191) used in the CH documentation. Since 9/12/2006, it is the official standard ISO 21127:2006.

If the adoption of a Semantic Web (RDF) perspective has indubitably produced important beneficial effects with respect to the digital encoding of the physical properties of the iconographic items, it does not seem to have introduced real progresses in the search for concrete solutions for representing the second, immaterial size of the digital CH twins. A canonical and well-known example in this context is represented by the RDF-based description of Claude Monet’s Garden at Saint-Addressee painting included in the Image Annotation on the Semantic Web W3C Incubator Group Report [19]. This description includes an impressive amount of documentary/bibliographic/descriptive/ physical details. However, the description of the proper content/deep meaning of the painting and of the message it could transmit to the observer is reduced to three extremely sketchy statements—e.g. <vra:subject>Adolph Monet (artist’s father)</vra:subject>-relating the presence on the picture of three persons (three Claude Monet’s relatives). No information is given about the real semantic content of the painting, for example, the mode of sitting of the personages in front of the see, their mutual relationships, their attitude about the peaceful and bright landscape, the impressive number of flowers, etc. The Incubator Group Report goes back to 2007, and we could suppose that the state of the art about the formal description of iconographic items described in this report is now largely outdated. This is not always the case. We can see for example, in the context of the very complex and convoluted formalization of the Mona Lisa painting carried out in 2013 in an Europeana context [20] making use of the RDF-oriented European Data Model (EDM), that the description of the semantic content of this painting seems to be several times reduced to flat statements in the style of “the subject of the painting is a woman”[20]:19.

A number of proposals concerning the implementation of specialized systems for the description/management of iconographic items that are based on extensions of CIDOC CRM have been advanced these last years. In [21] for example, the authors suggest to expand CIDOC CRM making use of the Situations &Descriptions (S&D) module of the DOLCE Semantic Web tool. In this context, CIDOC CRM should be used to represent the properties and the basic relationships of these items, while the S&D module should supply an in-depth formal description of their informative content; the iconographic domain of application concerns some temple scenes and reliefs pertaining to the Meroitic civilization in Sudan. From the paper, it appears clearly through some concrete examples how CIDOC CRM, in association with other ontologies, could be employed to formalise the relationships between temples, scenes and reliefs, highlighting in particular some spatial relationships. On the contrary, the contribution of S&D to the formalization of the informative content is not concretely illustrated in the paper and seems, therefore, to belong more to the domain of intent than to that of tangible results-even if the authors mention, in the Conclusion, the existence of some undefined successful tests. Another recent proposal of extension of CIDOC CRM concerns the implementation of an ontological tool called VIR (Visual Representation) [22]-see also https://ncarboni.github.io/vir/. VIR expands the key entities and properties of CIDOC CRM by introducing seven new classes (e.g. Iconographical Atom) and 20 new relationships (e.g. portray/is portrayed) corresponding to the needs of the visual and art historical community. According to the authors, the resulting model is able to provide a clear distinction between denotation and signification of a CH element, permitting, in particular, the definition of diverse denotative criteria for the same representation. Taking as an example the representation of a widely known iconographic character, Saint George, the VIR-supported ontological/graphical representations allow us to follow the evolution in time of the set of attributes illustrating this character. These are simply reduced to horse and spear in the frescoes of the Panagia Phorbiotissa church in Cyprus, to which are added Castle, Princess, Lake and Dragon when the Saint George killing the dragon painting by Vittore Carpaccio is taken into consideration. This work is of course interesting, but it contributes scarcely to the representation of the deep meaning of the iconographic items examined. For example, the modelling of the Carpaccio’s picture is reduced to a simple binary representation where an individual, Saint George, is denoted by a set of static properties, horse, spear, castle etc., and the highly complex, dramatic and dynamic illustration of the fight with the dragon is totally ignored.

We can conclude this short discussion about the present state of the art with respect to the modelling of the inner meaning of Cultural Items by mentioning a recent paper [23], which concerns the project Cultural-ON (Cultural Ontology) resulting from a collaboration between the Italian Ministry for CH and the Italian National Council for Research. The project has been recently restructured (see http://dati.beniculturali.it/cultural_on/to) to adapt it to other ontological initiatives of the Ministry, but its basic approach did not change. In this paper we find, e.g. the formal description of a Giambologna’s sculpture denoting the kidnapping of a Sabine woman by a Roman warrior, on display at the Capodimonte museum in Naples in the framework of the so-called Collezione Farnese. The description of this sculpture is restricted to purely documentary/information retrieval-oriented statements like cis: isMemberOf collection: Collezione_Farnese or cis: isInSite: Sede_del_Museo_di_Capodimonte, without any attempt to model the frightened woman, the kidnapper, the man who tries to prevent the kidnapping, the reciprocal attitudes of the different protagonists of the scene, etc.

Table 1 presents a summary of the aforementioned works, selected as the most relevant ones in the literature with the closest scope to the proposed approach, and their contribution to the literature.

Table 1 Current approaches of the literature and their contribution

Note, eventually, that finding a solution to the above difficulties is not facilitated by the choice of systematically choosing RDF as implementation support, because of the well-known lack of expressiveness problem that bothers all the Semantic Web tools from the beginning. This problem is linked to the choice of making use only of the quite limited binary Knowledge Representation model. In this, in fact, a concept is simply defined through a set of properties/attributes; when the concept is instantiated into a concrete individual, each associated property can only link this individual to another individual or a value, individual1-property-individual2/value. Such an approach renders then particularly difficult the setting up of complete and effective formal descriptions of real world, complex information structures like spatio-temporal data, contexts, reified situations, human intentions and behaviours, etc. The evident solution to this problem is that of making use, instead of binary models, of n-ary ones where a given predicate can be associated with multiple arguments-for example, the n-ary purchase relation/property concerns events where at least a seller, a buyer, a good, a price, and a timestamp are involved. Unfortunately, it is not so evident that an n-ary solution should be used, given the associated implementation difficulties. To give only an example, a common misunderstanding consists in asserting that the use of specific n-ary Knowledge Representation structures is not necessary, given that any n-ary relationship can be simply reduced to a set of binary relationships. This is in principle true, and this sort of decomposition can be also useful for, e.g. dealing with very practical problems like storing efficiently n-ary relationships into standard databases. However, binary and n-ary relationships are conceptually irreconcilable, and an n-ary relation cannot be reduced to the simple addition of binary elements without losing its deep meaning-it is impossible to reason about, e.g. the possible reasons and consequences, the context, etc. of a purchase without considering the purchase event in its whole conceptual entirety, i.e. by taking all its arguments simultaneously into account.

To sum up what expounded in this section, we can say that:

  • The first effective solutions employed for the digitisation of CH entities go back to the sixties; they were implemented, under annotations form, making use of keywords/metadata extracted from existing thesauri as AAT or ICONCLASS, and concerned mainly physical (see above) aspects of these entities.

  • Original data models expressly created for the CH needs like the first version of the Dublin Core or VRA have appeared in the eighties/nineties. They were quite unsophisticated from a Knowledge Representation point of view, and mainly suitable for implementing applications in a documentary/information retrieval style.

  • The specification/implementation of CIDOC CRM can be considered as the culmination point of this first phase. An official ISO standard from 2006, still largely (and successfully) in use, CIDOC CRM is based on a real ontology including concepts and relationships. This tool however-as, by the way, the Semantic Web tools and the overwhelming majority of the tools used in a CH context—support only binary relations between classes/concepts, which is a significant limitation from an expressiveness point of view.

  • Since 2000, Dublin Core, VRA, CIDOC CRM, etc. have progressively migrated towards Semantic Web (RDF) compatible versions. This is an indubitable progress from an interoperability and standardization point of view. Unfortunately, this does not solve the expressiveness problems of the currently used tools, which encounter then significant difficulties to express, without a significant loss of meaning, difficult situations implying complex relationships among the involved entities, see the examples above in this section.

  • The digital CH twin approach can represent a solution to the above problems given that, in conformity with the general digital twin philosophy, its proper digital twin component is devoted to take into account all the semantic/conceptual/immaterial aspects of given CH entities independently for their complexity’s degree. This implies, however, the use of really advanced, n-ary based Knowledge Representation tools in the NKRL style—while, e.g. a recent suggestion for the use of the digital twin concept in the CH domain, see [24], proposes simply to make use, as usual, of CIDOC CRM with some additional binary features.

3 Preliminaries on NKRL in the cultural heritage domain

From an ontological point of view, the most interesting characteristic of NKRL concerns the addition of an ontology of elementary events to the usual ontology of concepts. This last is called HClass (hierarchy of classes) in an NKRL context and allows us to represent the strictly lexical aspects of the specific information to be dealt with. HClass presents some interesting, original aspects from an ontological point of view with respect, e.g. to the representation of difficult notions like substances and colors [10]:132-137. However, HClass’ architecture is relatively traditional, and its concepts are represented, to a large extent, according to the usual Semantic Web binary model.

A pure binary-based approach faces, however, major difficulties when the entities to be represented are not simple notions/concepts that can be prior defined and inserted then in a tree-shaped static ontology, but denote instead dynamic situations. These last are characterized, in fact, by the presence of complex spatio-temporal information and of mutual relationships among their constituent elements (including, e.g. intentions and behaviours). In specifying then the ontology of elementary events of NKRL, an augmented n-ary approach has been chosen. In this approach, n-ary means to make use of a formal representation where a given predicate can be associated with multiple arguments-representing the n-ary purchase relation implies, e.g. associating with a purchase-like predicate several arguments as a seller, a buyer, a good, a price, a timestamp, etc.Footnote 2Augmented means that-see in this context the seminal “What’s in a Link” paper by William Woods [25]-within n-ary representations, the logico-semantics links between the predicate and its arguments are explicitly represented making use of the notion of functional role [26]. The NKRL representation of a simple narrative like “Peter gives a book to Mary” will include, then, the indication that Peter plays the role of subject/agent of the action of giving, book is the object of this action and Mary the beneficiary.

In the NKRL ontology of elementary events, accordingly, the nodes are represented by augmented n-ary knowledge patterns called templates—this ontology is known then as HTemp, the hierarchy of templates. Templates denote formally general classes of elementary states/situations/actions/episodes, etc. (designated collectively, for simplicity, as elementary events). Examples can be to be present in a place, experience a given situation, have a specific attitude, send/receive messages, etc. Templates’ instances -called predicative occurrences-describe formally the meaning of specific elementary events pertaining to one of these classes. The general representation of a template is given by Eq. 1:

$$\begin{aligned} (L_i (P_j (R_1 a_1) (R_2 a_2)...(R_n a_n))), \end{aligned}$$
(1)

where \(L_i\) is the symbolic label identifying a given template, \(P_j\) is a conceptual predicate, and \(R_k\) is a generic functional role, used to specify the logico-semantic function carried out by its filler \(a_k\), a standard predicate argument, with respect to predicate \(P_j\).

We can note that, to avoid the ambiguities of natural language and any possible combinatorial explosion problem, see Zarri [10]:56–61, both the conceptual predicate of Eq. 1 and the associated functional roles are represented as primitives. Predicates \(P_j\) belong then to the closed set {BEHAVE, EXIST, EXPERIENCE, MOVE, OWN, PRODUCE, RECEIVE}, and the roles \(R_k\) to the set {SUBJ(ect), OBJ(ect), SOURCE, BEN(e)F(iciary), MODAL(ity), TOPIC, CONTEXT}Footnote 3. As a consequence of the use of seven conceptual predicates, the HTemp hierarchy is structured into seven branches, where each of them includes only the templates created-see Eq. 1-around one of the seven predicates \(P_j\). HTemp includes presently (December 2022) more than 150 templates, very easy to specialize and customize.

Table 2 shows the NKRL template Produce:Entity corresponding to the general syntax of Eq. 1.

Table 2 The Produce:Entity template

When needed, determiners (attributes) can be added to templates and predicative occurrences to introduce further details about the basic core, symbolic label-conceptual predicate-functional roles -arguments of the predicate, see Eq. 1, of their formal representation. In the template of Table 2, e.g. the variables var2, var5 and var7 denote, e.g. determiners/attributes of the location type represented, in the corresponding predicative occurrences, by specific terms of the HClass concept location_ or by individuals derived from these terms. Modulators represent an important category of determiners/attributes that-unlike the location determiners that can be associated only to the fillers of the SUBJ, OBJ, SOURCE and BENF functional roles-apply to a full, well-formed template or predicative occurrence to particularize its meaning. They are classed into three categories, temporal (begin, end, obs(erve)), deontic (oblig(ation), fac(ulty), interd(iction), perm(ission)) and modal (abs(olute), against, for, main, ment(al), poss(ible), wish, etc.), see [5]:71–75.

It can be interesting to remark how the NKRL representation of a full (visual or textual) narrative can always be represented under tree-shaped format-as a knowledge graph, then, to make use of a now very popular terminology. This possibility represents, among other things, a simple way to immediately check the logical coherence of a generic digitized narrative.

3.1 Second order conceptual structures

In the context of accurate, complete and digitally exploitable representations of any sort (pictorial, textual etc.) of complex and structured narrative events, it is evident the need for efficient tools able to collect and join together, within a unified and coherent framework, all their possible basic, formalized fragments that could be equated, in some way, to NKRL’s predicative occurrences. NKRL is endowed with several tools capable to satisfy this need; this represents a further proof of the advantage of NKRL’s approach with respect to other proposals committed to the formalization of narrative-like information.

A first, elementary way to associate together predicative occurrences in an NKRL context concerns the possibility of making use of a sort of co-reference mechanism allowing us to logically associate two or more predicative occurrences where the same individual(s) appear(s).

The most general and interesting way of logically associating single predicative occurrences is, however, to make use of second order structures created through the reification of the single occurrences. These structures reflect, at the digital formal level, surface linguistic connectivity phenomena like causality, goal or indirect speech. Reification is intended here, as usual in a computer science context, as the possibility of creating new objects out of already existing entities and to say something about them without making reference to the original entities. In NKRL, reification is implemented using the symbolic labels (the \(L_i\) terms of Eq. 1) of the predicative occurrences according to two different conceptual mechanisms.

A first solution concerns the possibility of referring to an elementary or complex event as an argument of another (elementary) event-a complex event corresponds to a coherent set of elementary events. The (surface) connectivity phenomenon involved here is the indirect speech. An informal example can be that of an elementary event X describing someone who speaks about Y, where Y is itself an elementary/complex event. In NKRL, this mechanism is called completive construction, see [10]:87–91.

The second mechanism allows us to associate together, through several types of connectivity operators, elementary/complex events that, unlike the previous case, can still be regarded as fully independent entities. This mechanism-binding occurrences, see [10]:91–98-is realized under the form of lists formed of a binding operator \(Bn_i\) and its \(L_i\) arguments, see Eq. 2:

$$\begin{aligned} (Lb_k (Bn_i L_1 L_2 ... L_n)). \end{aligned}$$
(2)

\(Lb_k\) is the symbolic label identifying the global binding structure: unlike templates and predicative occurrences, binding occurrences are characterized by the absence of any predicate and functional role. The \(Bn_j\) operators are: ALTERN(ative), COORD(ination), ENUM(eration), CAUSE, REFER(ence), the weak causality operator, GOAL, MOTIV(ation), the weak intentionality operator, COND(ition). Their precise logico-semantic definitions can be found in [10]:92. The binding occurrences \(bc_i\) must necessarily conform to a set of mandatory restrictions, like the following:

  • Each term (argument) \(L_j\) that, in a binding list, is associated with one of the above operators denotes exactly a single predicative or binding occurrence \(c_j\) that is defined externally to the list. Therefore, the arguments \(L_j\) are always single terms and cannot consist of lists of symbolic labels.

  • Within binding occurrence of the ALTERN, COORD and ENUM type, no restrictions are imposed on the cardinality of the list, i.e. on the possible number of arguments \(L_j\).

  • In the binding occurrences labeled with CAUSE, REFER, GOAL, MOTIV and COND, only two arguments \(L_m\) and \(L_n\) are admitted. The occurrences labeled with these five operators are then simply of the form, (\(Lb_k\) (\(Bn_i\) \(L_m\) \(L_n\))). In these lists, the arguments \(L_m\) and \(L_n\) can denote, in general, either a predicative or a binding occurrence: an exception are the COND(ition) binding occurrences, where the first argument, \(L_m\), must necessarily correspond to a predicative occurrence, see [10]:93–95.

3.2 Inference process

Querying/reasoning are particularly important activities in an NKRL context; even if they do not represent a crucial topic in the context of this specific paper, we think they deserve at least a quick mention to get a more complete picture of this language. The interested reader can refer, e.g. to [10]:183–243 for additional information.

These activities range from the direct questioning of a knowledge base of NKRL formal structures making use of search patterns \(p_i\) -basically, formal queries obtained as partial instantiating of standard templates-to the execution of high-level inference procedures.

These last include, e.g. the transformation rules. They try to adapt, from a semantic point of view, a search pattern \(p_i\) that failed (i.e. that was unable to find a unification within the knowledge base) to the real contents of this base making use of a sort of analogical reasoning. Formally, this means that the transformation rule will try to automatically convert \(p_i\) into one or more different patterns \(p_1, p_2... p_n\) that will be not strictly equivalent but only semantically close to the original one. As an informal example-derived from the concrete use of transformation procedures in the context of the NKRL representation of the central scene of the Velasquez’s Surrender of Breda masterpiece, see [13]-we can mention the following common-sense rule, “If a given person stops a submissiveness expression towards herself/himself by another person, this could imply a positive attitude of the first person against the second”. This rule allows us, e.g. given the failure of a search pattern \(p_i\) trying to find in the NKRL knowledge base corresponding to the Surrender of Breda direct evidence of a positive attitude of the winning Spanish general with respect to the defeated Dutch governor of the city, an indirect answer can be supplied to the user by retrieving the information that, in the context of the key-to-the-city handover ceremony, the winner has neatly stopped the loser’s attempt to genuflect in front of him.

Another important category of high-level procedures is represented by the hypothesis rules. These allow us to build up automatically a sort of causal explanation for an elementary event (a predicative occurrence) retrieved by direct query within a NKRL knowledge base. By following, in fact, a sequence of partially pre-defined reasoning steps, they can generate search patterns able to retrieve information supporting the explanation hypothesis. These rules are expressed, in fact, as biconditionals of the type: X iff \(Y_1\) and \(Y_2...\) and \(Y_n\), where the head X of the rule corresponds to a predicative occurrence \(c_j\) to be explained and the reasoning steps \(Y_i\)-called condition schemata in a hypothesis context-must all be satisfied. This means that, for each of them, at least one successful search pattern \(p_i\) must be (automatically) derived by the hypothesis InferenceEngine in order to find a successful unification with some information of the base. In this case, the set of \(c_1\), \(c_2\)... \(c_n\) predicative occurrences retrieved by the condition schemata \(Y_i\) thanks to their conversion into \(p_i\) can be interpreted as a context/causal explanation of the original occurrence \(c_j\)(X).

We can also note that an interesting feature of the NKRL rule system concerns the possibility of making use of transformations when working in a hypothesis context-i.e. of utilizing these two modalities of inference in an integrated way, see [10]:216–231 for further details. This means in practice that, whenever a search pattern \(p_j\) is derived from a condition schema \(Y_i\) of a hypothesis to implement a step of the reasoning process, we can use this pattern as it has been automatically built up by the hypothesis InferenceEngine from its father condition schema, but also in a transformed form if the appropriate transformation rules exist. In this way, a hypothesis that was deemed to fail because of the impossibility of deriving a successful \(p_j\) from one of its condition schemata \(Y_i\) can now continue if a new \(p_j\), obtained using a transformation rule, will find a successful unification within the knowledge base, getting then new values for the hypothesis variables. Moreover, this strategy can also be used to discover all the possible implicit relationships among the stored data.

Additional formal details about NKRL in general can be found in the more than 300 pages of the book mentioned in [10]. Moreover, several detailed papers deal formally with specific NKRL’s aspects. For example [30] describes the formalism used to specify/implement the different (backward chaining-oriented) NKRL inference engines and [31] explains in some detail the formal system of rules and symbolisms adopted for representing and reasoning about the temporal information. See also two recent PhD thesis about specific formal aspects of NKRL: [32] deals with the use of NKRL for modelling causal relationships among complex events, and [33] formalises in-depth the logics of the NKRL’s structured arguments (expansions) described above in the paper.

3.3 Summary of the proposed model

In this section, we provide a summary of the proposed method (for a schematic representation, see the flowchart of Fig. 1).

The process starts with the definition of the NKRL ontology of elementary events, whose nodes are represented by the templates denoting formally general classes of elementary states/situations/actions/episodes etc. Examples can be: to be present in a place, experience a given situation, have a specific attitude, send/receive messages, etc. A template is characterized by an identification label \(L_i\), a conceptual predicate \(P_j\) (belonging to the closed set {BEHAVE, EXIST, EXPERIENCE, MOVE, OWN, PRODUCE, RECEIVE}), a set of functional roles \(R_1...R_n\) (belonging to the set {SUBJ(ect), OBJ(ect), SOURCE, BEN(e)F(iciary), MODAL(ity), TOPIC, CONTEXT}), and a set of standard predicate arguments \(a_1...a_n\) (location and modulators represent important categories of determiners/attributes).

Fig. 1
figure 1

Flowchart of the proposed model

Templates are instantiated as predicative occurrences to formally describe the meaning of specific elementary events pertaining to one of the template classes.

Predicative occurrences can be logically associated together using the co-reference mechanism and reification. The former allows to logically associate two or more predicative occurrences where the same individual(s) appear(s). The latter is implemented according to two different conceptual mechanisms: (i) completive construction, regarding the possibility of referring to an elementary or complex event as an argument of another (elementary) event (e.g. an elementary event X describing someone who speaks about Y, where Y is itself an elementary/complex event), and (ii) binding occurrences, allowing to associate together, through several types of connectivity operators, elementary/complex events as fully independent entities.

The set of predicative occurrences defines the NKRL knowledge base, which is represented as a knowledge graph. It can be questioned using search patterns \(p_i\) or hypothesis rules. The latter strategy allows to generate causal explanations (corresponding to a new set of \(c_1, c_2,...,c_n\) predicative occurrences) for a predicative occurrence \(c_j\) retrieved by direct querying within the knowledge base, making use of rules X iff \(Y_1\) and \(Y_2...\) and \(Y_n\). The head X of the rule corresponds to the predicative occurrence \(c_j\) to be explained and the reasoning steps \(Y_i\) must all be satisfied.

4 Data collection

We try to overcome the limitations of the previously introduced flat formalizations of Mona Lisa painting [20] by adopting the proposed NKRL method for representing some immaterial/conceptual aspects of Mona Lisa image (see Fig. 2). In particular, we refer to some of the most relevant intangible items of the visual narrative underlying the hidden painting that lies beneath the Mona Lisa image painted by Leonardo Da Vinci on the same poplar panel.

Fig. 2
figure 2

Original portrait of Leonardo Da Vinci’s Mona Lisa [12]

More specifically, a problem that has troubled Mona Lisa’ specialists for a long-time concerns the identification of the woman represented in the hidden painting, i.e. the portrait, visible to us only in X-rays, indubitably painted by Leonardo Da Vinci and that lies beneath Mona Lisa on the same poplar panel [11] (see Fig. 3). According, e.g. to Lillian Feldmann Schwartz [12], this woman has nothing to do with Mona Lisa and represents probably Isabella d’Aragona, wife of the Duke of Milan Gian Galeazzo Sforza-end of 15th century, Leonardo had worked for the Sforza family.

Fig. 3
figure 3

X-rays portrait that lies beneath Mona Lisa on the same poplar panel [12]

From all aforementioned, the aim is to construct a NKRL knowledge graph related to the Schwartz’s hypothesis, then to evaluate it according to syntactic and semantic accuracy of the graph [34]. When entities and relations, represented by nodes and edges, accurately capture real-world phenomena, the graph is said to be accurate. In particular, syntactic accuracy is the degree to which the data conform to the grammatical rules established for the domain and/or data model. Also, semantic accuracy is the degree to which data values accurately describe real-world phenomena.

To pursue this objective, semantically relevant text sentences on the main aspects of the Schwartz’s hypothesis were manually extracted from [12] as input data for creating the knowledge graph. They are related, essentially, on the lack of correspondence between the facial characteristics of the hidden painting woman and Mona Lisa. More exactly, eyes, mouth, nose tips, hairlines and chins do not match between Mona Lisa and the unknown woman. A correspondence exists, on the contrary, between this last woman and the woman represented in Leonardo’s cartoon, a (very retouched) preparatory study for a portrait of Isabella d’Aragona (see Fig. 4). Specifically, the eyes of the hidden painting woman were, in L.F. Schwartz’s words, “lined up with the eyes on the cartoon, the hairlines fell into place and overlapped exactly”.

Fig. 4
figure 4

Leonardo’s original cartoon [12]

In the rest of the paper, a syntactic and semantic performance evaluation of the knowledge graph is described (see, respectively, Sect. 5 and 6).

5 Results

Table 3 reproduce the NKRL narrative, gio3, which formalizes the hidden painting issue. Also, Fig. 5 shows the representation under tree format of the narrative of Table 3.

Table 3 NKRL representation of the hidden painting issue
Fig. 5
figure 5

Tree-shaped representation of Table 3 formalism

A simple example-a predicative occurrence identified by the symbolic label (\(L_i\)) gio3.c16-allows to clarify what the use of Eq. 1 above implies from a concrete point of view. When the NKRL template Produce:Entity in Table 2 is instantiated to provide the representation of the elementary event, “In the period 1490-1495, Leonardo Da Vinci realized, in the form of a cartoon drawing, the portrait of Isabella d’Aragona”, we can see from gio3.c16 that the predicate \(P_j\) of Eq. 1 (PRODUCE in this case) introduces three arguments \(a_k\), namely LEONARDO_DA_VINCI, CARTOON_DRAWING_1 and (SPECIF portrait_ISABELLA_D_ARAGONA) via, respectively, the functional roles (\(R_k\)) SUBJ(ect), OBJ(ect) and TOPIC. As it appears clearly from Table 2, in the actual representation of all the NKRL’s templates, the \(a_k\) terms of Eq. 1 are implemented under the form of variables \(var_i\) associated with constraints expressed by HClass terms or combinations of these terms; these constraints must be satisfied when the occurrences are created.

In our example, the individual LEONARDO_DA_VINCI is an instance (through intermediate HClass concepts like individual_person) of the high-level concept human_being_or_social_body. In NKRL, individuals are denoted using upper-case characters, and concepts are in lower-case. It satisfies then the constraint imposed on var1 of the Produce:Entity template-and, as a consequence, on all the possible fillers of the SUBJ(ect) roles in all the predicative occurrences derived from this template. Similarly, CARTOON_DRAWING_1 is an instance of cartoon_drawing, a specific HClass term included in the branch of the HClass ontology having at the top the (high-level) concept artefact_ and integrating intermediate items like drawing_ and artistic_artefact; it satisfies then the constraint on var3, the OBJ(ect) filler of the template. With respect now to the TOPIC filler of gio3.c16, (SPECIF portrait_ISABELLA_D_ARAGONA), we can note first that the main element of this structured argument or expansion, i.e. portrait_, is a HClass concept instead of an individual as in the previous two cases. This is in conformity with an important NKRL principle that requires to limit as much as possible any unnecessary proliferation of individuals in the context of concrete NKRL applications because of the difficulties of creating and managing in a coherent way large amounts of this sort of entities; see [26] in this context. Anyway, portrait_ is a specific term of image_, a concept pertaining to the information_content subtree of HClass, and the constraint on var9 of Produce:Entity is then satisfied.

Returning now to the structured arguments/expansions and their syntax, we can note that these particular kinds of \(a_k\) arguments take the form of recursive lists introduced by the four AECS operators, the alternative operator ALTERN(ative) = A, the distributive operator ENUM(eration) = E, the collective operator COORD(ination) = C and the attributive operator SPECIF(ication) = S. This last operator is widely used in the predicative occurrences of Table 3; in the example of gio3.c16, it is used to exactly specify that the portray we are speaking of is the Isabella d’Aragona’s portrait. An example of the use of the operator COORD(ination) is given in the gio3.c7 predicative occurrence: in this case, the TOPIC’s filler is represented by three SPECIF lists where the first tells us that PAINTING_2 (the hidden painting) is painted on the same poplar plank utilized for the Mona Lisa’s portrait and, more precisely, that is located under this portrait, the second that this specific painting is named the hidden painting, and the third that PAINTING_2 is visible only under X-ray. This particular expansion allows us to introduce also the so-called priority rule, see [5]:68-70, which supervises the interweaving of the AECS operators within a structured argument by forbidding, e.g. to use a list of the COORD type within the scope of a list SPECIF. The inverse is obviously admitted, as the structured argument of gio3.c7 demonstrates.

Once a given template has been instantiated in order to obtain a valid n-ary predicative occurrence, this last is reified by means of a symbolic name like gio3.c16, corresponding then to \(L_i\) of Eq. 1 and allowing the inclusion of this occurrence within wider conceptual structures. With respect to templates and their instantiation procedures, we can also add that, in Table 2, the elements in square brackets are facultative: this means that, for example, in the predicative occurrences derived from the Produce:Entity template, only the SUBJ and OBJ roles and their associated elements are obligatorily present.

Several examples of use of the different categories of modulators appear in Table 3. In particular, the temporal modulator obs(erve) is frequently used to denote that, at the specific date associated with the attribute date-1, the elementary event corresponding to the predicative occurrence we are dealing with (e.g. gio3.c7) is in progress, without making any assumptions about the existence of this event before and after the given date. The modal negv (negated event) modulator used in gio3.c10 is particularly important, given that it is used to point out that the elementary event corresponding to the particular predicative occurrence where negv can be found did not take place. In the gio3.c7 example, UNKNOWN_WOMAN_1 has not been recognized as Mona Lisa.

In Table 3, we can also observe a last category of attribute/determiners, concerning the two operators date-1, date-2. They are necessarily associated with any well-formed NKRL predicative occurrence and are used in general to materialize the temporal interval normally associated with the elementary event corresponding to a particular occurrence. Linking a specific date (full date, uncertain data, reconstructed date, etc.) associated with date-1 with one of the three temporal modulators introduced above allows us to denote the beginning of a specific event (modulator begin), the termination of an event (end) or, as already stated, the fact that a specific event is in progress at the date-1 date. A detailed description of the formal system used in NKRL for the representation and management of temporal information can be found, e.g. in [31].

5.1 Results on second-order conceptual structures

In Table 3, we have utilized the individual UNKNOWN_WOMAN_1 to denote the unknown personage whose portrait lies beneath Mona Lisa’s portrait. The consistent utilization of the co-reference mechanism allows us, then, to associate together seven predicative occurrences of Table 3 - gio3.c6, gio3.c10, gio3.c12 etc.—and to create then, at lower costs, a (possibly interesting) thematic cluster. With respect to the remark about the practical dangers associated with an unjustified proliferation of individuals, we can note that the creation of the individual UNKNOWN_WOMAN_1 is here absolutely necessary and then totally justified. The introduction of a generic unknown_woman concept instead of the individual within the seven predicative occurrences would have made, in fact, the management problems of the HClass terms even worse by forcing the developers to create useless and expansive inference procedures to verify, when finding unknown_woman in one of the seven occurrences, that this woman corresponds really to those mentioned in the other ones.

Completing construction is illustrated, e.g. by the association of the occurrences gio3.c3/gio3.c4 in Table 3, where gio3.c4 -a binding occurrence—is introduced by gio3.c3 as filler of its OBJ(ect) role; the # prefix in gio3.c3 indicates that its associated term is not a HClass item but an occurrence label. Some formal restrictions must be respected. For example, only the OBJ, MODAL, TOPIC and CONTEXT functional roles of a predicative occurrence \(pc_i\) can accept as filler the symbolic label \(L_j\) of a (generic) \(c_j\) occurrence: this last can then be, as in the case of the gio3.c3/gio3.c4 association in Table 3, one of those binding occurrences we will introduce below.

We can observe that most of the COORD type, see gio3.c1, gio3.c2, gio3.c4, gio3.c11, etc. gio3.c5, i.e. (CAUSE gio3.c10 gio3.c11), is a binding occurrence of the causal type. It necessarily includes only two arguments, the general meaning being that the event corresponding to its first argument gio3.c10, recognizing that UNKNOWN_WOMAN_1 is not Mona Lisa, is caused by all the incoherence described by the occurrences included in another binding occurrence gio3.c11. In a context of formal representation of causal situations, it may be worth emphasizing the difference between the conceptual meaning associated with the two binding operators CAUSE and REFER(ence)—the syntax of the corresponding binding occurrences being identical, i.e. (\(Lb_k\) (\(Bn_i\) \(L_m\) \(L_n\))), with \(Bn_i\) corresponding to CAUSE or REFER. In the CAUSE case, \(L_n\) is both necessary and sufficient to explain \(L_m\) while, in the REFER case, \(L_n\) is necessary but not sufficient to explain \(L_m\). This means that, in the interpretation of the Mona Lisa’s expert who has inspired the Table 3 coding, and making reference to the REFER occurrence of Table 3, i.e. gio3.c9: (REFER gio3.c14 gio3.c15), all the explications put forward by Lillian Feldmann Schwartz to support her Isabella d’Aragona hypothesis are necessary but not sufficient to definitely validate this hypothesis.

6 Discussion

The importance of this work mainly resides in overcoming the flat formalizations of the semantic content of Mona Lisa painting previously introduced in the literature (e.g. see [20] adopting the RDF-oriented EDM) with the digital CH twin based NKRL approach, devoted to take into account all the semantic/conceptual/immaterial aspects of given CH entities independently for their complexity’s degree.

Also, we observe that NKRL is general and efficient with respect to the representation and management of any possible form of narrative (both in written and pictorial form), both from a specific knowledge representation and an inferential point of view. In fact, simple description under metadata form or knowledge representation tools are unsuitable for representing complex situations/events. It reveals the potential of NKRL to explore the immaterial components (including its emotional factors) associated to any possible CH entities in a more expressive, complete and meaningful form. This is mainly due to the use of a rich n-ary approach in the NKRL style to the creation of highly expressive inference rules. In their NKRL version, the building blocks (i.e. the atoms) of these rules can directly represent complex situations and events, actions, scenarios, narratives etc. They are not limited, then, to the use of the (inexpressive) standard clauses created making use only of (unary) and dyadic (binary) predicates. Modelling the rule atoms according to a simple binary format implies in fact, among other things, the impossibility of stating even relatively simple implications of the if-then type making use of a reduced number of clauses. This leads in practice to the necessity of the (systematic) decomposition of the original formulation of the rules into (possibly very large) amounts of simple binary clauses, which implies in turn serious problems both from a logical (and operational) point of view. It is evident, in fact, that techniques of this type are relatively simple to use when the rules to be built up are quite simple, but they can be out of place in more realistic, complex and dynamic situations like those typically dealt with in an NKRL context (see the Sect. 5 example). In this sort of cases in fact, very often, the decomposition techniques would be simply unable to deal with the specific situations at hand using a limited number of binary clauses.

As for the obtained results, we have already seen in Sect. 5 that NKRL is able to represent the hidden painting issue through the gio3 narrative with the different predicative occurrences. In particular, the formal representation of the hidden painting topic has been structured in three parts. gio3.c2 informs us that a hidden painting exists, that it has been realized by Leonardo, and lists some of its characteristics. gio3.c3 and the associated #gio3.c4 occurrence (completive construction) explains why L. F. Schwartz rejects the identification of the hidden woman with La Gioconda and suggests, instead, that the hidden portrait could represent Isabella d’Aragona. The last predicative occurrence, gio3.c5, points out that several art historians agree with Ms. F. Schwartz about her Isabella d’Aragona hypothesis. More specifically, the first component of the representation gio3.c2 consists of two predicative occurrences, that are gio3.c6 and gio3.c7. The former explains that a painting (conventionally: PAINTING_2) concerning the portrait of a woman (conventionally: UNKNOWN_WOMAN_1) has been produced by Leonardo Da Vinci within the temporal interval 1497-1503. The latter informs us that i) PAINTING_2 has been painted on the same poplar plank of PAINTING_1 (conventionally, Mona Lisa’s portrait), ii) it is located under PAINTING_1, iii) it is known as the HIDDEN_PAINTING, and iv) it is visible only under X-ray analysis. Also, gio3.c3 explains that in January 1988, L. F. Schwartz has circulated the information described in gio3.c4 by means of a scientific paper published in 1988 on The Visual Computer Journal, while gio3.c4 informs us that the information spread by L. F. Schwartz via her paper is formed of two parts, that are gio3.c8 and gio3.c9. The first part of the disseminated information (see gio3.c8) consists in stating that what is described in gio3.c10 is originated by the events collected in gio3.c11. The elementary event represented by occurrence gio3.c10 is a negated event (modulator negv), i.e. UNKNOWN_WOMAN_1 is not MONA_LISA. The reasons for failing to identify UNKNOWN_WOMAN_1 with MONA_LISA are partially collected in gio3.c11, while gio3.c12 explains that a dissimilarity exists between the eyes of UNKNOWN_WOMAN_1 and those of MONA_LISA. A dissimilarity exists also between the mouths of UNKNOWN_WOMAN_1 and MONA_LISA (see gio3.c13).

The second component of L.F. Schwartz’s message is represented by a REFER(ence)-weak causality-binding occurrence (see gio3.c9).

Also, gio3.c14 informs us that it is possible (modal modulator poss(ible)) that the unknown woman corresponds to Isabella d’Aragona, while gio3.c15 explains that the reasons for this (possible) identification are given in the three predicative occurrences listed in gio3.c16. Still, gio3.c16, gio3.c17 and gio3.c18 explain, respectively, that (i) in the period 1490-1495, Leonardo Da Vinci realized, in the form of a cartoon drawing, the portrait of Isabella d’Aragona, (ii) there is a correspondence between the eyes of Isabella d’Aragona and those of the unknown woman, and (iii) there is a correspondence between the hairline of UNKNOWN_WOMAN_1 and the hairline of WOMAN_54. Finally, gio3.c5 explains that several art historians agree with L. F. Schwartz about her Isabella d’Aragona hypothesis.

A similar approach can be interestingly adopted in multiple real-life contexts related to cultural heritage, such as virtual museums and art galleries, where the iconographic narratives can be used to represent facts, feelings and emotions aroused by the CH item. The ability of NKRL to formalize (even complex) immaterial aspects of the CH item and the possibility to query the knowledge graph, can open innovative ways to interact with the virtual object. We can think about the possibility to retrieve text narrations for the virtual items describing complex emotions and feelings, querying the knowledge base for a deeper understanding of the item characteristics. Other important applications of the NKRL knowledge graph could be in the context of CH semantic search platforms and virtual assistants to assist visitors by responding to their inquiries while they are at art events and exhibitions. All these aspects irremediably shed light on new promising research directions involving the usage of knowledge graphs for metaverse-based systems, expert systems, virtual assistance, information retrieval and semantic search platforms.

7 Conclusion

This paper introduced an advanced digital methodology for representing all the relevant physical and symbolic/conceptual characteristics of any kind of CH items, and for allowing a generalized access to make possible their global fruition. Following the research direction of the so-called digital CH twin approach, NKRL has been employed for representing and formalizing, in particular, the immaterial aspects and characteristics of this approach, thanks to the well-known capacity for dealing with important immaterial/symbolic aspects of many different realities. In order to prove the effectiveness of the proposed methodology, NKRL has been used to represent in digital form a particularly important component—both numerically and culturally—of the whole CH domain, representing complex expressive CH entities, with particular reference to those iconographic narratives entities corresponding to stories told in visual form that are conveyed by works of art like paintings, drawings, frescoes, mosaics, sculptures, murals etc. In particular, an experiment has been performed in the context of a project regarding the digital modelling of the Mona Lisa (La Gioconda) painting by Leonardo Da Vinci. In this experiment, following the digital CH twin approach, the main characteristics of the hidden painting that lies beneath Mona Lisa on the same poplar panel, were represented and modelled in NKRL format.

The possibility to create the NKRL knowledge graph to formalize the semantic/conceptual/immaterial aspects of given CH entities regardless from their complexity’s degree can have a huge impact in different fields, i.e. virtual and augmented reality, information retrieval and semantic search platforms, expert systems and virtual assistance. The knowledge graph can be questioned in real time, paving the way for creation of CH virtual objects, virtual assistants and expert systems which can interact with the visitors at art events and exhibitions, and for the definition of semantic search platforms related to CH.

The theories and methods put out in this work present some relevant benefits which undoubtedly mark the beginning of our investigation. However, the proposed work still has some limitations that serve as a springboard for additional research in this area. The first limit is that semantically relevant sentences are manually extracted from the Schwartz’s text for creating the NKRL knowledge graph. Another limit concerns the only formalization of the Schwartz’s hypothesis in our work. It results in a limited number of considered details of Mona Lisa, that only partially describe the hidden features of the painting.

As a future work direction, we aim to overcome the current limitation of manually selecting relevant text sentences for the construction of the knowledge graph. It might be accomplished by introducing a natural language processing module for automatically extracting the most important keywords or phrases, eventually performing annotation tasks. Also, we aim to consider more hypotheses about Mona Lisa with further details about the hidden features of the painting. In fact, we currently considered a single hypothesis because it keeps easier the task of modelling the knowledge graph. However, it causes us to lose information about the hidden characteristics of the painting. We hope to extend our work in the future to get around this restriction.