Most Common Words – A cP Systems Solution

Nicolescu, Radu

doi:10.1007/978-3-319-73359-3_14

Most Common Words – A cP Systems Solution

Radu Nicolescu¹⁷

Conference paper
First Online: 31 December 2017

314 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10725))

Abstract

Finding the most common words in a text file is a famous “programming pearl”, originally posed by Jon Bentley (1984). Several interesting solutions have been proposed by Knuth (an exquisite model of literate programming, 1986), McIlroy (an engineering example of combining a timeless set of tools, 1986), Hanson (an alternate efficient solution, 1987). Here we propose a concise efficient solution based on the fast parallel and associative capabilities of cP systems. We also check their parallel sorting capabilities and propose a dynamic version of the classical pigeonhole algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bentley, J., Knuth, D., McIlroy, D.: Programming pearls: a literate program. Commun. ACM 29(6), 471–483 (1986). http://doi.acm.org/10.1145/5948.315654
Article Google Scholar
Knuth, D.E.: Literate programming. Comput. J. 27(2), 97–111 (1984). http://dx.doi.org/10.1093/comjnl/27.2.97
Article MATH Google Scholar
Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann Publishers Inc., San Francisco (1996)
MATH Google Scholar
Nicolescu, R.: Parallel and distributed algorithms in P systems. In: Gheorghe, M., Păun, G., Rozenberg, G., Salomaa, A., Verlan, S. (eds.) CMC 2011. LNCS, vol. 7184, pp. 35–50. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28024-5_4
Chapter Google Scholar
Nicolescu, R.: Parallel thinning with complex objects and actors. In: Gheorghe, M., Rozenberg, G., Salomaa, A., Sosík, P., Zandron, C. (eds.) CMC 2014. LNCS, vol. 8961, pp. 330–354. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14370-5_21
Google Scholar
Nicolescu, R.: Structured grid algorithms modelled with complex objects. In: Rozenberg, G., Salomaa, A., Sempere, J.M., Zandron, C. (eds.) CMC 2015. LNCS, vol. 9504, pp. 321–337. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-28475-0_22
Chapter Google Scholar
Nicolescu, R.: Revising the membrane computing model for byzantine agreement. In: Leporati, A., Rozenberg, G., Salomaa, A., Zandron, C. (eds.) CMC 2016. LNCS, vol. 10105, pp. 317–339. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54072-6_20
Chapter Google Scholar
Nicolescu, R., Ipate, F., Wu, H.: Programming P systems with complex objects. In: Alhazov, A., Cojocaru, S., Gheorghe, M., Rogozhin, Y., Rozenberg, G., Salomaa, A. (eds.) CMC 2013. LNCS, vol. 8340, pp. 280–300. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54239-8_20
Chapter Google Scholar
Nicolescu, R., Wu, H.: Complex objects for complex applications. Rom. J. Inf. Sci. Technol. 17(1), 46–62 (2014)
Google Scholar
Păun, G., Rozenberg, G., Salomaa, A. (eds.): The Oxford Handbook of Membrane Computing. Oxford University Press Inc., New York (2010)
MATH Google Scholar
Tel, G.: Introduction to Distributed Algorithms. Cambridge University Press, Cambridge (2000)
Book MATH Google Scholar
Van Wyk, C.J.: Literate programming. Commun. ACM 30(7), 583–599 (1987). http://doi.acm.org/10.1145/28569.315738
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Auckland, Private Bag, 92019, Auckland, New Zealand
Radu Nicolescu

Authors

Radu Nicolescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Radu Nicolescu .

Editor information

Editors and Affiliations

University of Bradford, Bradford, United Kingdom
Marian Gheorghe
Leiden University, Leiden, The Netherlands
Grzegorz Rozenberg
Turku Centre for Computer Science, Turku, Finland
Arto Salomaa
University of Milan-Bicocca, Milan, Italy
Claudio Zandron

A Appendix cP Systems: P Systems with Complex Symbols

We present the details of our cP framework, simplified from our earlier papers [5, 6].

1.1 A.1 Complex Symbols as Subcells

Complex symbols or subcells, play the roles of cellular micro-compartments or substructures, such as organelles, vesicles or cytoophidium assemblies (“snakes”), which are embedded in cells or travel between cells, but without having the full processing power of a complete cell. In our proposal, subcells represent nested labelled data compartments which have no own processing power: they are acted upon by the rules of their enclosing cells.

Our basic vocabulary consists of atoms and variables, collectively known as simple symbols. Complex symbols are similar to Prolog-like first-order terms, recursively built from multisets of atoms and variables. Together, complex symbols and simple symbols (atoms, variables) are called symbols and can be defined by the following formal grammar:

Atoms are typically denoted by lower case letters (or, occasionally, digits), such as a, b, c, $\textit{1}$. Variables are typically denoted by uppercase letters, such as X, Y, Z. Functors are term (subcell) labels; here functors can only be atoms, not variables.

For improved readability, we also consider anonymous variables, which are denoted by underscores (“$\_$”). Each underscore occurrence represents a new unnamed variable and indicates that something, in which we are not interested, must fill that slot.

Symbols that do not contain variables are called ground, e.g.:

Ground symbols: a, $a(\lambda )$, a(b), a(bc), $a(b^2 c)$, a(b(c)), $a(bc(\lambda ))$, a(b(c)d(e)), a(b(c)d(e)), $a(b(c)d(e(\lambda )))$, $a(bc^2 d)$.
Symbols which are not ground: X, a(X), a(bX), a(b(X)), a(XY), $a(X^2)$, a(XdY), a(Xc()), a(b(X)d(e)), a(b(c)d(Y)), $a(b(X^2)d(e(Xf^2)))$; also, using anonymous variables: $\_$, $a(b\_)$, $a(X\_)$, $a(b(X)d(e(\_)))$.
This term-like construct which starts with a variable is not a symbol (this grammar defines first-order terms only): X(aY).

Note that we may abbreviate the expression of complex symbols by removing inner $\lambda $’s as explicit references to the empty multiset, e.g. $a(\lambda ) = a()$.

In concrete models, cells may contain ground symbols only (no variables). Rules may however contain any kind of symbols, atoms, variables and terms (whether ground and not).

Unification. All symbols which appear in rules (ground or not) can be (asymmetrically) matched against ground terms, using an ad-hoc version of pattern matching, more precisely, a one-way first-order syntactic unification (one-way, because cells may not contain variables). An atom can only match another copy of itself, but a variable can match any multiset of ground terms (including $\lambda $). This may create a combinatorial non-determinism, when a combination of two or more variables are matched against the same multiset, in which case an arbitrary matching is chosen. For example:

Matching $a(b(X)fY) = a(b(cd(e))f^2g)$ deterministically creates a single set of unifiers: $X, Y = cd(e), fg$.
Matching $a(XY^2) = a(de^2f)$ deterministically creates a single set of unifiers: $X, Y = df, e$.
Matching $a(b(X)c(\textit{1}X)) = a(b(\textit{1}^2)c(\textit{1}^3))$ deterministically creates one single unifier: $X = \textit{1}^2$.
Matching $a(b(X)c(\textit{1}X)) = a(b(\textit{1}^2)c(\textit{1}^2))$ fails.
Matching $a(XY) = a(df)$ non-deterministically creates one of the following four sets of unifiers: $X, Y = \lambda , df$; $X, Y = df, \lambda $; $X, Y = d, f$; $X, Y = f, d$.

1.2 A.2 High-Level or Generic Rules

Typically, our rules use states and are applied top-down, in the so-called weak priority order.

Pattern matching. Rules are matched against cell contents using the above discussed pattern matching, which involves the rule’s left-hand side, promoters and inhibitors. Moreover, the matching is valid only if, after substituting variables by their values, the rule’s right-hand side contains ground terms only (so no free variables are injected in the cell or sent to its neighbours), as illustrated by the following sample scenario:

The cell’s current content includes the ground term:

$n(a \, \phi (b \, \phi (c) \, \psi (d)) \, \psi (e)).$
The following (state-less) rewriting rule is considered:

$n(X \, \phi (Y \, \phi (Y_1) \, \psi (Y_2)) \, \psi (Z)) ~ \rightarrow ~ v(X) \, n(Y \, \phi (Y_2) \, \psi (Y_1)) \, v(Z).$
Our pattern matching determines the following unifiers:

$X = a$, $Y = b$, $Y_1 = c$, $ Y_2 = d$, $Z = e$.
This is a valid matching and, after substitutions, the rule’s right-hand side gives the new content:

$v(a) ~ n(b \, \phi (d) \, \psi (c)) ~ v(e).$

Generic rules format. We consider rules of the following generic format (we call this format generic, because it actually defines templates involving variables):

Where:

current-state and target-state are atoms or terms;
symbols, in-symbols, promoters and inhibitors are symbols;
in-symbols become available after the end of the current step only, as in traditional P systems (we can imagine that these are sent via an ad-hoc fast loopback channel);
subscript $\alpha $ $\in $ $\{\scriptstyle \mathtt {min}\displaystyle $, $\scriptstyle \mathtt {max}\displaystyle \}$, indicates the application mode, as further discussed in the example below;
out-symbols are sent, at the end of the step, to the cell’s structural neighbours. These symbols are enclosed in round parentheses which further indicate their destinations, above abbreviated as $\delta $. The most usual scenarios include:
- $(a)\downarrow _i$ indicates that a is sent over outgoing arc i (unicast);
- $(a)\downarrow _{i,\,j}$ indicates that a is sent over outgoing arcs i and j(multicast);
- $(a)\downarrow _\forall $ indicates that a is sent over all outgoing arcs (broadcast).
All symbols sent via one generic rule to the same destination form one single message and they travel together as one single block (even if the generic rule is applied in mode $\scriptstyle \mathtt {max}\displaystyle $).

Example. To explain our rule application mode, let us consider a cell, $\sigma $, containing three counter-like complex symbols, $c(\textit{1}^2)$, $c(\textit{1}^2)$, $c(\textit{1}^3)$, and the two possible application modes of the following high-level “decrementing” rule:

The left-hand side of rule $\rho _\alpha $, $c(\textit{1}\, X)$, can be unified in three different ways, to each one of the three c symbols extant in cell $\sigma $. Conceptually, we instantiate this rule in three different ways, each one tied and applicable to a distinct symbol:

1.
If $\alpha = \, \scriptstyle \mathtt {min}\displaystyle $, rule $\rho _\mathtt {min}$ non-deterministically selects and applies one of these virtual rules $\rho _1$, $\rho _2$, $\rho _3$. Using $\rho _1$ or $\rho _2$, cell $\sigma $ ends with counters $c(\textit{1})$, $c(\textit{1}^2)$, $c(\textit{1}^3)$. Using $\rho _3$, cell $\sigma $ ends with counters $c(\textit{1}^2)$, $c(\textit{1}^2)$, $c(\textit{1}^2)$.
2.
If $\alpha = \, \scriptstyle \mathtt {max}\displaystyle $, rule $\rho _\mathtt {max}$ applies in parallel all these virtual rules $\rho _1$, $\rho _2$, $\rho _3$. Cell $\sigma $ ends with counters $c(\textit{1})$, $c(\textit{1})$, $c(\textit{1}^2)$.

Special cases. Simple scenarios involving generic rules are sometimes semantically equivalent to loop-based sets of non-generic rules. For example, consider the rule

$$ S_1 ~ a(x(I) \; y(J)) ~ \rightarrow _\mathtt {max}~ S_2 ~ b(I) ~ c(J), $$

where the cell’s contents guarantee that I and J only match integers in ranges [1, n] and [1, m], respectively. Under these assumptions, this rule is equivalent to the following set of non-generic rules:

$$ S_1 ~ a_{i,j} ~ \rightarrow S_2 ~ b_i ~ c_j, ~ \forall i \in [1,n], j \in [1,m]. $$

However, unification is a much more powerful concept, which cannot be generally reduced to simple loops.

Benefits. This type of generic rules allow (i) a reasonably fast parsing and processing of subcomponents, and (ii) algorithm descriptions with fixed-size alphabets and fixed-sized rulesets, independent of the size of the problem and number of cells in the system (often impossible with only atomic symbols).

Synchronous vs asynchronous. In our models, we do not make any syntactic difference between the synchronous and asynchronous scenarios; this is strictly a runtime assumption [4]. Any model is able to run on both the synchronous and asynchronous runtime “engines”, albeit the results may differ. Our asynchronous model matches closely the standard definition for asynchronicity used in distributed algorithms [3, 11]; however, this is not needed in this paper so we don’t follow this topic here.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nicolescu, R. (2018). Most Common Words – A cP Systems Solution. In: Gheorghe, M., Rozenberg, G., Salomaa, A., Zandron, C. (eds) Membrane Computing. CMC 2017. Lecture Notes in Computer Science(), vol 10725. Springer, Cham. https://doi.org/10.1007/978-3-319-73359-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-73359-3_14
Published: 31 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73358-6
Online ISBN: 978-3-319-73359-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix cP Systems: P Systems with Complex Symbols

A Appendix cP Systems: P Systems with Complex Symbols

1.1 A.1 Complex Symbols as Subcells

1.2 A.2 High-Level or Generic Rules

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation