Hard to Evaluate: Coaching Services

  • Siegfried Greif


Coaching is a service which is co-produced in interaction with the client as well as being both intangible and confidential. Coaching processes are characterized by a high level of complexity and disparity and are as a result almost impossible to standardize. Taken together, these characteristics lead to coaching being a service that is hard to evaluate for both the customer and the client. As a result, coaching is often only informally evaluated, which brings with it the risk that these assessments turn out to be devaluing and indeed for the client even discriminatory. Professionalism demands that the particularities of coaching are accommodated by the use of sound scientific methods of evaluation. An evaluation model will be presented here that encompasses the three criteria of structural, process and result quality.


Customer Satisfaction Scientific Evaluation Emotional Clarity Social Comparison Theory Management Coaching 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

3.1 Controversies Surrounding Coaching as a Service, and Evaluation Methods

“Coaching is not a service”, but rather an “another useful form of interaction”, coaching pioneer Looss (2014) recently and trenchantly declared in his keynote speech at a coaching congress. He justifies his thesis with the point that coaching is difficult to rate in terms of categories of achievement. He did not fundamentally speak out against the assessment of coaching in his conclusions with regard to the possibility of evaluating coaching; however, in his view such evaluations are often “desperate and moving attempts to make life predictable”. He is presumably not the only one to hold this critical view. Not a few professional coaches evidently have a lot to say against evaluation methods. Schmidt-Tanger (2014), for example took the view at the same congress that coaching, in her professional understanding, should never be conducted in a standardized form. It would have a negative impact on the non-standardizable coaching relationship if the human resources section of a company were to ask the clients to rate their coaching with predetermined standard questions. In this way, standardization of the coaching process would be encouraged.

How legitimate are these or similar frequently heard fundamental objections and reservations regarding the evaluation of coaching? Must a new third category of paid work be created for coaching beyond the classical differentiation between goods and services? One that encompasses forms that provide services that are not generally tangible? Should coaching associations press for coaching not to be evaluated, or at least not using standardized methods, or should they go with the scientists who hold that the scientific assessment of coaching is necessary in order to make the “best practices” usable for clients, as demanded by Stober and Grant (2006), the international protagonists of a scientifically-based coaching profession. The questions concerning the professionalization of coaching that are dealt with in this contribution refer to the clarification of complex contradictions and conflicts within the profession as well as to the issue of whether a professional self-perception of coaching is conceivable without practical and scientific evaluation. The following section offers an overview of the term “evaluation”, its meaning and various approaches and methods. After a brief introduction to the term and to the general significance of evaluations, the term service is considered as well as the question as to what constitutes the particularities of coaching in comparison with other services and why coaching is hard to evaluate. Finally, the reader is invited to enter into the discourse between science and practice regarding the further development of a professional self-perception as well as take an active part in the participation in evaluation research as an important basis for professionalization.

3.2 The Term Evaluation and Its Meaning

3.2.1 Various Methods of Evaluation

“Evaluation” is generally understood as “an inquiry process for collecting and synthesizing evidence” for an assessment of the value, worth, or quality of an object, person, or process (Fournier 2005, p. 140). Companies attempt in advance to evaluate the usefulness of coaching before they start the search for coaches and place an order. Often, companies follow the recommendations of credible experts when making their decisions (Stephan and Gross 2011, S. 178). They could however evaluate scientific studies into quality characteristics and the effectiveness of coaching before making their decisions. In order to check whether the expected results have been achieved after the introduction of coaching, they can attempt to evaluate the organization and conducting of the coaching as well as its results. If they are provided with the resources for this, they can carry out a comprehensive evaluation on the basis of an evaluation model and scientific methods or, more pragmatically, implement only a standard questionnaire on customer satisfaction. This decision is not without consequences for the development of the profession, as will be illustrated in the following.

3.2.2 Why Do People Evaluate?

Why do people evaluate products, services and the people and organizations that produce them? In his classical and today still fundamentally important Social Comparison Theory, the social psychologist Festinger (1954) assumes, in the manner of evolution theory, that it is an existential need of people to compare their own evaluations with the opinions of other relevant persons (cf. overview of Frey et al. 1993). This comparison of evaluations or “social validation” is necessary in order to avoid misjudgments in a complex and ever-changing world. It enables people to make decisions that are based upon subjectively certain evaluations. This need for comparative evaluation is fundamental. The greater the uncertainty, the stronger the need to evaluate. Festinger differentiates between the two types of criteria that are employed in order to evaluate the correctness of one’s own evaluations:
  1. (1)

    “Objective” criteria (inter-subjectively verifiable data, e.g. physical measurement values, turnover data etc.) and

  2. (2)

    Social criteria (comparison with the subjective evaluations of other people, for example opinions expressed by other people regarding their experiences with coaches).


According to Festinger, “objective” criteria are generally preferred. If these are not available or appear insufficient for evaluation purposes, a person’s own evaluations or judgements are compared with those of other relevant reference persons. Festinger assumed that, for the purposes of social comparison, people are preferred who are similar to the person doing the comparing. According to later studies, other reliable-seeming persons are also consulted according to the object of evaluation (Frey et al. 1993).

With regard to the evaluation of coaching, we can conclude, like Festinger, that evaluations of coaching are unavoidable. They are based on the fundamental need to safeguard one’s own opinion. The tendency to prefer “objective data” when evaluating is also, according to Festinger, a basis for decision-making that is recommended not only by scientists. It is in accordance with a natural preference for data the quality of which is uncontested.

3.2.3 Evaluations as Social Constructions

In his theory, Festinger sketches how people construct their evaluations by means of processes of social comparison. In doing so he indicated at a very early point that evaluations do not rest upon “objective” data alone, above all when they relate to complex objects of evaluation. Complex evaluations are jointly constructed by means of communication with other persons. This constructivist-style reformulated general conclusion refers not only to the forming of opinions on an individual level but rather also to communication with others.

Greif et al. (2004) describe how judgments of planned change processes in companies are spread by means of informal evaluation communications via internal communication networks. They assume that these evaluations of, for example, changes, are much more influential than the official evaluating statements made by the companies. Thus informally communicated evaluations influence readiness to support the planned changes as well as whether employees react with open or concealed individual or collective resistance.

According to the practical observations of the author as an external and internal consultant and coach in various organizations, coaching really lends itself to the spreading of informal stories. Coaching, and the managers who ask for coaching, are even made fun of on the informal office grapevine: “The fact that he needs coaching proves that he is incompetent …”. External and internal coaches who are not connected to the internal unofficial channels of communication often have no idea how they and their clients are discriminated against here (especially in the more down-to-earth areas of production).

However, if the need for evaluation is universal, and if evaluations of coaching, as either informal negative or positive social constructions, are unavoidable, it makes little sense to fundamentally go against informal assessments and methodical evaluations. The following case study describes the interplay between informal and formal evaluations.

In 1994, at the firm Felix Schoeller (at that time the global market leader for photographic and special papers), a new position and training programme for shift managers in production was created under the specialist control of the author of this chapter. The training modules were based upon methods of self-organized learning (Greif and Scheidewig 1998). In the first training module the trainee shift managers were to concern themselves with changes that were under way in the firm in order to then inform their staff about the planned changes and discuss with them how they could or would like to be involved. It was a tricky task. The changes had been set in motion by a well-known firm of consultants and were based upon the method still popular then of saving on personnel costs by means of Lean Management. The firm of consultants had managed to set almost the entire staff - right up to the senior master workmen and in part even the production line managers - against them and the planned changes. They did this by means of a concept that was in our view basically insufficiently practicable for this industry, as well as by making big mistakes in their communication of the necessary changes. At the same time it was, even in the opinion of the workers’ council, urgently necessary to bring about considerable reductions in costs in order to save jobs.

In the seminar, in agreement with the internal project managers and human resources development, we allowed ourselves the freedom to use a project as an example that was not based upon the Lean Management concept, which was limited to saving personnel costs, but rather a project for internal logistics putting into practice in which no reduction in personnel took place but rather savings in material costs in millions were achieved. It consisted of replacing the plastic packaging of the enormous paper rolls with cardboard rolls, which also happened to be a more environmentally friendly solution. The only idea to be taken up from the Lean Management approach was the idea of an increase in self-organization and the shifting of responsibility to the lower working levels. For the purposes of evaluation of the presentation and discussion of the implementation project by the project leader in the seminar, the trainee shift managers received a short questionnaire with standard scales and open questions on feedback of the type generally used by us at the time for evaluating seminars at the university.

In order to support the trainee shift managers in the preparation of their tricky communication task, we conducted a transfer-coaching process of three to five sessions with them. They knew that their staff would evaluate their information transfer and the discussion anonymously (using the same questionnaire that they themselves had used for feedback in the seminar). This type of evaluation of management behavior had never taken place up to this point in the firm.

In the end we had an implementation rate of 100% of the participants, something we had never before experienced in seminars. All of them received an evaluation of good or very good from their staff for their information meetings. These experiences of success have without a doubt increased the motivation and self-confidence of the future shift managers. This was then followed by very positive formal reports on the upper management levels. Even in the informal evaluation communications within the teams, the courageous appearances of the trainee shift managers were a subject of discussion. Previously there had been doubt and many informal complaints on almost all levels (including the management levels) about what shift masters were being trained as “shift managers” for, who were then even “needed coaching”. These discriminatory informal evaluation communications ended from that day on.

This case study is an example of how against the background of Festinger’s theory a simple evaluation method can make an explainable and very constructive psychological contribution towards increasing the evaluation certainty of the people involved. It additionally demonstrates how in this way discriminating and informal evaluation communications about coaching and the client can be halted or at least reduced. According to Festinger, it is psychologically important to use, where possible, objective numbers (in this case 100% implementation, “80% of employees found the presentation to be good or very good” etc.), but also to include and process as feedback non-standardized expressions of opinion given in response to open questions.

This example also demonstrates why evaluations are social constructs. It was already known to us before the evaluation took place that criticism had been expressed informally and at least in part justifiably concerning the change concept. This led to internally agreed re-evaluations and a partial new construction of the target concept. That which was to be achieved and evaluated in the training program was also developed together with the project management and the head of human resources as well as other stakeholders. The questionnaire, which had been kept deliberately short for use in the team, had to be literally constructed and its implementation had to be carefully negotiated with those responsible in the firm as well as the participants (leaders, human resources, workers’ council, and participants). Even in the most extensive scientific investigations pragmatic selection decisions cannot be avoided. Evaluations thus always retain gaps and can never be conducted as comprehensively as one might wish.

One result that is regarded as a success by the company management and the other participants or people affected is based in the end on a kind of collage of incomplete data and information that is regarded as reliable in the estimation and evaluation communications of the opinion leaders and participants. This does not mean that such information collages and those overall assessments that are constructed by means of the evaluation are entirely arbitrary. If in our example the evaluation conducted by us were arbitrary or incorrect in the view of the participants (in the way it was carried out or with regard to the result) then we would have found this out very quickly by means of the informal communication networks. If we think further along the lines of the evolution-theoretical assumptions in Festinger’s Social Comparison Theory we can happily postulate that scientists experienced in methodology as well as practitioners capable of reflection are able to find an “applicable and professional” combination of data and opinions by means of which the risk of faulty decisions can be reduced. Anyone who carries out evaluations studies in organizations should have learned to construct them so that they improve the certainty of decision-making and at the same time maintain contact with the informal evaluation networks.

According to scientific standards, evaluation studies must be conducted in a manner that is open for unexpected results. It also follows that (at least within the organization) the results need to be communicated transparently and openly, independently of whether they turn out to be negative or positive. Negative results offer the project leaders and participants the opportunity to initiate and manage an open and transparent discussion about the observed weaknesses of the changes and need for improvement in that they themselves first of all suggest improvements before the measures can be sentenced entirely to failure by the no longer controllable informal rejection. Applied to the evaluation of coaching, coaches who fundamentally reject accepted practical or scientific evaluation methods for coaching leave the evaluation in the hands of informal judgments and decisions.

In the following section, those particularities of coaching will be examined that make it so hard to evaluate the interactions within coaching as well as their effects. First of all, however, the question will be gone into as to whether coaching can be categorized as a service.

3.3 Coaching as a Particular Kind of Service

3.3.1 The Term Service

The classical economic differentiation between sectors of the economy differentiates between a primary sector (agriculture and forestry, animal husbandry and fishing) and a secondary sector (manufacturing and processing industries, mining, trades and crafts, energy and water supply). The “tertiary” sector was added as a “category for the rest” in which fall all other paid occupations that do not belong in either the first or second categories (Nerdinger 2011, S. 12). It was only recently that this extremely heterogeneous but economically strong and expanding “remaining sector” came to be known under the common term of “services”, and the attempt to find commonalities between the services performed in this sector. The Meaning of the Term “Service”

It is revealing to examine the origins of the term “service” as well as their evolving meanings (Harper 2016). “Servant” has old French roots of “servant” (foot-soldier) and “servire” in the sense of “to do duty toward, show devotion to” (Latin: “be devoted”). We can derive a role constellation from these word meanings that gives the determining role to the service recipient, whereby recognition for the service provider and for that which he provides is certainly also implicit. Scientific Definitions of the Term Service

One may be skeptical as to whether general common features can be found for the heterogeneous sector of “services” by means of which it can be clearly differentiated from “goods”. The business economists Meffert and Bruhn (2012, pp. 214–216) describe and discuss various definitions and the “constitutive” characteristics of services. They prefer a definition that (1) includes the necessary potential of the service provider, (2) the process by means of which the service is provided, and finally (3) the result aimed for. They thus define services in a highly complex manner as “independent marketable performances/services that are connected with the provision of (e.g. insurance services) and/or the employment of abilities (an example of potential-oriented services is hairdressing services). Internal (for example business premises, personnel, equipment) and external factors (that do not lie within the realm of influence of the service provider) are combined within the framework of the process of creation (process orientation). The factorial combination of the service provider is employed with the goal of achieving useful effects (e.g. a car inspection) for people (customers) and their objects (e.g. their cars and result orientation)” (Meffert and Bruhn 2012, pp. 238–239; free translation). The authors explicitly emphasize that their demarcation by means of definition does not intend to “imply a homogeneity that is non-existent in the service sector”, but rather emphasize the great variety and heterogeneity, which make a more exact differentiation of different types of services necessary.

The definition appears complicated. To put it simply, three main components are addressed: (1) necessary service skills (or potential) of the service provider, (2) internal and external factors the process of production of the services and (3) as a result, “useful effects” (for people and their goals). Coaching can be subsumed very unambiguously and un-problematically under this definition.

The question remains, however, as to whether all goods or products manufactured within a firm can be excluded using this definition. This also has to do with the fact that the manufacturer nowadays even more frequently combines the sale of their products with the provision of services (for example specialist advice helping the customer to install the product and servicing of the appliances at the customer’s location). There are also products that would as such fall under the definition offered by Meffert and Bruhn. One might for example think that the manufacturing of a skin cream might demand “skills” as well as the factors generally listed in the definition in the manufacturing process and in particular in the literal sense of a “useful effect” for people. Skin cream is however clearly not a service.

Maleri and Frietsche (2008) point out three characteristics that are more suitable for demarcation purposes: (1) the immaterial nature of the service, (2) the employment of so-called external factors in the production process (i.e., that the clients participate as external factors in the production of the service) and (3) “no raw materials are used”.

In the view of the economists quoted here, the evaluation of the quality of the results of services is difficult because services are immaterial or intangible. It follows that the objective measurement values are hardly suitable, and that instead the subjective perceptions and judgments of the customers need to be called upon. Examples are surveys on customer satisfaction, but in particular on customer loyalty and willingness to recommend. These subjective evaluations are taken very seriously by economists because the turnover of service providers has been shown empirically to be positively influenced by them (see Nerdinger’s summary 2011, p. 105 ff.).

3.3.2 Characteristics and Various Types of Services Major Characteristics and Types of Services

The defining major characteristics of services are summarized very clearly and succinctly by Schneider and White (2004). The characteristics can be more or less pronounced depending on the service. In the following, these general features are described using quotations from Schneider and White (2004, pp. 6–9). Afterwards, the individual characteristics are regarded by the author of this chapter in relation to coaching and questions are raised with regard to the evaluation of coaching.
  1. (1)

    Relative Intangibility

    “Pure services cannot be seen, touched, held, or stored - they have no physical manifestation.” Many services have however tangible components, for example car repair, in which a new exhaust pipe is installed. When we apply the major features of services described by Schneider and White (2004) to coaching, then this occupation clearly belongs to those services that are almost entirely intangible. It may be that in individual cases partial exceptions exist, for example when a client takes with him a questionnaire that he has filled in or notes an appointment in his calendar. The evaluation of the quality of intangible services is difficult. As already mentioned above, subjective perceptions and judgments or observable consequences of assumed intangible changes have to be resorted to. How this is possible is described below.

  2. (2)

    Relative Inseparability

    Listening to a symphony orchestra or watching a play at the theatre are examples of “pure” services in which their production and consumption by the customer are not separable in terms of time. They are not produced at one time and then stored at another time or in another place for later use. They are produced by the service provider and “consumed” by the customer at the same time by the customer in a delivery experience. This simultaneity is also called in Latin the uno-actu Principle. When the production and consumption take place simultaneously, this decreases the possibilities for testing and improving quality before delivery.


Coaching belongs without doubt to these services that are, according to Schneider and White (2004), partly inseparable services. The service comes about in an interactive process between coach and client. One example is a client who in a moment during the coaching process develops a strong feeling of being understood and supported by the coach. The process of coaching is a substantial part of the outcome. However, the clients can also strictly speaking “take something with them” from the coaching process if they keep in mind the support of the coach or a newly won insight or especially if they note down a plan as to how they can finally implement a change in behavior in their everyday lives for which they had been aiming for a long time.

Since the clients play a major part in the interaction and the result generated, one can speak in the case of coaching of a “co-production” of the service. Constructivists prefer to use the term “co-construction” here. However, “construction” is often used in the context of building, e.g. “Street under construction” etc. If one wants to emphasize that at least for the client something new has come into existence during the commonly created service, then one can describe this process as “co-creation”. One consequence for the evaluation of the quality of coaching is that not only do the competence and behavior of the coach within the process need to be analyzed in connection with the interaction but rather also that of the client.
  1. (3)

    Relative Heterogeneity

    Coaching is extremely heterogeneous und manifold in form, and in the end as varied as the clients and their themes, goals and contexts. Schneider and White (2004, pp. 8–9) emphasize the point that services are more heterogeneous and less open to standardization than goods in their production and delivery: they also depend on the interaction with the client. They must in each case be adapted to the individual expectations, abilities and previous knowledge of the customer.


In Coaching, the difference of the clients or customers that is described by Schneider and White (2004) is no doubt very clear to all coaches, and they attempt to take this into account by adapting their methods and tools. Some feel that in coaching it doesn’t make much sense to use tools or standard methods of evaluation. These methods do not however need to be seen as imperative: they can instead be understood as rules of thumb that are useful for orientation purposes but that always need to be adapted for use or as heuristic principles. (This applies explicitly to this author’s guidelines.)

It is possible to determine profiles for the different types of services on the basis of the form taken by the feature described, as shown in Fig. 3.1.
Fig. 3.1

Different types of services

In Fig. 3.1, the profiles of three different typical services are compared: coaching, textile sales in a clothes shop for demanding customers and selling burgers in a restaurant from one of the large chains. The coaching profile stands out clearly with its extremely low level of tangibility and high values for inseparability and heterogeneity. Were we however to draw up a profile for psychotherapeutic work, then it would take on a similar appearance. A professional counseling session conducted according to a proscribed standard procedure would however demonstrate a similar profile in terms of tangibility and inseparability (although with significantly less pronounced values), but have a much lower level of heterogeneity. In a burger restaurant, tangible, standardized edible goods are sold. Only the standard design of the restaurant, the outer appearance and taste of the burger, and the friendliness of the personnel would altogether lead to the value for tangibility being located just below the maximum value. Since however the production and consumption of the burger are clearly separable, and since as a result of the standardization of the sale of the burgers makes almost no allowance for the different customers, this service is very similar to the production and delivery of goods in the manufacturing trade. What is interesting is that in high-end clothes shops, the necessity for a psychologically favorable presentation of the current items of clothing reduces the tangibility of this service. The individual and different wishes of the customers must be taken very much into account, which is why the heterogeneity is relatively high.

Research into the particularities of services and the development of methods for evaluating the quality of services is generally still relatively young and not very reliable, as Meffert and Bruhn (2012) found. Schneider and White (2004) rely on research from the field of service psychology (Nerdinger 2011; Schneider and Bowen 1995). Qualifications of the Clients

Schneider and Bowen (1995, p. 93) found that many customers do not understand their role in the co-production of the service. It is therefore necessary to at least enlighten the customers as to the expectations concerning how they behave as co-producer. Schneider and Bowen recommend finding suitable ways of getting across to the customers what exactly is expected of them as co-producer and how they can make a contribution towards more efficient processes and optimal results. This is already common practice for some complex services, for example before a new company software system is introduced: The customers receive a specialist introduction and qualification.

As far as coaching is concerned, it would appear questionable whether the usual short introduction in the first coaching session with information about the general principles of coaching is sufficient, and whether the client is able to implement them straight away in the coaching process. It would appear more appropriate to give staff an introduction to coaching with a special further education module, for example in the case of managers within the framework of their management seminars. In a study project in the field of occupational and organizational psychology at the University of Osnabrück we developed a model of this kind for future managers with three or four “test coaching” sessions. Acceptance and satisfaction on the part of the participants as well as interest in coaching were consistently high. Further Typical Coaching Characteristics

Alongside the general characteristics of services addressed above, the following features are particularly characteristic for coaching:
  • Strict discretion: Coaches commit themselves to strict discretion with regard to their clients. Unlike doctors and psychotherapists, however, they are not strictly bound by professional secrecy laws. Nevertheless, evaluation studies demand strict anonymity for the participants. This can be ensured by means of strict anonymization procedures and trustworthy evaluators (for example independent scientists).

  • Relative independence of the purchaser and service recipient: Other than in the case of psychotherapy which is paid for in many countries by the health insurance companies, coaching is, with the exception of private customers, commissioned and financed by firms in a manner similar to company training measures. However, the subjects and goals in coaching are not completely determined by the purchaser, as is the case with training measures, but rather by the clients themselves. The firm’s goals are presented to the client in the form of expectations. Because of the discretion aspect, the extent to which they can be taken into account in the process can be controlled only indirectly by the purchaser, for example by means of talks before and after the coaching process. It would be interesting to take a closer look at how often conflicts result from this very particular constellation and whether they affect the level of acceptance of coaching on the part of the purchaser.

3.3.3 Coaching as a Difficult Service to Assess

Meffert and Bruhn (2012) closely examine the complexity of services and its significance for the estimation of the quality of services by the customers. Similar to Benkenstein and Güthoff (1996), they differentiate between five dimensions of complexity: (1) the number of service components (2) multi-personality (number of persons involved), (3) heterogeneity of the service components, (4) length of the period of production and (5) the individuality of the customers (cited after Meffert and Bruhn 2012, pp. 330–341). They use these dimensions to differentiate between types of services. As an example of a service with a high level of complexity in all dimensions they name a heart transplant operation, whereas the dispensing of cash from a cash dispenser scores low on all dimensions. Coaching would be categorized as a service the complexity of which is very high on four of these dimensions. The only exception is the dimension multi-personality, because coaching is normally conducted by only one person.

Complex services can be evaluated only partially and with difficulty by the customers, according to Meffert and Bruhn (2012). This means that their purchase risk is high (Meffert and Bruhn 2012, p. 2611). In order to reduce uncertainty of evaluation as a result of incomplete information, customers rely instead on trust-inducing so called “surrogates” such as for example the quality of the relationship (perceived goodness of the relationship between the service provider and the customer and in particular subjective level of trust on the part of the customer as well as familiarity between customer and service provider, Meffert and Bruhn 2012, p. 991) or simply how well known the provider is or the quality of the brand that he represents (p. 1420).

Coaching can profit as a service from the fact that a high level of trust can be established in the working relationship (Palmer and McDowall 2010), which may then be used by the customer as a surrogate for the evaluation of the goodness of the service. The confidentiality aspect however makes it difficult for the majority of people who have no coaching experience to evaluate complex coaching services in advance on the basis of verifiable information. That which takes place “in secret” can easily be taken as a surface for the projection of accusations and acts of discrimination in informal evaluations. The term “coach”, unlike the term “psychotherapist”, is not a protected professional term, and the quality of the certificates awarded by coaching associations for coaching qualifications or the status of coach is hardly verifiable for the customer as a measure of quality.

3.4 Coaching as a Profession at the Crossroads

3.4.1 Understanding Coaching as a Particular Form of Service

According to Fietze, from a profession-sociological point of view scientific research into the professionalization of coaching has an important role to play (Fietze 2011, p. 31 and Chap.  1 in this book). Research serves to increase the legitimization of coaching as an independent profession in that it contributes towards clarification of the coaching concept by means of scientific discourse as well as proving its effectiveness by means of scientific evaluation studies (see here Kotte et al., Chap.  2 in this book). More attention should also be paid however in the debate on professionalization to the fact that science not only has a legitimization function for the profession but also a function with regard to orientation, as Fietze notes. This is achieved by means of discourses between practitioners and scientists regarding the professional understanding of coaching as well as with regard to clarifying its constituent features. Comparisons and clarification of boundaries to other professions play a significant role in this.

As has been shown, coaching can be understood as a service and can thus be brought into connection with the interdisciplinary specialist efforts to clarify and research into the characteristics of services as well as into differentiating between different types of services. In summary, five features can be found to be characteristic for the particular nature of coaching:
  1. (1)

    A very high level of intangibility,

  2. (2)

    Inseparability (uno-actu principle), and co-creation of the service provided,

  3. (3)

    A high level of heterogeneity (taking into account the individuality of the client),

  4. (4)

    Strict discretion, and

  5. (5)

    Relative independence from the purchaser and service recipient.


When we look at these characteristics, we see that that characteristics profile of coaching differs clearly from that of most other services. A discussion of the similarities and differences between coaching and other services can contribute towards the further development of a differentiated professional self-understanding within the profession and help to clarify “the contours of the professional identity” (Fietze 2011, p. 27, free translation). There are services that are comparable with coaching in not all but many aspects. We can examine here whether these professions struggle with comparable problems relating to evaluation of the service provided. Psychotherapy for example could be discussed as an interesting example for comparison in many respects, even though its solutions can hardly be taken for use as a transferrable model.

In terms of the public image of the profession, it would make little sense for coaching to shut itself out from the current service-sciences’ attempts at clarification by regarding itself as a form of interaction or activity that is incomparable to services, when it can obviously be categorized as a service. When in addition service-sciences, and in particular economic analyses, are rejected wholesale, this leads in my view to a dead end in the future of the profession, because it is blind to current scientific discourses. At the current crossroads for the further development of the self-perception of coaching, connecting up with current and economic analyses would be more constructive. It is not true that economists only ever evaluate things in terms of economic figures. The economists quoted here regard services in their standard works through more than just “economic glasses”; instead they analyze in a differentiated manner the difficulties and possibilities with regard to how complex, co-produced, “immaterial services” can be evaluated, communicated and improved in terms of their quality. A closer reference to the language of the economists could even help the coaching profession to overcome communication barriers separating them from their business customers and managers, as well as make comprehensible to them how coaching services can be co-created, evaluated, and improved in terms of quality by means of common efforts.

3.4.2 Taking on the Challenges of Evaluation

The client’s difficulty with unambiguously evaluating the quality of coaching or coaches is the downside of the particular features of coaching as a service. Coaches who believe that coaching is impossible to evaluate and that it would be better to do entirely without practical and scientific evaluation do not by doing so get rid of peoples’ interest in evaluating coaching. They leave the evaluation of coaching to those informal forms of evaluation that can quickly lead to negative discrimination of the client. These kinds of “evaluation communications” (see above) are very difficult to influence by means of professional evaluations of coaching. The advantage of scientific evaluations is that according to academic standards they are transparent in terms of the methods and criteria used and can thus be criticized very exactly.

Companies mainly rely on personal recommendations, that is, informal evaluative communications, when looking for suitable coaches (Stephan and Gross 2011, p. 168). They attempt to ensure the quality of the coaching by means of “an exact clarification of the task and definition of the goal from the start” (35% of the companies), retrospective ratings of goal attainment (46%), and feedback dialogues (35%) (Stephan and Gross 2011, p. 178; only the highest percentages of the survey of 51 executives in charge of coaching are cited). In the face of such rudimentary criteria for evaluation, which can be easily criticized, we are still a long way away from a professional evaluation of coaches and coaching. It is a challenge for science and the profession to promote the development and use of better-suited systematic methods of investigation and criteria for the evaluation and improvement of the quality of coaching. Evaluation research into psychotherapeutic methods (Hautzinger and Pauli 2009) can serve as an example for how these challenges can be overcome despite perhaps even greater difficulties in terms of evaluation.

With the current state of research, the development of an “evaluation model” is to be recommended as a basis when planning a systematic evaluation (Wottawa and Thierau 1998). When undertaking evaluation of coaching, the systematic and personal preconditions, measures and process characteristics that appear relevant as well as the expected results need to be included and specified in a model of this kind. In the case of services, these differences correspond to the criteria of structural, process and result quality. Suitable methods can then be selected on the basis of this model with which the quality characteristics can then be collected. In a general evaluation model for coaching, Greif (2013, 2017) differentiates the following:
  1. (1)

    Antecedents of coaching, which influence the outcome before coaching has started (characteristics of coaches, clients and organizations),

  2. (2)

    Coaching process (coaching relationship, attributes and behavior of coach and client, and their interactions and instantaneous outcome),

  3. (3)

    Short-term outcome (general evaluation criteria as well as criteria specific to coaching for the clients, coach, and organization),

  4. (4)

    Long-term outcome (criteria for the clients, coach, and organization),

  5. (5)

    Favorable organizational context conditions (see also Chap.  2 in this book).


The model can, like any model, be further extended or differentiated as new knowledge is gained.

Attempts have been made by consultants to estimate the economic return on investment through coaching (Anderson and Anderson 2005). The adventurously high estimates (for example an increase of the ROI of 689%) result from estimation artifacts (Greif 2013). Indicators of economic efficiency can probably be achieved by coaching at best indirectly and long-term (when for instance more employees actively participate in organizational changes as a result of a transformational leadership style resulting from coaching).

With regard to the individual criteria of the evaluation model, the current state of research already makes it possible to list examples of recommended instruments and methods that have demonstrated themselves to be scientific and practicable for their assessment (Greif 2013, 2017). In the following, criteria are listed as examples that appear suitable for the evaluation of the short-term results for the client.

The first sub-group of criteria is made up of general scales, which can also be used to compare coaching with other services, for example training (for more details see Greif 2013, 2017 ):
  1. (1)

    Questionnaires for evaluating customer satisfaction and loyalty. (Here, established national and international scales can be used as recommended by Meffert and Bruhn 2012, for comparative evaluations of services.)

  2. (2)

    Ratings of the degree of goal attainment or satisfaction with goal achievement. Alongside ratings of customer satisfaction, this is apparently the most commonly used criterion for evaluation coaching.

  3. (3)

    Scales of positive and negative affect. (With this very short questionnaire people can describe their current feelings. Affect scales of this type are very reliable and “robust” in their practical application, and are used both nationally and internationally in the most varied of fields of application, for example for evaluating psychotherapy and other interventions in the health sector. Coaching appears to reduce negative affect and feelings of helplessness.)

  4. (4)

    Self-esteem. (This construct is considered to be an important protective mental basis for personality. Coaching can make an increase in strength possible here.)

  5. (5)

    General well-being. (Scales assessing general well-being are used in many fields of psychosocial research, above all in life or health coaching.)

Instruments and scales also exist with which effects can be measured which are more specific for coaching, such as (for more details see Greif 2013):
  1. (1)

    Evaluating goal clarity (for example before and after coaching).

  2. (2)

    Self-efficacy scales. (Here the subjects are asked to what extent they expect to be able to cope with impending tasks effectively and successfully. Self-efficacy is a beneficial motivational precondition for starting to act. It can be increased by coaching.)

  3. (3)

    Emotional clarity. (This scale is related to one of the dimensions of emotional intelligence. It is notable that coaching can promote self-perceived clarity with regard to one’s own feelings.)

  4. (4)

    Increase in result-oriented problem- and self-reflection. (These scales ask whether different kinds of problem- or self-reflections have lead to results. Strong effects from coaching could be demonstrated using them. Their application for evaluation is preferable to several standard scales applied in the field, which are problematic because they correlate with rumination and depression, see Greif and Berg 2011.)


The selection put together here only includes scales (Greif 2013) whose statistical reliability and construct validity (that is, that the internal dimensional structure is in accordance with expectations) have been checked. In addition, their correlation with further external criteria has been investigated. Even though they are only based upon the self-report and estimations of the clients, they allow conclusions to be drawn about important changes. Wherever possible, however, the constructs should be secured through a multi-method approach (applying a variety of data types and sources), that is, through additional behavioral data or at least reports from a number of different people: examples would be 360° ratings, or in the case of stress management coaching by measuring blood pressure, heart rate variability or other values that are easy to measure. So far, however, only a few such studies exist for coaching.

Alongside quantitative methods and data, qualitative methods are also always to be recommended in the “method mix” (Greif 2013, 2017). Examples would be interviews in which the subjects freely describe their coaching experiences and results and answer open questions on predetermined subjects. Anonymous case studies can be put together using this method in which it can be shown by means of examples how coaching has been conducted. Clearly described case studies are very important for illustrating intangible and hard to evaluate services and as a basis for the qualification of clients.

One constructivistic method, which activates the client and coach to reflect together about the coaching that has taken place and the perceived subjective causes and results, is the Coaching-Explorer. 1 With this method, coach and client can note perceived results and subjective causes of the individual results in key words on small cards (using different colors for positive and not achieved or negative results and related subjective causes). By means of this structure-laying technique, the subjective theories of both the coach and client can be reconstructed and visualized. The structure-images of the coaches and the clients are compared with each other and reflected upon.

When using this method a strong spontaneous interest can be observed on the part of the client when comparing their evaluations and explanations with those of the coach. The clients are happy to invest the additional time necessary in order to do this in the final session. This observation matches the assumption formulated above after Festinger (1954) that people have a strong need to compare their own judgments with those of other relevant persons in order to reduce their evaluation uncertainty. Clients can learn by example how coaching functions and what results can be targeted by means of this joint reflection and the coach’s explanations. By means of the visualization using the card technique both coach and client can very quickly decide which of the results noted on the cards can be communicated without problem to the client’s boss or colleagues and how the client can effectively explain the coaching process in the case of curious enquiries without giving away anything confidential. The clients thus become coaching experts in their own right as well as disseminators of the method.

3.4.3 Every Coaching Professional Is Called upon

Development of the professional is not a task that can be dealt with by means of well-formulated sentences in the statutes of the coaching federations. Only when it is supported by the attitudes and behavior of the professionals can it be effective. Every coach is confronted with the question as to whether and by means of which practical and scientific methods coaching can in their view be at least partially evaluated, and must answer the question as to how their occupation can be described in all its particularity as well as be distinguished from other services.

Many coaches appear to be content with the status quo of the profession as long as they personally find enough clients who are satisfied with their coaching. They do not involve themselves in the discourse with the scientists or in evaluation research. The question that remains open is not just what consequences such privatization tendencies may have for the profession, but more fundamentally, whether a freezing of the status quo in the development of the profession can be made possible at all through excluding oneself from the scientific discourse. Since society and the sciences continue to develop, this presumably means a backward step and eventually “de-professionalization”. At international scientifically oriented coaching conferences, however, a new “spirit of discourse” between scientists and practitioners is becoming tangible. It invites us to continually concern ourselves with the self-conception of coaching and the methods and outcome of evaluation research. Everyone is called upon to join in this discussion and take an active part.


  1. 1.

    Guidelines on the author’s website: [retrieved from 24.1.2016].


  1. Anderson, D. L., & Anderson, M. C. (2005). Coaching that counts—Harnessing the power of leadership coaching to deliver strategic value. Amsterdam: Elsevier Butterworth-Heinemann.Google Scholar
  2. Benkenstein, M., & Güthoff, J. (1996). Typologie von Dienstleistungen. Ein Ansatz auf der Grundlage system- und käuferverhaltenstheoretischer Überlegungen. Zeitschrift für Betriebswirtschaft, 66(12), 1493–1510.Google Scholar
  3. Festinger, L. (1954). A theory of social comparison processes. Human Relations, 7(2), 117–140.CrossRefGoogle Scholar
  4. Fietze, B. (2011). Chancen und Risiken der Coachingforschung - eine professionssoziologische Perspektive. In R. Wegener, A. Fritze, & M. Loebbert (Eds.), Coaching entwickeln. Forschung und Praxis im Dialog (pp. 24–33). Wiesbaden: VS Verlag.Google Scholar
  5. Fournier, D. M. (2005). Evaluation defined. In S. Mathison (Ed.), Encyclopedia of evaluation (pp. 139–140). Thousand Oakes, CA: Sage.Google Scholar
  6. Frey, D., Daeuenheimer, D., Parge, O., & Haisch, J. (1993). Die Theorie sozialer Vergleichsprozesse. In D. Frey & M. Irle (Eds.), Theorien der Sozialpsychologie - Bd. I Kognitive Theorien (2nd ed., pp. 81–122). Bern: Huber.Google Scholar
  7. Greif, S. (2013). Conducting organizational based evaluations of coaching and mentoring programs. In J. Passmore, D. B. Peterson, & T. Freire (Eds.), The Wiley-Blackwell handbook of the psychology of coaching and mentoring (pp. 445–470). Oxford: Wiley Blackwell.Google Scholar
  8. Greif, S. (2017). Researching outcomes of coaching. In T. Bachkirova, G. Spence, & D. Drake (Eds.), The SAGE handbook of coaching (pp. 569–588). London: Sage.Google Scholar
  9. Greif, S., & Berg, C. (2011). Result-oriented self-reflection—Report on the construct validation of theory-based scales. Retrieved from Osnabrück (retrieved from January 26, 2016).
  10. Greif, S., Runde, B., & Seeberg, I. (2004). Erfolge und Misserfolge beim change management. Göttingen: Hogrefe.Google Scholar
  11. Greif, S., & Scheidewig, V. (1998). Selbstorganisiertes Lernen von Schichtleitern. In S. Greif & H.-J. Kurtz (Eds.), Handbuch Selbstorganisiertes Lernen (2nd ed., pp. 347–362). Göttingen: Verlag für Angewandte Psychologie.Google Scholar
  12. Harper, D. (2016). Etymological dictionary Lancaster. Retrieved January 19, 2016 from
  13. Hautzinger, M., & Pauli, P. (Eds.). (2009). Psychotherapeutische Methoden, Enzyklopädie der Psychologie, Psychologische Interventionsmethoden, (Vol. 2). Göttingen: Hogrefe.Google Scholar
  14. Looss, W. (2014). Die Irrwege der Coaching-Könige: Warum Coaches keine Dienstleister sind, Kongress Coachig heute: Zwischen Königsweg und Irrweg. Erding (February 20–21, 2014): Hochschule für angewandtes Management.Google Scholar
  15. Maleri, R., & Frietsche, U. (2008). Grundlagen de Dienstleistungsproduktion (5th ed.). Heidelberg: Springer.Google Scholar
  16. Meffert, H., & Bruhn, M. (2012). Dienstleistungsmarketing: Grundlagen - Konzepte - Methoden: Grundlagen - Konzepte - Methoden. Mit Fallstudien (7th ed.). Stuttgart: Gabler (Kindle Edition).Google Scholar
  17. Nerdinger, F. W. (2011). Psychologie der Dienstleistung. Göttingen: Hogrefe.Google Scholar
  18. Palmer, S., & McDowall, A. (Eds.). (2010). The coaching relationship. London: Routledge.Google Scholar
  19. Schmidt-Tanger, M. (2014). Coachen statt Klönen – Emotionale Relevanz und Status im Coaching, Kongress Coachig heute: Zwischen Königsweg und Irrweg. Erding (February 20–21, 2014): Hochschule für angewandtes Management.Google Scholar
  20. Schneider, B., & Bowen, D. E. (1995). Winning the service game. Boston: Harvard Business School Press.Google Scholar
  21. Schneider, B., & White, S. S. (2004). Service quality—Research perspectives. Thousand Oaks, CA: Sage.Google Scholar
  22. Stephan, M., & Gross, P.-P. (Eds.). (2011). Organisation und Marketing von Coaching: Beiträge des Marburger Coaching Symposiums 2010. Wiesbaden: VS Verlag.Google Scholar
  23. Stober, D. R., & Grant, A. M. (Eds.). (2006). Evidence based coaching handbook: Putting best practices to work for your clients. New York: Wiley.Google Scholar
  24. Wottawa, H., & Thierau, H. (1998). Lehrbuch evaluation. Bern: Huber.Google Scholar

Copyright information

© Springer Fachmedien Wiesbaden GmbH 2017

Authors and Affiliations

  1. 1.OsnabrückGermany

Personalised recommendations