Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

We defined data science in Chapter 2 and covered what it means to be a “data scientist.” In this chapter, you’ll see how to break that role into several team roles. Then you’ll see how this team can work together to build a greater data science mindset.

Putting Data Scientists in Perspective

As you learned in Chapter 2, there’s some confusion surrounding the role of a data scientist. In 2001, William S. Cleveland published “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.”Footnote 1 This paper was the first to merge the fields of statistics and computer science to create a new area of innovation called “data science.” At the same time, Leo Breiman published “Statistical Modeling: The Two Cultures,”Footnote 2 which described how statisticians should change their mindset and embrace a more diverse set of tools. These two papers created a foundation for data science, but it built on the field of statistics.

In 2008, some top data gurus from Facebook and LinkedIn got together to discuss their day-to-day challenges. They realized they were doing similar things. They saw their role as a crossover of many different disciplines. They decided to call this role a “data scientist.”

A data scientist at this time was just a list of qualities. For example:

  • Understand data

  • Know statistics and math

  • Apply machine learning

  • Know programming

  • Be curious

  • Be a great communicator and hacker

They were renaissance enthusiasts who crossed over into many different fields.

The problem is that this list of skills is not easily found in one person. Each of us is predisposed to certain areas based on our individual talents. We usually gravitate toward our talents, and then work to refine our craft. A statistician will often work to become a better statistician. A business analyst will work to refine his or her communication skills. There is also a lot of organizational pressure to specialize. Most large organizations are divided into functional areas. There’s some need for common understanding, but not always common expertise.

People are also notoriously bad at self-assessing their own abilities. The famous Dunning Kruger studyFootnote 3 found that people who rated themselves as highly skilled often dramatically overestimated their expertise. A gifted statistician may rate themselves as an excellent communicator, but you don’t need to be a good communicator to be a great statistician. A great statistician could easily have a long career even if he or she fumbles through presentations.

That’s why most organizations divide the work up into teams. Individuals on the team will have their own areas of expertise. A cross-functional team doesn’t assume that everyone is an expert. Instead, it encourages individuals to learn from each other’s strengths and cover each other’s weaknesses. A team of data scientists might not be able to identify those weaknesses. The team will blindly fumble if there’s no one to identify blind spots.

I once worked for an organization that had a team of data scientists building out a cluster. There was some concern from the business because the higher-ups had no idea what the team was building—they were frustrated because they were paying for something they didn’t understand. I went to a few of the meetings. The team of data scientists demonstrated a simple mapReduce job. The business managers stared blankly at the screen and occasionally glanced at their smart phones. To an outsider, it seemed obvious from the yawns and eye rubbing that the team was not doing a great job communicating.

After the meeting, I wrote a matrix on the whiteboard. I listed the following six skill sets:

  • Data

  • Development

  • Machine learning

  • Statistics

  • Math

  • Communication

I asked the data scientists to rate how they felt they were doing on each of these areas from 1 to 10 (1 being poor and 10 being best) so we could look for areas to improve. I took that same list of skill sets and showed it to one of the business analysts. I asked them to rate the team.

The results are shown in the Table 6-1.

Table 6-1. Data scientists’ and business analysts’ ratings

It was a classic Dunning Kruger result. In the places where the data scientists rated themselves as highly skilled, they dramatically overestimated their expertise. The data scientists all came from quantitative fields. They were statisticians, mathematicians, and data analysts. They couldn’t identify their own blind spots. It took someone from an entirely different field to shine a light on their challenges.

If you’re part of a large organization trying to get value from data science, it would be a mistake to rely on a few superhero data scientists. Individuals who come from a similar background have a tendency to share the same blind spots. Academic research shows that you often get better insights from a cross-functional team with varied backgrounds.Footnote 4

There is some wisdom in our eclectic organizational structures. People with marketing, business, and management backgrounds deserve their place at the data science table. It’s unrealistic to assume that key people with a quantitative background will have all the same questions and insights. Keep your team varied and you’re more likely to have great results.

Seeing Value in Different Skills

One of the dangers to your data science team is putting too much emphasis on data scientists. Remember that data scientists are multidisciplinary. They should know about statistics, math, development, and machine learning—all while understanding the customer and coming up with interesting questions. Most data scientists come from engineering, math, and statistics backgrounds. This means that they’re likely to share a similar approach to questioning and look at the data from a shared perspective.

It’s unlikely that someone who’s spent a career in math and statistics will have as much insight into customers as someone who’s spent his or her career in marketing. Being an expert in one field doesn’t assume expertise in another.

Many people who claim to be multidisciplinary usually have a few very strong skills with some knowledge of other areas. If you’re very confident in many areas, you’re likely to have large skill gaps. It also means that a team that only has data scientists can have similar blind spots and be prone to groupthink.

One way to keep this from happening is to allow people with other backgrounds to participate in your data science team. Remember that good data science relies on interesting questions. There’s no reason why these interesting questions should only come from people who analyze the data.

Think about your running shoe web site. A data analyst shouldn’t have a problem finding web sites that referred customers to the store. Let’s say that most of the customers came from Twitter, Google, and Facebook. There were also quite a few customers who came from other running shoe web sites. A good data analyst can easily create a report of the top 50 web sites customers visited just before buying from you. Trying to find out where people are coming from is a good analytics question. It’s about gathering the data, counting it, and displaying it in a nice report, as shown in Figure 6-1.

Figure 6-1.
figure 1

Referral-site total visits and referral type

Note

Facebook, Twitter, and Instagram seem to bring great traffic in both paid and organic traffic. Pinterest drives a comparable amount of traffic to other sites, but about half of the traffic comes from paid advertisements. See how to create these charts at http://ds.tips/fRa4a.

A data science team goes deeper. The team might ask, why are there more people coming from Twitter than Google? Are people tweeting pictures of shoes? How many more people would visit the site if we bought advertising on Twitter? Is one site better than the other for releasing new products? Are people more likely to visit the site if they see a picture of a shoe? These questions are separate from the data. There’s no reason a business analyst, marketing specialist, or project manager can’t ask these questions.

A study of economics departments showed that when people from different disciplines collaborated, they were more likely to produce higher-quality publications. Diversity of opinion was a benefit to the quantity and quality of their work. In addition, people from different backgrounds are more likely to disagree. Disagreement causes everyone to work harder. In the end, this makes everyone’s arguments stronger. If everyone on your team easily agrees on the best questions, you’re probably not asking very interesting questions.

When you create your data science team, try to include many people from different parts of your organization. You want everyone in your organization to think about how they’ll be more data-driven. If you only hire data scientists for your team, you’re likely to make data science seem like a dark art—something only a few highly skilled people should attempt. This will make your data science less creative and disconnected from the rest of the organization.

In your data science team, it’s important to separate analysis from insight. A data analyst captures, counts, and presents the data. Insights are much tougher to get. You need to follow the scientific method of posing interesting questions and looking for results. Don’t let your team only produce data analysis. You want them to work harder. It’s likely that someone from the business side will push the team to ask more interesting questions. It’s also likely that someone from a marketing team will have interesting questions about your customer.

Some organizations have started moving in this direction. Companies like LinkedIn have created data walls that show different reports and charts from the data analysts. These walls of information allow people from all over the organization to see if there is anything interesting in the data. A marketing assistant might see an interesting story or an intern in human resources might think of an interesting question. This is a good way to get feedback from other parts of the organization.

Some organizations are going further and making sure that each data science team has a representative from both the marketing and the project management offices. This ensures that your data science team has someone who specializes in thinking about the customer as well as someone who understands how to deliver value to the rest of the organization.

Creating a Data Science Mindset

One term you’ll hear frequently in relation to data science teams is “data-driven.” It’s a little bit of a tricky term. We all like to use data to drive our decisions. If you decide not to eat sushi from a gas station, it’s based on real data. You’re using past experience and maybe some observations to make a good decision. More often than not, your intuition is right—or at least half right. Try not to think of data-driven decision-making as a drop-in replacement for your own intuition. A data-driven culture uses data to enhance the team’s intuition, not to replace it.

Your data science team will be the starting point for creating a larger data-science mindset that has a deeper relationship with data. Try to think of data-driven organizations as companies with many data science teams reinforcing a data science mindset. These teams create a culture of questions and insight. They should help the organization not only collect data, but also make it actionable.

A data science team will have three major areas of responsibility. These three areas create the foundation for your data science team, which will help the rest of your organization embrace this new mindset. They are:

  • Collecting, accessing, and reporting on data (groundwork): This involves processing the raw data into something that everyone else can understand.

  • Asking good questions: This drives interesting data experiments, and may come from the team members who don’t necessarily have a technology background. They can be from business, marketing, or management. They ask interesting business questions and push everyone to question their assumptions.

  • Making the data actionable: This will be the responsibility of team members who are primarily concerned with what the team has learned and how this data can be applied to the organization.

I once worked for a retail organization that sold home hardware and construction supplies. The company maintained several call centers because many customers preferred to call in their orders instead of using a mobile application.

The company was just starting out with data science and wanted the data science team to understand why these customers preferred to call in, because it costs a lot to maintain call centers. In addition, orders taken over the phone were much more likely to have errors. There were three people on the data science team: someone who understood the data, a business analyst, and a project manager. The three of them got together and tried to understand why these customers preferred to call in.

The business analyst was the first person to start asking questions. Do these customers have an account to order through their mobile phone? Are they professionals or residential customers? How much are they spending?

The team then created the data reports, shown in Figure 6-2. The data showed that most of the people were professionals who regularly placed several orders through their mobile devices. The orders they placed via the call center were much smaller than the orders placed through the mobile application. Around 80% of the transactions were less than $20. The business analyst had the follow-up question, “Why are some of our most loyal professional customers calling in for orders less than $20?”

Figure 6-2.
figure 2

Data reports for sales channels

Note

Most of the orders are placed by organizations; however, most of the orders placed through calls are by individuals. The average total value of an order placed through calls by individuals is the lowest across all categories. See how to create these charts at http://ds.tips/3uprU .

After looking at the data and talking to a few customer service representatives, they figured out that these customers were calling because they needed a small part to fix a big problem. The customer service representatives were looking up that part while these professionals were on job sites. Most of the time on the phone was spent describing, identifying, and expediting a crucial part that they needed.

The team tried an experiment. They contacted a few of their high-volume, professional customers and asked them to send a picture when they needed an emergency part. They called it their “Pic-it-Ship-it” program. They hoped this would increase customer satisfaction and decrease the time spent on the phone trying to describe the part.

The data science team was small, but they still covered all three areas of responsibility. They collected the data and created interesting reports. The business analyst asked interesting questions and got some insight into the customers. Finally, the project manager organized an experiment and started a small trial program. They collected new data, asked interesting questions, and made the insights actionable.

Before the data science team ran these experiments, the organization always assumed that these people were small-dollar, residential customers who were more comfortable on the phone than with a mobile application. Their intuition was only partially right. The majority of the people calling were actually some of their most valuable customers. A data science mindset with a diverse team led to better insights.

Summary

In this chapter, you explored the roles in a data science team. You found out what skills to bring to the table. You also saw how you can create a data science mindset. In Chapter 7, you’ll find out how to form your team.