Keywords

1 Introduction

Visualisation is a subject of growing importance to the field of computer forensics, in particular when dealing with large volumes of data from certain sources, such as social media sites. In an age where personal digital communication devices that can house information about our whereabouts, our conversations, or even current emotions are common [1], there are a multitude of ways that this information can be of value to a forensics investigation. Providing varied information on the different facets of an individual’s activity [2]. The data sets can be extensive considering some of the new devices providing monitoring of personal fitness or health. It may be possible to query personal digital devices and social media for a wide range of data: What on-line names do they use? What information was sent to their contacts or received from their contacts? Even who had an elevated heart rate? The volume of data is further complicated by the process of trying to identify a conversation traversing multiple communication mechanisms, Facebook, Twitter, WhatsApp, iMessage, email, and SMS across multiple devices.

Existing research has made some progress in addressing this issue [36]. However the difficulty currently faced by forensic examiners is that the process involves extensive data sets, is time-consuming and requires a degree of manual processing. There are current tools that are able to visualise aspects of the data as part of a digital forensics investigation. Current visualisation tools provide only limited interactivity and an expert understanding of different visualisation tools, often with different input formats.

This paper presents an architecture that encourages direct interaction with social media data in the visualisations to compliment the exploratory and investigative mindset of an investigator. Social media application data is stored in an abstracted fashion into cloud-based storage and a defined API provides the interface for different visualisation tools to make requests for data. Visualisation tools retrieve data bound by their inherent data types and can send data directly to other visualisation tools depending on the pairing of compatible input/output data types. The key contributions of this work are summarised below:

  • We present a visualisation tool framework encouraging data exploration between tools.

  • We demonstrate how heterogenous social media data sources can be unified and exploited by investigators.

  • We highlight a novel way of recording the visualisation trail to store the thought process of an investigator.

The rest of this paper is arranged as follows. Firstly, related work in the area of cell phone forensics visualisation is considered, secondly the architecture of the platform is described, thirdly a case study is presented to highlight the advantages of the platform, fourthly the outcome of the case study is analysed, finally the conclusions and future work are discussed.

2 Related Work

As stated by Garfinkel [7] forensic visualisations tend to serve two purposes, presentation and discovery. The former specialising in describing what has happened to a courtroom, the latter helping an investigator reach conclusions about some criminal activity. Discovery tools should be [7] data driven, have fixed, predictable, static output and interactive for producing the visualisation. In addition to [7] we should also be seeking to record the data-mining activities the investigator uses in order to retrace through the thought/decision making process (Fig. 1). The visualisations should indeed be repeatable so prosecution and defence can view the same output however a means of visualising how an investigator derived their conclusions would also be beneficial when describing the process to a jury for example.

Fig. 1.
figure 1

A scattergraph plot sends a list of usernames from gathered social media data to a listing tool.

Social media should be an area of considerable interest to the forensics investigator as it provides an insight into user actions and activities although these need to be interpreted with care [8].

Mutawa [9] has suggested when considering smartphones and social media that these devices are “…a goldmine for forensic investigators” and demonstrated that social media activity is retained on a number of cell phones, while recognising that this type of communication takes place across multiple devices [9]. A number of efforts have been made to examine the connections between mobile devices to better understand user activities [3]. Other tools have been applied to the visualization of social media including NodeXL [10] an extension of Microsoft Excel to enable a ‘network graph’ visualisation of data. There have also been efforts made into visualizing other aspects of social networks [4] and visualising the results. There are therefore existing tools, techniques and solutions of both commercial and open source forensics tools that can be applied to an investigation involving social media data. There are also a number of big data tools that can be used for social media analysis, including those tools used for e-discovery [1115]. The tool proposed in this paper bridges the gap between these tools providing an extensible social media analysis tool capable of visualisations that are appropriate for forensic analysis.

2.1 Cell Phone Forensics Visualisations

There are existing tools available for forensic visualisation; these can provide some visual representation of the data. How can we visualise cell phone social media communications? Commercial software is already looking at social-based data. However, as can be seen below, commercial applications are not yet able to visualise much social networking information, instead relying on more manual query languages to obtain answers in investigations. Social links tend to be made from more traditional digital communications, emails between individuals, SMS/MMS messages, telephone calls, etc. Open source software tends to follow a more generic approach, towards visualising any structured data rather than anything focusing on a specific niche like cell phone forensics.

The following considers the state of cell phone forensics and how recovered social media data is being incorporated into visualisations.

2.1.1 AccessData MPE+

MPE+ is the mobile phone forensics application from AccessData, producers of FTK and FTK Imager. MPE+ itself specialises in data extraction from mobile phones [16], and has its own visualisation component [17]. It uses SMS, MMS, and call histories to look for relationships, and has a social analysis that looks for relationships via email addresses. Mention is made of an SQL Builder feature, which allows investigators to run custom queries on app data, but nothing specific is mentioned about correlations from social media sources.

2.1.2 Micro Systemation XRY and XAMN

Micro Systemation focuses on the data extraction with XRY and the visualisation with XAMN [18]. XAMN provides a number of different views including timeline, list, connections and geographic. Although XRY provides the means to extract application data, XAMN does not currently visualise social media communications.

2.1.3 CelleBrite UFED Link Analysis

CelleBrite provides presentation and discovery visualisations through the Link Analysis software [19]. Link Analysis provides the means to view commonalities across multiple cell phones in terms of phone calls, SMS, MMS, email, chat and a few others. Social media communications do not presently feature in Link Analysis.

2.1.4 Oxygen Forensics

Oxygen Forensics suite contains several different visualisation tools [20], timeline, web connections & locations, links & stats, but of particular interest is the Social Graph. The Social Graph looks at relationships between device owners and their contacts, i.e. how much time was spent using different communication mechanisms. Metrics include call length, number of messages sent and times of communication [20].

2.1.5 Open Source Software (OSS) Cell Phone Tools

OSS tools tend to concentrate on the extraction of data from the standard sources (call history, SMS, MMS, email [21]) and then on an application-by-application basis (e.g. WhatsApp [22], Skype [23]). Some of the tools generate lists as a visual aid for presentation. There are a number of OSS tools that lend themselves expertly to big-data social network visualisation [24] but are not designed for the type of focus required of the types of digital forensic investigations described in this paper.

To the best of the authors’ knowledge no one has addressed the discovery side of visualisation for cell phone forensics in the open source community with a focus on the analysis and correlation of social media activity.

3 Architecture and Implementation

The architecture of the platform discussed in this paper is introduced below, highlighting the different core components of the storage, tool-to-tool data exchange and the audit process of the examiner.

3.1 Cloud Storage

SQlite is a database format found on most cell phones, as it is portable (i.e., store as files) and provides the advantages that come with relational database architectures. Mobile applications on Google’s Android and Apple’s iOS primarily make use of SQlite.

One of the main issues with these relational databases is the way the information is stored. This is usually in a standard format, as per the SQlite specification, although the actual table and column layouts are bespoke on the needs of the application. This poses a challenge to investigators and any tools they may rely upon. Applications are updated very frequently and it would not take much for a database design to change overnight.

In order to support the tools that have been developed, the database design proposed in [5, Fig. 2] provides a unifying database design for storing social media information. Individual parsers for Twitter and Facebook were created to convert the SQlite information into XML and then these were transformed then into the relevant social media XML format which is inserted into the cloud-based data store. Apache Cassandra was used to store the information and an XML middleware was then used to store and extract information when required by the visualisations.

Fig. 2.
figure 2

History Manager records an investigators progress.

Fig. 3.
figure 3

Recording the visualisation progression can reveal much about an investigation when expert testimony is required.

Fig. 4.
figure 4

Rapid visual investigation: Timeline (Top Left) to Scattergraph (Top Right) to InfoList (Bottom).

3.2 Interacting with Different Visualisation Tools

To facilitate data exchange between visualisations, the tools must incorporate the following components:

A configuration file (CDOC) that describes the quantity and type of data it can accept as input. These XML CDOCs are used whenever an investigator selects a subset of data within one visualisation tool. A list is presented to the investigator of other visualisation tools that can accept the subset as input.

An information file (IDOC) that is used to transport the data between tools.

It is an XML file that describes where the relevant data is in the underlying data source that the receiving tool needs to query. The structure stores the source visualisation tool, the destination visualisation tool, the type of information to be visualised, and the identifier of the row entry that stores the information. In the following sample, we can see two tools exchanging information relating to the relationships between social media accounts (Fig. 5).

Fig. 5.
figure 5

Sample IDOC XML.

With these incorporated, the actual exchange of data becomes a straightforward task for the forensic investigator. As in Fig. 1, an analyst selects the grouping or pattern they wish to investigate further, selects another visualisation tool that can receive such input from the library, and the new visualisation tool is invoked with the subset. The investigator can even create a new instance of the same visualisation tool (albeit with the reduced dataset) if they wish.

As the process of identifying compatible visualisation tools based on their data-types is performed in real-time, new visualisation tools can be integrated without closing down the entire system and disrupting the investigative process.

3.3 Recording the Thought Process

When a forensic practitioner is using visualisation as a mechanism to discover facts relating to an investigation, the process is typically unidirectional. Large datasets are visualised, groupings of interest are investigated, subsets are extracted for further visualisation until a discovery is made (Fig. 3). With the architecture, we log the exchange that takes place between visualisation tools to let an investigator reinvoke existing visuals at a later date. As mentioned above, each visualisation tool is started with an IDOC XML file that describes the location of the data the tool needs to request data from the cloud data store.

As the investigator moves from one visualisation to another, we store the IDOC files and use a GUI tool called the History Manager (Fig. 2) to visualise the investigator’s trail. Any views the investigator had seen previously are stored with the IDOC. The History Manager provides the facility to the investigator to select an existing IDOC and launch the visualisation tool, with the same inputs, and produce the same visual output again. The History Manager simply stores the files and other descriptive information in a hierarchical directory structure, which can easily be stored and recalled for later use or archiving.

4 Case Study

There are a range of potential sources for social media data. Access to corporate servers may be a lengthy legal process leading investigators to consider other possible sources [5, 25]. Therefore the case study detailed in this paper uses data collected directly from the mobile phones.

A simple scenario was developed to test the system between three individuals typifying communication usage across several platforms such as instant messages, voice over IP and different devices. As part of the study the individuals also shared documents using Google drive and email. The simulated criminal activity is the selling of illegally obtained information and software. Much of the discussion between the phone owners take place over Facebook and Twitter; in particular making use of Twitter’s direct messaging function and Facebook’s Messenger application.

The investigator, as part of his research, wants to look into the social media communications to identify any evidence of nefarious activity. However, it appears the thread of communication occurs over several social media types. In particular the following are found during an investigation of applications found on the cell phones:

  • com.twitter.android/db/xxxxxxxxxx.db

  • com.facebook.orca/db/threads_db2

These SQLite files contain private messages between the phone owner and others. The investigator begins by ingesting the files of interest into the data store. Once these have been parsed, he opens the Timeline tool to obtain an appreciation of the timescales in the data.

The investigator then tries to look for correlations between the different usernames and social media formats found. To do this he interacts with the Timeline tool (Fig. 4, Top Left) by clicking the background, which initiates a search for other tools in the architecture that can take these visualised data-types as input. The Scattergraph tool is selected, and the dataset is transferred as an IDOC XML file. This is stored in the HistoryManager and can be accessed at a later date if required.

The Scattergraph tool (Fig. 4, Top Right) initially shows the two types it was provided (Date/Time and Usernames). The axes are altered to show the Source Interaction against the Usernames. This highlights how users sent messages via different social media formats. From this visualisation, the investigator now wants to look at the actual messages sent, across all social media types, ordered by time.

The Infolist application (Fig. 4, Bottom) provides a view that lets the investigator see the messages sent in time order. It includes all social media formats from this case study and is not focused purely on one type.

5 Case Study Results and Evaluation

The case study highlights the advantages of the architecture. An investigator is able to easily transition from one tool to another, and rapidly data mine to gain an understanding of what has happened in their investigation. It should be clear from the case study that the emphasis in this paper is on the facilitation of interactivity rather than any perceived novelty of the visualisations themselves.

The investigator has the ability to choose which visualisation tools they want to use when visually mining the data. In this fashion they are able to make discoveries more rapidly as the data mining process is effectively tailored to suit.

By looking for commonalities across social media formats we are able to process the data collectively. Using the right combination of visualisation tools conversation threads across different communication mediums can be aligned and interpreted in a more straightforward fashion than existing mechanisms.

The analysis process in the case study was kept short for brevity. However, true of both a quick triage and a full investigation, the History Manager stores all interactions between visualisation tools. The IDOCs are effectively a log or audit trail of an investigators visualisation thought process.

6 Conclusions

Visualisation of social media is of major importance to computer forensic investigators in an age where personal digital communication occurs over various applications on multiple platforms. Existing tools and techniques for analysis include a range of commercial and open source forensics applications and Big Data visualisation tools. This paper has highlighted the lack of forensics tools that are appropriate for interactively visualising social media from cell phones. This paper describes the testing of an open source platform capable of large scale forensics visualisation. Data is stored in an abstracted fashion into cloud-based storage and a defined API provides the interface for multiple interactive visualisation tools to access the data. Visualisation tools retrieve data and can send data to other tools. This platform supports a forensic investigators analysis by enabling multiple simultaneous interactive visualizations to be generated: While capturing the investigators actions to record and log the process of the investigation. Future work will concentrate on the possibility of integrating other third-party visualisations into the platform, and the exploration of other potential evidence sources in a unified fashion to provide a more holistic view of an investigation.