Abstract
Underpinning all open-source intelligence investigations is data. Without data there is nothing to build upon, to combine, to analyse or draw conclusions from. This chapter outlines some of the processes an investigator can undertake to obtain data from open sources as well as methods for the preparation of this data into usable formats for further analysis. First, it discusses the reasons for needing to collect data from open sources. Secondly, it introduces different types of data that may be encountered including unstructured and structured data sources and where to obtain such data. Thirdly, it reviews methods for information extraction—the first step in preparing data for further analysis. Finally, it covers some of the privacy, legal and ethical good practices that should be adhered to when accessing, interrogating and using open source data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.
- 43.
- 44.
- 45.
- 46.
- 47.
- 48.
- 49.
- 50.
- 51.
- 52.
- 53.
- 54.
2008. Data Protection Directive 95/46/EC—EUR-Lex—Europa.eu. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31995L0046:en:HTML.
- 55.
2011. Protection of personal data—European Commission. http://ec.europa.eu/justice/data-protection/.
References
Bayerl PS, Akhgar B (2015) Surveillance and falsification implications for open source intelligence investigations. Commun ACM 58(8):62–69
Bazzell M (2016) Open source intelligence techniques: resources for searching and analyzing online information. CCI Publishing
Bird S (2006) NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on interactive presentation sessions. Association for Computational Linguistics, July 2006, pp 69–72
Bradbury D (2011) In plain view: open source intelligence. Comput Fraud Secur 2011(4):5–9
Cavoukian A (2011) 7 Foundational principles of privacy by design. https://www.ipc.on.ca/images/Resources/7foundationalprinciples.pdf
Chen H (2011) Dark Web: exploring and mining the dark side of the web. In: 2011 European intelligence and security informatics conference (EISIC). IEEE, Sept 2011, pp 1–2
College of Policing (2013) Investigation process. In: Authorised professional practice. https://www.app.college.police.uk/app-content/investigations/investigation-process/#material
College of Policing (2015) Intelligence cycle. In: Authorised professional practice. https://www.app.college.police.uk/app-content/intelligence-management/intelligence-cycle/
Cunningham H, Tablan V, Roberts A, Bontcheva K (2013) Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput Biol 9(2):e1002854
DARPA (2014) Memex aims to create a new paradigm for domain-specific search. In: Defense Advanced Research Projects Agency. http://www.darpa.mil/news-events/2014-02-09
Defense Technical Information Center (DTIC), Department of Defense (2007) Joint intelligence. http://www.dtic.mil/doctrine/new_pubs/jp2_0.pdf
FBI Intelligence Cycle (n.d.) In: Federal Bureau of Investigation. https://www.fbi.gov/about-us/intelligence/intelligence-cycle
Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, June 2005, pp 363–370
Fu T, Abbasi A, Chen H (2010) A focused crawler for Dark Web forums. J Am Soc Inform Sci Technol 61(6):1213–1231
Gibson S (2004) Open source intelligence. RUSI J 149:16–22
Greenwald G, MacAskill E, Poitras L (2013) Edward Snowden: the whistleblower behind the NSA surveillance revelations. In: The guardian. http://www.theguardian.com/world/2013/jun/09/edward-snowden-nsa-whistleblower-surveillance
Hansen D, Shneiderman B, Smith MA (2010) Analyzing social media networks with NodeXL: insights from a connected world. Morgan Kaufmann, Los Altos
HMIC (Her Majesty’s Inspectorate of Constabulary) (2011) The rules of engagement: a review of the August 2011 riots. https://www.justiceinspectorates.gov.uk/hmic/media/a-review-of-the-august-2011-disorders-20111220.pdf
Hoepman JH (2014) Privacy design strategies. In: IFIP international information security conference. Springer, Berlin, June 2014, pp 446–459
Imran M, Elbassuoni S, Castillo C, Diaz F, Meier P (2013) Practical extraction of disaster-relevant information from social media. In: Proceedings of the 22nd international conference on World Wide Web. ACM, May 2013, pp 1021–1024
Lohr S (2014) For big-data scientists, “Janitor Work” is key hurdle to insights. In: The New York Times. http://mobile.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=2
Madhavan J, Ko D, Kot Ł, Ganapathy V, Rasmussen A, Halevy A (2008) Google’s deep web crawl. Proc VLDB Endowment 1(2):1241–1252
Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The Stanford CoreNLP Natural Language Processing Toolkit. In ACL (System Demonstrations), June 2014, pp 55–60
Mercado SC (2009) Sailing the sea of OSINT in the information age. Secret Intell Reader 78
NATO (2001) NATO open source intelligence handbook
Omand D, Bartlett J, Miller C (2012) Introducing social media intelligence (SOCMINT). Intell Natl Secur 27(6):801–823
Pallaris C (2008) Open source intelligence: a strategic enabler of national security. CSS Analyses Secur Policy 3(32):1–3
Rogers C, Lewis R (eds) (2013) Introduction to police work. Routledge, London
Shein E (2013) Ephemeral data. Commun ACM 56:20
Warden P (2010) How I got sued by Facebook. In: Pete Warden’s blog. https://petewarden.com/2010/04/05/how-i-got-sued-by-facebook/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this chapter
Cite this chapter
Gibson, H. (2016). Acquisition and Preparation of Data for OSINT Investigations. In: Akhgar, B., Bayerl, P., Sampson, F. (eds) Open Source Intelligence Investigation. Advanced Sciences and Technologies for Security Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-47671-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-47671-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47670-4
Online ISBN: 978-3-319-47671-1
eBook Packages: Computer ScienceComputer Science (R0)