Getting Started with Databricks Tools for Understanding Massive Data Sets

  • Robert Ilijason

Your browser needs to be JavaScript capable to view this video

Try reloading this page, or reviewing your browser settings

You're watching a preview of subscription content. Log in to check access

A hit-the-ground-running introduction to the expansive world of understanding your data. If you fail to understand what all the hoopla is around data analytics, but you know you need to jump in somewhere, this video is for you. Learning a handful of techniques you’ll get started with turning great quantities of data into useful information that can help you better understand your own world of data.

This video is a user-friendly entry point that teaches you how to use machine learning algorithms to move from the trenches of data to making informed data-driven decisions. It begins with a brief introduction to data analysis and guidance on how to set up a development environment in Databricks. From there you will investigate data using popular tools such as SQL and Python on top of Apache Spark. From there you will move on to learning how to clean up data and prepare it for use in a machine-learning algorithm. The final segment offer guidance for presenting the results visually.

What You Will Learn

  • Investigate small and large datasets using SQL and Python

  • Clean and prepare data for advanced analytics

  • Run machine learning algorithms

  • Show data results using visualizations

Who This Video Is For

People who work with data, including analysts, data engineers, data scientists, and the data curious. Basic knowledge about the data field is assumed, but no special skills are needed.

This video will show you how to get started with Databricks to move from the trenches of data to making informed data-driven decisions.

About The Author

Robert Ilijason

Robert Ilijason is a 20 year veteran in the business intelligence segment. He’s worked as a contractor for some of Europe’s biggest companies and has done large-scale analytics work within retail, telecom, banking, government, and more. He has seen a lot of trends come and go over the years, but unlike most of them he believes that Spark in the cloud, especially with Databricks, is a game changer.


About this video

Robert Ilijason
Online ISBN
Total duration
1 hr
Copyright information
© Robert Ilijason 2021

Related content

Video Transcript


Hi, and welcome to this introduction to Databricks. My name is Robert Ilijason, and I will follow you throughout the course. I’ve been working within the business intelligence and data science fields for about 20 years, working mostly with large enterprise customers. I’ve been working with Spark and Databricks a little over four or five years.

I even wrote a book about it, Beginning Apache Spark using Azure Databricks for Apress. Feel free to check it out. So in this course, we’ll be looking at large scale analytics. What it is, why it’s hot right now, and what will happen in the future. And also, we’re looking at Databricks and see how it fits into the larger picture.

You will, of course, also learn how to work with the tool and learn some of the tricks of the trade so that you can get started easier and faster than if you were to start from scratch. Throughout the course, we’ll be looking at a case study and trying to figure out if we can see what the quality of wine is just by looking at the core components such as fixed acidity, citric acid, and density.

This is a classic machine learning problem because we have a bunch of old data, and we’re trying to extrapolate it to new data– new wines. To do that, we need to learn how to look at the data and understand it. We need to learn how to clean the data and prepare it for machine learning. And of course, we need to learn how to run the machine learning algorithms and read the results.

So by the end of the course, I hope that you will have learned how to do a complete machine learning project in Databricks. Thank you for taking this course, and I hope you’ll enjoy it.