Azure Databricks
Azure Databricks is a cloud-based big data analytics platform. It provides a collaborative environment for data scientists, data engineers and business analysts to work together on big data projects. Azure Databricks on the Federal Science DataHub (FSDH) combines the power of Apache Spark with a collaborative notebook interface, making it easy to build and deploy data pipelines, machine learning models and analytics applications.
Azure Databricks is ideal for:
- processing large datasets
- building machine learning models
- running interactive queries
- collaborating on data projects
The FSDH allows users to provision Azure Databricks for research, enabling them to conduct analysis of data at a large scale.
Visual Studio Code extension
Using the Databricks Visual Studio (VS) Code extension, users can connect to a Databricks workspace from within VS Code, allowing them to:
- write code locally in VS Code and run it remotely on a Databricks cluster to leverage cloud computing
- run SQL queries on a Databricks cluster and see the results locally in VS Code
- manage Databricks clusters
Google APIs and services
Users can use Google APIs on Databricks to run tools only available on Google Cloud, such as Big Query or Earth Engine, extending what can be accomplished on the FSDH.
Sample use cases
- A team of researchers at Environment and Climate Change Canada uses Azure Databricks to analyze climate data and automatically build dashboards for monitoring trends in Power BI. The FSDH provides them with a secure and scalable environment to process their data.
- Data scientists at Agriculture and Agri-Food Canada use Azure Databricks to house a database for pathogens present in Canada. They are now able to build dashboards from their data and have future options to build machine learning models on the data.