RepoDatabricks (DBRX)Databricks (DBRX)published Apr 3, 2020seen 5d

databricks/devrel

HTML

Open original ↗

Captured source

source ↗
published Apr 3, 2020seen 5dcaptured 8hhttp 200method plain

databricks/devrel

Description: This repository contains the notebooks and presentations we use for our Databricks Tech Talks

Language: HTML

Stars: 734

Forks: 447

Open issues: 8

Created: 2020-04-03T21:32:14Z

Pushed: 2025-01-06T03:23:48Z

Default branch: master

Fork: no

Archived: no

README:

tech-talks

This repository contains the notebooks and presentations we use for our Databricks Tech Talks.

You can find links to the tech talks below as well as the notebooks for these sessions directly in the repo.

Sections

  • [Upcoming Tech Talks](#Upcoming-Tech-Talks)
  • [Featured](#Featured)
  • [Previous Tech Talks](#Previous-Tech-Talks)
  • [COVID 19 Samples](#COVID-19-Samples)
  • [Datasets](#datasets)
  • [Notebooks](#notebooks)

Upcoming-Tech-Talks

2020-04-29 - Workshop | Introduction to Data Analysis for Aspiring Data Scientists: Introduction to Apache Spark

This workshop covers the fundamentals of Apache Spark, the most popular big data processing engine. In this workshop, you will learn how to ingest data with Spark, analyze the Spark UI, and gain a better understanding of distributed computing. We will be using data released by the Johns Hopkins Center for Systems Science and Engineering (CSSE) Novel Coronavirus (COVID-19). Prior basic Python experience is recommended.

2020-04-30 Using Delta as a Change Data Capture Source

While it is common to use Delta Lake as a sink for change data captured from traditional data sources; customers are increasingly asking how to use Delta tables as a source for a change data capture (CDC) process. To state a different way, how can we read a stream of changes from a Delta table, so that they can be propagated downstream. In each of these cases, we want to capture a change stream from a Delta table and send it somewhere for further processing. In this session, we will discuss the architecture, use cases, and solutions.

Featured

[Notebook | Johns Hopkins CSSE COVID-19 Analysis](./samples/JHU%20COVID-19%20Analysis.html)

This notebook processes and performs quick analysis from the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE (https://github.com/CSSEGISandData/COVID-19). The data is updated in the /databricks-datasets/COVID/CSSEGISandData/ location regularly so you can access the data directly. The following animated GIF shows the COVID-19 confirmed cases and deaths per 100K people per the Johns Hopkins CSSE dataset spanning March 22nd to April 14th 2020.

[Notebook | NY Times COVID-19 Analysis](./samples/NYT%20COVID-19%20Analysis.html)

This notebook processes and performs quick analysis from the NY Times COVID-19 dataset (https://github.com/nytimes/covid-19-data). The data is updated in the /databricks-datasets/COVID/covid-19-data/ location regularly so you can access the data directly. The following animated GIFs shows the COVID-19 confirmed cases and deaths per 100K people from the NY Times dataset spanning two week window around when educational facilities were closed for Washington (3/13) and New York (3/18) states .

Previous-Tech-Talks

2020-04-23 Predictive Maintenance (PdM) on IoT Data for Early Fault Detection w/ Delta Lake

Predictive Maintenance (PdM) is different from other routine or time-based maintenance approaches as it combines various sensor readings and sophisticated analytics on thousands of logged events in near real time and promises several fold improvements in cost savings because tasks are performed only when warranted. The collaborative Data and Analytics platform from Databricks is a great technology fit to facilitate these use cases by providing a single unified platform to ingest the sensor data, perform the necessary transformations and exploration, run ML and generate valuable insights.

2020-04-22 - Workshop | Introduction to Data Analysis for Aspiring Data Scientists: Machine Learning with scikit-learn

scikit-learn is one of the most popular open-source machine learning libraries among data science practitioners. This workshop will walk through what machine learning is, the different types of machine learning, and how to build a simple machine learning model. This workshop focuses on the techniques of applying and evaluating machine learning methods, rather than the statistical concepts behind them. We will be using data released by the Johns Hopkins Center for Systems Science and Engineering (CSSE) Novel Coronavirus (COVID-19). Prior basic Python experience is recommended.

2020-04-16 - Diving into Delta Lake: DML Internals

In the earlier Delta Lake Internals webinar series sessions, we described how the Delta Lake transaction log works. In this session, we will dive deeper into how commits, snapshot isolation, and partition and files change when performing deletes, updates, merges, and structured streaming.

2020-04-15 - Workshop | Introduction to Data Analysis for Aspiring Data Scientists: Data Analysis with Pandas

This workshop is on pandas, a powerful open-source Python package for data analysis and manipulation. In this workshop, you will learn how to read data, compute summary statistics, check data distributions, conduct basic data cleaning and transformation, and plot simple visualizations. We will be using data released by the Johns Hopkins Center for Systems Science and Engineering (CSSE) Novel Coronavirus (COVID-19). Prior basic Python experience is recommended.

2020-04-08 - Workshop | Introduction to Data Analysis for Aspiring Data Scientists: Introduction to Python on Databricks

Python is a popular programming language because of its wide applications including but not limited to data analysis, machine learning, and web development. This workshop covers major foundational concepts necessary for you to start coding in Python, with a focus on data analysis. You will learn about different types of variables, for loops, functions, and conditional statements. No prior programming knowledge is required.

2020-04-02 - [Diving into Delta Lake: Enforcing and Evolving…

Excerpt shown — open the source for the full document.