O'Reilly Data Newsletter

Online Learning • Conferences • Ideas

Data Newsletter

1. Cracking the box: Interpreting black box ML models

Here are a few ways to tackle the interpretability of ML models using Python.

2. Making Apache Spark effortless for all of Uber

Uber’s engineering team offers a deep, deep dive into how Uber uses Spark.

+ Case study on O’Reilly learning: How Uber used natural language processing and deep learning to improve customer service satisfaction.

3. Matplotlib 3.1 cheatsheet

Handy.

4. How to build and manage training data with Snorkel

In this episode of the O’Reilly Data Show podcast, Ben Lorica and Alex Ratner talk about how to build and manage training data with Snorkel.

+ Alex Ratner’s talk, Building and Managing Training Datasets for ML with Snorkel, is part of a strong slate of sessions on data, data networks, and data quality at the O’Reilly Artificial Intelligence Conference in San Jose.

5. Making uncommon knowledge common

Rich Barton founded Zillow, Expedia, and Glassdoor. “The Rich Barton Playbook is building Data Content Loops to disintermediate incumbents and dominate Search. And then using this traction to own demand in their industries. Or as he puts it ‘Power to the People.’”

IN COLLABORATION WITH IBM

Free ebook: Getting Your Data Ready for AI

One of the biggest obstacles to a successful AI implementation is getting the data ready: the tedious process of accessing, labeling, and transforming your data. This practical ebook explores a new crop of self-service data preparation tools and services for automating data wrangling. And it’s free, courtesy of IBM. Check it out.

Download the ebook

6. 10 simple rules for Jupyter Notebooks

Check out these ten best practices for writing and sharing computational analysis in Jupyter Notebooks. These guidelines will help ensure that your work is maintainable, reproducible, and understandable.

7. Why Monzo wasn’t working on July 29

In this post, Chris Evans describes Monzo’s recent outage (which was caused by issues with their configuration and usage of Apache Cassandra), how they identified the problem, and how the company plans to split their single Cassandra cluster into multiple clusters.

+ Building Applications with Apache Cassandra, live online O’Reilly course, September 6

8. 5 sampling algorithms you should know

Here’s an intro to common sampling techniques.

IN COLLABORATION WITH Cisco

Cisco Data Intelligence Platform

Come hear Siva Sivakumar’s keynote on the new Cisco Data Intelligence Platform (CDIP) at Strata in New York. Discover how this cloud-scale architecture brings together big data, AI/compute farm, and storage tiers to work together as a single entity while also being able to scale independently to address the IT issues in the modern data center.

9. Blockchain solutions in enterprise

What are the crucial steps for a successful blockchain-based solution?

10. Resume not working?

Seven reasons your resume could be failing you.

Small-group, hands-on training at Strata in New York

Strata’s two-day intensive, expert-led training courses give you hands-on experience in the technologies you need to succeed—they cover everything from TensorFlow to recommendation systems to serverless data apps. And the class size is kept small so you get individual attention. But that means they fill up fast. So reserve your seat at the courses that interest you ASAP.

See the courses

Share this newsletter

Want your own copy of this newsletter? Sign up here.

Get more data insight and analysis at oreilly.com

Read our Privacy Policy.

O’Reilly Media, Inc. 1005 Gravenstein Highway North, Sebastopol, CA 95472 (707) 827-7000