O'Reilly Data Newsletter

1. The secret to a successful data governance strategy

Here's how to create a flexible data governance platform with a multimodel database architecture.

2. GPU-accelerated analytics database

In this episode of the O'Reilly Podcast, Amit Vij, CEO and cofounder of Kinetica, discusses how organizations are using GPU-accelerated databases to converge artificial intelligence and business intelligence on a single platform.

3. Google's Colaboratory

Colaboratory is a Jupyter Notebook environment created to help disseminate machine learning education and research that requires no setup to use and works a lot like Google Docs. (Colaboratory FAQ here.)

4. Stop using word2vec

"Word vectors are awesome but you don't need a neural network—and definitely don't need deep learning—to find them. This post shows how to get word vectors by factorizing a 2D matrix of word co-occurrences. This one shows how to factorize a 3D tensor. (Both use the scikit-tensor Python library for multilinear algebra and tensor factorizations.)

5. Automated root cause analysis for Spark application failures

How to reduce troubleshooting time from days to seconds.
In collaboration with Zoomdata

Free: The State of Data Analytics and Visualization Adoption

Find out who's doing what in the marketplace. Understand what's working and what isn't. This free O'Reilly report, courtesy of Zoomdata, explains:
  • The industries making the biggest use of data analytics and visualization technologies
  • The most commonly analyzed sources of big data
  • The most popular technologies for analyzing streaming data
Get the free report →

6. Apps move rapidly to the cloud—Data? Not so much.

A recent 451 Business Research report commissioned by Schneider Electric, Customer Insight: Future-Proofing Your Colocation Business, shows that while use of the public cloud is growing, it's not a one-way road. 41% of the organizations surveyed moved some data and applications to the public cloud—and then back to colocation facilities.

Related resources:

7. 4 cringe-worthy mistakes startups make with data

Amanda Richardson, chief data and strategy officer for HotelTonight, describes four data trends that "aren't just lazy but also ineffective, misleading, and costing companies a ton of opportunity."

8. Technology trends in China

In this episode of the O'Reilly Data Show Podcast, Rhea Liu, analyst at China Tech Insights, discusses machine intelligence for content distribution, logistics, smarter cities, facial recognition, mobile payments, and even bike sharing.

In collaboration with data Artisans

Free: Introduction to Apache Flink

Do you want to know how to analyze real-time streaming data in large-scale systems for fraud detection, security, recommendations, and other mission-critical business functions? Analyzing data streams at large scale and low latency has been challenging—until now. In this book, you'll learn architectural considerations and best practices on how to get started on Apache Flink, a powerful open source stream processor. And it's free, compliments of data Artisans.
Download the free ebook →

9. An intro to hierarchical modeling

"This visual explanation introduces the statistical concept of hierarchical modeling, also known as mixed effects modeling" (used for modeling nested data).

10. Neural networks working together

This post dissects the design of an AI application. Zocdoc's tool for reading insurance cards uses different neural networks for detecting and reading different parts of the card. It's not a single AI; it's several, working together. This is an interesting example of how real-world AI systems will be built.
Read more about data on oreilly.com →