1. The current state of applied data science
|
Ben Lorica explores recent trends in practical use and key bottlenecks in supervised machine learning.
|
2. PyTorch or TensorFlow?
|
Awni Hannun details the differences he found between PyTorch and TensorFlow (with an emphasis on programmability and flexibility for a deep learning stack rather than performance).
Related:
|
3. How do you read math-heavy ML papers?
|
How (besides reading this newsletter, of course) do you keep up with all the recently released ML papers—particularly the math-dense ones? This is an interesting Reddit thread offering advice on skimming and/or reading math-heavy ML papers. Sample: "The secret for reading algebra-heavy papers is NOT trying to follow the algebra on the first read."
+ Likewise, Medium has 6,548 articles about AI/ML/DL written by 3,528 people. Here are the top 100 determined by the number of recommends.
|
Online learning just got stickier
|
 If you learn best by actually getting your hands on the material, we've got good news. Our new "Powered by Jupyter" live online training courses let you do live coding and data analysis right in your browser as you work alongside your instructor. No tedious setup or installation needed; just jump right in and start learning.
|
4. 3 enablers for ML in data unification
|
Human-guided ML pipelines for data unification and cleaning might be the only way to provide complete and trustworthy datasets for effective analytics.
+ Check out Ihab Ilyas's session, Solving data cleaning and unification using human-guided machine learning, at the Strata Data Conference in New York, September 25–28.
|
5. Machine learning for humans
|
"Simple, plain-English explanations accompanied by math, code, and real-world examples."
|
Strata NY courses selling out
|
The Apache Spark for machine learning and data science 2-day course has sold out. The Cloudera big data architecture workshop only has 2 spots left. And the other 2-day training courses and tutorials are booking up fast (as are hotel rooms). If you are planning to attend the Strata Data Conference in NY September 25–28, make your plans soon.
Check it out (but do it fast).
|
6. 3 things driving enterprise adoption of Jupyter
|
William Merchan describes the fundamental trends driving the adoption of Jupyter and its deployment in large organizations.
Related:
|
7. Going multicloud with AWS and GCP
|
In this post, Charles Allen discusses the lessons Metamarkets learned going to a multicloud model and compares AWS and GCP as cloud providers.
+ "A multicloud strategy is the foundation for digital transformation."
|
8. How to build an image recognizer in R with a few images
|
"Training an image recognition system requires LOTS of images—millions and millions of them. It involves feeding those images into a deep neural network, and during that process, the network generates 'features' from the image....But if you don't have millions of images, it's still possible to generate these features from a model that has already been trained on millions of images."
|
In collaboration with Lightbend
What's the role of ML in fast data and streaming applications?
|
 Emre Velipasaoglu, principal data scientist at Lightbend, will host a free 60-minute webcast that will explore machine learning methods for fast data and streaming applications, ideal use cases, and how ML can help you unlock value from your data. Tuesday, September 12 | 10am PT
|
9. Semantic data lake architecture in healthcare
|
Here is a look at how data lakes work for Montefiore Health System and the role of semantics and graph databases in the data lake architecture.
+ The data lake: Improving the role of Hadoop in data-driven business management (session at Strata in NY)
|
10. A paper explained
|
"Explained simply: How DeepMind taught AI to play video games" is a wonderfully gentle explanation of this important AI paper.
|