Data science is a “concept to unify statistics, data analysis, machine learning and their related methods” in order to “understand and analyze actual phenomena” with data. It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, data mining, databases, and visualization.

RedisGraph v1.0 released, benchmarking proves its 6-600 times faster than existing...

RedisGraph was released in beta mode six months ago. On the 14th of November, RedisLabs announced the general availability of RedisGraph v1.0. RedisGraph is a Redis module that adds a graph database functionality to Redis. RedisGraph delivers a fast and efficient way to store, manage and process graphs, around 6 to 600 times faster than [...]

TimescaleDB 1.0 officially released

On Tuesday, the team at Timescale announced the official production release of TimescaleDB 1.0. Two months ago, the team released its initial release candidate. With the official release, TimescaleDB 1.0 is now the first enterprise-ready time-series database that supports full SQL and scale. This release has crossed over 1M downloads and production deployments at Comcast, [...]

An empirical evaluation of imbalanced data strategies from a practitioner’s point...

This research tested the following well known strategies to deal with binary imbalanced data on 82 different real life data sets (sampled to imbalance...

The What-If Tool: Code-Free Probing of Machine Learning Models

The What-If Tool: Code-Free Probing of Machine Learning Models Posted by James Wexler, Software Engineer, Google AI Building effective machine learning (ML) systems means asking a lot of questions. It's not enough to train a model and walk away. Instead, good practitioners act as detectives, probing to understand their model better: How would changes to [...]

Relational inductive biases, deep learning, and graph networks

Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been...

Michelangelo PyML: Introducing Uber’s Platform for Rapid Python ML Model Development

As a company heavily invested in AI, Uber aims to leverage machine learning (ML) in product development and the day-to-day management of our business. In pursuit of this goal, our data scientists spend considerable amounts of time prototyping and validating powerful new types of ML models to solve Uber’s most challenging problems (e.g., NLP based [...]

Clustering and Learning from Imbalanced Data

ArXiv article A learning classifier must outperform a trivial solution, in case of imbalanced data, this condition usually does not hold true. To overcome this...

Google Dataset Search Launched to Help Analysts Scour Repositories

Google Dataset Search is a new product in the beta phase that you can use to find datasets published online. The single interface allows you to search repositories worldwide. Imagine you start a new analytics project. For example, let’s say you want to explore numbers pertaining to Boston Public Schools. Before you would search for it in [...]

PipelineDB 1.0.0, the high performance time-series aggregation for PostgreSQL

Three years ago, the PipelineDB team published the very first release of PipelineDB, as a fork of PostgreSQL. It received enormous support and feedback from thousands of organizations worldwide, including several Fortune 100 companies. It was highly requested that the fork be released as an extension of PostgreSQL. Yesterday, the team released PipelineDB 1.0.0 as [...]