Facebook launched Horizon, its first open source reinforcement learning platform for large-scale products and services, yesterday. The workflows and algorithms in Horizon have been built on open source frameworks such as PyTorch 1.0, Caffe2, and Spark. This is what makes Horizon accessible to anyone who uses RL at scale.
“We developed this platform to bridge the gap between RL’s growing impact in research and its traditionally narrow range of uses in production. We deployed Horizon at Facebook over the past year, improving the platform’s ability to adapt RL’s decision-based approach to large-scale applications”, reads the Facebook blog.
Facebook has already used this new platform to gain performance benefits such as delivering more relevant notifications, optimizing streaming video bit rates, and improving personalized suggestions in Messenger. However, given the Horizon’s open design and toolset, it will also be benefiting other organizations in RL.
Harnessing reinforcement learning for large-scale production
Horizon uses reinforcement learning to make decisions at scale by taking into account the issues specific to the production environments. These include feature normalization, distributed training, large-scale deployment, and data sets with thousands of varying feature types.
Moreover, as per Facebook, applied RL models are more sensitive to noisy and unnormalized data as compared to the traditional deep networks. This is why Horizon preprocesses these state and action features in parallel with the help of Apache Spark. Once the training data gets preprocessed, PyTorch-based algorithms are used for normalization and training on the graphics processing unit.
Also, Horizon’s design focuses mainly on large clusters, where distributed training on many GPUs at once allows engineers to solve the problems with millions of examples. Horizon supports algorithms such as Deep Q-Network (DQN), parametric DQN, and deep deterministic policy gradient (DDPG) models. Then comes the training process in Horizon where a Counterfactual policy evaluation (CPE) is run. CPE refers to a set of methods that are used to predict the performance of a newly learned policy. Once the evaluation is done, its results are logged to TensorBoard. Once the training gets done, Horizon exports the models using ONNX, so that these models can be efficiently served at scale.
Now, usually, in many RL domains, the performance of a model is measured by trying it out. However, since Horizon performs large-scale production, it is important to ensure that the test models are tested thoroughly before deploying them at scale. To achieve this, Horizon solves policy optimization tasks, which in turn ensures that the training workflow also automatically runs state-of-the-art policy evaluation techniques. These techniques include sequential doubly robust policy evaluation and MAGIC.
The evaluation is then combined with anomaly detection which automatically alerts engineers if a new iteration of the model performs radically different than the previous one before the policy gets deployed to the public.
Facebook plans on adding new models & model improvements along with CPE integrated with real metrics to Horizon in the future.
“We are leveraging the Horizon platform to discover new techniques in model-based RL and reward shaping, and using the platform to explore a wide range of additional applications at Facebook, such as data center resource allocation and video recommendations. Horizon could transform the way engineers and ML models work together”, says Facebook.
For more information, check out the official Facebook blog.