Reproducibility is an essential requirement for a lot of fields related to research. It also includes areas that are based on machine learning techniques. But it’s also true that most of the ML-based research publications are either not reproducible or are too difficult to reproduce.
Contents
PyTorch Hub Announced to Improve Machine Learning Research Reproducibility
The PyTorch Team announced the release of PyTorch Hub yesterday. It’s a simple API and workflow offering the basic building blocks for the improvement of machine learning research reproducibility.
If you don’t know, PyTorch is basically a machine learning library for Python. It’s based on the Torch library and used in natural language processing applications and others.
PyTorch Hub comes with a pre-trained model repository that is designed to facilitate research reproducibility and new research. It provides integrated support for Colab, integration with Papers With Code. PyTorch also comes with a set of models including classification and segmentation, transformers, generative, etc.
Furthermore, you can add a simple hubconf.py file to make the hub support the publication of pre-trained models to a GitHub repository, which has all the information regarding supported models and the list of dependencies required to run them.
Read: Facebook Open Sources Pythia: A Deep Learning Framework
For example, you can check out repositories such as torchvision, huggingface-bert and gan-model-zoo. Now, in the case of torchvision hubconf.py, each of the model files in the repository can function and can be executed independently. They don’t need any separate entry-points or require any package except for PyTorch. The hubconf.py can help users to send a pull request based on the template specified on the GitHub page.
According to the official blog post, “Our goal is to curate high-quality, easily-reproducible, maximally-beneficial models for research reproducibility. Hence, we may work with you to refine your pull request and in some cases reject some low-quality models to be published. Once we accept your pull request, your model will soon appear on Pytorch hub webpage for all users to explore.”
PyTorch Hub allows users to explore available models, load a model and understand the kind of models available for any given model. Have a look at the few examples mentioned below:
Explore available entrypoints
You can use torch.hub.list() API to list all the all available entrypoints in a repository. Apart from pre-trained models, the Hub also allows auxiliary entrypoints such as bertTokenizer for preprocessing the BERT models and making the user workflow more smoother.
Also Read: 5 Things Programmers Should Know to Learn Machine Learning
Loading a model
You can use torch.hub.load() API to load a model entrypoint. This API also provides useful information about instantiating the loaded model.
If you’re interested, you can know more about the PyTorch Hub here.