At ISC High Performance 2017, held in Frankfurt, Germany, deep learning is driving new computing innovation as processor manufacturers and systems developers race to deliver products optimised for deep learning applications.
Apparently, 2017 is the year that deep learning becomes a mainstream computing technology. This is good news for HPC developers as it is increasing demand for HPC hardware but there are still optimisations that must be made to fine tune both hardware and software for use in deep learning or future AI research.
Cray announced the Cray Urika-XC analytics software suite which aims to deliver analytics tools – specifically targeting analytics and deep learning to the Company’s line of Cray XC supercomputers.
Nvidia launched its PCIE based Volta V100 GPU. However, the company also demonstrated the use of its GPU technology in combination with deep learning as part of the human brain project.
HPE launched new server solutions aimed specifically at HPC and AI workloads while Mellanox highlighted its work to fine tune technology for AI and deep learning applications. Mellanox announced that deep learning frameworks such as TensorFlow, Caffe2, Microsoft Cognitive Toolkit, and Baidu PaddlePaddle can now leverage Mellanox’s smart offloading capabilities. Mellanox claims that this technology can provide near-linear scaling across multiple AI servers.
The Cray Urika-XC solution is a set of applications and tools optimised to run seamlessly on the Cray XC supercomputing platform. In basic terms the company is taking the toolset it has developed through the Urika GX platform, optimising it for deep learning and then applying the software and toolsets to its XC series of supercomputers.
The software package is comprised of the Cray Graph Engine, the Apache Spark analytics environment, the BigDL distributed deep learning framework for Spark, the distributed Dask parallel computing libraries for analytics, and widely-used languages for analytics including Python, Scala, Java, and R.
The Cray Urika-XC analytics software suite highlights the convergence of traditional HPC and data-intensive computing – such as deep learning – as core workloads for supercomputing systems in the coming years.
As the data volumes in HPC grow the industry is responding by moving away from the previous FLOPs centric model to a more data-centric model. This requires not only innovation in parallel processing, network, and storage performance but also the software and tools used to process the vast quantities of data needed to train deep learning networks.
While deep learning is not the only trigger for this new model it exemplifies the changing paradigm of architectural design in HPC.
One example of this is the Swiss National Supercomputing Centre (CSCS) in Lugano, Switzerland which currently uses the Cray Urika-XC solution on the ‘Piz Daint,’ which, after its recent upgrade, is now one of the fastest supercomputers in the world.
‘CSCS has been responding to the increased needs for data analytics tools and services,’ said Professor Thomas Schulthess, director of the Swiss National Supercomputing Centre (CSCS). ‘We were very fortunate to participate with our Cray supercomputer Piz Daint in the early evaluation phase of the Cray Urika-XC environment. Initial performance results and scaling experiments using a subset of applications including Apache Spark and Python have been very promising. We look forward to exploring future extensions of the Cray Urika-XC analytics software suite.’
Also this week at ISC, Nvidia announced the PCI Express version of their latest Tesla GPU accelerator, the Volta-based V100. The SXM2 form factor card was first announced earlier this year at the company’s GPU technology conference (GTC) but users can now use the more traditional PCIE slot to connect the Volta-based GPU.
It is not just hardware in the spotlight however as the company also highlighted some of the latest research that is making use of these technologies such as the Human Brain Project. Created in 2013 by the European Commission, the project’s aims include gathering, organizing and disseminating data describing the brain and its diseases, and simulating the brain itself.
Scientists at the Jülich Research Center (Forschungszentrum Jülich), in Germany, are developing a 3D multi-modal model of the human brain. They do this by analysing thousands of ultrathin histological brain slices using microscopes and advanced image analysis methods — and then reconstructing these slices into a 3D computer model.
Analysing and registering high-resolution 2D image data into a 3D reconstruction is both data and compute-intensive. To process this data as fast as possible the Julich researchers are using Jülich’s JURON supercomputer – one of two pilot systems delivered by IBM and NVIDIA to the Jülich Research Center.
The Juron cluster is composed of 18 IBM Minsky servers, each with four Tesla P100 GPU accelerators with NVIDIA NVLink interconnect technology.
Deep learning drives product innovation across the industry
Hewlett Packard Enterprise was also keen to get in on the AI action as the company launched the HPE Apollo 10 Series.
HPE Apollo 10 Series is a new platform, optimised for entry level Deep Learning and AI applications. The HPE Apollo sx40 System is a 1U dual socket Intel Xeon Gen10 server with support for up to 4 NVIDIA Tesla SXM2 GPUs with NVLink. The HPE Apollo pc40 System is a 1U dual socket Intel Xeon Gen10 server with support for up to 4 PCIe GPU cards.
‘Today, customer’s HPC requirements go beyond superior performance and efficiency,’ said Bill Mannel, vice president and general manager, HPC and AI solutions, Hewlett Packard Enterprise. ‘They are also increasingly considering security, agility and cost control. With today’s announcements, we are addressing these considerations and delivering optimised systems, infrastructure management, and services capabilities that provide A New Compute Experience.’
Collaboration to drive AI performance
Mellanox announced that it is optimising its existing technology to help accelerate deep learning performance. The company announced that deep learning frameworks such as TensorFlow, Caffe2, Microsoft Cognitive Toolkit, and Baidu PaddlePaddle can now leverage Mellanox’ smart offloading capabilities to increase performance and, the company claims, provide near-linear scaling across multiple AI servers.
The Mellanox announcement highlights the work of the company to ensure its products can meet the requirements of users running deep learning workloads but it also demonstrates Mellanox’ willingness to work with partners, such as Nvidia, to further increase performance and integration of their individual technologies.
‘Advanced deep neural networks depend upon the capabilities of smart interconnect to scale to multiple nodes, and move data as fast as possible, which speeds up algorithms and reduces training time,’ said Gilad Shainer, vice president of marketing at Mellanox Technologies. ‘By leveraging Mellanox technology and solutions, clusters of machines are now able to learn at a speed, accuracy, and scale that push the boundaries of the most demanding cognitive computing applications.’
One of the key points of this announcement is that Mellanox is working with partners to ensure that deep learning frameworks and hardware (such as Nvidia GPUs) are compatible with Mellanox interconnect fabric to help promote the use of Mellanox networking solutions to AI/deep learning users.
More information was provided by Duncan Poole, director of platform alliances at NVIDIA: ‘Developers of deep learning applications can take advantage of optimised frameworks and NVIDIA’s upcoming NCCL 2.0 library which implements native support for InfiniBand verbs and automatically selects GPUDirect RDMA for multi-node or NVIDIA NVLink when available for intra-node communications.’