Making Spark Fly: NVIDIA Accelerates World’s Most Popular Data Analytics Platform
The world’s most popular data analytics application, Apache Spark, now offers revolutionary GPU acceleration to its more than half a million users through the general availability release of Spark 3.0.
Databricks provides the leading cloud-based enterprise Spark platform, run on over a million virtual machines every day. At the Spark + AI Summit today, Databricks announced that Databricks Runtime 7.0 for Machine Learning features GPU-accelerator aware scheduling with Spark 3.0, developed in collaboration with NVIDIA and other community members.
Google Cloud recently announced the availability of a Spark 3.0 preview on Dataproc image version 2.0 , noting the powerful NVIDIA GPU acceleration that’s now possible thanks to the collaboration of the open source community. We’ll be hosting a webinar with Google Cloud on July 16 to dive into these exciting new capabilities for data scientists.
In addition, the new open source RAPIDS Accelerator for Apache Spark is now available to accelerate ETL (extract, transform, load) and data transfers to boost analytics performance from end to end, without any code changes.
Faster performance on Spark not only means faster insights, but also reduced costs since enterprises can complete workloads using less infrastructure.
Accelerated Data Analytics: Scientific Computing Makes Sense of AI
Spark is increasingly in the news for good reason.
Data is essential to helping organizations navigate shifting opportunities and possible threats. But to do so, they need to decipher the critical clues hidden in their data.
Organizations add to their heaps of information every time a customer clicks on a website, hosts a call with customer support or generates a daily sales report. With the rise of AI, data analytics has become critical to helping companies spot trends and stay ahead of changing markets.
Until recently, data analytics has relied on small datasets to glean historical data and insights. This data was analyzed through ETL on highly structured data, stored in traditional data warehouses.
ETL often becomes a bottleneck for data scientists working on AI-based predictions and recommendations. Estimated to take up 70-90 percent of a data scientist’s time, ETL slows down workflows and ties up sought-after talent on the most mundane part of their work.
When a data scientist is waiting for ETL, they’re not retraining their models to gain better business intelligence. Traditional CPU infrastructure can’t scale efficiently to accommodate these workloads, which often causes costs to balloon.
With GPU-accelerated Spark, ETL no longer spells trouble. Industries such as healthcare, entertainment, energy, finance, retail and many others can now cost-effectively accelerate their data analytics insights.
The Power of Parallel Processing for Data Analytics
GPU parallel processing allows computers to...