Originally published on Wes McKinney's blog.
I’m excited to announce that NVIDIA AI Labs has signed on as a supporter of Ursa Labs. NVIDIA’s new open source RAPIDS data science platform uses Apache Arrow for an interoperable representation of tabular data (data frames). We are looking forward to collaborating on our respective development roadmaps and growing the ecosystem of projects that use Arrow.
This new financial support will enable us to grow our team of full-time open source software developers.
Apache Arrow: An Open Standard for Columnar Data on the GPU
When NVIDIA and a group of startups embarked on the GPU Open Analytics Initiative last year to build libraries of pandas-style analytics for CUDA-powered GPU devices, they had to solve the problem of data portability between systems. The cost of serialization or memory copying can frequently exceed that of the underlying GPU kernel execution.
Anyone who’s been actively following my blog posts, presentations, and Twitter feed will know that Apache Arrow is fundamentally about making it easy to share data and share algorithms.
- If you can’t share data at zero cost, then crossing languages or computation environments incurs a high penalty
- You generally cannot share algorithms (code) without utilizing common in-memory data structures
The Arrow columnar format is an ideal fit for in-memory analytics on GPUs. The contiguous columnar memory layout is efficient for the manycore CUDA execution model.
It’s an exciting time both for the Apache Arrow community as well as GPGPU developers who might be interested in powering up their feature engineering and data preparation with GPU-enabled libraries.
We’re grateful for NVIDIA’s support of Ursa Labs and look forward to growing a larger team and continuing to make progress on the tech roadmap.
If you or your company would be interested in becoming a sponsor also, please get in touch with us at email@example.com.