NVIDIA has this week announced the availability of its cuTENSOR v1.4, which now supports up to 64-dimensional tensors, distributed multi-GPU tensor operations, and helps improve tensor contraction performance models. The cuTENSOR v1.4 software is now available to download for free allowing you to check out its capabilities. cuTENSOR is a high-performance CUDA library for tensor primitives and its features include Extensive mixed-precision support for
FP64 inputs with FP32 compute, FP32 inputs with FP16, BF16, or TF32 compute, Complex-times-real operations and Conjugate (without transpose) support.
NVIDIA cuTENSOR v1.4 new features
“The cuTENSOR Library is a first-of-its-kind GPU-accelerated tensor linear algebra library providing tensor contraction, reduction and elementwise operations. cuTENSOR is used to accelerate applications in the areas of deep learning training and inference, computer vision, quantum chemistry and computational physics.”
cuTENSOR v1.4 supports up to 64-dimensional tensors, arbitrary data layouts and trivially serializable data structures as well as offering support for various activation functions, arbitrary tensor permutations and conversion between different data types.
- Supports up to 64-dimensional tensors.
- Supports distributed, multi-GPU tensor operations.
- Improved tensor contraction performance model (i.e.,
algo CUTENSOR_ALGO_DEFAULT
). - Improved performance for tensor contraction that have an overall large contracted dimension (i.e., a parallel reduction was added).
- Improved performance for tensor contraction that have a tiny contracted dimension (<= 8).
- Improved performance for outer-product-like tensor contractions (e.g.,
C[a,b,c,d] = A[b,d] * B[a,c]
). - Additional bug fixes.
Source : NVIDIA
Latest Geeky Gadgets Deals
Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.