Blog | 2 of 33 | PyTorch

February 26, 2025

Accelerating Generative AI with PyTorch: Segment Anything 2 - Fast and furious inference with low latency and fast cold starts

This post is a follow-up to our first entry in the multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch and a focus on latency and elastic scalability. We use torch.compile and torch.export to create highly optimized low latency versions of SAM2 that can be quickly scaled up on new instances.

February 11, 2025

Unlocking the Latest Features in PyTorch 2.6 for Intel Platforms

PyTorch* 2.6 has just been released with a set of exciting new features including torch.compile compatibility with Python 3.13, new security and performance enhancements, and a change in the default parameter for torch.load. PyTorch also announced the deprecation of its official Anaconda channel.

February 05, 2025

Enabling advanced GPU features in PyTorch - Warp Specialization

Meta: Hongtao Yu, Manman Ren, Bert Maher, Shane Nay NVIDIA: Gustav Zhu, Shuhao Jiang

January 29, 2025

PyTorch 2.6 Release Blog

We are excited to announce the release of PyTorch® 2.6 (release notes)! This release features multiple improvements for PT2: torch.compile can now be used with Python 3.13; new performance-related knob torch.compiler.set_stance; several AOTInductor enhancements. Besides the PT2 improvements, another highlight is FP16 support on X86 CPUs.

January 28, 2025

2025 Priorities for the PyTorch Technical Advisory Council (TAC)

January 24, 2025

How Intel Uses PyTorch to Empower Generative AI through Intel Arc GPUs

Intel has long been at the forefront of technological innovation, and its recent venture into Generative AI (GenAI) solutions is no exception. With the rise of AI-powered gaming experiences, Intel sought to deliver an accessible and intuitive GenAI inferencing solution tailored for AI PCs powered by Intel’s latest GPUs. By leveraging PyTorch as the backbone for development efforts, Intel successfully launched AI Playground, an open source application that showcases advanced GenAI workloads.

January 21, 2025

Accelerating LLM Inference with GemLite, TorchAO and SGLang

Large Language Models (LLMs) are typically very resource-intensive, requiring significant amounts of memory, compute and power to operate effectively. Quantization provides a solution by reducing weights and activations from 16 bit floats to lower bitrates (e.g., 8 bit, 4 bit, 2 bit), achieving significant speedup and memory savings and also enables support for larger batch sizes.

📣 Submit to Speak at PyTorch Conference + Save on Registration

Accelerating Generative AI with PyTorch: Segment Anything 2 - Fast and furious inference with low latency and fast cold starts

Unlocking the Latest Features in PyTorch 2.6 for Intel Platforms

Enabling advanced GPU features in PyTorch - Warp Specialization

PyTorch 2.6 Release Blog

2025 Priorities for the PyTorch Technical Advisory Council (TAC)

How Intel Uses PyTorch to Empower Generative AI through Intel Arc GPUs

Accelerating LLM Inference with GemLite, TorchAO and SGLang

Install PyTorch

Quick Start With
Cloud Partners

Docs

Tutorials

Resources

Install PyTorch

Quick Start WithCloud Partners

Docs

Tutorials

Resources

Quick Start With
Cloud Partners