Balancing Performance, Cost and Efficiency: A Technical Analysis of AWS g4dn and g4ad GPU Instances

As reliance on cloud technology intensifies, the demand for reliable, high-performance and cost-efficient cloud-accelerated computing instances escalates. Among the numerous service providers, Amazon Web Services (AWS) sets itself apart by offering an extensive array of GPU instances designed to accommodate various computational needs. The AWS G4ad and G4dn instances are notable for their unique blend of attributes, performance and cost efficiency.

In this analysis, the focus will be on the hardware specifications, performance capabilities and cost-effectiveness of these two instance types. The analysis will show the differences between the two cloud instances across Direct X 11 and Direct X 12. The intention is to present the distinct benefits of each instance type, facilitating an informed decision-making process based on specific requirements. G4ad.4xlarge and G4dn.4xlarge will be used for this case study, though the methodology could be applied to any instance size.

Deep-Dive into Hardware Architecture

The choice of GPUs in these instances differs significantly. The G4ad instance incorporates AMD Radeon Pro V520 GPUs, recognized for their ability to manage graphics-intensive applications and high-quality gaming. On the other hand, the G4dn instance deploys NVIDIA Tesla T4 GPUs, appreciated for their AI (Artificial Intelligence), ML (Machine Learning) and data analytics processing prowess.

In the context of CPUs, the G4ad instance employs AMD EPYC 7002 series processors, known for their solid performance and efficiency. The G4dn instance, conversely, operates with Intel Xeon Scalable processors, which offer reliable performance across numerous workloads and present the advantage of specific optimizations aligned with Intel’s architecture.

Comparative Analysis - AWS G4ad vs. G4dn

A comparison of the G4ad and G4dn instances’ hardware reveals both similarities and differences.

At the 4xlarge tier, the G4ad and G4dn instances are equipped with 16 virtual CPUs (vCPUs) and one GPU, offering strong computational capabilities for various workloads. However, a difference is observed in the GPU memory: the G4dn instance comes with 16 GiB of GPU memory, while the G4ad instance is outfitted with 8 GiB.

Here is a comparison of the costs of the .4xlarge instances used in this experiment.

  • 4xlarge: $1.204 per hour
  • 4xlarge: $0.867 per hour

AWS G4ad Instance

  • CPU: The AMD EPYC 7002 series processor powers the G4ad instance. This CPU uses the efficient Zen 2 architecture, featuring a 7-nanometer process technology and support for up to 64 cores per socket.
  • GPU: The AMD Radeon Pro V520 GPU, with its RDNA architecture, is the graphics powerhouse in the G4ad. Designed for high-performance tasks, it optimizes performance per watt and clock speed.

AWS G4dn Instance

  • CPU: G4dn instances use Intel Xeon Scalable processors based on the Cascade Lake architecture, mitigating hardware vulnerabilities and supporting advanced features like Intel Optane DC Persistent Memory.
  • GPU: NVIDIA’s Turing-based Tesla T4 GPU is utilized in G4dn instances, designed to handle diverse computational tasks including AI, machine learning and gaming.

Performance and Cost-effectiveness

Although both instances have their unique strengths, the ultimate decision depends on the specific workload requirements and budget constraints.

When it comes to graphic-intensive tasks, the G4ad instance, with its AMD Radeon Pro V520 GPUs, tends to have an edge due to the GPU’s focus on high-quality graphics rendering. It is also worth noting that the G4ad instance is more cost-effective, making it an attractive choice for businesses looking to manage their cloud costs without sacrificing performance.

DirectX 11 vs DirectX 12

DirectX 11 and DirectX 12, both products of Microsoft, serve as distinct multimedia interfaces with differing application scopes and performance advantages.

DirectX 11 is a high-level API introduced in 2009, known for its extensive backward compatibility, making it a go-to for developers targeting a wide audience, including those utilizing older hardware. It operates using a hardware abstraction layer, simplifying development but potentially curtailing the full exploitation of hardware capabilities.

DirectX 12, unveiled in 2015, operates as a lower-level API. It gives developers a direct hardware pathway, enabling greater performance optimization but demanding a thorough comprehension of hardware architecture. Although this gives scope for better performance, it adds to the complexity of development.

DirectX 11 provides broad reach and easy development, while DirectX 12 offers the promise of superior performance, albeit at the expense of increased complexity. Your choice between the two hinges on the specific needs and objectives of your project.


Our methodology for this case study prioritized a simplified setup to ensure the highest degree of parity between the G4ad and G4dn instances. The key differences lay in their hardware configurations and device drivers.

After establishing the basic AWS instances, we proceeded with installing the most recent drivers, guided by the official AWS Driver Installation Guide.

Subsequently, we installed 3DMark from UL and Heaven Benchmark from Unigine to prepare for our benchmarks. We ran both DirectX 11 and DirectX 12 Time Spy tests across these instances.

The final stage involved setting up and running tests on Nextira’s Studio in the Cloud workstations. These tests were centered around the Unreal Engine (UE) editor and the construction of a sample 3D third-person player game.

To better gauge the load on each instance under varying usage scenarios, we conducted tests with one, three and five instances of the game running simultaneously without playing.

This methodical approach, while maintaining simplicity, allowed us to generate an accurate comparison of the performance and efficiency of the G4ad and G4dn instances under different workloads.

A note should be made, the separate tests had different outputs. Each test had its comparison done individually and then an overall comparison will be made at the end based on the results.

Benchmarks and Performance Analysis

To better compare these two instance types, we put them through a series of workloads and benchmarks. Here is what we found:

Unigine Heaven Benchmark with Direct X 11:

Instance Avg. Frames Per Second (FPS) Score Min FPS Max FPS

G4ad Instance

The G4ad instance, with its AMD Radeon Pro V520 GPU, manages an average FPS of 100.8, which is slightly higher than its G4dn counterpart. This results in an overall score of 2540. The FPS range extends from 9.1 to 223.3. The AMD Radeon Pro V520 GPU in the G4ad instances is known for its dedicated support and optimization for DirectX 11, which could account for its slightly higher average FPS and overall score. However, it exhibits a broader range of minimum and maximum FPS, suggesting variability or inconsistency in performance.

G4dn Instance

The G4dn instance, powered by the NVIDIA Tesla T4 GPU, achieves a slightly lower average FPS of 98.2, resulting in a total score of 2473. The FPS range for G4dn is slightly less varied, ranging from 24.3 to 214.3. On the other hand, the NVIDIA Tesla T4 GPU in the G4dn instances, while it has strong AI and ML capabilities, might not be as optimized for DirectX 11, resulting in a slightly lower average FPS and overall score. Nonetheless, it shows a tighter range of minimum and maximum FPS, implying a more consistent performance.

Nextira’s Studio in the Cloud Running Unreal Engine Sample Project DirectX 12 Workload:

Instance Sample Workloads GPU Utilization Dedicated GPU Memory Use Shared GPU Memory Use GPU Temperature Average Frame Rate
2.8 GB
0.1 GB
32 FPS
7.6 GB
0.2 GB
34 FPS
7.7 GB
4.7 GB
35 FPS
2.3 GB
0.2 GB
41 FPS
6.4 GB
0.4 GB
39 FPS
10.5 GB
0.6 GB
35 FPS

G4ad Instances

For the G4ad instances, GPU utilization remains consistently high at 98% across all workloads, indicating the GPU’s heavy involvement in the tasks. In terms of dedicated GPU memory use, there is a marked increase from 2.8 GB at workload 1 to 7.6 GB and 7.7 GB at workloads 3 and 5, respectively. This suggests more complex tasks, or tasks requiring more GPU memory as the workload increases.

It is also worth noting that shared GPU memory use increases significantly from workload 3 to workload 5 (0.2 GB to 4.7 GB), suggesting more shared memory-intensive tasks in workload 5. The GPU temperature remains constant at 58°C and there is a minor increase in average frame rate from 32 FPS at workload 1 to 35 FPS at workload 5.

G4dn Instances

For the G4dn instances, GPU utilization is much lower compared to G4ad, standing at 20% for workload 1 and 46% for workloads 3 and 5. This indicates that these workloads on G4dn may not be as GPU intensive, or the instance may have more efficient utilization of GPU resources.

The dedicated GPU memory usage for G4dn also increases as the workload number increases: 2.3 GB at workload 1, 6.4 GB at workload 3 and peaking at 10.5 GB at workload 5. Notably, G4dn seems to handle larger memory usage more efficiently than G4ad, given its lower GPU utilization rates.

The shared GPU memory use also increases as workloads progress but at a much smaller scale than G4ad, ranging from 0.2 GB to 0.6 GB. The GPU temperature for G4dn is lower than G4ad, standing at 46°C and 47°C. Lastly, the average frame rate begins at a higher 41 FPS at workload 1, dips to 39 FPS at workload 3 and matches G4ad’s 35 FPS at workload 5.

From these observations, G4dn instances manage GPU resources more efficiently across all workloads when compared to G4ad. However, the suitability of either would depend on the specific needs of the tasks at hand.

3DMark Timespy with Direct X 12:

Instance Graphics Test 1 Graphics Test 2 Graphics Score GPU Utilization CPU Test CPU Score CPU Utilization
49.23 fps
39.38 fps
56.3 fps
56.46 fps
38.68 fps
48.44 fps

G4ad Instance

In the graphics tests, the G4ad instance demonstrates slightly higher performance in Graphics Test 1 (49.23 fps) compared to Graphics Test 2 (39.38 fps). This results in a graphics score of 5384. The GPU utilization is extremely high at 99%, indicating that the GPU is fully engaged during these tests.

As for the CPU test, it produces a high frame rate of 56.3 fps, leading to a substantial CPU score of 7949. Despite this high performance, the CPU utilization stands at just 18%, indicating that the CPU has substantial resources left untapped during the test.

G4dn Instance

The G4dn instance shows stronger performance in Graphics Test 1 (56.46 fps) than in Graphics Test 2 (38.68 fps). These results lead to an overall higher graphics score of 7525 compared to G4ad. Interestingly, despite its higher score, the GPU utilization is only 48%, suggesting a more efficient GPU utilization or a more powerful GPU compared to G4ad.

In the CPU test, the G4dn instance demonstrates a frame rate of 48.44 fps, lower than G4ad, leading to a significantly lower CPU score of 3231. This lower performance corresponds with a higher CPU utilization of 48%, indicating that despite using more CPU resources, the G4dn does not perform as well in CPU tasks as the G4ad.

Making the Most of Your Investment

When it comes to getting the most bang for your buck, the g4ad shines. Based on the Unigine Heaven Benchmark’s FPS as a performance metric, the g4ad provides a more cost-effective solution.

The g4ad also demonstrated efficient use of resources, handling GPU-heavy tasks effectively with its high GPU utilization. But remember, the best practice is to run your benchmarks using workloads representative of your applications to make the optimal choice.

The AMD EPYC 7002 series processor, which powers the AWS G4ad instance, shows greater strength in gaming applications that require complex multithreading and computing needs. Its robust performance stands out, particularly in scenarios where complex computations are needed, such as real-time strategy and massively multiplayer online games. These game types demand significant CPU resources to process AI, game logic, pathfinding and manage extensive game worlds with multiple concurrent players. The G4ad instance, with its superior CPU performance, efficiently handles these computational requirements, resulting in smoother gameplay and improved user experience.

Optimizing Workloads for Cloud Instances

To maximize the performance and cost-effectiveness of an accelerated compute instance, consider the following strategies:

  1. Utilize GPU-specific features: Each GPU has unique features and capabilities that can enhance your workload. Make use of these features and associated open-source libraries and tools to optimize your tasks.
  2. Optimize shaders and compute kernels: Shaders and compute kernels play a pivotal role in GPU workloads. Tailor these elements to the specific architecture and capabilities of the GPU to optimize their performance.
  3. Leverage GPU libraries and Software Development Kits (SDKs): GPU manufacturers provide libraries and SDKs designed for their hardware. Using these resources can streamline your development process, offer performance optimizations and enable access to unique features.
  4. Use GPU profiling tools: Profiling tools can identify performance bottlenecks, understand GPU utilization and guide optimization efforts.
  5. Engage with developer resources: GPU manufacturers offer forums, documentation and developer support. Engaging with these resources and the wider developer community can provide valuable insights and best practices.

Wrapping Up

Deciding between the g4ad and g4dn AWS instances depends on your specific workloads and their requirements. However, this comparison should provide a solid starting point for your decision-making process. While both instances offer impressive capabilities, the G4ad’s strong performance in some workloads and its cost-effectiveness makes it an enticing choice for many organizations. While the enhanced performance and large workload capabilities as well as flexible graphical use cases mean that for those looking for higher resolution and consistency, the G4dn will be a quick choice. Remember, conducting your benchmarks will always provide the most relevant insights for your specific needs.

More Posts

Transform your HPC experience, streamline cluster creation and redefine the way you approach demanding computational workloads.

Learn how to establish a Docker-based Redis cluster on Mac OS for local development. Solve the issue of connecting to the cluster from the host network.

Discover the latest trends, best practices and strategies to safeguard your organization's data while unlocking the full potential of cloud technologies and AI-driven solutions.