HP Anyware PCoIP Session Metrics in the Amazon Cloud

HP Anyware, formerly known as Teradici, is a product that allows high-fidelity remote access to virtual workstations using the PCoIP protocol. It offers a secure, high-definition and highly responsive computing experience when working on a remote desktop hosted either on-premises or in the cloud. It’s very popular for media, entertainment, gaming and engineering users that want to use a remote high-powered remote workstation for graphically demanding workloads.

Here at Nextira, we have helped many clients use HP Anyware in AWS. There are many infrastructure patterns that can be followed to implement it based on your needs, as this doc shows.

But once your AWS/HP Anyware infrastructure is ready and fully functional, your users are ready to connect to their workstations and the support and troubleshooting phase starts. Graphical performance can be affected by the PCoIP server, the end user’s network connection, company firewalls or other intermediate devices, and load on the user’s end client. When someone is having a poor graphical experience, you need metrics to find out where the issue lies. Even though HP offers some tools to get metrics of a PCoIP session, like the PCoIP Session Statistics Viewer, it can sometimes be challenging to pull those metrics into your existing observability systems to monitor them more proactively.

In this post, we will explore a solution to programmatically push these metrics to AWS CloudWatch to empower your IT team to support your users.

Approach

The EC2 instances used as workstations run a PCoIP agent on them. HP Anyware has two agent types, a standard agent or a graphics agent that uses GPU acceleration. The agent periodically saves metrics information into log files. The strategy we will be using is to configure the AWS CloudWatch Agent in workstations with AWS SSM to push these logs into CloudWatch Logs, apply a CloudWatch Log Filter to send specific log lines to be processed by an AWS Lambda that will push the metrics to CloudWatch Metrics.

A diagram of the process that begins in AWS Systems Manager then installs and configures CW agent to Workstation which pushes logs to CloudWatch Logs. Then filtered logs are pushed to AWS Lambda. Then processed metrics are pushed to CloudWatch.

Implementation

CloudWatch Log Group

We will first create a Log group in CloudWatch where all the data is going to be pushed:

cloudwatch.tf

				
					resource "aws_cloudwatch_log_group" "pcoip_logs" {
  name              = "pcoip"
  retention_in_days = 14
  tags              = var.tags
}
				
			

CloudWatch Agent Configurations

The logs of the agent are located in /var/log/pcoip-agent/ (Linux) or C:\ProgramData\Teradici\PCoIPAgent\logs (Windows), so we need to install and configure the CloudWatch agent to push logs from there. The configurations are templates that Terraform will parse and save in AWS SSM parameters:

cloudwatch/config_windows.json

				
					{
    "logs": {
        "logs_collected": {
            "files": {
                "collect_list": [
                    {
                        "file_path": "C:\\ProgramData\\Teradici\\PCoIPAgent\\logs\\*Printing*",
                        "log_group_name": "${log_group}",
                        "log_stream_name": "{instance_id}-pcoip-printing-service",
                        "timestamp_format": "%Y-%m-%dT%H:%M:%S",
                        "timezone": "UTC"
                    },
                    {
                        "file_path": "C:\\ProgramData\\Teradici\\PCoIPAgent\\logs\\*pcoip_agent*",
                        "log_group_name": "${log_group}",
                        "log_stream_name": "{instance_id}-pcoip-agent",
                        "timestamp_format": "%Y-%m-%dT%H:%M:%S",
                        "timezone": "UTC"
                    },
                    {
                        "file_path": "C:\\ProgramData\\Teradici\\PCoIPAgent\\logs\\*pcoip_vhid*",
                        "log_group_name": "${log_group}",
                        "log_stream_name": "{instance_id}-pcoip-vhid",
                        "timestamp_format": "%Y-%m-%dT%H:%M:%S",
                        "timezone": "UTC"
                    },
                    {
                        "file_path": "C:\\ProgramData\\Teradici\\PCoIPAgent\\logs\\*pcoip_server*",
                        "log_group_name": "${log_group}",
                        "log_stream_name": "{instance_id}-pcoip-server",
                        "timestamp_format": "%Y-%m-%dT%H:%M:%S",
                        "timezone": "UTC"
                    }
                ]
            }
        },
        "log_stream_name": "default_log_stream"
    }
}
				
			

cloudwatch/config_linux.json

				
					{
    "logs": {
        "logs_collected": {
            "files": {
                "collect_list": [
                    {
                        "file_path": "/var/log/pcoip-agent/agent.log",
                        "log_group_name": "${log_group}",
                        "log_stream_name": "{instance_id}-pcoip-agent",
                        "timestamp_format": "%Y-%m-%dT%H:%M:%S",
                        "timezone": "UTC"
                    },
                    {
                        "file_path": "/var/log/pcoip-agent/session-launcher.log",
                        "log_group_name": "${log_group}",
                        "log_stream_name": "{instance_id}-pcoip-launcher",
                        "timestamp_format": "%Y-%m-%dT%H:%M:%S",
                        "timezone": "UTC"
                    }
                ]
            }
        },
        "log_stream_name": "default_log_stream"
    }
}
				
			

ssm.tf

				
					resource "aws_ssm_parameter" "cloudwatch_config_linux" {
  name = "/cw/workstations/linux"
  type = "String"
  tags = var.tags

  value = replace(templatefile("${path.module}/cloudwatch/config_linux.json", {
    log_group = aws_cloudwatch_log_group.pcoip_logs.name
  }), "/\n| /", "")
}

resource "aws_ssm_parameter" "cloudwatch_config_windows" {
  name = "/cw/workstations/windows"
  type = "String"
  tags = var.tags

  value = replace(templatefile("${path.module}/cloudwatch/config_windows.json", {
    log_group = aws_cloudwatch_log_group.pcoip_logs.name
  }), "/\n| /", "")
}
				
			

Workstations Provisioning

To install and configure automatically the CloudWatch agent in workstations, we will use AWS Systems Manager (AWS SSM). We will create an AWS SSM document and an association between this document and our workstations.

It is important to have into account the prerequisites AWS SSM Agent has to work in an EC2 instance.

We will need to ensure SSM agent is running in our workstations (it is included in new AWS provided AMIs), connectivity between the instances and SSM service and a correct policy in the Instance profile that allows the agent to work.

ssm/cloudwatch_agent_provisioning.yaml

				
					schemaVersion: '2.2'
description: 'Install and run CW agent'
mainSteps:
- action: aws:configurePackage
  name: InstallCWAgent
  inputs:
    name: AmazonCloudWatchAgent
    action: Install
- action: aws:runDocument
  name: ConfigureCWAgentLinux
  precondition:
      StringEquals:
      - platformType
      - Linux
  inputs:
    documentType: SSMDocument
    documentPath: AmazonCloudWatch-ManageAgent
    documentParameters:
      action: configure
      optionalConfigurationSource: ssm
      optionalConfigurationLocation: "${linux_ssm_parameter}"
- action: aws:runDocument
  name: ConfigureCWAgentWindows
  precondition:
      StringEquals:
      - platformType
      - Windows
  inputs:
    documentType: SSMDocument
    documentPath: AmazonCloudWatch-ManageAgent
    documentParameters:
      action: configure
      optionalConfigurationSource: ssm
      optionalConfigurationLocation: "${windows_ssm_parameter}"
				
			

ssm.tf

				
					resource "aws_ssm_document" "cloudwatch_agent" {
  name            = "install-cloudwatch-agent"
  document_type   = "Command"
  document_format = "YAML"
  tags            = var.tags

  content = templatefile("${path.module}/ssm/cloudwatch_agent_provisioning.yaml", {
    linux_ssm_parameter   = aws_ssm_parameter.cloudwatch_config_linux.name
    windows_ssm_parameter = aws_ssm_parameter.cloudwatch_config_windows.name
  })
}

resource "aws_ssm_association" "cloudwatch_agent" {
  name             = aws_ssm_document.cloudwatch_agent.id
  association_name = "install-cloudwatch-agent-in-workstations"

  targets {
    key    = "tag:${var.workstation_tag.key}"
    values = [var.workstation_tag.value]
  }
}
				
			

Filtering and Parsing Logs

Now that we have the workstations logs in CloudWatch Logs we can create a log subscription filter that will send specific logs of the log group to a lambda for processing:

cloudwatch.tf

				
					resource "aws_cloudwatch_log_subscription_filter" "pcoip_metrics" {
  depends_on = [aws_lambda_permission.allow_trigger_metrics_publisher]

  name            = "pcoip-metrics"
  log_group_name  = aws_cloudwatch_log_group.pcoip_logs.name
  filter_pattern  = "?MGMT_PCOIP_DATA ?VGMAC ?MGMT_IMG"
  destination_arn = module.metrics_publisher.lambda_function_arn
}

resource "aws_lambda_permission" "allow_trigger_metrics_publisher" {
  action        = "lambda:InvokeFunction"
  function_name = module.metrics_publisher.lambda_function_arn
  principal     = "logs.${data.aws_region.current.name}.amazonaws.com"
  source_arn    = "${aws_cloudwatch_log_group.pcoip_logs.arn}:*"
}
				
			

lambda.tf

				
					module "metrics_publisher" {
  source  = "terraform-aws-modules/lambda/aws"
  version = "4.2.0"

  function_name                     = "pcoip-metrics-publisher"
  description                       = "Lambda to process PCoIP logs and push metrics to CloudWatch"
  handler                           = "main.lambda_handler"
  runtime                           = "python3.8"
  source_path                       = "${path.module}/lambda/metrics_publisher"
  artifacts_dir                     = "${path.module}/builds"
  publish                           = true
  recreate_missing_package          = true
  ignore_source_code_hash           = true
  attach_policy_statements          = true
  cloudwatch_logs_retention_in_days = 14
  tags                              = var.tags

  policy_statements = {
    cloudwatch = {
      effect    = "Allow"
      actions   = ["cloudwatch:PutMetricData"]
      resources = ["*"]
    }
  }

  environment_variables = {
    metrics_namespace = var.metrics_namespace
  }
}
				
			

lambda/metrics_publisher/main.py

				
					import base64
import json
import zlib
import boto3
import re
import os
from datetime import datetime

NAMESPACE = os.environ.get("metrics_namespace")
cw_client = boto3.client("cloudwatch")

# Events based on https://help.teradici.com/s/article/1395
events_definitions = [
    {
        "event_pattern": ".*MGMT_PCOIP_DATA.*Tx thread info.*(?P<bw_group>bw limit\D*(?P<bw>[\d|\.]*))\W*(?P<avg_tx_group>avg tx\D*(?P<avg_tx>[\d|\.]*))\W*(?P<avg_rx_group>avg rx\D*(?P<avg_rx>[\d|\.]*)).*",
        "name": "Bandwidth metrics",
        "metrics": [
            {"Name": "PCoIPBandwidthLimit", "Unit": "Kilobytes/Second", "group": "bw"},
            {"Name": "PCoIPAvgTx", "Unit": "Kilobytes/Second", "group": "avg_tx"},
            {"Name": "PCoIPAvgRx", "Unit": "Kilobytes/Second", "group": "avg_rx"},
        ],
    },
    {
        "event_pattern": ".*MGMT_PCOIP_DATA.*Tx thread info.*(?P<rtt_group>round trip time\D*(?P<rtt>[\d|\.]*))\W*(?P<variance_group>variance\D*(?P<variance>[\d|\.]*))\W*(?P<rto_group>rto = (?P<rto>[\d|\.]*))\W*(?P<last_group>last\D*(?P<last>[\d|\.]*))\W*(?P<max_group>max\D*(?P<max>[\d|\.]*)).*",
        "name": "Latency metrics",
        "metrics": [
            {"Name": "PCoIPRoundTripTime", "Unit": "Milliseconds", "group": "rtt"},
            {"Name": "PCoIPVariance", "Unit": "Milliseconds", "group": "variance"},
        ],
    },
    {
        "event_pattern": ".*VGMAC.*Stat frms\W*(?P<R>R\D*(?P<r_a>[\d|\.]*)/(?P<r_i>[\d|\.]*)/(?P<r_o>[\d|\.]*))\W*(?P<T>T\D*(?P<t_a>[\d|\.]*)/(?P<t_i>[\d|\.]*)/(?P<t_o>[\d|\.]*)).*(?P<loss>Loss\D*(?P<r_loss>[\d|\.]*)%/(?P<t_loss>[\d|\.]*)%).*",
        "name": "Packet loss metrics",
        "metrics": [
            {"Name": "PCoIPPackeLossR", "Unit": "Percent", "group": "r_loss"},
            {"Name": "PCoIPPackeLossT", "Unit": "Percent", "group": "t_loss"},
        ],
    },
    {
        "event_pattern": ".*MGMT_PCOIP_DATA.*ubs-BW-decr\W*(?P<decrease_loss_group>Decrease\D*(?P<decrease_loss>[\d|\.]*))\W*(?P<current_group>current\D*(?P<current>[\d|\.]*))\W*(?P<active_group>active\D*(?P<active_from>[\d|\.]*)\D*(?P<active_to>[\d|\.]*))\W*(?P<adjust_factor_group>adjust factor\D*(?P<adjust_factor>[\d|\.]*)%)\W*(?P<floor_group>floor\D*(?P<floor>[\d|\.]*))\W*",
        "name": "Floor metrics",
        "metrics": [],
    },
    {
        "event_pattern": ".*MGMT_IMG.*log \(SoftIPC\).*(?P<tbl_group>tbl\W*(?P<tbl>[\d|\.]*))\W*(?P<fps_group>fps\W*(?P<fps>[\d|\.]*))\W*(?P<q_group>quality\W*(?P<quality>[\d|\.]*)).*",
        "name": "Image metrics",
        "metrics": [
            {"Name": "PCoIPQuality", "Unit": "Percent", "group": "quality"},
            {"Name": "PCoIPFPS", "Unit": "Count", "group": "fps"},
            {"Name": "PCoIPTBL", "Unit": "Count", "group": "tbl"},
        ],
    },
    {
        "event_pattern": ".*MGMT_IMG.*log \(SoftIPC\).*(?P<group1>bits\/pixel\W*(?P<bits_pixel>[\d|\.]*))\W*(?P<group2>bits\/sec\W*(?P<bits_sec>[\d|\.]*))\W*(?P<group3>MPix\/sec\W*(?P<mpix_sec>[\d|\.]*)).*",
        "name": "Image metrics",
        "metrics": [
            {"Name": "PCoIPBitsPerPixel", "Unit": "Count", "group": "bits_pixel"},
            {"Name": "PCoIPBitsPerSec", "Unit": "Count", "group": "bits_sec"},
            {"Name": "PCoIPMpixPerSec", "Unit": "Count", "group": "mpix_sec"},
        ],
    },
]


def decode_event(event):
    decoded_event = base64.b64decode(event)
    decoded_event = zlib.decompress(decoded_event, 16 + zlib.MAX_WBITS).decode("utf-8")
    decoded_event = json.loads(decoded_event)

    return decoded_event


def convert_to_number(x):
    try:
        return int(x)
    except:
        return float(x)


def handle_event(event, instance_id, match, timestamp):
    # Get instance name from instance id
    for metric in event["metrics"]:

        cw_client.put_metric_data(
            Namespace=NAMESPACE,
            MetricData=[
                {
                    "MetricName": metric["Name"],
                    "Dimensions": [
                        {"Name": "InstanceId", "Value": instance_id},
                    ],
                    "Value": convert_to_number(match.group(metric["group"])),
                    "Unit": metric["Unit"],
                    "Timestamp": timestamp,
                },
            ],
        )


def lambda_handler(event, context):
    # Decode message sent by CloudWatch
    decoded_event = decode_event(event["awslogs"]["data"])
    print(decoded_event)

    # Get relevant data
    log_stream = decoded_event["logStream"]
    log_events = decoded_event["logEvents"]
    instance_id = re.search("i-[a-zA-Z0-9]{17}", log_stream).group(0)

    # Iterate over all messages
    for log_event in log_events:
        message = log_event["message"]
        timestamp = datetime.utcfromtimestamp(
            int(log_event["timestamp"]) / 1000
        ).strftime("%Y-%m-%dT%H:%M:%S.%fZ")

        # Iterate over messages patterns
        for event_definition in events_definitions:

            # Find a match with any of the regex definitions
            match = re.search(event_definition["event_pattern"], message)
            if match:
                print(
                    "Event matched!",
                    {"event_type": event_definition["name"], "message": message},
                )

                # Push metric
                handle_event(event_definition, instance_id, match, timestamp)
                break
				
			

Results

These are some metrics extracted from a workstation running an active PCoIP session:

TX and RX

PCoIPAvgRx and PCoIPAvgTx results extracted from a workstation running an active PCoIP session.

Round Trip Time

PCoIPRoundTripTime and PCoIPVariance results extracted from a workstation running an active PCoIP session.

Packet Loss

PCoIPPacketLossR and PCoIPPacketLossT results extracted from a workstation running an active PCoIP session.

TL;DR

PCoIP session metrics can be programmatically pushed from workstations’ PCoIP agent logs using AWS Systems Manager, AWS CloudWatch and AWS Lambda.

AWS Systems Manager is used to install and configure the CloudWatch agent in the workstations, and to store CloudWatch configurations in AWS SSM parameters.

AWS CloudWatch agent is used to push logs from workstations to AWS CloudWatch logs; AWS CloudWatch Log Subscription is used to filter logs and execute a processing lambda; AWS CloudWatch Metrics is used to store the results.

AWS Lambda is used to process filtered logs from AWS CloudWatch Logs and extract metrics from there.

Reference

The code presented in this post is available as a Terraform module.

You can deploy it in your account using:

				
					module "pcoip_metrics" {
  source = "git@github.com:SixNines/hp-anywhere-session-metrics"

  metrics_namespace = "pcoip"
  
  workstation_tag = {
    key   = "type"
    value = "workstation"
  }
}
				
			

More Posts

Transform your HPC experience, streamline cluster creation and redefine the way you approach demanding computational workloads.

Learn how to establish a Docker-based Redis cluster on Mac OS for local development. Solve the issue of connecting to the cluster from the host network.

Discover the latest trends, best practices and strategies to safeguard your organization's data while unlocking the full potential of cloud technologies and AI-driven solutions.