Add support for configuring EFA receiver on EC2 via JSON#2093
Add support for configuring EFA receiver on EC2 via JSON#2093mitali-salvi wants to merge 12 commits intomainfrom
Conversation
Add a new translator that converts CloudWatch Agent JSON config to OTel YAML config for the awsefareceiver. Customers can configure EFA metrics collection under metrics.metrics_collected.efa with optional measurement filtering and collection interval overrides. The translator follows the existing awsnvme translator pattern: - Reads from metrics::metrics_collected::efa config path - Supports collection_interval with agent-level fallback (default 60s) - Supports measurement filtering with efa_ prefix auto-detection - Validates metric names against the 22 known EFA metrics - Disables unselected metrics when measurement list is specified Assisted-by: Kiro CLI
Register the EFA receiver in the host delta metrics pipeline alongside diskio and net receivers. Add EFA as a recognized native OTel receiver in the adapter layer and include efaKey in the cumulativetodelta processor's default keys so EFA alone satisfies the delta pipeline requirement. Assisted-by: Kiro CLI
Add efaDefinitions to the JSON schema under metrics_collected. Unlike other receivers, measurement is optional for EFA - an empty config enables all 22 metrics by default. Assisted-by: Kiro CLI
e6c81b3 to
df22c22
Compare
|
This PR was marked stale due to lack of activity. |
| @@ -0,0 +1,123 @@ | |||
| // Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. | |||
There was a problem hiding this comment.
Need to add goldel yaml tests
There was a problem hiding this comment.
The existing tests already use the golden JSON→YAML pattern (testdata/.json input → testdata/.yaml expected output). We now have 9 test cases covering: selective metrics, non-prefixed names, empty measurements, wildcard, measurement-required errors, and interval overrides. Added wildcardConfig.json/yaml golden pair. See commit 4e79df7.
There was a problem hiding this comment.
By the golden yaml tests, I mean the ones in translator/tocwconfig/sampleConfig/ that test the end to end pipelines and not just the individual components. You could probably add this to the complete linux one.
|
Mind posting a screenshot of what the actual metrics look like in the CW metrics console? Do they have device and interfaceID as dimensions? Without host/InstanceId, are these currently getting aggregated to the account level? |
…ics, add wildcard support - Make measurement field required (error if missing), consistent with other host plugins - Replace hardcoded allEfaMetrics map with reflection on MetricsConfig struct tags - Add wildcard '*' support in measurement list to enable all metrics - Update tests for measurement-required behavior and add wildcard test case
- Remove append_dimensions property (declared but not implemented in translator) - Add required: [measurement] to match basicMetricDefinition pattern
Standalone workflow_dispatch workflow that builds the agent and runs the efa_ec2 integration test on an EFA-enabled EC2 instance via terraform/ec2/efa in the test repo.
- Add ec2_efa_matrix output to GenerateTestMatrix job - Add EC2EfaIntegrationTest job (linux-only, follows GPU pattern) - Uses terraform_dir from matrix (terraform/ec2/efa) - Remove standalone efa-ec2-integration-test.yml workflow
… EFA - Remove wildcard ["*"] support from measurement to maintain consistency with existing plugins (diskio, net, nvidia_gpu, cpu) which all require explicit metric names - Add drop_original_metrics to efaDefinitions schema and toDropMap so customers can suppress original metrics when using aggregation_dimensions - Remove wildcard test case and test data files - Simplify translator to always unmarshal metrics config
Moving EFA EC2 integration test workflow to a separate PR.
Description of the issue
Add support for configuring the awsefareceiver via the CWA JSON configuration. Customers will be able to configure EFA (Elastic Fabric Adapter) metrics collection under
metrics.metrics_collected.efa.EFA is a distinct hardware type (Elastic Fabric Adapter) that exposes 22 cumulative monotonic sum metrics via
/sys/class/infiniband/sysfs. The receiver is a native OTel receiver (not adapter-based) that runs in thehostDeltaMetricspipeline with cumulative-to-delta processing.Configuration Examples
{ "metrics": { "metrics_collected": { "efa": { "measurement": ["efa_tx_bytes", "efa_rx_bytes", "efa_rdma_read_bytes"], "metrics_collection_interval": 30 } } } } { "metrics": { "metrics_collected": { "efa": { "measurement": ["tx_bytes", "rx_dropped"] } } } }{ "metrics": { "aggregation_dimensions": [["aws.efa.device"]], "metrics_collected": { "efa": { "measurement": ["efa_tx_bytes", "efa_rx_bytes"], "drop_original_metrics": ["efa_tx_bytes", "efa_rx_bytes"] } } } }License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
Requirements
Before committing your code, please do the following steps.
make fmtandmake fmt-shmake lintIntegration Tests
To run integration tests against this PR, add the
ready for testinglabel.