Skip to content

Add support for configuring EFA receiver on EC2 via JSON#2093

Open
mitali-salvi wants to merge 12 commits intomainfrom
feature/efa-receiver-translator
Open

Add support for configuring EFA receiver on EC2 via JSON#2093
mitali-salvi wants to merge 12 commits intomainfrom
feature/efa-receiver-translator

Conversation

@mitali-salvi
Copy link
Copy Markdown
Contributor

@mitali-salvi mitali-salvi commented Apr 17, 2026

Description of the issue

Add support for configuring the awsefareceiver via the CWA JSON configuration. Customers will be able to configure EFA (Elastic Fabric Adapter) metrics collection under metrics.metrics_collected.efa.

EFA is a distinct hardware type (Elastic Fabric Adapter) that exposes 22 cumulative monotonic sum metrics via /sys/class/infiniband/ sysfs. The receiver is a native OTel receiver (not adapter-based) that runs in the hostDeltaMetrics pipeline with cumulative-to-delta processing.

Configuration Examples

{
  "metrics": {
    "metrics_collected": {
      "efa": {
        "measurement": ["efa_tx_bytes", "efa_rx_bytes", "efa_rdma_read_bytes"],
        "metrics_collection_interval": 30
      }
    }
  }
}

{
  "metrics": {
    "metrics_collected": {
      "efa": {
        "measurement": ["tx_bytes", "rx_dropped"]
      }
    }
  }
}
{
  "metrics": {
    "aggregation_dimensions": [["aws.efa.device"]],
    "metrics_collected": {
      "efa": {
        "measurement": ["efa_tx_bytes", "efa_rx_bytes"],
        "drop_original_metrics": ["efa_tx_bytes", "efa_rx_bytes"]
      }
    }
  }
}

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

  • Unit tests: 9 test cases all passing (including wildcard and measurement-required error cases)
  • Manual: deployed CWA to EC2 instance with EFA devices, generated traffic between two EFA-enabled instances, confirmed agent starts and collects metrics
Screenshot 2026-04-17 at 14 51 41 Screenshot 2026-05-01 at 13 57 06

Requirements

Before committing your code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

Integration Tests

To run integration tests against this PR, add the ready for testing label.

@mitali-salvi mitali-salvi requested a review from a team as a code owner April 17, 2026 18:52
@mitali-salvi mitali-salvi added the ready for testing Indicates this PR is ready for integration tests to run label Apr 17, 2026
Add a new translator that converts CloudWatch Agent JSON config to OTel
YAML config for the awsefareceiver. Customers can configure EFA metrics
collection under metrics.metrics_collected.efa with optional measurement
filtering and collection interval overrides.

The translator follows the existing awsnvme translator pattern:
- Reads from metrics::metrics_collected::efa config path
- Supports collection_interval with agent-level fallback (default 60s)
- Supports measurement filtering with efa_ prefix auto-detection
- Validates metric names against the 22 known EFA metrics
- Disables unselected metrics when measurement list is specified

Assisted-by: Kiro CLI
Register the EFA receiver in the host delta metrics pipeline alongside
diskio and net receivers. Add EFA as a recognized native OTel receiver
in the adapter layer and include efaKey in the cumulativetodelta
processor's default keys so EFA alone satisfies the delta pipeline
requirement.

Assisted-by: Kiro CLI
Add efaDefinitions to the JSON schema under metrics_collected. Unlike
other receivers, measurement is optional for EFA - an empty config
enables all 22 metrics by default.

Assisted-by: Kiro CLI
@mitali-salvi mitali-salvi force-pushed the feature/efa-receiver-translator branch from e6c81b3 to df22c22 Compare April 17, 2026 19:00
@github-actions
Copy link
Copy Markdown
Contributor

This PR was marked stale due to lack of activity.

@github-actions github-actions Bot added the Stale label Apr 25, 2026
@github-actions github-actions Bot removed the Stale label Apr 30, 2026
JayPolanco
JayPolanco previously approved these changes Apr 30, 2026
Comment thread translator/config/schema.json Outdated
@@ -0,0 +1,123 @@
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add goldel yaml tests

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing tests already use the golden JSON→YAML pattern (testdata/.json input → testdata/.yaml expected output). We now have 9 test cases covering: selective metrics, non-prefixed names, empty measurements, wildcard, measurement-required errors, and interval overrides. Added wildcardConfig.json/yaml golden pair. See commit 4e79df7.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the golden yaml tests, I mean the ones in translator/tocwconfig/sampleConfig/ that test the end to end pipelines and not just the individual components. You could probably add this to the complete linux one.

Comment thread translator/translate/otel/receiver/awsefa/translator.go Outdated
Comment thread translator/translate/otel/receiver/awsefa/translator.go Outdated
Comment thread translator/config/schema.json
Comment thread translator/translate/otel/receiver/awsefa/translator.go
@sky333999
Copy link
Copy Markdown
Contributor

Mind posting a screenshot of what the actual metrics look like in the CW metrics console? Do they have device and interfaceID as dimensions? Without host/InstanceId, are these currently getting aggregated to the account level?

…ics, add wildcard support

- Make measurement field required (error if missing), consistent with other host plugins
- Replace hardcoded allEfaMetrics map with reflection on MetricsConfig struct tags
- Add wildcard '*' support in measurement list to enable all metrics
- Update tests for measurement-required behavior and add wildcard test case
- Remove append_dimensions property (declared but not implemented in translator)
- Add required: [measurement] to match basicMetricDefinition pattern
Standalone workflow_dispatch workflow that builds the agent and runs
the efa_ec2 integration test on an EFA-enabled EC2 instance via
terraform/ec2/efa in the test repo.
- Add ec2_efa_matrix output to GenerateTestMatrix job
- Add EC2EfaIntegrationTest job (linux-only, follows GPU pattern)
- Uses terraform_dir from matrix (terraform/ec2/efa)
- Remove standalone efa-ec2-integration-test.yml workflow
… EFA

- Remove wildcard ["*"] support from measurement to maintain consistency
  with existing plugins (diskio, net, nvidia_gpu, cpu) which all require
  explicit metric names
- Add drop_original_metrics to efaDefinitions schema and toDropMap so
  customers can suppress original metrics when using aggregation_dimensions
- Remove wildcard test case and test data files
- Simplify translator to always unmarshal metrics config
Moving EFA EC2 integration test workflow to a separate PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for testing Indicates this PR is ready for integration tests to run

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants