Skip to content

CNTRLPLANE-2649: always run OTE CI profiles for kso#78299

Open
ropatil010 wants to merge 1 commit intoopenshift:mainfrom
ropatil010:sso-update
Open

CNTRLPLANE-2649: always run OTE CI profiles for kso#78299
ropatil010 wants to merge 1 commit intoopenshift:mainfrom
ropatil010:sso-update

Conversation

@ropatil010
Copy link
Copy Markdown
Contributor

@ropatil010 ropatil010 commented Apr 24, 2026

Hi @ingvagabund

Earlier we added OTE profiles for scheduler operator but with always_run: false, optional: true parameters.
We need to make sure both non OTE/OTE CI profile should work properly. Once OTE profiles work fine then we can remove non OTE ci profiles e2e-aws-operator, e2e-aws-operator-preferred-host.

Summary by CodeRabbit

  • Tests
    • Consolidated multiple OTE E2E targets into a single combined OTE test that runs suites sequentially on one cluster.
    • Updated presubmit jobs to trigger and always run the combined OTE target, removing prior per-target presubmits and optional serial jobs.
    • Added a new orchestration script, CI step/ref, and test entry to execute and capture operator serial/parallel/preferred-host suites with aggregated results.
  • Chores
    • Added ownership metadata defining approvers and reviewers for the new step registry paths.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 24, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 24, 2026

@ropatil010: This pull request references CNTRLPLANE-2649 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Hi @ingvagabund

Earlier we added OTE profiles for scheduler operator but with always_run: false, optional: true parameters.
We need to make sure both non OTE/OTE CI profile should work properly. Once OTE profiles work fine then we can remove non OTE ci profiles e2e-aws-operator, e2e-aws-operator-preferred-host.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ropatil010
Copy link
Copy Markdown
Contributor Author

/assign @ingvagabund

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

Walkthrough

Consolidates three AWS OTE tests into a single combined OTE test, updates presubmit jobs to use the combined target (making the presubmit always run), and adds a new step-registry OTE script, ref, metadata, and OWNERS for cluster-kube-scheduler-operator.

Changes

Cohort / File(s) Summary
CI Test Config
ci-operator/config/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main.yaml
Removed three separate OTE test entries and added a single e2e-aws-operator-ote-combined test referencing the new cluster-kube-scheduler-operator-e2e-ote ref.
Presubmit Jobs
ci-operator/jobs/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main-presubmits.yaml
Replaced presubmits for separate OTE variants with a combined presubmit e2e-aws-operator-ote-combined: updated job name/context, trigger regex, rerun command, container --target, set always_run: true, and removed optional/serial presubmits.
Step Registry — OTE flow (new)
ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-commands.sh, ...-ref.yaml, ...-ref.metadata.json
Added a new OTE runner script that prepares creds/env, detects platform/cluster type, waits for cluster/operator stability, creates artifact dirs, and sequentially runs three openshift-tests suites while collecting logs/JUnit. Added a ref YAML (timeout/resources) and metadata JSON for ownership.
Step Registry OWNERS
ci-operator/step-registry/cluster-kube-scheduler-operator/OWNERS, ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/OWNERS
Added OWNERS files listing approvers and reviewers for the step-registry path and the e2e-ote subpath.

Sequence Diagram(s)

sequenceDiagram
    participant CI as CI System
    participant Presubmit as Presubmit Job
    participant StepRef as Step Ref (ref.yaml)
    participant Script as e2e-ote-commands.sh
    participant Cluster as Cluster API
    participant TestRunner as openshift-tests
    participant Artifacts as Artifact Storage

    CI->>Presubmit: trigger presubmit (e2e-aws-operator-ote-combined)
    Presubmit->>StepRef: invoke step ref
    StepRef->>Script: execute e2e-ote-commands.sh
    Script->>Cluster: load credentials, detect platform, wait for stability
    Script->>Cluster: apply permissions / proxy configuration if needed
    Script->>Artifacts: create artifact directories, record timestamps
    Script->>TestRunner: run operator-serial suite
    TestRunner-->>Artifacts: write console.log and junit
    Script->>TestRunner: run operator-parallel suite
    TestRunner-->>Artifacts: write console.log and junit
    Script->>TestRunner: run preferred-host-serial suite
    TestRunner-->>Artifacts: write console.log and junit
    Script-->>Presubmit: return aggregated exit status
    Presubmit-->>CI: report result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR does not contain Ginkgo test code with dynamic or unstable test names; changes are CI/CD configs, OWNERS files, and bash script with static test suite references.
Test Structure And Quality ✅ Passed The custom check for Ginkgo test code quality is not applicable to this PR. The openshift/release repository is a CI/CD configuration repository containing YAML and bash files, not application code with Ginkgo tests.
Microshift Test Compatibility ✅ Passed This PR does not add new Ginkgo e2e tests; it only reorganizes existing OTE test variants and creates CI infrastructure to run existing test suites.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR does not add new Ginkgo e2e tests; only CI/CD infrastructure modifications and orchestration of pre-existing openshift-tests suites.
Topology-Aware Scheduling Compatibility ✅ Passed PR only modifies CI/CD test infrastructure files, not deployment manifests, operator code, or controllers with scheduling constraints.
Ote Binary Stdout Contract ✅ Passed The PR adds a bash orchestration script that invokes openshift-tests directly, not an OTE binary extension, so the JSON stdout contract does not apply.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds bash CI orchestration script, not new Ginkgo e2e tests. Script lacks IPv4-specific hardcoding and handles disconnected environments, making IPv6 check inapplicable.
Title check ✅ Passed The PR title accurately reflects the main change: consolidating three separate OTE CI test variants into a single always-run test profile for the kube-scheduler-operator.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ci-operator/config/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main.yaml (1)

103-110: Consider adding a timeout for consistency with other OTE tests.

The e2e-aws-operator-preferred-host-serial-ote test lacks a timeout setting, while both e2e-aws-operator-serial-ote and e2e-aws-operator-parallel-ote have timeout: 8h0m0s. Since this is also a serial E2E test, it may benefit from the same timeout configuration to prevent unexpected job terminations.

Proposed fix to add timeout
 - as: e2e-aws-operator-preferred-host-serial-ote
   steps:
     cluster_profile: openshift-org-aws
     env:
       TEST_SUITE: openshift/cluster-kube-scheduler-operator/preferred-host/serial
     test:
     - ref: openshift-e2e-test
     workflow: ipi-aws
+  timeout: 8h0m0s
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ci-operator/config/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main.yaml`
around lines 103 - 110, Add a timeout to the preferred-host serial OTE test:
locate the job named e2e-aws-operator-preferred-host-serial-ote (the block with
TEST_SUITE: openshift/cluster-kube-scheduler-operator/preferred-host/serial) and
add a timeout entry matching the other OTE jobs (e.g., timeout: 8h0m0s) under
the steps section so the job uses the same maximum runtime as
e2e-aws-operator-serial-ote and e2e-aws-operator-parallel-ote.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@ci-operator/config/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main.yaml`:
- Around line 103-110: Add a timeout to the preferred-host serial OTE test:
locate the job named e2e-aws-operator-preferred-host-serial-ote (the block with
TEST_SUITE: openshift/cluster-kube-scheduler-operator/preferred-host/serial) and
add a timeout entry matching the other OTE jobs (e.g., timeout: 8h0m0s) under
the steps section so the job uses the same maximum runtime as
e2e-aws-operator-serial-ote and e2e-aws-operator-parallel-ote.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4cef7004-3977-4899-a2bc-32ccf183aa97

📥 Commits

Reviewing files that changed from the base of the PR and between ab1c2b4 and 850f738.

📒 Files selected for processing (2)
  • ci-operator/config/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main.yaml
  • ci-operator/jobs/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main-presubmits.yaml

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@ropatil010: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@ropatil010: requesting more than one rehearsal in one comment is not supported. If you would like to rehearse multiple specific jobs, please separate the job names by a space in a single command.

1 similar comment
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@ropatil010: requesting more than one rehearsal in one comment is not supported. If you would like to rehearse multiple specific jobs, please separate the job names by a space in a single command.

@ropatil010
Copy link
Copy Markdown
Contributor Author

/pj-rehearse pull-ci-openshift-cluster-kube-scheduler-operator-main-e2e-aws-operator-preferred-host-serial-ote

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@ropatil010: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@ropatil010
Copy link
Copy Markdown
Contributor Author

/pj-rehearse pull-ci-openshift-cluster-kube-scheduler-operator-main-e2e-aws-operator-serial-ote

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@ropatil010: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@ropatil010
Copy link
Copy Markdown
Contributor Author

/assign @gangwgr @sandeepknd
PTAL on this PR.

@ingvagabund
Copy link
Copy Markdown
Member

Before moving forward with this is there a limit on how many CI jobs it is still practical to run? The current KSO test suite is quite minimal. Is it worth spinning another cluster (= more resources) for running this vs. running the KSO teset suite as part of already existing jobs?

@ingvagabund
Copy link
Copy Markdown
Member

Also, s/sso/kso in the PR name.

@gangwgr
Copy link
Copy Markdown
Contributor

gangwgr commented Apr 24, 2026

why we need 3 jobs for 4-5 cases only it will too expensive, if we have not more cases

ropatil010 added a commit to ropatil010/release that referenced this pull request Apr 24, 2026
Consolidates 3 separate OTE test jobs into a single job that runs all
test suites sequentially on one AWS cluster, reducing infrastructure
costs by 66%.

Changes:
- Created custom step-registry component: cluster-kube-scheduler-operator-e2e-ote
- Removed 3 individual jobs (e2e-aws-operator-serial-ote,
  e2e-aws-operator-parallel-ote, e2e-aws-operator-preferred-host-serial-ote)
- Added consolidated job: e2e-aws-operator-ote-combined

The new job runs all three test suites sequentially:
1. openshift/cluster-kube-scheduler-operator/operator/serial
2. openshift/cluster-kube-scheduler-operator/operator/parallel
3. openshift/cluster-kube-scheduler-operator/preferred-host/serial

Each test suite maintains separate log files and JUnit directories for
proper test result tracking.

Benefits:
- Reduces from 3 AWS clusters to 1 cluster (66% cost reduction)
- Same test coverage maintained
- Separate JUnit results for debugging

Addresses resource efficiency concern from PR review:
openshift#78299 (comment)

Co-Authored-By: Rohit Patil <ropatil@redhat.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ropatil010
Once this PR has been reviewed and has the lgtm label, please ask for approval from gangwgr. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Apr 24, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-commands.sh`:
- Around line 148-180: The script currently aborts on the first failing
openshift-tests run because of errexit; modify the test-run blocks that call
openshift-tests run (for TEST_SUITE values like
"openshift/cluster-kube-scheduler-operator/operator/serial",
"operator/parallel", and "preferred-host/serial") so each invocation captures
its exit code without exiting the script (e.g., run the command normally but
append || true or run in a subshell and capture $?), write its logs to
"${ARTIFACT_DIR}/e2e-*.log" and junit to "${ARTIFACT_DIR}/junit-*" as before,
and record the per-suite result into a variable/array (e.g., SUITE_FAILURES or
LAST_EXIT) so you can print per-suite status immediately and, after all suites
run, exit non-zero if any recorded exit codes indicate failure; ensure you use
the existing TEST_ARGS, TEST_PROVIDER and ARTIFACT_DIR variables when invoking
openshift-tests run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 53e91a6c-c50f-4bbf-9ea3-6ec61a830eb3

📥 Commits

Reviewing files that changed from the base of the PR and between 850f738 and 41894c4.

📒 Files selected for processing (5)
  • ci-operator/config/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main.yaml
  • ci-operator/jobs/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main-presubmits.yaml
  • ci-operator/step-registry/cluster-kube-scheduler-operator/OWNERS
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-commands.sh
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-ref.yaml
✅ Files skipped from review due to trivial changes (2)
  • ci-operator/step-registry/cluster-kube-scheduler-operator/OWNERS
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-ref.yaml

@openshift-ci openshift-ci Bot removed the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Apr 24, 2026
ropatil010 added a commit to ropatil010/release that referenced this pull request Apr 24, 2026
Consolidates 3 separate OTE test jobs into a single job that runs all
test suites sequentially on one AWS cluster, reducing infrastructure
costs by 66%.

Changes:
- Created custom step-registry component: cluster-kube-scheduler-operator-e2e-ote
  - Includes proper error handling to run all suites even if individual tests fail
  - Each test suite has separate log files and JUnit directories
  - Supports all cloud providers (AWS, GCP, Azure, vSphere, OpenStack, etc.)
  - Includes cluster stability checks and retry strategy for presubmits

- Removed 3 individual jobs:
  - e2e-aws-operator-serial-ote
  - e2e-aws-operator-parallel-ote
  - e2e-aws-operator-preferred-host-serial-ote

- Added consolidated job: e2e-aws-operator-ote-combined
  - Runs automatically on PRs
  - Blocks merges if tests fail

Test suites executed sequentially:
1. openshift/cluster-kube-scheduler-operator/operator/serial
2. openshift/cluster-kube-scheduler-operator/operator/parallel
3. openshift/cluster-kube-scheduler-operator/preferred-host/serial

Error handling:
- Initializes rc=0 before test execution
- Uses if/else blocks to capture failures without aborting
- All three test suites run regardless of individual failures
- Exits with rc=1 if any test suite fails
- Ensures all test logs and JUnit artifacts are generated

OWNERS files:
- Created OWNERS at both parent and subdirectory levels
- Contains only valid OpenShift org members
- Generated metadata file for step registry

Benefits:
- Reduces from 3 AWS clusters to 1 cluster (66% cost reduction)
- Same test coverage maintained
- Separate JUnit results for debugging
- Proper artifact generation even on test failures

Addresses resource efficiency concern from PR review:
openshift#78299 (comment)

Addresses CodeRabbit review feedback:
openshift#78299 (review)

Co-Authored-By: Rohit Patil <ropatil@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-commands.sh (1)

25-29: Use a Bash array for TEST_ARGS.

Line 156, Line 171, and Line 186 expand TEST_ARGS unquoted. It works for the current single flag, but the first arg containing whitespace or shell metacharacters will break all three invocations.

Suggested change
-TEST_ARGS=""
+TEST_ARGS=()
 if [[ "${JOB_TYPE:-}" == "presubmit" && ( "${PULL_BASE_REF:-}" == "main" || "${PULL_BASE_REF:-}" == "master" ) ]]; then
     if openshift-tests run --help 2>/dev/null | grep -q 'retry-strategy'; then
-        TEST_ARGS+=" --retry-strategy=aggressive"
+        TEST_ARGS+=(--retry-strategy=aggressive)
         echo "Enabled aggressive retry strategy for presubmit"
     fi
 fi
@@
-if openshift-tests run "${TEST_SUITE}" ${TEST_ARGS} \
+if openshift-tests run "${TEST_SUITE}" "${TEST_ARGS[@]}" \
@@
-if openshift-tests run "${TEST_SUITE}" ${TEST_ARGS} \
+if openshift-tests run "${TEST_SUITE}" "${TEST_ARGS[@]}" \
@@
-if openshift-tests run "${TEST_SUITE}" ${TEST_ARGS} \
+if openshift-tests run "${TEST_SUITE}" "${TEST_ARGS[@]}" \

Also applies to: 156-186

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-commands.sh`
around lines 25 - 29, The script builds TEST_ARGS as a plain string which breaks
when flags contain whitespace or metacharacters; change TEST_ARGS to a Bash
array (use TEST_ARGS=() instead of TEST_ARGS=""), append flags with
TEST_ARGS+=(--retry-strategy=aggressive) where currently TEST_ARGS+=" …", and
update all invocations that expand TEST_ARGS (the openshift-tests run calls
referenced around the earlier TEST_ARGS uses and the later expansions at the
sites noted) to use the safe expansion "${TEST_ARGS[@]}" so each element is
passed as a separate quoted argument.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-commands.sh`:
- Around line 151-194: You’re running three suites sequentially by setting
TEST_SUITE and invoking openshift-tests run (operator/serial, operator/parallel,
preferred-host/serial) on the same cluster without any explicit cleanup or
documented isolation; fix this by adding explicit per-suite isolation and
verification steps: after each openshift-tests run (the invocations that write
to ${ARTIFACT_DIR}/e2e-*.log and --junit-dir ${ARTIFACT_DIR}/junit-*), perform a
deterministic teardown or cluster-state verification (e.g., delete test
namespaces/resources or run a cluster-sanity check) and record its result (so rc
is set on failures), or alternatively run each TEST_SUITE in an isolated
namespace/unique resource prefix and document in the job comment or README that
TEST_SUITE definitions are order-independent and self-cleaning; ensure these
changes reference the TEST_SUITE variable and the openshift-tests run calls so
reviewers can find and validate the isolation.

---

Nitpick comments:
In
`@ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-commands.sh`:
- Around line 25-29: The script builds TEST_ARGS as a plain string which breaks
when flags contain whitespace or metacharacters; change TEST_ARGS to a Bash
array (use TEST_ARGS=() instead of TEST_ARGS=""), append flags with
TEST_ARGS+=(--retry-strategy=aggressive) where currently TEST_ARGS+=" …", and
update all invocations that expand TEST_ARGS (the openshift-tests run calls
referenced around the earlier TEST_ARGS uses and the later expansions at the
sites noted) to use the safe expansion "${TEST_ARGS[@]}" so each element is
passed as a separate quoted argument.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 530e6adf-4d01-48e8-b279-00c63c515c63

📥 Commits

Reviewing files that changed from the base of the PR and between 41894c4 and 1f2a854.

📒 Files selected for processing (7)
  • ci-operator/config/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main.yaml
  • ci-operator/jobs/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main-presubmits.yaml
  • ci-operator/step-registry/cluster-kube-scheduler-operator/OWNERS
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/OWNERS
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-commands.sh
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-ref.metadata.json
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-ref.yaml
✅ Files skipped from review due to trivial changes (4)
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/OWNERS
  • ci-operator/step-registry/cluster-kube-scheduler-operator/OWNERS
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-ref.metadata.json
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-ref.yaml

Consolidates 3 separate OTE test jobs into a single job that runs all
test suites sequentially on one AWS cluster, reducing infrastructure
costs by 66%.

Changes:
- Created custom step-registry component: cluster-kube-scheduler-operator-e2e-ote
  - Includes proper error handling to run all suites even if individual tests fail
  - Each test suite has separate log files and JUnit directories
  - Supports all cloud providers (AWS, GCP, Azure, vSphere, OpenStack, etc.)
  - Includes cluster stability checks and retry strategy for presubmits

- Removed 3 individual jobs:
  - e2e-aws-operator-serial-ote
  - e2e-aws-operator-parallel-ote
  - e2e-aws-operator-preferred-host-serial-ote

- Added consolidated job: e2e-aws-operator-ote-combined
  - Runs automatically on PRs
  - Blocks merges if tests fail

Test suites executed sequentially:
1. openshift/cluster-kube-scheduler-operator/operator/serial
2. openshift/cluster-kube-scheduler-operator/operator/parallel
3. openshift/cluster-kube-scheduler-operator/preferred-host/serial

Error handling:
- Initializes rc=0 before test execution
- Uses if/else blocks to capture failures without aborting
- All three test suites run regardless of individual failures
- Exits with rc=1 if any test suite fails
- Ensures all test logs and JUnit artifacts are generated

OWNERS files:
- Created OWNERS at both parent and subdirectory levels
- Contains only valid OpenShift org members
- Generated metadata file for step registry

Benefits:
- Reduces from 3 AWS clusters to 1 cluster (66% cost reduction)
- Same test coverage maintained
- Separate JUnit results for debugging
- Proper artifact generation even on test failures

Addresses resource efficiency concern from PR review:
openshift#78299 (comment)

Addresses CodeRabbit review feedback:
openshift#78299 (review)

Co-Authored-By: Rohit Patil <ropatil@redhat.com>
@ropatil010
Copy link
Copy Markdown
Contributor Author

/pj-rehearse pull-ci-openshift-cluster-kube-scheduler-operator-main-e2e-aws-operator-ote-combined

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@ropatil010: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@ropatil010: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-cluster-kube-scheduler-operator-main-e2e-aws-operator-ote-combined openshift/cluster-kube-scheduler-operator presubmit Presubmit changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ci-operator/jobs/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main-presubmits.yaml (1)

84-108: Please confirm the cost tradeoff before making this mandatory.

This turns the OTE job into an always-run presubmit while the existing non-OTE operator presubmits in this file are still kept, so every PR now pays for another AWS cluster/job instead of swapping coverage. Worth getting explicit owner sign-off on the added default load.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ci-operator/jobs/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main-presubmits.yaml`
around lines 84 - 108, The job currently sets always_run: true for the OTE
presubmit (name:
pull-ci-openshift-cluster-kube-scheduler-operator-main-e2e-aws-operator-ote-combined,
context: ci/prow/e2e-aws-operator-ote-combined), which forces AWS cost on every
PR; change always_run to false (or remove the always_run key) so it is not
mandatory, and add a note in the PR requesting explicit owner sign-off (or add
an annotation/label indicating "requires-owner-approval") before reintroducing
always_run:true to ensure the cost tradeoff is approved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@ci-operator/jobs/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main-presubmits.yaml`:
- Around line 84-108: The job currently sets always_run: true for the OTE
presubmit (name:
pull-ci-openshift-cluster-kube-scheduler-operator-main-e2e-aws-operator-ote-combined,
context: ci/prow/e2e-aws-operator-ote-combined), which forces AWS cost on every
PR; change always_run to false (or remove the always_run key) so it is not
mandatory, and add a note in the PR requesting explicit owner sign-off (or add
an annotation/label indicating "requires-owner-approval") before reintroducing
always_run:true to ensure the cost tradeoff is approved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ef6fb2c5-fa57-4b7d-9ec3-2c012a0c616e

📥 Commits

Reviewing files that changed from the base of the PR and between 1f2a854 and f6cadd3.

📒 Files selected for processing (7)
  • ci-operator/config/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main.yaml
  • ci-operator/jobs/openshift/cluster-kube-scheduler-operator/openshift-cluster-kube-scheduler-operator-main-presubmits.yaml
  • ci-operator/step-registry/cluster-kube-scheduler-operator/OWNERS
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/OWNERS
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-commands.sh
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-ref.metadata.json
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-ref.yaml
✅ Files skipped from review due to trivial changes (4)
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/OWNERS
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-ref.metadata.json
  • ci-operator/step-registry/cluster-kube-scheduler-operator/OWNERS
  • ci-operator/step-registry/cluster-kube-scheduler-operator/e2e-ote/cluster-kube-scheduler-operator-e2e-ote-ref.yaml

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 24, 2026

@ropatil010: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ropatil010
Copy link
Copy Markdown
Contributor Author

CI Logs:

Used 1 cluster to execute these test suites
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/78299/rehearse-78299-pull-ci-openshift-cluster-kube-scheduler-operator-main-e2e-aws-operator-ote-combined/2047656350128803840

  1. Parallel
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/78299/rehearse-78299-pull-ci-openshift-cluster-kube-scheduler-operator-main-e2e-aws-operator-ote-combined/2047656350128803840/artifacts/e2e-aws-operator-ote-combined/cluster-kube-scheduler-operator-e2e-ote/artifacts/e2e-operator-parallel.log

passed: (300ms) 2026-04-24T14:23:04 "[sig-scheduling] kube scheduler operator [Operator][Parallel] should expose metrics endpoints accessible via prometheus"
  1. Serial
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/78299/rehearse-78299-pull-ci-openshift-cluster-kube-scheduler-operator-main-e2e-aws-operator-ote-combined/2047656350128803840/artifacts/e2e-aws-operator-ote-combined/cluster-kube-scheduler-operator-e2e-ote/artifacts/e2e-operator-serial.log

passed: (1.2s) 2026-04-24T14:08:23 "[sig-scheduling] kube scheduler operator [Operator][Serial] should create configmap when scheduler CR is updated"
passed: (3.3s) 2026-04-24T14:08:22 "[sig-scheduling] kube scheduler operator [Operator][Serial] should update policy configmap when source configmap changes"
2 pass, 0 flaky, 0 skip (7m52s)
  1. Host
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/78299/rehearse-78299-pull-ci-openshift-cluster-kube-scheduler-operator-main-e2e-aws-operator-ote-combined/2047656350128803840/artifacts/e2e-aws-operator-ote-combined/cluster-kube-scheduler-operator-e2e-ote/artifacts/e2e-preferred-host-serial.log

passed: (16m1s) 2026-04-24T14:50:08 "[sig-scheduling] kube scheduler operator [PreferredHost][Serial] should communicate with kube-apiserver over preferred host [Timeout:30m]"
1 pass, 0 flaky, 0 skip (23m3s)

@ropatil010 ropatil010 changed the title CNTRLPLANE-2649: always run OTE CI profiles for sso CNTRLPLANE-2649: always run OTE CI profiles for kso Apr 24, 2026
@ropatil010
Copy link
Copy Markdown
Contributor Author

ropatil010 commented Apr 24, 2026

Design implementation: Using 1 cluster and execution of cases on that. Here it triggers

  1. First triggers serial suite, execute all serial cases 1 by 1.
  2. Second triggers parallel suite, if there are many parallel cases it executes all those cases in parallel.
  3. Third triggers Host serial it executes serial way.

Future wise if there are increase in cases with serial/Parallel it handles automatically in 1 cluster instead of new CI profiles. If in case we need 2 separate CI profiles we can call existing openshift-e2e-test which works fine.

  ┌──────────────────────────┬──────────────────┬──────────┬─────────────────────┐                                                                                                                                  
  │        Test Suite        │ Start Time (UTC) │ Duration │       Status        │                                                                                                                                  
  ├──────────────────────────┼──────────────────┼──────────┼─────────────────────┤                                                                                                                                  
  │ 1. operator/serial       │ 14:08:22         │ ~7m 52s  │ ✅ Passed (2 tests) │                                                                                                                                
  ├──────────────────────────┼──────────────────┼──────────┼─────────────────────┤                                                                                                                                  
  │ 2. operator/parallel     │ 14:23:04         │ ~9m 6s   │ ✅ Passed (1 test)  │                                                                                                                                  
  ├──────────────────────────┼──────────────────┼──────────┼─────────────────────┤                                                                                                                                  
  │ 3. preferred-host/serial │ 14:27:06         │ ~23m 3s  │ ✅ Passed (1 test)  │                                                                                                                                  
  └──────────────────────────┴──────────────────┴──────────┴─────────────────────┘ 

Visualization

  Our Script (Sequential Invocation):                                                                                                                                                                             
  ┌─────────────────────────────────────────────────────────┐                                                                                                                                                       
  │ Suite 1: operator/serial                                │                                                                                                                                                     
  │   ├─ Test 1 [Serial] ──┐                               │                                                                                                                                                        
  │   ├─ Test 2 [Serial] ──┤ Run one at a time             │                                                                                                                                                        
  │   └─ Test 3 [Serial] ──┘                               │                                                                                                                                                        
  └─────────────────────────────────────────────────────────┘                                                                                                                                                       
           ↓ (Wait for completion)                                                                                                                                                                                  
  ┌─────────────────────────────────────────────────────────┐                                                                                                                                                     
  │ Suite 2: operator/parallel                              │                                                                                                                                                       
  │   ├─ Test 1 [Parallel] ─┬─ Run concurrently            │                                                                                                                                                      
  │   ├─ Test 2 [Parallel] ─┤                              │                                                                                                                                                        
  │   └─ Test 3 [Parallel] ─┘                              │                                                                                                                                                      
  └─────────────────────────────────────────────────────────┘                                                                                                                                                       
           ↓ (Wait for completion)                                                                                                                                                                                  
  ┌─────────────────────────────────────────────────────────┐                                                                                                                                                       
  │ Suite 3: preferred-host/serial                          │                                                                                                                                                       
  │   └─ Test 1 [Serial] ── Run alone                      │                                                                                                                                                        
  └─────────────────────────────────────────────────────────┘        

@ingvagabund @gangwgr PTAL on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants