[INS-455] Unify common logic in Atlassian Data Center detectors#4907
[INS-455] Unify common logic in Atlassian Data Center detectors#4907mustansir14 wants to merge 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7c19046. Configure here.
|
This is completely irrelevant to the goal of this PR. But can you please add similar logic in |
MuneebUllahKhan222
left a comment
There was a problem hiding this comment.
Couple of changes that need to be addressed, Other than that the PR is good to go.
| @@ -170,35 +136,13 @@ func (s Scanner) FromData(ctx context.Context, verify bool, data []byte) (result | |||
|
|
|||
| func verifyPAT(ctx context.Context, client *http.Client, baseURL, token string) (bool, error) { | |||
| endpoint := strings.TrimRight(baseURL, "/") + "/rest/api/user/current" | |||
There was a problem hiding this comment.
No need to do TrimRight here because we are already doing that in FindEndpoints function.
There was a problem hiding this comment.
You might be right here, but doing so would add to the changes in this PR and my goal is to keep detector specific changes as minimal as possible. Also it doesn't really harm to have additional check
| u, err := detectors.ParseURLAndStripPathAndParams(baseURL) | ||
| if err != nil { | ||
| return false, err | ||
| } | ||
| u.Path = "rest/api/1.0/projects" | ||
| q := u.Query() | ||
| q.Set("limit", "1") | ||
| u.RawQuery = q.Encode() |
There was a problem hiding this comment.
We can get rid of all this and do
baseURL + "/rest/api/1.0/projects?limit=1"
There was a problem hiding this comment.
I was trying to make as minimal changes as possible to the detectors' specific logic. Also we generally prefer to use net/url to work with URLs and paths. This is valid for your other comment as well.
| u, err := detectors.ParseURLAndStripPathAndParams(baseURL) | ||
| if err != nil { | ||
| return false, nil, err | ||
| } | ||
| u.Path = "/rest/api/2/myself" |
There was a problem hiding this comment.
We can get rid of all this and do
baseURL + "/rest/api/2/myself"
There was a problem hiding this comment.
Same comment as above
I don't really think that's a good idea, as this will add more complexity to the PR and will require additional tests. I would say we stick to the goal of this PR and do this in a separate optimizations PR. |

Background
Three detectors target Atlassian Data Center (self-hosted) products:
JiraDataCenterPAT,ConfluenceDataCenter, andBitbucketDataCenter. Reviewing them side-by-side revealed significant duplication: identical structural-validation logic, the same bearer-auth HTTP plumbing, and the same URL-extraction pipeline repeated verbatim in each file.What changed
New shared package:
pkg/detectors/atlassiandatacenter/Following the pattern established by
pkg/detectors/aws/, the three detectors are now co-located under a common parent that also houses shared utilities incommon.go.GetDCTokenPat(prefixes []string) *regexp.RegexpReturns the compiled PAT regex for Jira/Confluence DC tokens (44-char base64 strings whose decoded form is
<numeric-id>:<random-bytes>). Previously each detector hand-rolled the same pattern with inline literal strings.IsStructuralPAT(candidate string) boolValidates that a base64 candidate decodes to the
<digits>:<bytes>structure. This function was copy-pasted verbatim between Jira and Confluence.GetURLPat(prefixes []string) *regexp.RegexpReturns the compiled URL regex for self-hosted Atlassian instance URLs. Each detector calls this once at package init time and stores the result in a package-level var, preserving the same compile-once behaviour as before.
go-re2compiles via CGo/FFI so doing this per-chunk would be a meaningful regression.FindEndpoints(data string, urlPat *regexp.Regexp, resolve func(...string) []string) []stringExtracts keyword-scoped URLs from a chunk using the pre-compiled
urlPat, passes them throughs.Endpoints(which merges configured endpoints), deduplicates, and strips trailing slashes. All three detectors had their own multi-step version of this pipeline.MakeVerifyRequest(ctx, client, fullURL, token string) (bool, map[string]any, error)Sends a Bearer-authenticated GET and interprets the response:
200→(true, decoded-JSON-body, nil),401→(false, nil, nil), other →(false, nil, error). Previously each detector built the request inline and contained its own copy of the200 / 401 / defaultstatus-code switch. Jira readsbody["displayName"]andbody["emailAddress"]from the returned map; Confluence and Bitbucket discard it.Detector changes
Each detector now imports the shared package and delegates to it:
isStructuralPAT(Jira + Confluence)atlassiandatacenter.IsStructuralPATatlassiandatacenter.GetDCTokenPat(keywords)atlassiandatacenter.FindEndpoints(...)atlassiandatacenter.MakeVerifyRequest(...)The URL regex is also standardised across all three detectors. Confluence and Bitbucket previously used an unbounded
\d+port pattern and allowed hostnames starting with.or-; all three now use[a-zA-Z0-9][a-zA-Z0-9.\-]*with\d{1,5}for the port (matching the stricter Jira original).A
keywordspackage-level variable is introduced in Jira (Confluence and Bitbucket already had one) so the keyword list is defined once and reused by the regex,FindEndpoints, andKeywords().TestIsStructuralPATmigratedThe structural-PAT test previously lived in
jiradatacenterpat_test.goand called the private function. It is now incommon_test.goand covers tokens from all three products.common_test.goalso adds tests forGetDCTokenPat,FindEndpoints, andMakeVerifyRequest.What did not change
Scannertype,Keywords(),Type(),Description(), andFromData()logic.displayName,emailAddress) is unchanged.invalidHostscache anderrNoHostsentinel are unchanged.?limit=1query parameter is unchanged.Note: It appears that Bitbucket DC detector was introduced but never registered in
defaults.go. This PR also registers it. With that in mind, and of course to test the other changes as well, I also ran the usual corpora tests we do for new detectors.Corpora Tests
Checklist:
make test-community)?make lintthis requires golangci-lint)?Note
Medium Risk
Mostly a refactor, but it changes shared URL matching/endpoint extraction and verification plumbing across three detectors, which could impact detection/verification behavior and false positive/negative rates.
Overview
Consolidates duplicated logic across Atlassian Data Center detectors by introducing a new shared
atlassiandatacenterhelper package for PAT/URL regexes, endpoint extraction, structural PAT validation, and a common Bearer-auth verification request helper.Updates the Jira/Confluence/Bitbucket DC detectors to delegate to these shared helpers (including a standardized, slightly stricter URL pattern), removes their inlined structural/HTTP code, and relocates detector imports in
defaults.go/tests (plus registering Bitbucket DC as a no-cloud-endpoint detector). Adds a comprehensivecommon_test.gocovering the new helpers and moves structural-PAT tests out of Jira-specific tests.Reviewed by Cursor Bugbot for commit 7c19046. Bugbot is set up for automated code reviews on this repo. Configure here.