Skip to content

feat(webportal): Add export to Web Portal function#7994

Open
grantfitzsimmons wants to merge 28 commits intomainfrom
issue-7606
Open

feat(webportal): Add export to Web Portal function#7994
grantfitzsimmons wants to merge 28 commits intomainfrom
issue-7606

Conversation

@grantfitzsimmons
Copy link
Copy Markdown
Member

@grantfitzsimmons grantfitzsimmons commented Apr 18, 2026

Fixes #7606

This PR adds support for exporting query results as a Web Portal package (ZIP) for import into the Specify Web Portal. It introduces backend endpoints and logic for generating the export, updates frontend components to allow users to trigger and download the export, and adds appropriate permission checks and notifications. It even includes tests and validation improvements for tree-related queries.

At this time, exports using this tool require no changes on the Web Portal side of things.

Button in Query Builder Notification when complete
image image
Policy Note
image Users need to be given specific permission to be able to export to the web portal.

Web Portal Export Feature:

  • Added backend support for exporting queries as a Web Portal ZIP package via a new endpoint (exportwebportal/), including a new query_to_web_portal_zip function and integration with the export workflow.
  • Implemented frontend UI to trigger the Web Portal export from the query builder, including permission checks, export button, and download link in notifications.

Testing and Validation:

  • Added backend tests for the new export endpoint and attachment mapping logic.

Error Reporting
This also adds handling for error reporting if an export (either CSV, KML, or Web Portal) fails.

Screenshot 2026-04-26 at 9 37 03 PM

exampleWebPortalfailure.txt
exampleCSVfailure.txt
exampleKMLfailure.txt

Example

Specify Query

image

Portal

image

Attachments

image

Map

image

Uniqueness Check

image

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add automated tests

Testing instructions

  • Confirm the new Export to Web Portal button appears in the query export controls (when you have permission).
  • Confirm the new Export to Web Portal button does not appear in the query export controls if you do not have permission to that tool (Querybuilder → Query → Export To Web Portal).
  • Start the export and wait for the completion notification.
  • Use the notification or download link to retrieve the generated ZIP package.
  • Open the ZIP and confirm it contains:
    • PortalFiles/PortalData.csv
    • PortalFiles/flds.json
    • PortalFiles/PortalInstanceSetting.json
    • PortalFiles/SolrFldSchema.xml
  • Inspect PortalFiles/flds.json and verify:
    • the spid and img rows are present
    • attachment metadata is emitted correctly (if there are attachments)
    • attachment Title is derived from attachment.title when available
  • Verify existing query export behavior still works for normal CSV exports.
  • Confirm tree-related query validation still passes after the new tree parsing validation changes.

DEVELOPERS: To really make sure this works, we need to import this into the Web Portal. To really make sure it is an adequate replacement for the Specify 6 exporter, it is a good idea to test the same export query in both versions. All KU databases have a mapping setup in them, which can be compared easily.

  • Import the ZIP into a Specify Web Portal instance to confirm the package can be ingested.
  • Verify that the rows in the query are the same count as the rows in the portal
  • Verify that all visible fields (within the 7 query itself) are searchable in the portal when using 'advanced' search
  • Verify that all attachments marked as public (via attachment.ispublic) are visible in the portal
  • Verify that the asset server URL and collection are set correctly (based on the docker environment for the Specify installation)
  • Verify that the map plots all points (as long as longitude1, latitude1, longitude2, or latitude2 are present, but keep If second set of coordinates fields are present, they are required for Geo Coords webportal-installer#64 in mind), and make sure they are clickable (so a dialog appears)
  • Check that the attachment 'Title' is shown in the dialog header when viewing an attachment
  • Verify that no unexpected console errors are logged in the web portal (no more than before, comparing against a standard Specify 6-exported data set in the portal)

Summary by CodeRabbit

  • New Features

    • "Export to Web Portal" added to Query Builder — permission-controlled ZIP export (data, images, metadata), endpoint and UI button included; duplicate-record-ID detection with a warning dialog.
  • Bug Fixes

    • Export flows for CSV, KML, and Web Portal now emit clear success/failure notifications and offer downloadable error payloads (including tracebacks when available).
  • Tests

    • New tests for web-portal export flow and field-spec parsing.
  • Localization

    • Localized messages for export success/failure and duplicate-ID warnings.

@grantfitzsimmons grantfitzsimmons marked this pull request as ready for review April 22, 2026 15:06
@grantfitzsimmons grantfitzsimmons modified the milestones: 7.12.2, 7.12.1 Apr 22, 2026
@lexiclevenger lexiclevenger requested a review from a team April 22, 2026 19:48
Copy link
Copy Markdown
Collaborator

@lexiclevenger lexiclevenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Confirm the new Export to Web Portal button appears in the query export controls (when you have permission).
  • Confirm the new Export to Web Portal button does not appear in the query export controls if you do not have permission to that tool (Querybuilder → Query → Export To Web Portal).
  • Start the export and wait for the completion notification.
  • Use the notification or download link to retrieve the generated ZIP package.
  • Open the ZIP and confirm it contains:
    • PortalFiles/PortalData.csv
    • PortalFiles/flds.json
    • PortalFiles/PortalInstanceSetting.json
    • PortalFiles/SolrFldSchema.xml
  • Inspect PortalFiles/flds.json and verify:
    • the spid and img rows are present
    • attachment metadata is emitted correctly (if there are attachments)
    • attachment Title is derived from attachment.title when available
  • Verify existing query export behavior still works for normal CSV exports.
  • Confirm tree-related query validation still passes after the new tree parsing validation changes.

Looks good to me! Tested using 'Vanessa Mapping 2' query on sbmnhiz_2025_12_12. The files appear as expected, and the existing query and csv export features don't seem to have been impacted.

Triggered by 22e4cc0 on branch refs/heads/issue-7606
@grantfitzsimmons
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Web Portal export feature: backend ZIP generator and export wiring, new POST view and URL, frontend UI/permission/localization updates, notification renderers, adjustments to QueryFieldSpec/tree utils, and tests for field-spec parsing and the export view.

Changes

Cohort / File(s) Summary
Web Portal export core
specifyweb/backend/stored_queries/execution.py, specifyweb/backend/stored_queries/web_portal_export.py
Introduces exporttype == 'webportal' in do_export, wraps CSV/KML/WebPortal exports in try/except to emit query-export-to-{exporttype}-failed (includes error and traceback) on exceptions, and adds query_to_web_portal_zip(...) to generate the ZIP (writes PortalData.csv, flds.json, PortalInstanceSetting.json, SolrFldSchema.xml).
API & routing
specifyweb/backend/stored_queries/views.py, specifyweb/backend/stored_queries/urls.py
Adds export_to_web_portal POST view (enforces execute and new export_to_web_portal permission, parses JSON, resolves collection, generates filename, starts a daemon thread to call do_export) and registers exportwebportal/ route.
QueryFieldSpec parsing
specifyweb/backend/stored_queries/queryfieldspec.py
Restricts legacy relation/tree-rank parsing behind is_tree_table; when field is None preserves legacy sentinel semantics for formatted relation stringids on non-tree tables by setting tree_rank_name and appending a TreeRankQuery where appropriate.
Tree utils validation
specifyweb/backend/trees/utils.py
Adds runtime validation in get_search_filters and get_treedefs to raise ValueError for unknown tree names.
Backend tests
specifyweb/backend/stored_queries/tests/test_execution/test_field_specs_from_json.py, specifyweb/backend/stored_queries/tests/test_views/test_export_web_portal.py
Adds tests covering QueryFieldSpec.from_stringid behaviors (tree-rank sentinel vs none) and tests for export_to_web_portal view and _portal_attachment_map behavior (auth, daemon thread spawn, query expectations).
Frontend export UI
specifyweb/frontend/js_src/lib/components/QueryBuilder/Export.tsx
Adds duplicate-record-ID detection and duplicateWarning dialog state; adds permission-gated “export to web portal” button that posts to /stored_query/exportwebportal/.
Notifications & renderers
specifyweb/frontend/js_src/lib/components/Notifications/NotificationRenderers.tsx, specifyweb/frontend/js_src/lib/localization/notifications.ts
Adds notification renderers and localization strings for CSV/KML/WebPortal completion and failure; failed renderers offer plain-text-download payloads, web-portal-complete offers a /static/depository/<file> link.
Frontend permissions & localization
specifyweb/frontend/js_src/lib/components/Permissions/definitions.ts, specifyweb/frontend/js_src/lib/localization/query.ts
Adds export_to_web_portal to frontend operation policies and new localized strings for duplicate-ID warnings.

Sequence Diagram(s)

sequenceDiagram
    actor Frontend as Frontend (UI)
    participant BackendView as Backend View
    participant Executor as Export Thread
    participant ExportModule as Web Portal Export Module
    participant Notifier as Notification System

    Frontend->>BackendView: POST /stored_query/exportwebportal/ (query JSON)
    BackendView->>BackendView: Verify execute & export_to_web_portal permissions
    BackendView->>BackendView: Parse JSON, resolve collection, build filename
    BackendView->>Executor: Start daemon thread -> do_export(exporttype='webportal', ...)
    BackendView-->>Frontend: 200 OK
    Executor->>ExportModule: query_to_web_portal_zip(...)
    ExportModule->>ExportModule: Build query, map attachments, compute flds.json
    ExportModule->>ExportModule: Generate PortalData.csv, SolrFldSchema.xml, PortalInstanceSetting.json
    ExportModule->>ExportModule: Write ZIP to depository path
    ExportModule->>Notifier: Emit 'query-export-to-webportal-complete' or 'query-export-to-webportal-failed' (payload includes file or error+traceback)
    Notifier->>Frontend: Deliver notification (download link or error payload)
Loading
🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Automatic Tests ⚠️ Warning PR includes tests for view endpoint and helper functions, but lacks unit tests for core query_to_web_portal_zip() function, inconsistent with CSV and KML export patterns. Add test_query_to_web_portal_zip.py file with direct unit tests for the query_to_web_portal_zip() function, matching the pattern of existing CSV and KML test files.
Testing Instructions ⚠️ Warning PR lacks comprehensive testing instructions and automated validation of ZIP file contents, structure, and metadata formatting. Add integration tests validating ZIP contents, CSV format, and JSON/XML structure; create TESTING.md with manual verification steps; populate PR template's testing section.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the main change: adding Web Portal export functionality to the query system.
Linked Issues check ✅ Passed All primary objectives from issue #7606 are met: ZIP structure (PortalFiles with flds.json, PortalData.csv, PortalInstanceSetting.json, SolrFldSchema.xml), field metadata generation, CSV export with tab delimiter, Solr type mapping, notification with download link, UI integration, and permission controls.
Out of Scope Changes check ✅ Passed All changes are directly related to the Web Portal export feature: backend export logic, field spec parsing improvements, URL routing, permissions, views, tests, frontend UI components, and localization. No unrelated modifications detected.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-7606

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
specifyweb/backend/stored_queries/tests/test_views/test_export_web_portal.py (1)

11-25: Assert the worker wiring, not just that a thread was created.

This still passes if the view starts the wrong target or passes the wrong export args. Please assert the Thread call includes target=do_export and the expected 'webportal' payload so this endpoint’s contract is actually covered.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@specifyweb/backend/stored_queries/tests/test_views/test_export_web_portal.py`
around lines 11 - 25, The test currently only asserts that Thread was created
and started; update test_export to assert the Thread call includes the correct
target and payload: verify Thread was called with target=do_export (the function
used to perform exports) and that the args or kwargs passed include the export
type 'webportal' (e.g., args containing 'webportal' or kwargs['payload']
containing 'webportal'), while keeping the existing assertions that
thread.return_value.daemon is True and start() was called; use the patched
Thread mock to inspect the call arguments (e.g.,
thread.assert_called_once_with(...) or thread.call_args) to enforce the endpoint
wiring to do_export with the expected 'webportal' argument.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@specifyweb/backend/stored_queries/execution.py`:
- Around line 280-292: Wrap the call to query_to_web_portal_zip(...) in a
try/except inside the branch where exporttype == 'webportal'; on exception catch
the error, log/format the error details (e.g., str(e) and traceback) and emit a
notification/message with type 'query-export-to-web-portal-failed' including the
error details (instead of letting the daemon thread die), otherwise set
message_type to 'query-export-to-web-portal-complete' as before; update the same
scope where message_type is currently assigned so downstream notification code
uses the failure message when an exception occurs.

In `@specifyweb/backend/stored_queries/queryfieldspec.py`:
- Around line 263-275: The code appends a TreeRankQuery sentinel for legacy
relation strings which makes make_stringid treat it as a real tree rank; instead
create and append a distinct sentinel (e.g.,
LegacyRelationSentinelQuery.create(extracted_fieldname, node.name) or a similar
new class/marker) in the branch that detects legacy relation stringids (the
block that currently calls TreeRankQuery.create and appends to join_path), and
update the serialization/round-trip logic (where TreeRankQuery is inspected
during make_stringid/serialization) to ignore this new sentinel type as a tree
rank so it does not get serialized as a real tree field. Ensure references:
TreeRankQuery.create, join_path append, extracted_fieldname,
is_tree_table(node), and the make_stringid/serialization checks are updated to
treat the new sentinel distinctly.

In `@specifyweb/backend/stored_queries/views.py`:
- Around line 212-227: The current code calls check_permission_targets against
request.specify_collection before parsing the request body, letting a caller
supply spquery['collectionid'] to switch targets; fix by parsing the JSON
(spquery) first, determine the intended collection (use spquery['collectionid']
if present else request.specify_collection.id), then re-run
check_permission_targets against that target collection id and the same user for
the permissions QueryBuilderPt.execute and QueryBuilderPt.export_to_web_portal
before fetching Collection or proceeding; update references to spquery and
collection accordingly (functions/methods: check_permission_targets,
QueryBuilderPt.execute, QueryBuilderPt.export_to_web_portal, spquery,
collection, request.specify_collection).

In `@specifyweb/backend/stored_queries/web_portal_export.py`:
- Around line 528-537: The metadata entry for 'spid' in metadata_rows currently
declares 'solrtype': 'int' but the exported spid values are UUID strings; update
the metadata to use a string-compatible Solr type (e.g. change 'solrtype' from
'int' to 'string' or other string/uuid type used by your Solr schema) for the
'spid' dict in metadata_rows and make the same change for the other 'spid'
occurrence (the second dict instance around the later block) so the metadata
matches flds.json and the actual exported data.
- Around line 268-295: _simplify_portal_field_metadata currently omits computed
mapping fields (e.g., concepturl and spdescription) causing lossy output; update
the function (in _simplify_portal_field_metadata) to include those keys when
building simplified (either by adding 'concepturl' and 'spdescription' to the
for-key tuple or by copying any existing mapping-related keys from field_meta
into simplified), so that the simplified dict preserves all computed mapping
metadata produced earlier by the exporter.
- Around line 550-579: The current code materializes the whole result set into
data_rows and builds output_rows and portal_data in memory; instead stream rows
into the ZIP/CSV writer to avoid O(N) memory. Modify the export logic that uses
data_rows and output_rows: iterate query.yield_per(...) (or the incoming list as
an iterator) and write each computed row immediately after computing spid,
cleaned_values (via _clean_cell), contents, img (obtain portal attachments in
batches or change _portal_attachment_map to accept an iterator/batched ids), and
geoc (via _build_geoc_value), rather than appending to output_rows or calling
_serialize_portal_data to assemble a giant string; keep header =
['spid','contents','img','geoc', *[column_def[1] for column_def in column_defs]]
and write it once before streaming rows. Ensure portal_attachments lookup is
available for streamed ids (batch _portal_attachment_map calls or lazy fetch)
and remove the final portal_data buffering step so the ZIP is written
incrementally.
- Around line 350-363: The export currently writes comma-delimited rows with
solrname headers; change the CSV writer in _serialize_portal_data to use a tab
delimiter (csv.writer(..., delimiter='\t')) and ensure the header row contains
the user-facing column names from your field metadata (flds.json colname)
instead of solr field names—map the incoming header list to the corresponding
colname values before calling writer.writerow. Also update the other CSV-writing
spot in this module that builds PortalData.csv (the other csv.writer usage) to
use the same tab delimiter and colname-mapped header so the portal receives
tab-delimited rows with flds.json.colname headers.

In `@specifyweb/frontend/js_src/lib/localization/query.ts`:
- Around line 366-379: The four new localization keys exportToWebPortal,
webPortalExportDuplicateRecordIds, webPortalExportDuplicateRecordIdsDescription,
and webPortalExportDuplicateHint only define 'en-us' and must be populated for
the other supported locales in the same dictionary; update each key to include
the same locale branches used by nearby entries (copy patterns from other keys
in this file), adding proper translations or an explicit fallback (e.g.,
duplicate the English string) for each locale so the new button and dialog are
localized across all supported languages.

---

Nitpick comments:
In
`@specifyweb/backend/stored_queries/tests/test_views/test_export_web_portal.py`:
- Around line 11-25: The test currently only asserts that Thread was created and
started; update test_export to assert the Thread call includes the correct
target and payload: verify Thread was called with target=do_export (the function
used to perform exports) and that the args or kwargs passed include the export
type 'webportal' (e.g., args containing 'webportal' or kwargs['payload']
containing 'webportal'), while keeping the existing assertions that
thread.return_value.daemon is True and start() was called; use the patched
Thread mock to inspect the call arguments (e.g.,
thread.assert_called_once_with(...) or thread.call_args) to enforce the endpoint
wiring to do_export with the expected 'webportal' argument.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 6b54294a-b37c-4377-9016-55676a40078a

📥 Commits

Reviewing files that changed from the base of the PR and between 64f3cc2 and 540a55d.

📒 Files selected for processing (13)
  • specifyweb/backend/stored_queries/execution.py
  • specifyweb/backend/stored_queries/queryfieldspec.py
  • specifyweb/backend/stored_queries/tests/test_execution/test_field_specs_from_json.py
  • specifyweb/backend/stored_queries/tests/test_views/test_export_web_portal.py
  • specifyweb/backend/stored_queries/urls.py
  • specifyweb/backend/stored_queries/views.py
  • specifyweb/backend/stored_queries/web_portal_export.py
  • specifyweb/backend/trees/utils.py
  • specifyweb/frontend/js_src/lib/components/Notifications/NotificationRenderers.tsx
  • specifyweb/frontend/js_src/lib/components/Permissions/definitions.ts
  • specifyweb/frontend/js_src/lib/components/QueryBuilder/Export.tsx
  • specifyweb/frontend/js_src/lib/localization/notifications.ts
  • specifyweb/frontend/js_src/lib/localization/query.ts

Comment thread specifyweb/backend/stored_queries/execution.py Outdated
Comment thread specifyweb/backend/stored_queries/queryfieldspec.py
Comment thread specifyweb/backend/stored_queries/views.py Outdated
Comment thread specifyweb/backend/stored_queries/web_portal_export.py
Comment thread specifyweb/backend/stored_queries/web_portal_export.py
Comment thread specifyweb/backend/stored_queries/web_portal_export.py
Comment thread specifyweb/backend/stored_queries/web_portal_export.py
Comment thread specifyweb/frontend/js_src/lib/localization/query.ts
@github-project-automation github-project-automation Bot moved this from 📋Back Log to Dev Attention Needed in General Tester Board Apr 27, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@specifyweb/backend/stored_queries/execution.py`:
- Around line 273-295: The current export dispatch (checking exporttype and
calling query_to_csv, query_to_kml, or query_to_web_portal_zip) silently falls
through for unknown exporttype values and still emits a success message; update
the dispatch in stored_queries/execution.py to add an explicit else branch that
handles unsupported export types by setting an error message_type (or raising an
exception) and including a clear error description referencing exporttype, so no
success notification is sent for invalid types; reference the export dispatch
that calls query_to_csv, query_to_kml, and query_to_web_portal_zip and modify it
to fail fast for unexpected exporttype values.
- Around line 270-307: The fields_from_json call is currently outside the try
block so parse errors won't trigger the failure notification; move the
fields_from_json(spquery['fields']) call into the try block (above the
exporttype conditionals) so any exception raised during parsing is caught, then
use the resulting field_specs variable in the existing export calls
(query_to_csv, query_to_kml, query_to_web_portal_zip) and let the existing
except block create the Message via Message.objects.create with the
error_details and message_type as implemented.

In
`@specifyweb/frontend/js_src/lib/components/Notifications/NotificationRenderers.tsx`:
- Around line 118-121: Add a runtime type-guard helper in
NotificationRenderers.tsx (e.g., parseExportErrorPayload or isErrorPayload) that
accepts notification.payload.error, checks that it's an object with
string-valued "error" and "traceback" properties, and returns normalized strings
(falling back to a safe JSON.stringify or a default message if validation
fails). Replace the existing unsafe casts in the 'query-export-to-csv-failed'
renderer (and the other two failed-export renderers referenced) to call this
helper and use the returned safe error and traceback strings when building the
downloadable error file content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 3b82dc5c-2c91-45cb-832b-c769de774c40

📥 Commits

Reviewing files that changed from the base of the PR and between 540a55d and 4c29e59.

📒 Files selected for processing (3)
  • specifyweb/backend/stored_queries/execution.py
  • specifyweb/frontend/js_src/lib/components/Notifications/NotificationRenderers.tsx
  • specifyweb/frontend/js_src/lib/localization/notifications.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • specifyweb/frontend/js_src/lib/localization/notifications.ts

Comment thread specifyweb/backend/stored_queries/execution.py Outdated
Comment thread specifyweb/backend/stored_queries/execution.py
Makes sure any exceptions raised while parsing field specs are caught by the export's exception handling
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@specifyweb/backend/stored_queries/execution.py`:
- Around line 300-303: The code currently always includes full traceback via
traceback.format_exc() into error_details and ultimately into Message.content;
update the block that builds error_details (where error_details = {'error':
str(e), 'traceback': traceback.format_exc()}) so the 'traceback' key is only
added when settings.DEBUG is True (mirror the other background-task notification
pattern), and ensure any Message.content serialization uses that guarded
error_details; apply the same guard for the subsequent similar block around
lines 305-309 that serializes error_details into Message.content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b004495d-93d0-42b4-bb9b-8d9f2f41e7f0

📥 Commits

Reviewing files that changed from the base of the PR and between 4c29e59 and 7141458.

📒 Files selected for processing (1)
  • specifyweb/backend/stored_queries/execution.py

Comment thread specifyweb/backend/stored_queries/execution.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@specifyweb/backend/stored_queries/execution.py`:
- Around line 299-309: The current except block only stores str(e) and a null
traceback when settings.DEBUG is False, losing actionable stack traces; update
the except block around Message.objects.create to log the exception (e.g., call
logger.exception or logging.exception with context) so the full traceback is
emitted to server logs in production, and change the payload construction so the
'traceback' key is included only when traceback.format_exc() is non-empty (e.g.,
build error_details conditionally using tb = traceback.format_exc() and add
'traceback': tb only if tb). Ensure you reference the same variables (e,
error_details, message_type, filename) and keep Message.objects.create(...)
semantics unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c0f870f4-84ee-49bd-8ba4-a298d0fbab8c

📥 Commits

Reviewing files that changed from the base of the PR and between 7141458 and 14d4711.

📒 Files selected for processing (1)
  • specifyweb/backend/stored_queries/execution.py

Comment thread specifyweb/backend/stored_queries/execution.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
specifyweb/backend/stored_queries/execution.py (1)

299-313: ⚠️ Potential issue | 🟠 Major

Don’t ship raw tracebacks in user-facing export errors.

This payload is rendered as a downloadable error file, so serializing traceback.format_exc() here exposes server internals to any user who can run exports. logger.exception(...) already keeps the full stack trace in server logs, so the notification should stay sanitized outside debug mode.

🔧 Suggested change
-            tb = traceback.format_exc()
-            error_details = {'error': str(e)}
-            if tb:
-                error_details['traceback'] = tb
+            error_details = {'error': str(e)}
+            if settings.DEBUG:
+                error_details['traceback'] = traceback.format_exc()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@specifyweb/backend/stored_queries/execution.py` around lines 299 - 313,
Remove the raw traceback from the user-facing error payload: stop capturing
traceback.format_exc() into tb and do not add it to error_details when building
the Message (keep logger.exception(...) as-is for server logs). In the except
block around logger.exception(...) and Message.objects.create(...), only include
a sanitized error value (e.g. error_details = {'error': str(e)} or a generic
message like 'Internal server error') and omit any 'traceback' key so no server
internals are serialized to the downloadable error file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@specifyweb/backend/stored_queries/execution.py`:
- Around line 299-313: Remove the raw traceback from the user-facing error
payload: stop capturing traceback.format_exc() into tb and do not add it to
error_details when building the Message (keep logger.exception(...) as-is for
server logs). In the except block around logger.exception(...) and
Message.objects.create(...), only include a sanitized error value (e.g.
error_details = {'error': str(e)} or a generic message like 'Internal server
error') and omit any 'traceback' key so no server internals are serialized to
the downloadable error file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: f64f1b09-a0ff-45e9-9430-2a6a199fd529

📥 Commits

Reviewing files that changed from the base of the PR and between 14d4711 and 5c6459f.

📒 Files selected for processing (1)
  • specifyweb/backend/stored_queries/execution.py

This was a privilege escalation vulnerability. The `get_query` helper (used by `ephemeral` and `batch_edit`) already correctly parses JSON first, resolves the collection, then checks permissions. The three export functions were inconsistent and vulnerable.
Copy link
Copy Markdown
Contributor

@melton-jason melton-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't completed a review yet, but here's my thoughts in the few files I could get to in an hour.
Nice job on the branch so far! 👏

Comment on lines +299 to +314
except Exception as e:
logger.exception(
"Export failed for %s: collection %s, file %s, type %s",
user, collection, filename, exporttype,
)
tb = traceback.format_exc()
error_details = {'error': str(e)}
if tb:
error_details['traceback'] = tb
message_type = f'query-export-to-{exporttype}-failed'
Message.objects.create(user=user, content=json.dumps({
'type': message_type,
'file': filename,
'error': error_details,
}))
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional)

Catching all exceptions here can generally be pretty dangerous while wrapped in a session_context.
Note that our SQLAlchemy session generator is set up to rollback changes made during the session on exceptions:

def make_session_context(session_maker):
@contextmanager
def _session_context():
session, connection = session_maker()
try:
if connection is None:
connection = session.connection()
session.info['connection'] = connection
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()
return _session_context
session_context = make_session_context(lambda: (Session(), None))

If there are any side effects in query_to_csv, query_to_kml, or query_to_web_portal_zip, then they will be implicitly committed.
I don't think there are any side effects at the moment, so this might be theoretically fine at the moment. In the future though, this might cause problems.
Another use case for raising the exception and allowing the session to rollback is to maintain a consistent transaction state within the SQLAlchemy session.
Consider the case where an internal SQLAlchemy flush fails for whatever reason. According to https://docs.sqlalchemy.org/en/14/orm/session_basics.html#rolling-back, continued use of the session could result in an exception (https://docs.sqlalchemy.org/en/14/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar).

I'm pretty sure these cases are all generally handled at the moment, but the code is sensitive to a lot of (seemingly innocuous) changes that could cause some issues down the line

Comment on lines +167 to +168
check_permission_targets(collection.id, request.specify_user.id, [
QueryBuilderPt.execute, QueryBuilderPt.export_csv])
Copy link
Copy Markdown
Contributor

@melton-jason melton-jason Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, aren't we exporting the CSV in the collection the user is signed into, and running the query in the Collection specified by the query?

Say the user was signed into Collection A and exported a Query that was run in Collection B.
It's clear we should at least check that the user has the correct Query permissions in Collection B, but which collection should we check Export CSV permissions against?

This is definitely a question regarding the scope of the permission. I can see arguments for all three sides:

  • The query is being executed in Collection B, regardless of what collection the user is signed in to, so we only need to check Export CSV permission in Collection B
  • The action was initiated and is being performed in Collection A by the user, so we should check whether they have Export permissions in Collection A
  • Both of the above apply, so we should check Export permissions in both collections

From a strictly data-origin standpoint, I think it probably makes the most sense to keep this as-is: that is, scope the permission check to the Collection in which the query is scoped to.
Do we think that approach would be the most intuitive interpretation of the permission? (this is not a rhetorical question, down to hear what other people think! 😅)

Comment on lines +234 to +239
thread = Thread(
target=do_export,
args=(spquery, collection, request.specify_user, file_name, 'webportal', None),
)
thread.daemon = True
thread.start()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional, performance)

I know we've been using threads for Query exports (CSV, KML, and now WebPortal).
These queries can use a significant amount of simultaneous memory resources on the server. As noted in #7858 and its sub-issues (particularly #7875 in this case), this high allocation of resources can persist in the main Specify process until memory is released (which oftentimes will be when the process is stopped or killed).

Ideally, these potentially very memory-intensive operations should be performed in a separate process. That way, as soon as the operation is completed the allocated memory will be returned to the operating system and reused (there is some memory overhead when spawning a new process, which for Specify will largely be ~82 MiB for its setup and global structures as of v7.11.4).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@melton-jason Maybe I can open a new issue for this– what is your suggestion for implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Dev Attention Needed

Development

Successfully merging this pull request may close these issues.

Export to Web Portal

5 participants