feat(webportal): Add export to Web Portal function#7994
feat(webportal): Add export to Web Portal function#7994grantfitzsimmons wants to merge 28 commits intomainfrom
Conversation
also adds support for attachments and removes the `web_asset_store.xml` component of the asset server URL
adds some small comments here as well
lexiclevenger
left a comment
There was a problem hiding this comment.
- Confirm the new Export to Web Portal button appears in the query export controls (when you have permission).
- Confirm the new Export to Web Portal button does not appear in the query export controls if you do not have permission to that tool (
Querybuilder → Query → Export To Web Portal). - Start the export and wait for the completion notification.
- Use the notification or download link to retrieve the generated ZIP package.
- Open the ZIP and confirm it contains:
PortalFiles/PortalData.csvPortalFiles/flds.jsonPortalFiles/PortalInstanceSetting.jsonPortalFiles/SolrFldSchema.xml
- Inspect
PortalFiles/flds.jsonand verify:- the
spidandimgrows are present - attachment metadata is emitted correctly (if there are attachments)
- attachment
Titleis derived fromattachment.titlewhen available
- the
- Verify existing query export behavior still works for normal CSV exports.
- Confirm tree-related query validation still passes after the new tree parsing validation changes.
Looks good to me! Tested using 'Vanessa Mapping 2' query on sbmnhiz_2025_12_12. The files appear as expected, and the existing query and csv export features don't seem to have been impacted.
Triggered by 22e4cc0 on branch refs/heads/issue-7606
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Web Portal export feature: backend ZIP generator and export wiring, new POST view and URL, frontend UI/permission/localization updates, notification renderers, adjustments to QueryFieldSpec/tree utils, and tests for field-spec parsing and the export view. Changes
Sequence Diagram(s)sequenceDiagram
actor Frontend as Frontend (UI)
participant BackendView as Backend View
participant Executor as Export Thread
participant ExportModule as Web Portal Export Module
participant Notifier as Notification System
Frontend->>BackendView: POST /stored_query/exportwebportal/ (query JSON)
BackendView->>BackendView: Verify execute & export_to_web_portal permissions
BackendView->>BackendView: Parse JSON, resolve collection, build filename
BackendView->>Executor: Start daemon thread -> do_export(exporttype='webportal', ...)
BackendView-->>Frontend: 200 OK
Executor->>ExportModule: query_to_web_portal_zip(...)
ExportModule->>ExportModule: Build query, map attachments, compute flds.json
ExportModule->>ExportModule: Generate PortalData.csv, SolrFldSchema.xml, PortalInstanceSetting.json
ExportModule->>ExportModule: Write ZIP to depository path
ExportModule->>Notifier: Emit 'query-export-to-webportal-complete' or 'query-export-to-webportal-failed' (payload includes file or error+traceback)
Notifier->>Frontend: Deliver notification (download link or error payload)
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
specifyweb/backend/stored_queries/tests/test_views/test_export_web_portal.py (1)
11-25: Assert the worker wiring, not just that a thread was created.This still passes if the view starts the wrong target or passes the wrong export args. Please assert the
Threadcall includestarget=do_exportand the expected'webportal'payload so this endpoint’s contract is actually covered.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@specifyweb/backend/stored_queries/tests/test_views/test_export_web_portal.py` around lines 11 - 25, The test currently only asserts that Thread was created and started; update test_export to assert the Thread call includes the correct target and payload: verify Thread was called with target=do_export (the function used to perform exports) and that the args or kwargs passed include the export type 'webportal' (e.g., args containing 'webportal' or kwargs['payload'] containing 'webportal'), while keeping the existing assertions that thread.return_value.daemon is True and start() was called; use the patched Thread mock to inspect the call arguments (e.g., thread.assert_called_once_with(...) or thread.call_args) to enforce the endpoint wiring to do_export with the expected 'webportal' argument.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@specifyweb/backend/stored_queries/execution.py`:
- Around line 280-292: Wrap the call to query_to_web_portal_zip(...) in a
try/except inside the branch where exporttype == 'webportal'; on exception catch
the error, log/format the error details (e.g., str(e) and traceback) and emit a
notification/message with type 'query-export-to-web-portal-failed' including the
error details (instead of letting the daemon thread die), otherwise set
message_type to 'query-export-to-web-portal-complete' as before; update the same
scope where message_type is currently assigned so downstream notification code
uses the failure message when an exception occurs.
In `@specifyweb/backend/stored_queries/queryfieldspec.py`:
- Around line 263-275: The code appends a TreeRankQuery sentinel for legacy
relation strings which makes make_stringid treat it as a real tree rank; instead
create and append a distinct sentinel (e.g.,
LegacyRelationSentinelQuery.create(extracted_fieldname, node.name) or a similar
new class/marker) in the branch that detects legacy relation stringids (the
block that currently calls TreeRankQuery.create and appends to join_path), and
update the serialization/round-trip logic (where TreeRankQuery is inspected
during make_stringid/serialization) to ignore this new sentinel type as a tree
rank so it does not get serialized as a real tree field. Ensure references:
TreeRankQuery.create, join_path append, extracted_fieldname,
is_tree_table(node), and the make_stringid/serialization checks are updated to
treat the new sentinel distinctly.
In `@specifyweb/backend/stored_queries/views.py`:
- Around line 212-227: The current code calls check_permission_targets against
request.specify_collection before parsing the request body, letting a caller
supply spquery['collectionid'] to switch targets; fix by parsing the JSON
(spquery) first, determine the intended collection (use spquery['collectionid']
if present else request.specify_collection.id), then re-run
check_permission_targets against that target collection id and the same user for
the permissions QueryBuilderPt.execute and QueryBuilderPt.export_to_web_portal
before fetching Collection or proceeding; update references to spquery and
collection accordingly (functions/methods: check_permission_targets,
QueryBuilderPt.execute, QueryBuilderPt.export_to_web_portal, spquery,
collection, request.specify_collection).
In `@specifyweb/backend/stored_queries/web_portal_export.py`:
- Around line 528-537: The metadata entry for 'spid' in metadata_rows currently
declares 'solrtype': 'int' but the exported spid values are UUID strings; update
the metadata to use a string-compatible Solr type (e.g. change 'solrtype' from
'int' to 'string' or other string/uuid type used by your Solr schema) for the
'spid' dict in metadata_rows and make the same change for the other 'spid'
occurrence (the second dict instance around the later block) so the metadata
matches flds.json and the actual exported data.
- Around line 268-295: _simplify_portal_field_metadata currently omits computed
mapping fields (e.g., concepturl and spdescription) causing lossy output; update
the function (in _simplify_portal_field_metadata) to include those keys when
building simplified (either by adding 'concepturl' and 'spdescription' to the
for-key tuple or by copying any existing mapping-related keys from field_meta
into simplified), so that the simplified dict preserves all computed mapping
metadata produced earlier by the exporter.
- Around line 550-579: The current code materializes the whole result set into
data_rows and builds output_rows and portal_data in memory; instead stream rows
into the ZIP/CSV writer to avoid O(N) memory. Modify the export logic that uses
data_rows and output_rows: iterate query.yield_per(...) (or the incoming list as
an iterator) and write each computed row immediately after computing spid,
cleaned_values (via _clean_cell), contents, img (obtain portal attachments in
batches or change _portal_attachment_map to accept an iterator/batched ids), and
geoc (via _build_geoc_value), rather than appending to output_rows or calling
_serialize_portal_data to assemble a giant string; keep header =
['spid','contents','img','geoc', *[column_def[1] for column_def in column_defs]]
and write it once before streaming rows. Ensure portal_attachments lookup is
available for streamed ids (batch _portal_attachment_map calls or lazy fetch)
and remove the final portal_data buffering step so the ZIP is written
incrementally.
- Around line 350-363: The export currently writes comma-delimited rows with
solrname headers; change the CSV writer in _serialize_portal_data to use a tab
delimiter (csv.writer(..., delimiter='\t')) and ensure the header row contains
the user-facing column names from your field metadata (flds.json colname)
instead of solr field names—map the incoming header list to the corresponding
colname values before calling writer.writerow. Also update the other CSV-writing
spot in this module that builds PortalData.csv (the other csv.writer usage) to
use the same tab delimiter and colname-mapped header so the portal receives
tab-delimited rows with flds.json.colname headers.
In `@specifyweb/frontend/js_src/lib/localization/query.ts`:
- Around line 366-379: The four new localization keys exportToWebPortal,
webPortalExportDuplicateRecordIds, webPortalExportDuplicateRecordIdsDescription,
and webPortalExportDuplicateHint only define 'en-us' and must be populated for
the other supported locales in the same dictionary; update each key to include
the same locale branches used by nearby entries (copy patterns from other keys
in this file), adding proper translations or an explicit fallback (e.g.,
duplicate the English string) for each locale so the new button and dialog are
localized across all supported languages.
---
Nitpick comments:
In
`@specifyweb/backend/stored_queries/tests/test_views/test_export_web_portal.py`:
- Around line 11-25: The test currently only asserts that Thread was created and
started; update test_export to assert the Thread call includes the correct
target and payload: verify Thread was called with target=do_export (the function
used to perform exports) and that the args or kwargs passed include the export
type 'webportal' (e.g., args containing 'webportal' or kwargs['payload']
containing 'webportal'), while keeping the existing assertions that
thread.return_value.daemon is True and start() was called; use the patched
Thread mock to inspect the call arguments (e.g.,
thread.assert_called_once_with(...) or thread.call_args) to enforce the endpoint
wiring to do_export with the expected 'webportal' argument.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 6b54294a-b37c-4377-9016-55676a40078a
📒 Files selected for processing (13)
specifyweb/backend/stored_queries/execution.pyspecifyweb/backend/stored_queries/queryfieldspec.pyspecifyweb/backend/stored_queries/tests/test_execution/test_field_specs_from_json.pyspecifyweb/backend/stored_queries/tests/test_views/test_export_web_portal.pyspecifyweb/backend/stored_queries/urls.pyspecifyweb/backend/stored_queries/views.pyspecifyweb/backend/stored_queries/web_portal_export.pyspecifyweb/backend/trees/utils.pyspecifyweb/frontend/js_src/lib/components/Notifications/NotificationRenderers.tsxspecifyweb/frontend/js_src/lib/components/Permissions/definitions.tsspecifyweb/frontend/js_src/lib/components/QueryBuilder/Export.tsxspecifyweb/frontend/js_src/lib/localization/notifications.tsspecifyweb/frontend/js_src/lib/localization/query.ts
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@specifyweb/backend/stored_queries/execution.py`:
- Around line 273-295: The current export dispatch (checking exporttype and
calling query_to_csv, query_to_kml, or query_to_web_portal_zip) silently falls
through for unknown exporttype values and still emits a success message; update
the dispatch in stored_queries/execution.py to add an explicit else branch that
handles unsupported export types by setting an error message_type (or raising an
exception) and including a clear error description referencing exporttype, so no
success notification is sent for invalid types; reference the export dispatch
that calls query_to_csv, query_to_kml, and query_to_web_portal_zip and modify it
to fail fast for unexpected exporttype values.
- Around line 270-307: The fields_from_json call is currently outside the try
block so parse errors won't trigger the failure notification; move the
fields_from_json(spquery['fields']) call into the try block (above the
exporttype conditionals) so any exception raised during parsing is caught, then
use the resulting field_specs variable in the existing export calls
(query_to_csv, query_to_kml, query_to_web_portal_zip) and let the existing
except block create the Message via Message.objects.create with the
error_details and message_type as implemented.
In
`@specifyweb/frontend/js_src/lib/components/Notifications/NotificationRenderers.tsx`:
- Around line 118-121: Add a runtime type-guard helper in
NotificationRenderers.tsx (e.g., parseExportErrorPayload or isErrorPayload) that
accepts notification.payload.error, checks that it's an object with
string-valued "error" and "traceback" properties, and returns normalized strings
(falling back to a safe JSON.stringify or a default message if validation
fails). Replace the existing unsafe casts in the 'query-export-to-csv-failed'
renderer (and the other two failed-export renderers referenced) to call this
helper and use the returned safe error and traceback strings when building the
downloadable error file content.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 3b82dc5c-2c91-45cb-832b-c769de774c40
📒 Files selected for processing (3)
specifyweb/backend/stored_queries/execution.pyspecifyweb/frontend/js_src/lib/components/Notifications/NotificationRenderers.tsxspecifyweb/frontend/js_src/lib/localization/notifications.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- specifyweb/frontend/js_src/lib/localization/notifications.ts
Makes sure any exceptions raised while parsing field specs are caught by the export's exception handling
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@specifyweb/backend/stored_queries/execution.py`:
- Around line 300-303: The code currently always includes full traceback via
traceback.format_exc() into error_details and ultimately into Message.content;
update the block that builds error_details (where error_details = {'error':
str(e), 'traceback': traceback.format_exc()}) so the 'traceback' key is only
added when settings.DEBUG is True (mirror the other background-task notification
pattern), and ensure any Message.content serialization uses that guarded
error_details; apply the same guard for the subsequent similar block around
lines 305-309 that serializes error_details into Message.content.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: b004495d-93d0-42b4-bb9b-8d9f2f41e7f0
📒 Files selected for processing (1)
specifyweb/backend/stored_queries/execution.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@specifyweb/backend/stored_queries/execution.py`:
- Around line 299-309: The current except block only stores str(e) and a null
traceback when settings.DEBUG is False, losing actionable stack traces; update
the except block around Message.objects.create to log the exception (e.g., call
logger.exception or logging.exception with context) so the full traceback is
emitted to server logs in production, and change the payload construction so the
'traceback' key is included only when traceback.format_exc() is non-empty (e.g.,
build error_details conditionally using tb = traceback.format_exc() and add
'traceback': tb only if tb). Ensure you reference the same variables (e,
error_details, message_type, filename) and keep Message.objects.create(...)
semantics unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: c0f870f4-84ee-49bd-8ba4-a298d0fbab8c
📒 Files selected for processing (1)
specifyweb/backend/stored_queries/execution.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
specifyweb/backend/stored_queries/execution.py (1)
299-313:⚠️ Potential issue | 🟠 MajorDon’t ship raw tracebacks in user-facing export errors.
This payload is rendered as a downloadable error file, so serializing
traceback.format_exc()here exposes server internals to any user who can run exports.logger.exception(...)already keeps the full stack trace in server logs, so the notification should stay sanitized outside debug mode.🔧 Suggested change
- tb = traceback.format_exc() - error_details = {'error': str(e)} - if tb: - error_details['traceback'] = tb + error_details = {'error': str(e)} + if settings.DEBUG: + error_details['traceback'] = traceback.format_exc()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@specifyweb/backend/stored_queries/execution.py` around lines 299 - 313, Remove the raw traceback from the user-facing error payload: stop capturing traceback.format_exc() into tb and do not add it to error_details when building the Message (keep logger.exception(...) as-is for server logs). In the except block around logger.exception(...) and Message.objects.create(...), only include a sanitized error value (e.g. error_details = {'error': str(e)} or a generic message like 'Internal server error') and omit any 'traceback' key so no server internals are serialized to the downloadable error file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@specifyweb/backend/stored_queries/execution.py`:
- Around line 299-313: Remove the raw traceback from the user-facing error
payload: stop capturing traceback.format_exc() into tb and do not add it to
error_details when building the Message (keep logger.exception(...) as-is for
server logs). In the except block around logger.exception(...) and
Message.objects.create(...), only include a sanitized error value (e.g.
error_details = {'error': str(e)} or a generic message like 'Internal server
error') and omit any 'traceback' key so no server internals are serialized to
the downloadable error file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: f64f1b09-a0ff-45e9-9430-2a6a199fd529
📒 Files selected for processing (1)
specifyweb/backend/stored_queries/execution.py
This was a privilege escalation vulnerability. The `get_query` helper (used by `ephemeral` and `batch_edit`) already correctly parses JSON first, resolves the collection, then checks permissions. The three export functions were inconsistent and vulnerable.
melton-jason
left a comment
There was a problem hiding this comment.
Haven't completed a review yet, but here's my thoughts in the few files I could get to in an hour.
Nice job on the branch so far! 👏
| except Exception as e: | ||
| logger.exception( | ||
| "Export failed for %s: collection %s, file %s, type %s", | ||
| user, collection, filename, exporttype, | ||
| ) | ||
| tb = traceback.format_exc() | ||
| error_details = {'error': str(e)} | ||
| if tb: | ||
| error_details['traceback'] = tb | ||
| message_type = f'query-export-to-{exporttype}-failed' | ||
| Message.objects.create(user=user, content=json.dumps({ | ||
| 'type': message_type, | ||
| 'file': filename, | ||
| 'error': error_details, | ||
| })) | ||
| return |
There was a problem hiding this comment.
(optional)
Catching all exceptions here can generally be pretty dangerous while wrapped in a session_context.
Note that our SQLAlchemy session generator is set up to rollback changes made during the session on exceptions:
specify7/specifyweb/backend/stored_queries/models.py
Lines 19 to 37 in b50a592
If there are any side effects in query_to_csv, query_to_kml, or query_to_web_portal_zip, then they will be implicitly committed.
I don't think there are any side effects at the moment, so this might be theoretically fine at the moment. In the future though, this might cause problems.
Another use case for raising the exception and allowing the session to rollback is to maintain a consistent transaction state within the SQLAlchemy session.
Consider the case where an internal SQLAlchemy flush fails for whatever reason. According to https://docs.sqlalchemy.org/en/14/orm/session_basics.html#rolling-back, continued use of the session could result in an exception (https://docs.sqlalchemy.org/en/14/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar).
I'm pretty sure these cases are all generally handled at the moment, but the code is sensitive to a lot of (seemingly innocuous) changes that could cause some issues down the line
| check_permission_targets(collection.id, request.specify_user.id, [ | ||
| QueryBuilderPt.execute, QueryBuilderPt.export_csv]) |
There was a problem hiding this comment.
Conceptually, aren't we exporting the CSV in the collection the user is signed into, and running the query in the Collection specified by the query?
Say the user was signed into Collection A and exported a Query that was run in Collection B.
It's clear we should at least check that the user has the correct Query permissions in Collection B, but which collection should we check Export CSV permissions against?
This is definitely a question regarding the scope of the permission. I can see arguments for all three sides:
- The query is being executed in Collection B, regardless of what collection the user is signed in to, so we only need to check Export CSV permission in Collection B
- The action was initiated and is being performed in Collection A by the user, so we should check whether they have Export permissions in Collection A
- Both of the above apply, so we should check Export permissions in both collections
From a strictly data-origin standpoint, I think it probably makes the most sense to keep this as-is: that is, scope the permission check to the Collection in which the query is scoped to.
Do we think that approach would be the most intuitive interpretation of the permission? (this is not a rhetorical question, down to hear what other people think! 😅)
| thread = Thread( | ||
| target=do_export, | ||
| args=(spquery, collection, request.specify_user, file_name, 'webportal', None), | ||
| ) | ||
| thread.daemon = True | ||
| thread.start() |
There was a problem hiding this comment.
(optional, performance)
I know we've been using threads for Query exports (CSV, KML, and now WebPortal).
These queries can use a significant amount of simultaneous memory resources on the server. As noted in #7858 and its sub-issues (particularly #7875 in this case), this high allocation of resources can persist in the main Specify process until memory is released (which oftentimes will be when the process is stopped or killed).
Ideally, these potentially very memory-intensive operations should be performed in a separate process. That way, as soon as the operation is completed the allocated memory will be returned to the operating system and reused (there is some memory overhead when spawning a new process, which for Specify will largely be ~82 MiB for its setup and global structures as of v7.11.4).
There was a problem hiding this comment.
@melton-jason Maybe I can open a new issue for this– what is your suggestion for implementation?
Fixes #7606
This PR adds support for exporting query results as a Web Portal package (ZIP) for import into the Specify Web Portal. It introduces backend endpoints and logic for generating the export, updates frontend components to allow users to trigger and download the export, and adds appropriate permission checks and notifications. It even includes tests and validation improvements for tree-related queries.
At this time, exports using this tool require no changes on the Web Portal side of things.
Web Portal Export Feature:
exportwebportal/), including a newquery_to_web_portal_zipfunction and integration with the export workflow.Testing and Validation:
Error Reporting
This also adds handling for error reporting if an export (either CSV, KML, or Web Portal) fails.
exampleWebPortalfailure.txt
exampleCSVfailure.txt
exampleKMLfailure.txt
Example
Specify Query
Portal
Attachments
Map
Uniqueness Check
Checklist
self-explanatory (or properly documented)
Testing instructions
Querybuilder → Query → Export To Web Portal).PortalFiles/PortalData.csvPortalFiles/flds.jsonPortalFiles/PortalInstanceSetting.jsonPortalFiles/SolrFldSchema.xmlPortalFiles/flds.jsonand verify:spidandimgrows are presentTitleis derived fromattachment.titlewhen availableDEVELOPERS: To really make sure this works, we need to import this into the Web Portal. To really make sure it is an adequate replacement for the Specify 6 exporter, it is a good idea to test the same export query in both versions. All KU databases have a mapping setup in them, which can be compared easily.
attachment.ispublic) are visible in the portallongitude1,latitude1,longitude2, orlatitude2are present, but keep If second set of coordinates fields are present, they are required for Geo Coords webportal-installer#64 in mind), and make sure they are clickable (so a dialog appears)Summary by CodeRabbit
New Features
Bug Fixes
Tests
Localization