Commit Graph

691 Commits

Author SHA1 Message Date
sua yoo
85913112a2
Upgrade lit + shoelace to reduce build size (#938)
* upgrade lit

* upgrade shoelace

* upgrade testing libraries

* add webpack bundle analyzer

* revert shoelace changes

* remove bundle analyzer

* remove console log
2023-07-20 11:50:05 +02:00
Tessa Walsh
d5c3a8519f
Add crawler Use Sitemap option to Browsertrix Cloud (#978)
* Add user-guide docs for Use Sitemap option
---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-07-19 13:57:52 -04:00
Anish Lakhwara
db851b8360
Merge pull request #976 from webrecorder/ansible-lint-action
feat: ansible lint github action
2023-07-19 03:22:22 +10:00
Ilya Kreymer
a5312709bb
fix issues that caused cronjob container to crash: (#987)
- don't set CRAWL_TIMEOUT to "None" in configmap, and if encountered, just set to 0
- run register_exit_handler() after run loop has been inited
2023-07-18 18:08:53 +02:00
sua yoo
c5b3be0680
Fix frontend formatting pre-commit (#991)
* update lint staged config

* remove prettier defaults
2023-07-18 17:51:13 +02:00
Anish Lakhwara
4fed3ed1b0
fix: resolve ansible pipenv dependencies successfully (#977) 2023-07-18 17:39:38 +02:00
Ilya Kreymer
5dede47874 remove accidentally added values file! 2023-07-16 15:05:08 +02:00
Anish Lakhwara
bc82f562dc feat: ansible lint github action 2023-07-10 17:58:47 -07:00
Ilya Kreymer
2372f43c2c
frontend: fix to collection editor with crawls and uploads (#971)
* frontend:
- follow up to #969, fixes crawl workflows by using crawl-specific endpoint and merging results

* get crawls and uploads concurrently

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2023-07-10 19:29:19 +02:00
sua yoo
f3660839bf
Allow users to add uploads to collections (#968)
* show uploads in 'Select Uploads' section
2023-07-09 22:21:50 -07:00
Ilya Kreymer
7d694754c6
uploads api ext: (#970)
- also support collectionId filter on /all-crawls
- update tests
2023-07-09 22:12:54 -07:00
Ilya Kreymer
f1bce310d0
uploads api: support filtering uploads by collectionId (#969)
tests: add collection filter test
2023-07-09 10:54:30 -07:00
Ilya Kreymer
a640f58657
Tests: fix test get crawl loop (#967)
* tests: add sleep() between all looping get_crawl() calls to avoid tight request loop, also remove unneeded loop
will likely fix occasional '504 timeout' test failures where frontend is overwhelmed with /replay.json requests
2023-07-08 17:16:11 -07:00
Henry Wilkinson
d9e73fcbc3
Reorder Limits section (#966)
* Reorder Limits section

- Minor text change to section names
  - "Limit Per Page" → "Per-Page Limits"
  - "Limit Per Crawl" → "Per-Crawl Limits"

* Reorder limits section in documentation
2023-07-08 08:54:30 -07:00
Anish Lakhwara
fd310f620a
fix: mongodb uri password not accessible on second API call (#964) 2023-07-08 08:48:50 -07:00
Anish Lakhwara
9489c1e00d
fix: configure_kubectl is the variable name (#963) 2023-07-08 08:13:54 -07:00
Anish Lakhwara
df82a4755f
fix: pass ansible-lint in DO playbook (#962)
* fix: pass ansible-lint in DO playbook

* fix: don't break s3 module
2023-07-08 08:13:23 -07:00
Ilya Kreymer
8eeb66e11f
Frontend more upload path fixes (#961)
* additional fixes for #935:
- don't use artifactType for detail pages, ensure correct artifact selected based on path

* naming tweaks:
- from uploads detail, return to 'All Uploads' with filter
- from crawls detail, return to 'All Crawls' with filter
- rename general to 'All Archived Data'
2023-07-07 15:41:03 -07:00
Anish Lakhwara
478719d59a
fix: only use db_create when the db is created (#959) 2023-07-07 14:38:03 -07:00
Ilya Kreymer
d3a757e20b
partial fix for: #935: (#960)
- add route for /artifacts/upload/<id> to be used for uploads
- link uploads to /artifacts/upload/<id> instead of /artifacts/crawl/<id>
2023-07-07 14:23:26 -07:00
sua yoo
de4b18aa67
List crawls, uploads, and all objects in UI (#941)
- Adds top-level "Archived Data" view, replacing "Finished Crawls" and moving it as "Crawls" into view
- Adds list for viewing all artifacts/data
- Adds list for viewing all uploaded crawls
- Updates crawl detail view to show upload details
- Edit upload metadata, including 'name'
- Delete uploads
---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-07-07 13:20:28 -07:00
Ilya Kreymer
d7cb47390e
readd support for passing in 'crawler_extra_args' for additional/custom (#957)
options not covered by standard crawler opts (removed setting all args this way in #889)
2023-07-07 12:08:40 -07:00
Ilya Kreymer
2038e3d668
remove default: similar to #952, remove default extraHops setting as it disables 'url list' extraHops by forcing the value to 0 (#954) 2023-07-07 12:08:30 -07:00
Ilya Kreymer
7139b9a7a9
operator: ensure finished is always set (#953) 2023-07-07 12:08:15 -07:00
Anish Lakhwara
99117a532b
feat: configure mongodb firewall (#949)
Co-authored-by: Anish Lakhwara <anish+git@lakhwara.com>
2023-07-07 09:15:36 -07:00
Anish Lakhwara
c5803dcda0
feat: configure kubectl through ansible (#948)
Co-authored-by: Anish Lakhwara <anish+git@lakhwara.com>
2023-07-07 09:15:18 -07:00
Anish Lakhwara
dd3d9001fb
fix: idempotent mongodb creation, with saved facts (#945)
Co-authored-by: Anish Lakhwara <anish+git@lakhwara.com>
2023-07-07 09:14:12 -07:00
Ilya Kreymer
00eb62214d
Uploads API: BaseCrawl refactor + Initial support for /uploads endpoint (#937)
* basecrawl refactor: make crawls db more generic, supporting different types of 'base crawls': crawls, uploads, manual archives
- move shared functionality to basecrawl.py
- create a base BaseCrawl object, which contains start / finish time, metadata and files array
- create BaseCrawlOps, base class for CrawlOps, which supports base crawl deletion, querying and collection add/remove

* uploads api: (part of #929)
- new UploadCrawl object which extends BaseCrawl, has name and description
- support multipart form data data upload to /uploads/formdata
- support streaming upload of a single file via /uploads/stream, using botocore multipart upload to upload to s3-endpoint in parts
- require 'filename' param to set upload filename for streaming uploads (otherwise use form data names)
- sanitize filename, place uploads in /uploads/<uuid>/<sanitized-filename>-<random>.wacz
- uploads have internal id 'upload-<uuid>'
- create UploadedCrawl object with CrawlFiles pointing to the newly uploaded files, set state to 'complete'
- handle upload failures, abort multipart upload
- ensure uploads added within org bucket path
- return id / added when adding new UploadedCrawl
- support listing, deleting, and patch /uploads
- support upload details via /replay.json to support for replay
- add support for 'replaceId=<id>', which would remove all previous files in upload after new upload succeeds. if replaceId doesn't exist, create new upload. (only for stream endpoint so far).
- support patching upload metadata: notes, tags and name on uploads (UpdateUpload extends UpdateCrawl and adds 'name')

* base crawls api: Add /all-crawls list and delete endpoints for all crawl types (without resources)
- support all-crawls/<id>/replay.json with resources
- Use ListCrawlOut model for /all-crawls list endpoint
- Extend BaseCrawlOut from ListCrawlOut, add type
- use 'type: crawl' for crawls and 'type: upload' for uploads
- migration: ensure all previous crawl objects / missing type are set to 'type: crawl'
- indexes: add db indices on 'type' field and with 'type' field and oid, cid, finished, state

* tests: add test for multipart and streaming upload, listing uploads, deleting upload
- add sample WACZ for upload testing: 'example.wacz' and 'example-2.wacz'

* collections: support adding and remove both crawls and uploads via base crawl
- include collection_ids in /all-crawls list
- collections replay.json can include both crawls and uploads

bump version to 1.6.0-beta.2
---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-07-07 09:13:26 -07:00
Anish Lakhwara
e1d6de21a0
docs: ansible deploy docs reflect expected env var names (#946)
Co-authored-by: Anish Lakhwara <anish+git@lakhwara.com>
2023-07-06 21:57:19 -07:00
Tessa Walsh
29a6f0f6bc
Fix links in watch crawl after workflow crawl completes (#943) 2023-07-06 15:04:26 -07:00
Tessa Walsh
bf1e817da3
Unset default scopeType for seeds so they inherit parent scopeType by default (#952) 2023-07-06 15:03:05 -07:00
Henry Wilkinson
8a240ad044
Fixes z-index (#939) 2023-07-04 23:05:09 -04:00
Henry Wilkinson
ac4716614e
Minor gramatical changes to documentation (#919) 2023-07-04 17:14:49 -04:00
Ilya Kreymer
4c8de3160b typo fix: fix extra trailing quote on CRAWL_ARGS in configmap.yaml 2023-06-16 18:55:21 -07:00
Ilya Kreymer
e37f220d6c version: bump to 1.6.0-beta.1 2023-06-16 18:53:32 -07:00
Tessa Walsh
c7051d5fbf
Backend API consistency pass (#921)
* Make API add and update method returns consistent

- Updates return {"updated": True}
- Adds return {"added": True}
- Both can additionally have other fields as needed, e.g. id or name

- remove Profile response model, as returning added / id only
- reformat

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-06-16 18:52:46 -07:00
Ilya Kreymer
d9ad8c11d2 frontend: fix RWP_BASE_URL not being set correctly for nginx image 2023-06-13 00:04:46 -07:00
Tessa Walsh
bd6dc79449
Add frontend support for auto-adding collections to workflows (#916)
- Adds collections search and list to workflow editor
- Adds collections to workflow details component
- Adds namePrefix filter to backend GET /orgs/{oid}/collections endpoint to support case-insensitive searching of collections
- Adds documentation for new setting

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-06-12 18:18:05 -07:00
Henry Wilkinson
71e9984e65
Adds documentation link and version copy button to footer (#920)
* Updates footer

- Adds documentation link
- Adds label to GitHub link, moves outside of the version code
- Adds copy button to version code for quick access when filing bug reports :)

* Comments out invisible div

* Improves responsiveness on mobile
2023-06-12 17:51:21 -07:00
Ilya Kreymer
ec3404c798
Fix Extra URLs in Scope (#913)
* scope fix: when using 'Custom Page Prefix scope (fixes #873)
- don't include primary seed URL in include list
- don't always add trailing slash to extra in scope URLs
- set seed scope to 'prefix' (supported via webrecorder/browsertrix-crawler#318) instead of re-including seed URL
- add comments on using 'custom' to indicate 'Custom Prefix Scope' semantics on frontend, setting actual scope to 'prefix' on backend
- remove unneeded conditional for additional urls, main scopeType overridden per seed anyway
2023-06-12 17:29:41 -07:00
Henry Wilkinson
79703baa69
Org Settings documetation & Getting Started docs page updates 2023-06-11 17:39:16 -04:00
Henry Wilkinson
2364433932
Admin Panel Minor Frontend Style Updates (#915)
- Unifies trash icons on all pages to use trash3 (there were a few stragglers!)
- Brings styling of org quotas dialogue in-line with the rest of our dialogues
- Adds missing localization strings
- Swaps button with icon button to match table row action styling elsewhere
2023-06-10 19:21:34 -07:00
Tessa Walsh
325355d991
Fix post-crawl collection stats update and add test (#918)
This fixes #917, where crawls added to a collection via the workflow
autoAddCollections were not successfully represented in the crawl
and page count stats in the collection after completing.
2023-06-10 19:06:25 -07:00
Henry Wilkinson
8477919989
Adds all workflow settings to the user docs with descriptions (#894)
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-06-08 14:28:58 -04:00
Tessa Walsh
e10b7093c7
Fix bug preventing deleting collections with no crawls (#912) 2023-06-08 11:28:30 -07:00
Ilya Kreymer
9707fb55e4
fix finished workflows incorrectly being displayed as running (#909) 2023-06-08 11:26:42 -07:00
Ilya Kreymer
4428184aea
frontend: configure running with a fixed 'replay.json', auth headers passed via separate config (#899)
wabac.js will reload the replay.json on 403 with new token (will be in next version of wabac.js)
presign urls: make presign timeout configurable (in minutes), defaults to 60 mins
dockerfile: fix configuring RWP_BASE_URL
2023-06-08 11:26:26 -07:00
Henry Wilkinson
d286555396
Adds initial version of the documentation style guide (#891)
* Adds initial version of the documentation style guide

* Adds a note about adding new pages

* Instructs users about where to edit the `nav:` for the section

* Adds acronym rule clarification
2023-06-07 16:54:49 -07:00
Tessa Walsh
120f7ca158
Precompute crawl file stats (#906) 2023-06-07 16:39:49 -07:00
Ilya Kreymer
dd757961fc
config: add overridable 'user_agent_suffix' and 'user_agent' to values.yaml, (#910)
passed to crawler --userAgentSuffix and --userAgent params, respectively, using
'quote' to support spaces in user-agent.
config: re-order settings to put 'Crawler Settings' section first, followed by 'Cluster Settings'
fixes #787
2023-06-07 12:01:12 -07:00