Commit Graph

749 Commits

Author SHA1 Message Date
Tessa Walsh
a2435a013b
Add totalSize to workflow API endpoints (#783) 2023-04-20 17:23:59 -04:00
Ilya Kreymer
3f41498c5c quickfix: fix typo, remove unnecessary async 2023-04-18 16:14:15 -07:00
Tessa Walsh
a9c1c54194
Make btrix helper work with microk8s (#768)
* Check for microk8s

* Use python3

* Add note about installing pytest

* Add chart/local.yaml to .gitignore to avoid committing
2023-04-18 08:50:46 -04:00
Ilya Kreymer
821d29bd2a
crawlconfig api: add 'currCrawlState' and 'currCrawlTimeStart' to crawlconfig list api (already queried on backend) (#770)
* crawlconfig api: add 'currCrawlState' and 'currCrawlTimeStart' to crawlconfig list api (already queried on backend)
2023-04-17 23:13:13 -07:00
Tessa Walsh
6b19f72a89
Add crawl errors endpoint (#757)
* Add crawl errors endpoint

If this endpoint is called while the crawl is running, errors are
pulled directly from redis.

If this endpoint is called when the crawl is finished, errors are
pulled from mongodb, where they're written when crawls complete.

* Add nightly backend test for errors endpoint

* Add errors for failed and cancelled crawls to mongo

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2023-04-17 12:59:25 -04:00
Ilya Kreymer
4a46f894a2
backend: add 'lastCrawlStartTime' and 'lastStartedByName' fields to crawlconfigs apis (#753) 2023-04-17 08:34:29 -07:00
Tessa Walsh
59e49eacd5
Update collections backend API (#759)
* Re-implement collections, storing crawlIds in collection

* Return collections for crawl endpoints and filter on coll name

* Remove crawl from all collections when deleted

* Revert get_collection_crawls to flat array of resources

* Fix tests
2023-04-14 12:17:18 -04:00
Henry Wilkinson
a62a452c07
Merge pull request #758 from webrecorder/docs-fonts&icons 2023-04-13 22:05:48 -04:00
Tessa Walsh
1ad82a63e6
Add crawl timeout nightly test (#762) 2023-04-11 19:36:18 -07:00
Ilya Kreymer
85b6a05419
Upgrade to mongo 6 and use sortArray for workflow crawls (#764) (#765)
fixes from 1.4.1:
* Upgrade to mongo 6 and use  for workflow crawls

* update readiness probe with timeouts doubled, and failure threshold increased for slower 'mongosh' readiness check

update versions to 1.5.0-beta.0 in backend and frontend

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-04-11 18:22:07 -07:00
Henry Wilkinson
d50fab67a9 Link accessibility improvements
- Nav bar text is now 20% higher opacity, hover state also differentiated with weight
- In-body links are now underlined
- Lightened BG colour and darkened link colour — now achieves an APCA score of 84!
2023-04-11 19:51:48 -04:00
Sara Tavares
07fb7317fe
Delete proofread-action.yaml (#760)
Resulting in a lot of false positives (to revisit later)
2023-04-11 15:49:27 -07:00
Tessa Walsh
f261967de8 Bump version to 1.5.0-beta.0 2023-04-11 11:51:17 -04:00
Tessa Walsh
fb80a04f18 Add crawl /log API endpoint
If a crawl is completed, the endpoint streams the logs from the log
files in all of the created WACZ files, sorted by timestamp.

The API endpoint supports filtering by log_level and context whether
the crawl is still running or not.

This is not yet proper streaming because the entire log file is read
into memory before being streamed to the client. We will want to
switch to proper streaming eventually, but are currently blocked by
an aiobotocore bug - see:

https://github.com/aio-libs/aiobotocore/issues/991?#issuecomment-1490737762
2023-04-11 11:51:17 -04:00
Henry Wilkinson
128aa89d33 Adds the specific icons currently required
- Updates writing docs page regarding adding icons
2023-04-10 18:58:24 -04:00
Henry Wilkinson
ec324799c9 removes icons 2023-04-10 03:05:32 -04:00
Henry Wilkinson
8e8f59ec13 Updates main & code block background colors 2023-04-07 00:06:26 -04:00
Henry Wilkinson
f90a85aa66 Merge branch 'main' into docs-fonts&icons 2023-04-06 23:40:49 -04:00
Henry Wilkinson
4852259f1c Adds the bootstrap icon library to the docs dir 2023-04-06 23:33:07 -04:00
Henry Wilkinson
8d60984760 Typography updates
- Sets Recursive as the main typeface for code and text!
- Adjusts variable axes and sets stylistic alternates accordingly.
- Self hosts the font
2023-04-06 23:28:23 -04:00
Henry Wilkinson
ab8088aec4 merge main into update 2023-04-06 18:39:23 -04:00
Henry Wilkinson
25800b924b update admonition icons 2023-04-06 17:49:29 -04:00
Henry Wilkinson
883da0bc89 Adds footer license & links
- Updates license section in readme clarifying docs licensing
2023-04-06 17:20:50 -04:00
Henry Wilkinson
63bbe4c1ae Adds bootstrap icons to the docs repo 2023-04-06 17:20:13 -04:00
Ilya Kreymer
631c84e488 version: bump to 1.4.0! 2023-04-06 10:12:43 -07:00
Henry Wilkinson
ba3daf326d
Adds inputmode attributes to workflow config fields (#755)
- Now the appropriate virtual keyboards are shown! :)
- Also adjusts type weight for workflow config headers to match mockups
2023-04-06 09:16:48 -07:00
Henry Wilkinson
c6aec84af4
Changes the autoscroll setting to true by default (#756)
As per my note on #745, currently all our other check boxes turn features on when enabled.  For consistency I have reversed the states of the autoscroll checkbox so the page autoscrolls when it is checked and does not run the behavior when it is unchecked.  Checked is also now the default state.

- Updates help text accordingly
- Renames `disableAutoscrollBehavior` → autoscrollBehavior
2023-04-06 09:06:55 -07:00
Ilya Kreymer
3ab62547a9 version: bump to 1.4.0-beta.2 2023-04-06 02:45:20 -07:00
Henry Wilkinson
0a1f5eff8e
Docs: adds mkdocs features, adds theming (#728)
* Add stylesheet & mkdocs features

- Adds a custom stylesheet & brand colours
- Adds Recursive as the code font
- Adds repo info to the nav bar
- Adds auto tracking ID links for deep linking to sections as users scroll the page
- Index pages are now a part of their section as determined by their H1
- Removes mkdocs info from future footer

* Reorganize content

- Renames "Dev" to "Develop" for improved navigation labels
- Adds links to tools the first time they're mentioned
- Rewords part of the homepage
- Hides section navigation on the homepage (now we don't have a blank section nav bar!
- Adds some syntax highlighting
- Removes some manual word wrapping — this was done very rarely / inconsistently

* Rename "Developer Docs" index page

- Better title for sidebar

* Update docs.md

- Adds links to tools
- Adds future docs style guide section
- Updates name and makes it an H1

- Replaces hyphens on the homepage with em dashes

* deployment index page: changed title, removed non-k8s section, cleaned up intro
* develop index page: changed title
fixed typo on main page

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-04-06 02:44:19 -07:00
Tessa Walsh
11ca3e678a
Configure crawler disk utilization threshold via helm chart (#748) 2023-04-05 21:51:53 -07:00
Tessa Walsh
f6f3b7abba
Add btrix CLI dev helper (#732)
* Add btrix CLI dev helper

* Fix identation

* Use bash syntax for ifs
2023-04-05 21:51:22 -07:00
sua yoo
80bc4a3eb9
Fix additional URLs (#752) 2023-04-05 20:11:09 -07:00
sua yoo
91c2c1ad62
Allow users to set additional page time limits (#744) 2023-04-05 20:06:46 -07:00
sua yoo
72967a0381
Frontend Docker build improvements (#749) 2023-04-05 20:05:45 -07:00
sua yoo
c60dc5d086
Crawls list backend pagination (#735) 2023-04-05 10:55:42 -07:00
Ilya Kreymer
63be81d835 ci: make playwright integration tests run only on PRs involving frontend 2023-04-05 09:57:34 -07:00
Ilya Kreymer
7f757d396a
config: add 'pageLoadTimeout' and 'pageExtraDelay' options to backend… (#742)
* config: add 'pageLoadTimeout' and 'pageExtraDelay' options to backend config
- add 'default_page_load_timeout_seconds' to values.yaml, defaulting to 120, for pageLoadTimeout
- add 'defaultPageLoadTimeSeconds ' to /api/settings, update tests for /api/settings
addresses issue in #636
2023-04-04 19:52:23 -07:00
Ilya Kreymer
67172ca1e2
fix: only include finished crawls in crawlCount value for /api/crawlconfigs (#746) 2023-04-04 19:50:14 -07:00
Ilya Kreymer
88497d2a64
text: rename workflowuration -> workflow (#741) 2023-04-04 08:48:06 -07:00
sua yoo
370b8cbd4d
Set max pages to API default (#739) 2023-04-04 08:47:37 -07:00
Ilya Kreymer
2b0d5ff8b3
misc frontend build fixes: playwright version + chunking (#740)
* misc frontend build fixes:
- fix playwright version to be consistent to fix playwright test
- chunking: set max number of chunks generated

* lock playwright version

* remove intl polyfill

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2023-04-03 21:27:44 -07:00
Ilya Kreymer
1c47a648a9
Max page limit override (#737)
* more page limit: update to #717, instead of setting --limit in each crawlconfig,
apply override --maxPageLimit setting, implemented in crawler, to override individually configured page limit

* update tests, no longer returning 'crawl_page_limit_exceeds_allowed'
2023-04-03 14:01:32 -07:00
Tessa Walsh
3b99bdf26a
Update nightly test fixtures to use Seed objects (#734) 2023-04-03 16:21:25 -04:00
Tessa Walsh
e9b61c632d
Add pageSize to pagination format (#736) 2023-04-03 15:57:47 -04:00
Henry Wilkinson
68ec47cb7f Moves deployment docs back to the root docs directory
- Replaces hyphens on the homepage with em dashes
2023-03-31 00:06:45 -04:00
Ilya Kreymer
887cb16146
Allow configurable max pages per crawl in deployment settings (#717)
* backend: max pages per crawl limit, part of fix for #716:
- set 'max_pages_crawl_limit' in values.yaml, default to 100,000
- if set/non-0, automatically set limit if none provided
- if set/non-0, return 400 if adding config with limit exceeding max limit
- return limit as 'maxPagesPerCrawl' in /api/settings
- api: /all/crawls - add runningOnly=0 to show all crawls, default to 1/true (for more reliable testing)

tests: add test for 'max_pages_per_crawl' setting
- ensure 'limit' can not be set higher than max_pages_per_crawl
- ensure pages crawled is at the limit
- set test limit to max 2 pages
- add settings test
- check for pages.jsonl and extraPages.jsonl when crawling 2 pages
2023-03-28 16:26:29 -07:00
Sara Tavares
948cce3d30
Add README.md related to run playwright tests locally (#722) 2023-03-28 16:08:28 -07:00
Tessa Walsh
4724754efc
Filter and sort crawl and workflow list API endpoints in backend (#724)
* Re-implement pagination and paginate crawlconfig revs

First step toward simplifying pagination to set us up for sorting
and filtering of list endpoints. This commit removes fastapi-pagination
as a dependency.

* Migrate all HttpUrl seeds to Seeds

This commit also updates the frontend to always use Seeds and to
fix display issues resulting from the change.

* Filter and sort crawls and workflows

Crawls:
- Filter by createdBy (via userid param)
- Filter by state (comma-separated string for multiple values)
- Filter by first_seed, name, description
- Sort by started, finished, fileSize, firstSeed
- Sort descending by default to match frontend

Workflows:
- Filter by createdBy (formerly userid) and modifiedBy
- Filter by first_seed, name, description
- Sort by created, modified, firstSeed, lastCrawlTime

* Add crawlconfigs search-values API endpoint and test
2023-03-28 17:55:40 -04:00
Sara Tavares
36cfb2591f
ci: fix version related to @playwright/test (#729)
* fix version, add resolutions to have fixed playwright version
2023-03-28 14:30:36 -07:00
sua yoo
25e4da2522
fix: enable semibold variable 2023-03-28 12:17:34 -07:00