browsertrix

Author	SHA1	Message	Date
Ilya Kreymer	86311ab4ea	merge 1.9.5 fixes (#1637 ) retry loading profile if initial load fails, follow-up to #1604 - Add missing setTimeout to retry profile loading bump RWP to 1.8.15	2024-03-27 21:49:19 -07:00
Tessa Walsh	00ced6dd6b	Add single page QA GET endpoint (#1635 ) Fixes #1634 Also make sure other get page endpoint without qa uses PageOut model	2024-03-27 14:57:59 -07:00
Henry Wilkinson	275f69493f	Frontend: `icon-button` Cleanup (#1628 ) Closes #1591 ### Changes - Converts one instance of a button with an icon in it to an `icon-button` - Makes all the trashcan icon buttons have a red hover state - Adds localization function & placeholder to upload dialog "Name" field - Adds localization functions to some missing icon-button label instances - Adds a few missing icon button labels Co-authored-by: sua yoo <sua@suayoo.com> Co-authored-by: sua yoo <sua@webrecorder.org>	2024-03-27 14:57:32 -04:00
Ilya Kreymer	412eb2ef32	MetaController update (#1630 ) Bump metacontroller to latest (4.11)	2024-03-27 08:49:56 -07:00
Tessa Walsh	66b4532321	Give test_crawl_timeout 10 mins to finish (#1627 ) Related to https://github.com/webrecorder/browsertrix-cloud/issues/1620 Follow-up to https://github.com/webrecorder/browsertrix-cloud/pull/1621, which didn't seem to fix the problem. I'm giving it much more time here in the hopes that it solves it (since it's a nightly test, time shouldn't be such a pressing issue).	2024-03-26 18:33:30 -07:00
Tessa Walsh	e9895e78a2	Add additional filters to page list endpoints (#1622 ) Fixes #1617 Filters added: - reviewed: filter by page has approval or at least one note (true) or neither (false) - approved: filter by approval value (accepts list of strings, comma-separated, each of which are coerced into True, False, or None, or ignored if they are invalid values) - hasNotes: filter by has at least one note (true) or not (false) Tests have also been added to ensure that results are as expected.	2024-03-21 21:33:07 -07:00
Tessa Walsh	b3b1e0d7d8	Fix intermittent crawl timeout test failure (#1621 ) Fixes #1620 This increases the total timeout from 60 seconds to 120 seconds for crawl to complete, which should be sufficient given how intermittently the failure has been happening. Can increase it further if needed.	2024-03-21 17:18:27 -07:00
Ilya Kreymer	4f676e4e82	QA Runs Initial Backend Implementation (#1586 ) Supports running QA Runs via the QA API! Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes #1498 Also requires the latest Browsertrix Crawler 1.1.0+ (from webrecorder/browsertrix-crawler#469 branch) Notable changes: - QARun objects contain info about QA runs, which are crawls performed on data loaded from existing crawls. - Various crawl db operations can be performed on either the crawl or `qa.` object, and core crawl fields have been moved to CoreCrawlable. - While running,`QARun` data stored in a single `qa` object, while finished qa runs are added to `qaFinished` dictionary on the Crawl. The QA list API returns data from the finished list, sorted by most recent first. - Includes additional type fixes / type safety, especially around BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific get_upload(), get_basecrawl(), get_crawl() getters for internal use and get_crawl_out() for API - Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch) along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results. --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-03-20 22:42:16 -07:00
sua yoo	05e03e0b90	Disable Prettier check in CI (#1619 ) Disables `prettier:check` until discrepancies are handled in https://github.com/webrecorder/browsertrix-cloud/issues/1618 so that formatting issues don't fail CI runs.	2024-03-20 15:01:51 -07:00
Emma Segal-Grossman	d2862ff797	Emit more modern code for browsers (#1614 ) Adds a `browserlist` field to `package.json`, which Webpack picks up so it doesn't convert nullish coalescing operators into obfuscated messes like `key !== null && key !== void 0 ? key : null`. This improves output size a little & improves the debugging experience as well. Tested in Chrome, FF, & Safari locally and didn't encounter any issues.	2024-03-19 17:22:41 -04:00
Emma Segal-Grossman	41d6e79cb3	Clean up ESLint warnings in main (#1616 ) See title. The only place this changes behaviour is in the placeholder page list, which will be replaced by the real one shortly, so I'm going to just merge this.	2024-03-19 17:22:27 -04:00
Tessa Walsh	21ae38362e	Add endpoints to read pages from older crawl WACZs into database (#1562 ) Fixes #1597 New endpoints (replacing old migration) to re-add crawl pages to db from WACZs. After a few implementation attempts, we settled on using [remotezip](https://github.com/gtsystem/python-remotezip) to handle parsing of the zip files and streaming their contents line-by-line for pages. I've also modified the sync log streaming to use remotezip as well, which allows us to remove our own zip module and let remotezip handle the complexity of parsing zip files. Database inserts for pages from WACZs are batched 100 at a time to help speed up the endpoint, and the task is kicked off using asyncio.create_task so as not to block before giving a response. StorageOps now contains a method for streaming the bytes of any file in a remote WACZ, requiring only the presigned URL for the WACZ and the name of the file to stream.	2024-03-19 14:14:21 -07:00
Emma Segal-Grossman	2c44011b5b	Update node version mentioned in docs (#1615 ) Follow-up to #1612 cc @SuaYoo	2024-03-19 16:40:53 -04:00
sua yoo	dcd2efcd3b	Fix asset imports in tests (#1611 ) Addresses failing test in https://github.com/webrecorder/browsertrix-cloud/pull/1592 by fixing asset imports in unit tests. Unit tests now import an empty string for all assets--note: if we want to test actual asset content, will need to update this config.	2024-03-19 13:06:07 -07:00
sua yoo	26820cbaba	Upgrade Node 16 > 18 (#1612 )	2024-03-19 13:02:08 -07:00
sua yoo	b43f550ff3	Fix missing page component imports (#1610 ) Missed bug introduced in https://github.com/webrecorder/browsertrix-cloud/pull/1608, adds back imports and disables `import-x` rule.	2024-03-18 20:55:35 -07:00
Emma Segal-Grossman	91df222cdf	Fix mismatch in prettier import order config (#1609 ) Follow-up to #1608 — quick fix for an issue I encountered after merging main into #1497 Just going to directly merge once this completes (cc @SuaYoo for visibility)	2024-03-18 22:14:13 -04:00
sua yoo	c9c57fafee	fix: hide wip qa tab	2024-03-18 18:59:24 -07:00
Emma Segal-Grossman	b1e2f1b325	Add ESLint rules for import ordering (#1608 ) Follow-up from https://github.com/webrecorder/browsertrix-cloud/pull/1546#discussion_r1529001599 (cc @SuaYoo) - Adds `eslint-plugin-import-x` and `@ianvs/prettier-plugin-sort-imports` and configures rules for them both so imports get sorted on format & on lint. - Runs both on everything!	2024-03-18 21:50:02 -04:00
Ilya Kreymer	5a4902b6d4	kubernetes api: avoid overriding content-type header in kubernetes-asyncio, pass in via arg instead (main) (#1605 ) - instead of overriding the content-type header globally, pass 'application/merge-patch+json' to self.custom_api.patch_namespaced_custom_object() directly - bump kubernetes-asyncio to 29.0.0 - fixes potential issues with global override of the header in kubernetes-asyncio - copy of #1602 for main	2024-03-18 11:17:54 -07:00
sua yoo	6e9c14aea6	test: fix frontend auth unit test	2024-03-18 11:00:13 -07:00
Henry Wilkinson	1093aa959f	Adds favicons! (#1584 ) Closes #328 ## Changes The app has favicons now! Added: - SVG - Changes to slightly brighter colours in dark mode for better contrast! - Fallback ICO - `apple-touch-icon` (some browsers also use this, not just iOS) - Web manifest with app description - Two web manifest icon sizes should users add the app to their local launcher (Windows' Start or macOS' Dock / Launchpad - Lighting & render by @emma-sg, thanks! The manifest and icons are copied to the root directory at build time by webpack. All of the dedicated ways of doing this seemed more complicated than this? --------- Co-authored-by: emma <hi@emma.cafe>	2024-03-16 15:11:31 -07:00
Henry Wilkinson	fa194c3d0d	Docs: Update docs theme (#1594 ) Partially addresses #1241 ### Changes - Adds Browsertrix logo to readme - It detects if you're in light or dark mode and adjusts the text color accordingly! _The future is now!_ - Minor readme updates - Updates icon and adds favicon SVGs to the docs - This does not yet use Konsole for the docs site title. Will have to sort this out later along with private hosting for that font. - Updates docs theme to use new brand colours — picked the green for this one, will probably be consistent across all of Webrecorder's MKDocs sites.	2024-03-16 15:09:31 -07:00
Ilya Kreymer	e7af081af1	profile browser fixes: better resource usage + load retry (main) (#1604 ) - Backend: Use separate resource constraints for profiles: default profile browser resources to either 'profile_browser_cpu' / 'profile_browser_memory' or single browser 'crawler_memory_base' / 'crawler_cpu_base', instead of scaled to the number of browser workers - Frontend: check that profile html page is loading, keep retrying if still getting nginx error instead of loading an iframe with the error. Fixes #1598 (Copy of #1599 from 1.9.4)	2024-03-16 15:07:04 -07:00
sua yoo	960f54bf4e	Update issue reporting templates (#1596 ) Changes: - Edits templates for succinctness and precision - Separate section for screenshots and OS/browser for bugs - Removes requirements and TODO section of features to simplify interface for external-facing requests	2024-03-16 07:27:19 -04:00
wvengen	6278157f40	Make storage deletion work on more S3 providers, don't use access URL for deletion (#1600 ) I came across [this problem](https://forum.webrecorder.net/t/deleting-crawl-failure/512) and noticed that the access URL is used when deleting files, causing my file deletions to fail on OpenStack SWIFT S3 (relates to #1090). This trivial change makes it work there.	2024-03-16 04:17:23 -04:00
sua yoo	eb7036bf87	Add QA tab to archived item detail (#1590 ) Adds tab with placeholders as a starting point to work off of. The badge and button is not currently linked up to any data or actions.	2024-03-12 14:05:16 -07:00
Henry Wilkinson	16e8b761c0	Frontend: Various icon updates (#1569 ) Closes #1568 ## Changes - Status icons are now filled! - Uses Bootstrap Icons' new `copy` icon for all actions involving copying to clipboard! - Finally! A real copy icon! 🎉 - Removes `copy-code.svg` as it is no longer used - Actions involving duplicating objects still use `files`... Which is good! Now they have distinct symbols! - Adds orange to the tailwind colour palette --------- Co-authored-by: sua yoo <sua@webrecorder.org>	2024-03-12 15:18:10 -04:00
sua yoo	9f312c075e	Manually approve pages in QA review (#1576 ) - Automatically update view to first page if page ID isn't specified - Show current page URL in location bar (resolves https://github.com/webrecorder/browsertrix-cloud/issues/1495) - Approve, reject, or leave notes on a page - Display temporary list of links to pages in the sidebar	2024-03-12 10:08:51 -07:00
Henry Wilkinson	8ba29ca776	Browsertrix Cloud → Browsertrix text rename (#1466 ) Part of #1241 ### Changes - Renames all instances of "Browsertrix Cloud" to "Browsertrix" on the front end, emails, and documentation --------- Co-authored-by: emma <hi@emma.cafe>	2024-03-12 11:30:05 -04:00
Ilya Kreymer	08f6847194	Configurable Max Scale for frontend (#1557 ) Allow maximum scale option to be fully configurable via `max_crawl_scale`. Already configurable on the backend, and now exposed to the frontend via API `/api/settings` `maxCrawlScale` value. The workflow editor and workflow details are updated to allow selecting the scale up to the maxCrawlScale setting (which defaults to 3 if not set).	2024-03-11 16:21:20 -07:00
Emma Segal-Grossman	8462c08206	Fix a couple linting issues (#1565 )	2024-03-11 16:20:37 -07:00
sua yoo	548261e663	Fix shoelace icon loading (#1587 ) Loads `sl-icon` synchronously to get correct base path when running webpack-dev-server.	2024-03-11 13:38:58 -07:00
dependabot[bot]	a5521c6866	Bump cryptography from 41.0.1 to 42.0.4 in /ansible (#1574 ) Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.1 to 42.0.4. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-06 16:24:36 -08:00
Ilya Kreymer	ea494fa6e6	Merge V1.9.3 changes into main (#1583 ) - Fix execution time checking by keeping lastUpdatedTime in db by @ikreymer in https://github.com/webrecorder/browsertrix-cloud/pull/1573 - disable postcss-lit for var css - Prevent closing tooltips from closing collection share dialog by @SuaYoo in https://github.com/webrecorder/browsertrix-cloud/pull/1579 - Fix pending exclusion pagination by @SuaYoo in https://github.com/webrecorder/browsertrix-cloud/pull/1578 - Fix regex escape in exclusion editor text match by @SuaYoo in https://github.com/webrecorder/browsertrix-cloud/pull/1577 --------- Co-authored-by: emma <hi@emma.cafe> Co-authored-by: sua yoo <sua@webrecorder.org>	2024-03-06 15:38:22 -08:00
Tessa Walsh	c20e754269	Add updatable QA reviewStatus field to crawls (#1575 ) Fixes #1539 Adds `reviewStatus` field to `BaseCrawl` model, updatable via the crawl update API endpoint. Acceptable values are "good", "acceptable" or "failure", enforced by an Enum. Added to `BaseCrawl` so that we can extend support to uploads more easily later on, but for now we'll only display this for crawls in the frontend.	2024-03-05 16:49:23 -08:00
Emma Segal-Grossman	780dd09321	Create `ArchivedItemPage` and `ArchivedItemPageComment` types (#1567 ) Based on #1534 Figured this should be in place so we can work on other front-end things with these, rather than dealing with refactoring later <!-- Fixes #issue_number --> ### Changes - Adds `ArchivedItemPage` and `ArchivedItemPageComment` types from #1534 (thank you @SuaYoo!) - Adds typedefs for match and resource count properties - sets properties optional in the db schema to optional in the type as well ### Manual testing 1. ### Screenshots \| Page \| Image/video \| \| ---- \| ----------- \| \| \| \| <!-- ### Follow-ups -->	2024-03-04 18:52:09 -05:00
Tessa Walsh	ec0db1c323	Temporarily remove pages migration (#1572 ) Removing until we have a better tested solution, including to avoid testing of QA runs for new crawls in beta.	2024-03-04 10:30:04 -08:00
Tessa Walsh	144000c7a3	Add guide for customizing Helm chart values (#1556 ) Fixes #1555 This is a first pass at some of the configuration options within the Helm chart that might be most applicable to users. Emphasis is placed on configuration that's particular to our application, such as storage and crawler channels. --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2024-03-04 12:03:11 -05:00
Ilya Kreymer	09a0d51843	pages: set page status to 200 if unset and loadState != 0 (#1563 ) Follow up to #1516, ensure page status is set to 200 if no status is provided, if loadState is not 0	2024-02-29 15:15:17 -08:00
Ilya Kreymer	2ac6584942	Refactor operator class into module (#1564 ) The operator class has gotten fairly large, this is a first pass in refactoring operator.py into a submodule instead, with multiple operator instances which handle different types of objects. - The main k8s interface has been split into K8sOpApi which extends K8sApi and is shared across all operators. - Each operator extends BaseOperator which also has an instance of K8sOpApi - The CrawlOperator is still the bulk of the functionality, but will likely be further refactored to support QA jobs --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-02-29 14:40:12 -08:00
Tessa Walsh	da19691184	Add crawl errors incrementally during crawl (#1561 ) Fixes #1558 - Adds crawl errors to database incrementally during crawl rather than after crawl completes - Simplifies crawl /errors API endpoint to always return errors from database	2024-02-29 09:16:34 -08:00
Ilya Kreymer	804f755787	Increase startup probe time to account for long-running migrations (#1560 ) - increases the failureThreshold for startupProbe for the api backend container to account for long running migrations, upto 300 seconds - add `/healthzStartup` which checks if db is ready - bump - keeps `/healthz` to always return 200 when running - increases livenessProbe failureThreshold to be higher than readiness probe, following recommended best practice of liveness probe > readiness probe - fixes #1559	2024-02-28 14:22:33 -08:00
Tessa Walsh	14189b7cfb	Add crawl pages and related API endpoints (#1516 ) Fixes #1502 - Adds pages to database as they get added to Redis during crawl - Adds migration to add pages to database for older crawls from pages.jsonl and extraPages.jsonl files in WACZ - Adds GET, list GET, and PATCH update endpoints for pages - Adds POST (add), PATCH, and POST (delete) endpoints for page notes, each with their own id, timestamp, and user info in addition to text - Adds page_ops methods for 1. adding resources/urls to page, and 2. adding automated heuristics and supplemental info (mime, type, etc.) to page (for use in crawl QA job) - Modifies `Migration` class to accept kwargs so that we can pass in ops classes as needed for migrations - Deletes WACZ files and pages from database for failed crawls during crawl_finished process - Deletes crawl pages when a crawl is deleted Note: Requires a crawler version 1.0.0 beta3 or later, with support for `--writePagesToRedis` to populate pages at crawl completion. Beta 4 is configured in the test chart, which should be upgraded to stable 1.0.0 when it's released. Connected to https://github.com/webrecorder/browsertrix-crawler/pull/464 --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2024-02-28 12:11:35 -05:00
sua yoo	974b919eef	docs: remove reference to prod	2024-02-26 13:26:47 -08:00
sua yoo	86a816662e	add api reference section	2024-02-26 12:58:21 -08:00
Emma Segal-Grossman	f6e82d9335	Archived item nav button quickfix (#1543 ) Navigation buttons weren't being laid out properly and were overflowing in unintentional ways, this fixes that, and then also updates navigation buttons & puts them into use everywhere elements service the purpose of navigation buttons were used instead! <img width="452" alt="Screenshot 2024-02-24 at 10 37 41 PM" src="https://github.com/webrecorder/browsertrix-cloud/assets/5727389/a77ed1be-3f95-4e03-a4d8-e3740229621e"> <img width="519" alt="Screenshot 2024-02-24 at 10 38 06 PM" src="https://github.com/webrecorder/browsertrix-cloud/assets/5727389/684bc9a4-bec2-4258-b264-662dc441e75f"> <img width="273" alt="Screenshot 2024-02-24 at 10 38 20 PM" src="https://github.com/webrecorder/browsertrix-cloud/assets/5727389/863d9d9a-121e-4682-8c12-eaf94ae69c7c"> <img width="410" alt="Screenshot 2024-02-24 at 10 38 25 PM" src="https://github.com/webrecorder/browsertrix-cloud/assets/5727389/b321375c-d063-4c00-b876-36a592c85a35"> <img width="200" alt="Screenshot 2024-02-24 at 10 38 37 PM" src="https://github.com/webrecorder/browsertrix-cloud/assets/5727389/62bbb5d1-d4f3-4ba3-8cd5-035242424f3a">	2024-02-25 02:04:53 -05:00
Ilya Kreymer	ae59617e02	ci fix: deploy-dev.yaml fix, install poetry earlier, add decrypt values to sparse checkout	2024-02-23 18:40:36 -08:00
Ilya Kreymer	5e003f36a0	ci: also publish helm chart for *-release branches	2024-02-22 23:54:23 -08:00
Tessa Walsh	fa35d8994f	Disable useSitemap by default in new workflows (#1541 )	2024-02-22 23:54:23 -08:00

1 2 3 4 5 ...

1070 Commits