Make HTTP cache persisten across sessions (#936)

* use pytest-cache to cache HTTP requests, so that it survives across multiple runs * add an option to clear the HTTP cache * add some docs about pytest
2026-03-10 10:00:08 -04:00 · 2022-11-10 15:01:27 +01:00
parent 06a5a54103
commit 4c8443fd00
4 changed files with 267 additions and 9 deletions
--- a/docs/development/developing.md
+++ b/docs/development/developing.md
@@ -52,3 +52,152 @@ make tests
 Sometimes you might be asked to rebase main into your branch. Please refer to this [section on git rebase from GitHub docs](https://docs.github.com/en/get-started/using-git/about-git-rebase).

 If you need help with anything, feel free to reach out and ask for help!
+
+
+## pytest quick guide
+
+We make a heavy usage of `pytest`. Here is a quick guide and collection of
+useful options:
+
+- To run all tests in the current directory and subdirectories: `pytest`
+
+- To run tests in a specific directory or file: `pytest path/to/dir/test_foo.py`
+
+- `-s`: disables output capturing
+
+- `--pdb`: in case of exception, enter a `(Pdb)` prompt so that you can
+  inspect what went wrong.
+
+- `-v`: verbose mode
+
+- `-x`: stop the execution as soon as one test fails
+
+- `-k foo`: run only the tests whose full name contains `foo`
+
+- `-k 'foo and bar'`
+
+- `-k 'foo and not bar'`
+
+
+## Running integration tests under pytest
+
+`make test` is useful to run all the tests, but during the development is
+useful to have more control on how tests are run. The following guide assumes
+that you are in the directory `pyscriptjs/tests/integration/`.
+
+#### To run all the integration tests, single or multi core
+
+```
+$ pytest -xv
+...
+
+test_00_support.py::TestSupport::test_basic[chromium] PASSED                                              [  0%]
+test_00_support.py::TestSupport::test_console[chromium] PASSED                                            [  1%]
+test_00_support.py::TestSupport::test_check_js_errors_simple[chromium] PASSED                             [  2%]
+test_00_support.py::TestSupport::test_check_js_errors_expected[chromium] PASSED                           [  3%]
+test_00_support.py::TestSupport::test_check_js_errors_expected_but_didnt_raise[chromium] PASSED           [  4%]
+test_00_support.py::TestSupport::test_check_js_errors_multiple[chromium] PASSED                           [  5%]
+...
+```
+
+`-x` means "stop at the first failure". `-v` means "verbose", so that you can
+see all the test names one by one. We try to keep tests in a reasonable order,
+from most basic to most complex. This way, if you introduced some bug in very
+basic things, you will notice immediately.
+
+If you have the `pytest-xdist` plugin installed, you can run all the
+integration tests on 4 cores in parallel:
+```
+$ pytest -n 4
+```
+
+#### To run a single test, headless
+```
+$ pytest test_01_basic.py -k test_pyscript_hello -s
+...
+[  0.00 page.goto       ] pyscript_hello.html
+[  0.01 request         ] 200 - fake_server - http://fake_server/pyscript_hello.html
+...
+[  0.17 console.info    ] [py-loader] Downloading pyodide-0.21.3...
+[  0.18 request         ] 200 - CACHED - https://cdn.jsdelivr.net/pyodide/v0.21.3/full/pyodide.js
+...
+[  3.59 console.info    ] [pyscript/main] PyScript page fully initialized
+[  3.60 console.log     ] hello pyscript
+```
+
+`-k` selects tests by pattern matching as described above. `-s` instructs
+`pytest` to show the output to the terminal instead of capturing it. In the
+output you can see various useful things, including network requests and JS
+console messages.
+
+#### To run a single test, headed
+```
+$ pytest test_01_basic.py -k test_pyscript_hello -s --headed
+...
+```
+
+Same as above, but with `--headed` the browser is shown in a window, and you
+can interact with it. The browser uses a fake server, which means that HTTP
+requests are cached.
+
+Unfortunately, in this mode source maps does not seem to work, and you cannot
+debug the original typescript source code. This seems to be a bug in
+playwright, for which we have a workaround:
+
+```
+$ pytest test_01_basic.py -k test_pyscript_hello -s --headed --no-fake-server
+...
+```
+
+As the name implies, `-no-fake-server` disables the fake server: HTTP requests
+are not cached, but source-level debugging works.
+
+Finally:
+
+```
+$ pytest test_01_basic.py -k test_pyscript_hello -s --dev
+...
+```
+
+`--dev` implies `--headed --no-fake-server`. In addition, it also
+automatically open chrome dev tools.
+
+
+#### Fake server, HTTP cache
+
+By default, our test machinery uses a playwright router which intercepts and
+cache HTTP requests, so that for example you don't have to download pyodide
+again and again. This also enables the possibility of running tests in
+parallel on multiple cores.
+
+The cache is stored using the `pytest-cache` plugin, which means that it
+survives across sessions.
+
+If you want to temporarily disable the cache, the easiest thing is to use
+`--no-fake-server`, which bypasses it completely.
+
+If you want to clear the cache, you can use the special option
+`--clear-http-cache`:
+
+```
+$ pytest --clear-http-cache
+...
+-------------------- SmartRouter HTTP cache --------------------
+Requests found in the cache:
+     https://raw.githubusercontent.com/pyscript/pyscript/main/README.md
+     https://cdn.jsdelivr.net/pyodide/v0.21.3/full/repodata.json
+     https://cdn.jsdelivr.net/pyodide/v0.21.3/full/pyodide.asm.js
+     https://cdn.jsdelivr.net/pyodide/v0.21.3/full/micropip-0.1-py3-none-any.whl
+     https://cdn.jsdelivr.net/pyodide/v0.21.3/full/pyodide.asm.data
+     https://cdn.jsdelivr.net/pyodide/v0.21.3/full/pyodide.js
+     https://cdn.jsdelivr.net/pyodide/v0.21.3/full/pyodide.asm.wasm
+     https://cdn.jsdelivr.net/pyodide/v0.21.3/full/pyodide_py.tar
+     https://cdn.jsdelivr.net/pyodide/v0.21.3/full/pyparsing-3.0.9-py3-none-any.whl
+     https://cdn.jsdelivr.net/pyodide/v0.21.3/full/distutils.tar
+     https://cdn.jsdelivr.net/pyodide/v0.21.3/full/packaging-21.3-py3-none-any.whl
+Cache cleared
+```
+
+**NOTE**: this works only if you are inside `tests/integration`, or if you
+explicitly specify `tests/integration` from the command line. This is due to
+how `pytest` decides to search for and load the various `conftest.py`.
--- a/pyscriptjs/tests/integration/conftest.py
+++ b/pyscriptjs/tests/integration/conftest.py
@@ -1,3 +1,4 @@
+import shutil
 import threading
 from http.server import HTTPServer as SuperHTTPServer
 from http.server import SimpleHTTPRequestHandler
@@ -7,6 +8,44 @@ import pytest
 from .support import Logger


+def pytest_cmdline_main(config):
+    """
+    If we pass --clear-http-cache, we don't enter the main pytest logic, but
+    use our custom main instead
+    """
+
+    def mymain(config, session):
+        print()
+        print("-" * 20, "SmartRouter HTTP cache", "-" * 20)
+        # unfortunately pytest-cache doesn't offer a public API to selectively
+        # clear the cache, so we need to peek its internal. The good news is
+        # that pytest-cache is very old, stable and robust, so it's likely
+        # that this won't break anytime soon.
+        cache = config.cache
+        base = cache._cachedir.joinpath(cache._CACHE_PREFIX_VALUES, "pyscript")
+        if not base.exists():
+            print("No cache found, nothing to do")
+            return 0
+        #
+        print("Requests found in the cache:")
+        for f in base.rglob("*"):
+            if f.is_file():
+                # requests are saved in dirs named pyscript/http:/foo/bar, let's turn
+                # them into a proper url
+                url = str(f.relative_to(base))
+                url = url.replace(":/", "://")
+                print("    ", url)
+        shutil.rmtree(base)
+        print("Cache cleared")
+        return 0
+
+    if config.option.clear_http_cache:
+        from _pytest.main import wrap_session
+
+        return wrap_session(config, mymain)
+    return None
+
+
 def pytest_configure(config):
    """
    THIS IS A WORKAROUND FOR A pytest QUIRK!
@@ -61,6 +100,11 @@ def pytest_addoption(parser):
        action="store_true",
        help="Automatically open a devtools panel. Implies --headed and --no-fake-server",
    )
+    parser.addoption(
+        "--clear-http-cache",
+        action="store_true",
+        help="Clear the cache of HTTP requests for SmartRouter",
+    )


@pytest.fixture(scope="session")
--- a/pyscriptjs/tests/integration/support.py
+++ b/pyscriptjs/tests/integration/support.py
@@ -1,3 +1,4 @@
+import dataclasses
 import pdb
 import re
 import sys
@@ -75,11 +76,15 @@ class PyScriptTest:
            # use a real HTTP server. Note that as soon as we request the
            # fixture, the server automatically starts in its own thread.
            self.http_server = request.getfixturevalue("http_server")
+            self.router = None
        else:
            # use the internal playwright routing
            self.http_server = "http://fake_server"
            self.router = SmartRouter(
-                "fake_server", logger=logger, usepdb=request.config.option.usepdb
+                "fake_server",
+                cache=request.config.cache,
+                logger=logger,
+                usepdb=request.config.option.usepdb,
            )
            self.router.install(page)
        #
@@ -520,19 +525,23 @@ class SmartRouter:
        headers: dict
        body: str

-    # NOTE: this is a class attribute, which means that the cache is
-    # automatically shared between all instances of Fake_Server (and thus all
-    # tests of the pytest session)
-    _cache = {}
+        def asdict(self):
+            return dataclasses.asdict(self)

-    def __init__(self, fake_server, *, logger, usepdb=False):
+        @classmethod
+        def fromdict(cls, d):
+            return cls(**d)
+
+    def __init__(self, fake_server, *, cache, logger, usepdb=False):
        """
        fake_server: the domain name of the fake server
        """
        self.fake_server = fake_server
+        self.cache = cache  # this is pytest-cache, it survives across sessions
        self.logger = logger
        self.usepdb = usepdb
        self.page = None
+        self.requests = []  # (status, kind, url)

    def install(self, page):
        """
@@ -569,6 +578,7 @@ class SmartRouter:
            route.abort()

    def log_request(self, status, kind, url):
+        self.requests.append((status, kind, url))
        color = "blue" if status == 200 else "red"
        self.logger.log("request", f"{status} - {kind} - {url}", color=color)

@@ -587,17 +597,38 @@ class SmartRouter:
            return

        # network requests might be cached
-        if full_url in self._cache:
+        resp = self.fetch_from_cache(full_url)
+        if resp is not None:
            kind = "CACHED"
-            resp = self._cache[full_url]
        else:
            kind = "NETWORK"
            resp = self.fetch_from_network(route.request)
-            self._cache[full_url] = resp
+            self.save_resp_to_cache(full_url, resp)

        self.log_request(resp.status, kind, full_url)
        route.fulfill(status=resp.status, headers=resp.headers, body=resp.body)

+    def clear_cache(self, url):
+        key = "pyscript/" + url
+        self.cache.set(key, None)
+
+    def save_resp_to_cache(self, url, resp):
+        key = "pyscript/" + url
+        data = resp.asdict()
+        # cache.set encodes it as JSON, and "bytes" are not supported: let's
+        # encode them as latin-1
+        data["body"] = data["body"].decode("latin-1")
+        self.cache.set(key, data)
+
+    def fetch_from_cache(self, url):
+        key = "pyscript/" + url
+        data = self.cache.get(key, None)
+        if data is None:
+            return None
+        # see the corresponding comment in save_resp_to_cache
+        data["body"] = data["body"].encode("latin-1")
+        return self.CachedResponse(**data)
+
    def fetch_from_network(self, request):
        # sometimes the network is flaky and if the first request doesn't
        # work, a subsequent one works. Instead of giving up immediately,
--- a/pyscriptjs/tests/integration/test_00_support.py
+++ b/pyscriptjs/tests/integration/test_00_support.py
@@ -355,3 +355,37 @@ class TestSupport(PyScriptTest):
        assert divs.count() == 3
        texts = [el.inner_text() for el in self.iter_locator(divs)]
        assert texts == ["foo", "bar", "baz"]
+
+    def test_smartrouter_cache(self):
+        if self.router is None:
+            pytest.skip("Cannot test SmartRouter with --dev")
+
+        # this is not an image but who cares, I just want the browser to make
+        # an HTTP request
+        URL = "https://raw.githubusercontent.com/pyscript/pyscript/main/README.md"
+        doc = f"""
+        <html>
+          <body>
+              <img src="{URL}">
+          </body>
+        </html>
+        """
+        self.writefile("mytest.html", doc)
+        #
+        self.router.clear_cache(URL)
+        self.goto("mytest.html")
+        assert self.router.requests == [
+            (200, "fake_server", "http://fake_server/mytest.html"),
+            (200, "NETWORK", URL),
+        ]
+        #
+        # let's visit the page again, now it should be cached
+        self.goto("mytest.html")
+        assert self.router.requests == [
+            # 1st visit
+            (200, "fake_server", "http://fake_server/mytest.html"),
+            (200, "NETWORK", URL),
+            # 2nd visit
+            (200, "fake_server", "http://fake_server/mytest.html"),
+            (200, "CACHED", URL),
+        ]