Query Execution Model ##################### Introduction ============ The first datasource which was used with re:dash was Redshift. Because we had billions of records in Redshift, and some queries were costly to re-run, from the get go there was the idea of caching query results in re:dash. This was to relieve stress from the Redshift cluster and also to improve user experience. How queries get executed and cached in re:dash? =============================================== Server ------ To make sure each query is executed only once at any giving time, we translate the query to a ``query hash``, using the following code: .. code:: python COMMENTS_REGEX = re.compile("/\*.*?\*/") def gen_query_hash(sql): sql = COMMENTS_REGEX.sub("", sql) sql = "".join(sql.split()).lower() return hashlib.md5(sql.encode('utf-8')).hexdigest() When query execution is done, the result gets stored to ``query_results`` table. Also we check for all queries in the ``queries`` table that have the same query hash and update their reference to the query result we just saved (`code `__). Client ------ The client (UI) will execute queries in two scenarios: 1. (automatically) When opening a query page of a query that doesn't have a result yet. 2. (manually) When the user clicks on "Execute". In each case the client does a POST request to ``/api/query_results`` with the following parameters: ``query`` (the query text), ``data_source_id`` (data source to execute the query with) and ``ttl``. When loading a cached result, ``ttl`` will be the one set to the query (if it was set). This is a relic from previous versions, and I'm not sure if it's really used anymore, as usually we will fetch query result using its id. When loading a non cached result, ``ttl`` will be 0 which will "force" the server to execute the query. As a response to ``/api/query_results`` the server will send either the query results (in case of a cached query) or job id of the currently executing query. When job id received the client will start polling on this id, until a query result received (this is encapsulated in ``Query`` and ``QueryResult`` services). Ideas on how to implement query parameters ========================================== Client side only implementation ------------------------------- (This was actually implemented in. See pull request `#363 `__ for details.) The basic idea of how to implement parametized queries is to treat the query as a template and merge it with parameters taken from query string or UI (or both). When the caching facility isn't required (with queries that return in a reasonable time frame) the implementation can be completly client side and the backend can be "blind" to the parameters - it just receives the final query to execute and returns result. As one improvement over this, we can let the UI/user specify the TTL value when making the request to ``/api/query_results``, in which case caching will be availble too, while not having to make the server aware of the parameters. Hybrid ------ Another option, will be to store the list of possible parameters for a query, with their default/optional values. In such case, the server can prefetch all the options and cache them to provide faster results to the client.