Chantal, Aurélien Pierre's AI assistant

Aurélien Pierre's assistant, aggregating manually-curated resources on photography, color science, cameras, film stocks, Ansel or Darktable, in French and in English. (Sources)

Your request has been added to the queue and is processing. Requests are allowed 60 s to complete and will be stopped after this time whether or not they succeeded.

Search query

You can use keywords in French, or in English, or even a mix of both. It is not a chatbot so you don't need to form full sentences. The more keywords you add and the more accurate context you create. Keywords must be separated with spaces.

The AI strips words from their suffix and keeps only their stem, to allow generalizing French and English into a made-up language: Dumbrish. Don't be surprised if the suggested keywords and the topics look weird.

Search by

You can combine 2 methods of searching:

  1. fuzzy keywords: the most general and beginner-friendly, best used if you don't know exactly what to look for. It relies on an artificial intelligence aware of context, synonyms and words commonly used together (more details).
  2. patterns & exact keywords: optionaly filter the results from the AI with a grep-like search supporting regular expressions (Python-flavoured) and specific keywords.

Options

[1] You can optionally include developer resources regarding Darktable, Ansel and Rawspeed, from Github: issues (bug reports, feature requests), pull requests, commit messages and wikis. For non-developers, this will be useful to look for known issues and caveats.

[2] You can optionally include user forums (discuss.pixls.us, forums.darktable.fr). I have answered many questions on both between 2016 and 2022. They tend to be polluted with unsupported claims and wrong statements coming from color-blind IT guys with too much freetime, so you need to exert caution with the information you find there.

Features

You can click on associated keywords to add them directly to the search.

PDF documents are indexed by section if they contain an outline: in that case, links to PDF documents already point to the relevant page (compatible with Chrome/Chromium PDF reader).

This web application has taken 10 weeks of work (on top of previous work), around 420 man-hours and dozens of hours of burning CPU cycles, you are welcome to donate.

More information

This is Chantal v1.0, it indexes 35682 pages and knows 99183 words.

Why Chantal ?

Chantal solves the problem of information scattering and lack of identified, authoritative resources, in the context of open-source software projects relying on real-world phenomenons and on the scientific theories describing them. This problem triggers recurring questions from users, coming back in waves on forums and in my emails, and losing many man-hours repeating the same info again and again, as well as server storage and bandwidth, because new posts duplicate content already posted, but never found again.

As image processing is a craft at the crossroad between art, technology and science, it relies on theories and concepts that have precise names, defined by scientific authorities who should be given more credit than random search-engine-optimized blogs. Understanding background theory makes using the image processing tooling a lot less frustrating and a lot more predictable. But users who lack knowledge of the jargon will have no entry point in information systems using keyword-based retrieval methods, which is what blogs, websites and forums use internally for performance.

On the other hand, photography (thus Google index) is polluted by lots of blogs and video channels of people who think they understood color theory because they have been using HSL color wheels in some software for some amount of time. This contributes to spreading false knowledge amongst users and defines a self-feeding loop of time wasting. Things become actively damaging when false knowledge spreads to software developers who use it to support their application design, which has been seen times and times again, especially in the open-source world.

What makes Chantal AI "intelligent" ?

  1. The underlying language model knows synonyms and related words: searching for brightness will yield results that are not too far from searching for lightness. This is useful when diving into technical fields without knowing the exact technical slang and to expand your research to similar topics past the anecdotal keywords of your query.
  2. The language model knows physical units and computer science: searching for "/usr/lib64/file.so missing" will yield results for _PATH_ _BINARYFILE_ missing, searching for "50mm f/1.8" will yield results for _DISTANCE_ _APERTURE_. This helps generalizing queries and increases the probability of finding relevant pages, even where the exact keywords don't match. Of course, when you know exactly what you are looking for, and don't know where to find it, this smart-ass second-guessing can really get in your way, and this is why you also have an additional grep-like search.
  3. The language model uses context to infer topics, so the more keywords you give it, the more likely it is to indentify the right topic of your query.

What makes Chantal AI "eco-friendly" ?

The underlying AI language model is a 400×10000 matrix that weighs about 150 MB and doesn't need more because it is specialized on image-processing topics. It uses an energy-efficient algorithm (Word2Vec, 2013) that turns a search into a matrix dot product. It is trained with around 300k documents containing around 82 millions words, for a total disk space of 800 MB (compressed) and a peak RAM use at training of 28 GB. Training needs around 6 h on an Intel Xeon laptop. At runtime, searching the whole index of 78k pages needs 3.2 GB of RAM and takes about 0.3 to 0.5 s on a desktop computer.

While this is not the yesteryear state-of-the-art in language processing, the accuracy vs. computational power ratio is very good and lets the search engine run on low-end hardware with reasonable runtimes. The current web service runs on a shared hosting environnement (CPanel/Linux/Apache) typically in 1 to 3 s, along with a multi-site WordPress installation and a forum CMS. No GPU is involved.

The current fashion in AI is to run very heavy algorithms (transformers) on GPU farms, and to provide general audience with nice web interfaces allowing them to abuse the service for their own amusement. For information retrieval tasks, these methods are typically 3-6 % more accurate than the one used here, but are at (very) least 500 times more computational-expensive at runtime, while needing much more input training data.

Users should be made aware of the carbon footprint of the cloud-based services they use. Technology is worthless if it solves a problem by creating another one, and increasing electricity consumption is not going to make the world any better.

What makes Chantal better than Google ?

Chantal indexes only manually-curated sources regarding color science and image processing: color theory authorities (CIE, ICC, ACES), researchers (Hunt, Fairchild, Kirk, Poynton), scientific journals (IPOL, Études photographiques), manufacturers (Kodak, Ilford, Nikon), software documentations (Darktable, Ansel, ArgyllCMS) and Github repositiories (for bugs and design decisions). This reduces noise in results and maximizes your probability of finding reliable information, especially since there will be no quiproquo on the meaning of words (technical jargon often reusing common language but with a different meaning – Google is easily confused there).

Chantal also allows you to grep your way through all the indexed document (including PDF), even with regex. Google still doesn't allow that in 2023. Granted, bad regex can put servers down, and we can afford it here because the size of the index is rather small.

How is Chantal made ?

Chantal is the server-side implementation and demo of Virtual Secretary, a Python framework developped by Aurélien Pierre to help cope with information overload coming from the sheer amount of emails, notifications and things to check that are the daily life at your typical office in the 2020's. You can use Virtual Secretary to develop your own bots and automated workflows involving emails, contacts and knowledge corpora (HTML and PDF).

Your fingerprint is f412129be42a00b0df22de0b04ad1db4a098db05d15411d4a49e76250ba9b50e

This cryptographic hash is your unique identifier and is used to threshold users trying to process large batches of queries. You are allowed no more than one request every 3 s.

Your fingerprint is all we store about you and will be deleted after the next 3 s following your last request.