Chantal, Aurélien Pierre's virtual assistant : search ""

Unified knowledge base on photography, color theory, image processing, cameras, film stocks and open-source imaging software, manually curated, indexed by AI.

Fuzzy keywords & topics

Restrict to language

Restrict to category

Restrict to content containing exactly…

Restrict to URLs containing exactly…

From date

Up to date

Include undated documents

AI ranking weight: 50%

Include Github developer info (Bugs reports, Pull requests, Wikis) ^[1]

Include user forums (information may be unreliable) ^[2]

This is Chantal v2.2. The AI language model knows 99790 words, in French and in English, understands synonyms and some translations. The index includes 413822 manually-curated pages (sources ). Details and advanced search options below.

Search query

You can use keywords in French, or in English, or even a mix of both. You don't need to form full sentences, but the more keywords you add and the more accurate context you create. Keywords must be separated with spaces.

The circled "+" icon left of the search entry allows to define additional content filters. In there, the AI ranking weight, defines how much influence the semantic AI search gets in the final page ranking.

0% disables AI entirely, so the search degrades into a simple keyword-based statistical ranking (BM25+ algorithm).
50% is an equal mix of semantic AI and keyword-based statistical ranking.
100% disables the keyword-based statistical ranking entirely, making it AI-only.

The semantic AI search behaves like a topic recognition feature based on the context created by your keywords. As you give it more weight, the global ranker behaves more like a recommendation algorithm based on content similar to your query, regardless of whether it contains your exact keywords, which is how queries are generalized to synonyms.

Search by

You can combine 3 methods of searching:

fuzzy keywords and topics: mandatory, treats keywords as fuzzy topics and will target their synonyms too. It relies on a context-aware artificial intelligence (more details) and uses meta-tokens.
restrict to content containing exactly…: optional, narrows down the results from the fuzzy search to page content containing the exact string. Regex mode can be enabled by starting requests with \r\ and uses Python-flavoured regex.
restrict to URLs containing exactly…: optional, narrows down the results from the fuzzy search to URLs containing the exact string. Regex mode can be enabled by starting requests with \r\ and uses Python-flavoured regex.

In Regex mode, characters should be escaped according to Python Regex syntax. Ex: github.com will restrict URLs to Github domain by string search, \r\github\.com\/.*\/.*\/issues will restrict URLs to Github issues for all account/repository sets by regex search.

Options

^[1] You can optionally include developer resources regarding Darktable, Ansel, Rawspeed, Lensfun and Exiv2 from Github: issues (bug reports, feature requests), pull requests and wikis. For non-developers, this will be useful to look for known issues and caveats.

^[2] You can optionally include user forums (discuss.pixls.us, forums.darktable.fr). I have answered many questions on both between 2016 and 2022. They tend to be polluted with unsupported claims and wrong statements coming from color-blind IT guys with too much freetime, so you need to exert caution with the information you find there.

Why Chantal ?

Chantal solves the problem of information scattering and lack of identified, authoritative resources, in the context of open-source software projects relying on real-world phenomenons and on the scientific theories describing them. This problem triggers recurring questions from users, coming back in waves on forums and in my emails, and losing many man-hours repeating the same info again and again, as well as server storage and bandwidth, because new posts duplicate content already posted, but never found again.

As image processing is a craft at the crossroad between art, technology and science, it relies on theories and concepts that have precise names, defined by scientific authorities who should be given more credit than random search-engine-optimized blogs. Understanding background theory makes using the image processing tooling a lot less frustrating and a lot more predictable. But users who lack knowledge of the jargon will have no entry point in information systems using keyword-based retrieval methods, which is what blogs, websites and forums use internally for performance.

On the other hand, photography (thus Google index) is polluted by lots of blogs and video channels of people who think they understood color theory because they have been using HSL color wheels in some software for some amount of time. This contributes to spreading false knowledge amongst users and defines a self-feeding loop of time wasting. Things become actively damaging when false knowledge spreads to software developers who use it to support their application design, which has been seen times and times again, especially in the open-source world.

What makes Chantal AI "intelligent" ?

The underlying language model knows synonyms and related words: searching for brightness will yield results that are not too far from searching for lightness. This is useful when diving into technical fields without knowing the exact technical slang and to expand your research to similar topics past the anecdotal keywords of your query.
The language model knows physical units and computer science: searching for "/usr/lib64/file.so missing" will yield results for _PATH_ _BINARYFILE_ missing, searching for "50mm f/1.8" will yield results for _DISTANCE_ _APERTURE_. This helps generalizing queries and increases the probability of finding relevant pages, even where the exact keywords don't match. Of course, when you know exactly what you are looking for, and don't know where to find it, this smart-ass second-guessing can really get in your way, and this is why you also have an additional grep-like search.
The language model uses context to infer topics, so the more keywords you give it, the more likely it is to indentify the right topic of your query.

What makes Chantal AI "eco-friendly" ?

The underlying AI language model weighs about 83 MiB on disk and doesn't need more because it is specialized on image-processing topics. It uses an energy-efficient algorithm (Word2Vec, 2013) that turns a search into a matrix dot product. Training against 300k document needs around 20 minutes on an 8-cores Intel Xeon CPU on laptop. Actually, preparing the training corpus takes the most time (2 hours) and crawling all the indexed websites takes 2 weeks. At runtime, searching the whole index of 123k pages needs around 80 MiB of RAM and takes about 160 ms on a desktop computer.

While this is not the yesteryear state-of-the-art in language processing, the accuracy vs. computational power ratio is very good and lets the search engine run on low-end hardware with reasonable runtimes. The current web service runs on a shared hosting environnement (CPanel/Linux/Apache) typically in 1 s, along with a multi-site WordPress installation and a forum CMS. No GPU is involved.

The current fashion in AI is to run very heavy algorithms (transformers) on GPU farms, and to provide general audience with nice web interfaces allowing them to abuse the service for their own amusement. For information retrieval tasks, these methods are typically 3-6 % more accurate than the one used here, but are at (very) least 500 times more computational-expensive at runtime, while needing much more input training data.

Users should be made aware of the carbon footprint of the cloud-based services they use. Technology is worthless if it solves a problem by creating another one, and increasing electricity consumption is not going to make the world any better.

What makes Chantal better than Google ?

Chantal indexes only manually-curated sources regarding color science and image processing: color theory authorities (CIE, ICC, ACES), researchers (Hunt, Fairchild, Kirk, Poynton), scientific journals (IPOL, Études photographiques), manufacturers (Kodak, Ilford, Nikon), software documentations (Darktable, Ansel, ArgyllCMS) and Github repositiories (for bugs and design decisions). This reduces noise in results and maximizes your probability of finding reliable information, especially since there will be no quiproquo on the meaning of words (technical jargon often reusing common language but with a different meaning – Google is easily confused there).

Chantal also allows you to grep your way through all the indexed document (including PDF), even with regex. Google still doesn't allow that in 2023. Granted, bad regex can put servers down, and we can afford it here because the size of the index is rather small.

Your fingerprint is ab71acd4fcbf91ac15ea2a2534d524dc16dc7d7e11abd47373ff41983cd67bc4

This cryptographic hash is your unique identifier and is used to threshold users trying to process large batches of queries. You are allowed no more than one request every 3 s.

Your fingerprint is all we store about you and will be deleted after the next 3 s following your last request.

Chantal, Aurélien Pierre's assistant