Navigation

Chantal, Aurélien Pierre's assistant

Aurélien Pierre's assistant

Unified knowledge base on photography, color theory, image processing, cameras, film stocks and open-source imaging software, manually curated, indexed by AI.
This is Chantal v1.1. The AI language model knows 99481 words, in French and in English, understands synonyms and some translations. The index includes 49216 manually-curated pages (sources ). Details and advanced search options below.

Your request has been added to the queue and is processing. Requests are allowed 60 s to complete and will be stopped after this time whether or not they succeeded.

This web application has taken 12 weeks of work (on top of previous work), around 490 man-hours and dozens of hours of burning CPU cycles, you are welcome to donate.
How is Chantal made ?
Chantal is the server-side implementation and demo of Virtual Secretary, a Python framework developped by Aurélien Pierre to help cope with information overload. (Technical report). You can use Virtual Secretary to develop your own bots and automated workflows involving emails, contacts and knowledge corpora (HTML and PDF).
Didn't find what you wanted ?
Add more keywords to create a better context, use meta-tokens or restrict your search using . In last resort, ask humans.

Search query

You can use keywords in French, or in English, or even a mix of both. It is not a chatbot so you don't need to form full sentences. The more keywords you add and the more accurate context you create. Keywords must be separated with spaces.

The AI strips words from their suffix and keeps only their stem, to allow generalizing French and English into a made-up language: Dumbrish. Don't be surprised if the suggested keywords and the topics look weird.

Search by

You can combine 3 methods of searching:

  1. fuzzy keywords: best used if you don't know exactly what to look for, treats keywords as topics and will target their synonyms too. It relies on a context-aware artificial intelligence (more details).
  2. exact keywords & patterns: optionaly narrow down the results from the AI with a grep-like search supporting regular expressions (Python-flavoured) and specific keywords.
  3. URLs patterns: optionaly filter URLs by pattern. By default, restrict results to URLs containing the full query as a subset. To enable regex mode, embed your regex pattern in r"..." and escape characters according to Python Regex syntax. Ex: github.com will restrict URLs to Github domain by string search, r"github\.com\/.*\/.*\/issues" will restrict URLs to Github issues for all account/repository sets by regex search.

Options

[1] You can optionally include developer resources regarding Darktable, Ansel, Rawspeed, Lensfun and Exiv2 from Github: issues (bug reports, feature requests), pull requests and wikis. For non-developers, this will be useful to look for known issues and caveats.

[2] You can optionally include user forums (discuss.pixls.us, forums.darktable.fr). I have answered many questions on both between 2016 and 2022. They tend to be polluted with unsupported claims and wrong statements coming from color-blind IT guys with too much freetime, so you need to exert caution with the information you find there.

Why Chantal ?

Chantal solves the problem of information scattering and lack of identified, authoritative resources, in the context of open-source software projects relying on real-world phenomenons and on the scientific theories describing them. This problem triggers recurring questions from users, coming back in waves on forums and in my emails, and losing many man-hours repeating the same info again and again, as well as server storage and bandwidth, because new posts duplicate content already posted, but never found again.

As image processing is a craft at the crossroad between art, technology and science, it relies on theories and concepts that have precise names, defined by scientific authorities who should be given more credit than random search-engine-optimized blogs. Understanding background theory makes using the image processing tooling a lot less frustrating and a lot more predictable. But users who lack knowledge of the jargon will have no entry point in information systems using keyword-based retrieval methods, which is what blogs, websites and forums use internally for performance.

On the other hand, photography (thus Google index) is polluted by lots of blogs and video channels of people who think they understood color theory because they have been using HSL color wheels in some software for some amount of time. This contributes to spreading false knowledge amongst users and defines a self-feeding loop of time wasting. Things become actively damaging when false knowledge spreads to software developers who use it to support their application design, which has been seen times and times again, especially in the open-source world.

What makes Chantal AI "intelligent" ?

  1. The underlying language model knows synonyms and related words: searching for brightness will yield results that are not too far from searching for lightness. This is useful when diving into technical fields without knowing the exact technical slang and to expand your research to similar topics past the anecdotal keywords of your query.
  2. The language model knows physical units and computer science: searching for "/usr/lib64/file.so missing" will yield results for _PATH_ _BINARYFILE_ missing, searching for "50mm f/1.8" will yield results for _DISTANCE_ _APERTURE_. This helps generalizing queries and increases the probability of finding relevant pages, even where the exact keywords don't match. Of course, when you know exactly what you are looking for, and don't know where to find it, this smart-ass second-guessing can really get in your way, and this is why you also have an additional grep-like search.
  3. The language model uses context to infer topics, so the more keywords you give it, the more likely it is to indentify the right topic of your query.

What makes Chantal AI "eco-friendly" ?

The underlying AI language model is a 512×10000 matrix that weighs about 195 MB and doesn't need more because it is specialized on image-processing topics. It uses an energy-efficient algorithm (Word2Vec, 2013) that turns a search into a matrix dot product. It is trained with around 450k documents containing around 134 millions words, for a total disk space of 560 MB (compressed). Training needs around 6 h on an Intel Xeon laptop. At runtime, searching the whole index of 78k pages needs 3.2 GB of RAM and takes about 0.3 to 0.5 s on a desktop computer.

While this is not the yesteryear state-of-the-art in language processing, the accuracy vs. computational power ratio is very good and lets the search engine run on low-end hardware with reasonable runtimes. The current web service runs on a shared hosting environnement (CPanel/Linux/Apache) typically in 1  s, along with a multi-site WordPress installation and a forum CMS. No GPU is involved.

The current fashion in AI is to run very heavy algorithms (transformers) on GPU farms, and to provide general audience with nice web interfaces allowing them to abuse the service for their own amusement. For information retrieval tasks, these methods are typically 3-6 % more accurate than the one used here, but are at (very) least 500 times more computational-expensive at runtime, while needing much more input training data.

Users should be made aware of the carbon footprint of the cloud-based services they use. Technology is worthless if it solves a problem by creating another one, and increasing electricity consumption is not going to make the world any better.

What makes Chantal better than Google ?

Chantal indexes only manually-curated sources regarding color science and image processing: color theory authorities (CIE, ICC, ACES), researchers (Hunt, Fairchild, Kirk, Poynton), scientific journals (IPOL, Études photographiques), manufacturers (Kodak, Ilford, Nikon), software documentations (Darktable, Ansel, ArgyllCMS) and Github repositiories (for bugs and design decisions). This reduces noise in results and maximizes your probability of finding reliable information, especially since there will be no quiproquo on the meaning of words (technical jargon often reusing common language but with a different meaning – Google is easily confused there).

Chantal also allows you to grep your way through all the indexed document (including PDF), even with regex. Google still doesn't allow that in 2023. Granted, bad regex can put servers down, and we can afford it here because the size of the index is rather small.

Your fingerprint is 420cb9a76a8908b86954baacd055c3787c8b6b69ed49256a535512bb96da6382

This cryptographic hash is your unique identifier and is used to threshold users trying to process large batches of queries. You are allowed no more than one request every 3 s.

Your fingerprint is all we store about you and will be deleted after the next 3 s following your last request.