What does this writing signal mean?

Scaleway published How to get started in AI without excessive cost, or emissions! - MindMatch guest post. This talking signal gives public context for research themes, product direction, policy, or launch framing. High-signal details: How to get started in AI without excessive cost, or emissions! - MindMatch guest post Build • Zofia Smoleń • 26/02/24 • 7 min read This a guest post by Zofia Smoleń,.... onlylabs links this event to 1 captured evidence page and 6 related writing signals.

Scaleway Writing: How to get started in AI without excessive cost, or emissions! - MindMatch guest post

Captured source

source ↗

scaleway.com/scaleway.com/en/blog

How to get started in AI without excessive cost, or emissions! - MindMatch guest post

Source ↗

published Feb 26, 2024seen 5dcaptured 3dhttp 200method plain

How to get started in AI without excessive cost, or emissions! - MindMatch guest post Build • Zofia Smoleń • 26/02/24 • 7 min read

This a guest post by Zofia Smoleń, Founder of Polish startup MindMatch , a member of Scaleway's Startup Program 🚀

One of the greatest developments of recent years was making computers speak our language. Scientists have been working on language models (which are basically models predicting next sequence of letters) for some time already, but only recently they came up with models that actually work - Large Language Models (LLMs). The biggest issue with them is that they are… Large.

LLMs have billions of parameters. In order to run them, you have to own quite a lot of computer power and use a significant amount of energy. For instance, OpenAI spends $700 000 daily on ChatGPT, and their model is highly optimized. For the rest of us, this kind of spending is neither good for your wallet, nor for the climate.

So in order to limit your spending and carbon footprint, you cannot just use whatever OpenAI or even Hugging Face provides. You have to dedicate some time and thought to come up with more frugal methods of getting the job done. That is exactly what [Scaleway Startup Program member] MindMatch has been doing lately.

MindMatch is providing a place where Polish patients can seek mental help from specialists. Using an open-source LLM from Hugging Face, MindMatch recognizes their patients’ precise needs based on a description of their feelings. With that knowledge, MindMatch can find the right therapy for their patients. It is a Polish-only website, but you can type in English (or any other language) and the chatbot ( here ) will understand you and give you its recommendation. In this article, we wrap their thoughts on dealing with speed and memory problems in production.

1. Define your needs

What do you need to do exactly? Do you need to reply to messages in a human-like manner? Or do you just need to classify your text? Is it only topic extraction?

Read your bibliography. Check how people approached your task. Obviously, start from the latest papers, because in AI (and especially Natural Language Processing), all the work becomes obsolete and outdated very quickly. But… taking a quick look at what people did before Transformers (the state-of-the-art model architecture behind ChatGPT) can do no harm. Moreover, you may find solutions that resolve your task almost as well as any modern model would (if your task is comparatively easy) and are simpler, faster and lighter.

You could start by simply looking at articles on Towards data science, but we also encourage you to browse through Google Scholar. A lot of work in data science is documented only in research papers so it actually makes sense to read them (as opposed to papers in social science).

Why does this matter? You don’t need a costly ChatGPT-like solution just to tell you whether your patient is talking about depression or anxiety. Defining your needs and scouring the internet in search of all solutions applied so far might give you a better view on your options, and help select those that make sense in terms of performance and model size.

2. Set up your directory so that you can easily switch between different models and architectures

This is probably the most obvious step for all developers, but make sure that you store all the models, classes and functions (and obviously constants - for example labels that you want to classify) in a way that allows you to quickly iterate, without needing to dig deep into code. This will make it easier for you, but also for all non-technical people that will want to understand and work on the model.

What worked well for MindMatch was even storing all the dictionaries in an external database that was modifiable via Content Management Systems. One of those dictionaries was a list of classes used by the model. This way non-technical people were able to test the model. Obviously, to reduce the database costs, MindMatch had to make sure that they only pull those classes when necessary.

Also, the right documentation will make it easier for you to use MLOps tools such as Mlflow. Even if it is just a prototype yet, it is better for you to prepare for the bright future of your product and further iterations.

There is a lot of information and guidance about how to set the directory so that it is neat and tidy. Browse Medium and other portals until you find enough inspiration for your purpose.

3. Choose the right deployment model

Now you’ve defined your needs, it’s time to choose the right solution. Since you want to use LLMs, you will most likely not even think about training your own model from scratch (unless you are a multi-billion company or a unicorn startup with high aspirations). So your options are limited to pre-trained models.

For the pre-trained models, there are basically two options. You can either call them through an API and get results generated on an external computer instance (what OpenAI offers), or you can install the model on your computer and run it there as well (that is what Hugging Face offers, for example).

The first option is usually more expensive, but that makes sense - you are using the computer power of another company, and it should come with a price. This way, you don’t have to worry about scalability. Usually, proprietary models like OpenAI’s work like that, so on top of that you also pay a fee for just using the model. But some companies producing open source models, like Mistral, also provide APIs.

The second option (installing the model on your computer) comes only with open source models. So you don’t pay for the model itself, but you have to run it on your computer. This option is often chosen by companies who don’t want to be dependent on proprietary models and prefer to have more control over their solution. It comes with a cost: that of storage and computing power. It is pretty rare for organizations to own physical instances with memory sufficient for running LLM models, so most companies (like MindMatch) choose to use cloud services for that purpose.

The choice between proprietary and open-source models depends on various factors, including the specific needs of the project, budget constraints, desired level of control and customization, and the importance of transparency and community support. In many cases it also depends on the level of domain knowledge within the organization. Proprietary models…

Excerpt shown — open the source for the full document.