News

DAG Focus

Agriculture x Data Science

We need to create efficient and reliable processes to produce enough food, with good enough quality and minimal carbon emissions. Of course, to meet those constrains a shift in our way of consuming is needed. This article elaborates on the existing solutions we have at our disposal.

DAG Review

DAG News

BloombergGPT is a newly developed language model that has undergone specific training on a broad range of financial data.📈 The primary goal of this model is to support various natural language processing (NLP) tasks within the financial sector, including named entity recognition, sentiment analysis, news classification, and question answering. The model is expected to enhance existing financial NLP tasks and create new opportunities for utilizing the abundant data on the Bloomberg Terminal to serve the firm's clients while unleashing the full potential of AI in the financial domain.

👩‍💻 Interested in pursuing a tech career in Europe? Check out @talent_io European Salary Report for valuable insights into average salaries and salary statistics for Europe's biggest cities (excluding Switzerland). 🌍

This report covers topics such as salary distributions, job market trends, and compensation packages available in different cities. 📈 

GPT-4 is out!


You may have heard that OpenAI has released GPT-4, the latest addition to the GPT series of language models. GPT-4 has two versions available - one with a context window of 8192 tokens and another with a context window of 32768 tokens. These versions are a significant improvement over GPT-3.5 and GPT-3, which had context windows of only 4096 and 2049 tokens, respectively. This means that GPT-4 can process longer texts and produce more coherent outputs.


But GPT-4 isn't just limited to text processing. It is a multimodal model, capable of processing both pictures and texts. This makes it a versatile tool for a wide range of applications. In fact, GPT-4 has already been integrated into several products, including educational platforms like Khan Academy‘s tutoring chatbot called Khanmigo and Duolingo’s features like "Roleplay" and "Explain My Answer".


Fortunately, GPT-4 is not capable of everything. According to OpenAI, it can solve 3 of the Leetcode hard problems from 45, but it is a significant improvement in comparison to GPT3.5, which was only capable of solving 0. So, heads up, you can still have a career in computer science.

But GPT-4 isn't just about solving problems. It is also about making life easier for people. The Be My Eyes app assists visually impaired individuals in identifying objects and navigating their environment. Recently, it introduced a "Virtual Volunteer" feature that uses GPT-4's image recognition capabilities. This feature is currently in beta testing, and it is the first app to use GPT-4's image recognition.


In conclusion, GPT-4 is a major step forward in the field of AI. Its ability to process both text and images, solve complex problems, and make life easier for people is truly impressive. We can't wait to see what the future holds for this incredible technology.

Have you heard about the ongoing AI war? 🤖

Large language models (LLMs) are currently at the forefront of an AI war, and many different models are being developed and tested for natural language processing (NLP) applications. Some use cases of LLMs include chatbot development, code generation, classification, copywriting, translation, response generation, personalized recommendations, grammar correction, and many more... 👀

Here's a list of some popular LLMs, if you want to check them out:

- LaMDA, powered by Google's Bard AI, is known for its loose conversational style,

- GPT-3, an autoregressive language model that generates human-like text,

- Chinchilla AI, released by DeepMind in 2022, offers more accurate results than GPT-3 with fewer parameters,

- Meta AI Research's LLaMA for question answering and document summarization.

Overall, LLMs have revolutionized the field of NLP and are being used in various industries. As technology continues to evolve, we can expect LLMs to become even more advanced and useful in the years to come. 📈

Update: OpenAI has recently released GPT-4, and Microsoft has confirmed that the new Bing is powered by it! 

Meta's LLM Galactica got shut down! 🪦

Meta's large language model, Galactica, designed to assist scientists with scientific compositions, was taken down just three days after launch due to its tendency to generate inaccuracies, even racist and offensive content.

Furthermore, the scientific community criticized the tool, some even used the word "dangerous" because tends to generate grammatically correct text that feels real but in fact was erroneous or biased. This tool was producing pseudo-science based on statistical properties of science writing, to put it bluntly.

This puts emphasis on the common dilemma again can we give the responsibility to users to use it properly, or should the producers be cautious to prevent misuse? 

Meta AI has developed CICERO, the first AI to reach human-level proficiency in the popular strategy game Diplomacy. Diplomacy has been a long-standing challenge in AI due to its requirement for players to understand others' motivations, make complex plans, adjust strategies, and use natural language to reach agreements and form partnerships. 

This breakthrough combines two areas of AI research, strategic reasoning, and natural language processing, to create an agent that can effectively use language to negotiate and work with people. CICERO achieved a top 10% score and doubled the average score of human players on webDiplomacy.net. You can learn more about it from its open-sourced code and a related paper.

The COP27 is currently taking place in Sharm El-Sheikh, Egypt. How can AI help?

António Guterres, a previous secretary-general at the UN, spoke about an AI-backed tool for tracking greenhouse gas emissions. Climate TRACE combines satellite data and artificial intelligence to show the facility-level emissions of over 70,000 sites around the world. This will allow leaders to know the location and scope of emissions being released. 📍

Google also published a blog post explaining how they are investing in technologies that can help communities prepare for and respond to climate-related disasters and threats. 

They developed FloodHub to show where floods are happening and where they are forecasted to happen thanks to their AI models. Other projects expand to using ML on satellite data to map wildfires and identifying areas that need help after hurricanes. Their technology was deployed in partnership with GiveDirectly  to quickly allocate aid to those most affected. 🆘

You have likely been faced with the limits of CPUs when training ML models, processing large datasets and playing around with AI. ⌛️

IBM Research has unveiled plans to build a prototype chip specialized for artificial intelligence, named the Artificial Intelligence Unit (AIU).

CPUs are well suited for general-purpose software applications. However, this also comes at a disadvantage when it comes to training and running deep learning models, which require massively parallel AI operations. ➕

The AIU will be adapted to perform calculations typically present in deep learning models more efficiently. Another novelty is in how it streamlines AI workflows by sending data directly from one compute engine to the next. Ultimately, this chip will allow to deploy AI at industrial scale. ↗️ 

Marine litter is an unsolved and growing issue that will only be exacerbated by climate change and extreme weather events, as they will contribute to transporting waste into the sea.

Previously, the detection of floating plastic was done with remote sensing, by analyzing the spectral response in images and creating indices such as the Floating Debris Index (FDI) or the Plastic Index (PI). The use of such methods was limited by the resolution of the images and often spectra were mixed with a background water signal. 🌊

By implementing a U-Net, the researchers hoped to rely on spatial rather than spectral features and propose a scalable waste detection method. ♻️ Using hand-labeled images from the work done using spectral methods, they trained their model to classify pixels. They also include atmospheric noise and other floating objects in their data.

They have made their dataset open-source to encourage work, which you can find here. Original paper here.

Due to the increasing interest in the impact of sleep on health, Google announced the integration of enhanced sleep sensing in its Nest Hub. 😴 Going beyond sleep schedule and duration tracking, they want to provide deeper insights by analysing disturbance events. ⚠️

By training a model with transfer learning techniques, Sleep Sensing predicts sleep stages and disambiguates the source of sleep disturbances from radar and microphone signals. 🔊

Google updated their algorithms such as to separate the different sources of noise in the room. Snoring will correspond closely with inhalations and exhalations, while other noises might vary independently.  Google also addresses privacy concerns by using on-device audio processing, with no raw audio data sent to Google’s servers.

Finally, Sleep Sensing could also open up opportunities in detecting respiratory diseases, such as cystic fibrosis, a rare disease. By quantifying nighttime cough, treatment effectiveness and response could be monitored. 💊

I've stopped using box plots. Should you?

Controversial idea: box plots are not an intuitive visualization tool ❌ This is the statement that Nick Desbarats shares in his article for the Data Visualization Society.

Nick believes in a more human-friendly way to display distributions, such as for example strip plots or jitter plots.

Why? First, he stated that box plots are straight up not visually intuitive 🤯 If it requires users to have explanations or certain a priori statistical knowledge, then it might not be a good tool to communicate results with a general public, or decision-makers. 

Humans interpret bigger lengths as bigger quantities, but longer whiskers in a box plot have the exact opposite meaning. Whatever the length of a whisker, it contains the same amount of data, which can be counter-intuitive.

Finally, he reminds us that they conceal information. Two similar box plots could have very different underlying distributions.

So, will you be exploring other chart types from now on? 📉

How good is a landscape? At first glance, an answer to such a question seems to only be a matter of taste. 

However, researchers from the ECEO lab at EPFL and the Wageningen University in the Netherlands developed deep learning-based models of landscape scenicness. 🌄

Using Flickr images of Great Britain landscapes, they used two neural networks to respectively predict scene categories (e.g., rainforest, lagoon, gardens…) and scenicness (i.e., aesthetic quality). 

These models yielded comparable accuracy to traditional models, which only tend to rely on environmental indicators and ignore public interactions with the landscape. Integrating this component should help build more comprehensive ecosystem preservation policies. ✍️

The original paper, published in Nature, can be found here. For people in a hurry 🏃, a summary is available here.