Paul Simmering
About
Blog
Projects
Talks
Blog
Categories
All
(26)
Advice
(4)
Cloud
(2)
Data Engineering
(1)
Data Visualization
(1)
Economics
(2)
Entrepreneurship
(1)
Machine Learning
(15)
Marketing
(1)
Productivity
(4)
Python
(9)
R
(2)
Tidy Tuesday
(1)
From 5-7-5 to Thousand Lines: The Case for Longer Prompts
Machine Learning
Prompts are the key to guide LLMs for any task, from a chatbot to a text classifier. Longer prompts are usually better than shorter ones, as I’ll argue below. There is a…
Sep 1, 2024
Paul Simmering
OpenAI’s structured output vs. instructor and outlines
Machine Learning
Python
On August 6 OpenAI released structured outputs in their API. Is structured outputs a replacement for instructor, outlines and other libraries that provide structured outputs…
Aug 10, 2024
Paul Simmering
Levels of Abstraction in the LLM Stack
Machine Learning
Training and serving LLMs requires a tall software stack. You can engage with this stack at different levels of abstraction, from low-level frameworks like CUDA to…
Aug 8, 2024
Paul Simmering
Less stress, more focus: How to handle waiting times in development
Productivity
It’s unfortunate, but there are many waiting times in data science. Dealing with them well can make work more productive and enjoyable. Common waiting times include:
Jul 28, 2024
Paul Simmering
Text Tournament: Rank Marketing Copy with LLMs
Machine Learning
Python
Marketing
The launch of the review analysis project has me working on various marketing tasks. Naturally, I built a tool to let LLMs help with the creative process. It’s called Text…
Jul 23, 2024
Paul Simmering
Let Research Settle Before Consuming It
Advice
Machine Learning
The pace of publishing in machine learning is extremely high. There were 242,290 AI publications in 2022. That’s 663 per day, or one every two minutes. Based on comments on…
Jul 20, 2024
Paul Simmering
The World is Large and Very Detailed
Economics
Entrepreneurship
It’s easy to underestimate how vast and heterogeneous the world is. For entrepreneurs and developers this has two implications:
Jul 13, 2024
Paul Simmering
Rich Personal Wiki in Quarto
Productivity
Machine Learning
Machine learning is a deep and constantly evolving field. In an applied project, the details of models are typically compressed into a few lines of a configuration file.…
Jul 7, 2024
Paul Simmering
Fast and good
Productivity
The adage goes: fast, good, cheap. Pick two. As a developer, you probably don’t want to be cheap labor, so I suggest that you strive for fast and good. Not just good, and…
Jun 22, 2024
Paul Simmering
The best library for structured LLM output
Machine Learning
Python
By default, Large Language Models (LLMs) output free-form text. But many use cases such as text classification, named entity recognition, relation extraction and information…
May 11, 2024
Paul Simmering
Evaluating an LLM for your use case
Machine Learning
In the last two months we’ve seen releases of flagship LLMs like Llama 3, Mixtral 8x22B, and Claude 3. The title of Mistral’s announcement summarizes the dynamic well:
Cheape…
Apr 28, 2024
Paul Simmering
How to get gold standard data for NLP
Machine Learning
With the attention on new LLM releases, it’s easy to forget that correctly labeled examples are still a critical factor for accuracy in most NLP tasks. I think they’re the…
Mar 10, 2024
Paul Simmering
LLM Price Comparison
Machine Learning
Cloud
Economics
This article is about prices as of January 11, 2024. For current prices and more comprehensive analysis, check artificialanalysis.ai (not affiliated with me).
Jan 11, 2024
Paul Simmering
The Grug Brained Data Scientist
Advice
The Grug Brained Developer is a funny essay on advice for software developers. The lessons resonated with me. This is my own version, geared towards data professionals.
Dec 9, 2023
Paul Simmering
NLP escalation ladder: Use the simplest NLP model that does the job
Machine Learning
Advice
With all the hype and breathtaking demos, it’s tempting to see LLMs as the universal tool for every language problem. And yes, GPT-4 in particular will achieve decent to…
Nov 12, 2023
Paul Simmering
Large language models for aspect-based sentiment analysis
Machine Learning
A finetuned GPT-3.5 Turbo model achieves state-of-the-art performance in aspect-based sentiment analysis (ABSA). Zero-shot and few-shot settings with GPT-4 and GPT-3.5 reach…
Nov 1, 2023
Paul Simmering
One-stop NLP: Multi-task prompts for LLMs
Machine Learning
Python
In NLP, we often want to extract multiple pieces of information from a text. Each extraction task is typically done by one model. For example, we might want to classify the…
Oct 29, 2023
Paul Simmering
Dataset Size vs. Label Correctness: What is more important for training a model?
Machine Learning
Python
Supervised models are trained on labeled data. The more data, the better the model. But what if the labels are wrong? How much does the quality of the labels matter compared…
Oct 28, 2023
Paul Simmering
Future Directions for Large Language Models
Machine Learning
Large language models (LLMs) have taken the world by storm in the last year. It’s not even been one year since ChatGPT was released, and we have seen countless applications…
Oct 21, 2023
Paul Simmering
A Critical Evaluation of Github Copilot and GPT-4 in a Data Science Workflow
Productivity
AI assistants like Github Copilot and ChatGPT promise breakthrough productivity improvements for developers. In this article, I’ll explore how these tools can be used in a…
Apr 10, 2023
Paul Simmering
Twitter API data collector with Modal
Python
Cloud
Data Engineering
In this article, I’ll show how to build a Twitter data collector in just 100 lines of code. Twitter data has many applications, from social science research to marketing…
Jan 26, 2023
Paul Simmering
Investing in data science skills for the long run
Advice
Data science is a field that is constantly evolving and requires a lot of practice to master. Picking the right skills to focus on is critical for career development.
Jan 9, 2023
Paul Simmering
Tidy Tuesday: analyzing yarns with polars
Python
Tidy Tuesday
In this article, I’m taking the Python data frame library polars for a spin. Polars is a super fast alternative to pandas, implemented in Rust. It also has a leaner…
Oct 22, 2022
Paul Simmering
FANGMANT: Tech stock analysis with pandas
Python
The acronym FANGMANT stands for Facebook, Apple, Netflix, Google, Microsoft, Amazon, Nvidia and Tesla. Large, highly profitable US tech companies that dominate their…
Dec 27, 2021
Paul Simmering
Data frame wars: Choosing a Python dataframe library as a dplyr user
R
Python
I’m a long time R user and lately I’ve seen more and more signals that it’s worth investing into Python. I use it for NLP with spaCy and to build functions on AWS Lambda.…
Dec 20, 2021
Paul Simmering
Exploring echarts4r
R
Data Visualization
As web-oriented presentation in R Markdown and Shiny becomes more and more popular, there is increasing demand for interactive graphics with R. Whereas ggplot2 and its vast…
Feb 8, 2020
Paul Simmering
No matching items