Threads
A thread is a back-and-forth exchange between your generative AI pipeline and an external actor (typically an end user). A prime example is the conversation that occurs between ChatGPT and a user. Typically, a thread is composed of multiple AI generations.
Within our data model, we represent threads as a singly-linked list of pipeline runs. You can specify that a run belongs to a thread by specifying the prior pipeline run while generating the next run.
Example
Let's say you have a chat feature for your end user. The feature initially performs a chat completion request with our SDK.
// Omit initialization
const openai = new OpenAI({
apiKey: process.env.OPENAI_KEY,
});
// This would normally be populated from an HTTP request
const endUserQuestion = "Hello OpenAI! How are you doing today?";
const chatCompletionResponse = await openai.chat.completions.create({
messages: [
{
role: "user",
content: endUserQuestion
},
],
model: "gpt-3.5-turbo",
pipelineSlug: "introduction",
});
const runId = chatCompletionResponse.pipelineRunId;
// Omit logic to store this run in your desired persistent storage
// Omit logic to return the chat response to the user
# Omit SDK initialization
gentrace.init(
api_key=os.getenv("GENTRACE_API_KEY"),
)
gentrace.configure_openai()
openai.api_key = os.getenv("OPENAI_KEY")
# This would normally be populated from an HTTP request
endUserQuestion = "Hello OpenAI! How are you doing today?"
result = openai.ChatCompletion.create(
pipeline_slug="introduction",
messages=[
{
"role": "user",
"content": endUserQuestion
}
],
model="gpt-3.5-turbo"
)
runId = result["pipelineRunId"]
# Omit logic to store this run in your desired persistent storage
# Omit logic to return the chat response to the user
Once the request is performed and you receive a Gentrace run ID, store the initial pipeline run ID in persistent storage (e.g. database).
At a later point, your end user decides to respond to the supplied AI generation with a follow up question.
// Omit initialization
const openai = new OpenAI({
apiKey: process.env.OPENAI_KEY,
});
// This would normally be populated from an HTTP request
const endUserQuestion = "Great to hear! What's the capital of Maine?";
const previousRunId = ...; // TODO: pull prior run ID from DB
const priorMessages = ...; // TODO: pull prior messages from DB
const chatCompletionResponse = await openai.chat.completions.create({
messages: [
...priorMessages,
{
role: "user",
content: endUserQuestion
},
],
model: "gpt-3.5-turbo",
pipelineSlug: "introduction",
gentrace: {
// By specifying the previous run ID, we associate this generation to a thread in Gentrace
previousRunId,
}
});
const runId = chatCompletionResponse.pipelineRunId;
// Omit logic to store this next run in your desired persistent storage
// Omit logic to return the chat response to the user
# Omit SDK initialization
gentrace.init(
api_key=os.getenv("GENTRACE_API_KEY"),
)
gentrace.configure_openai()
openai.api_key = os.getenv("OPENAI_KEY")
previousRunId = ... # TODO: pull prior run ID from DB
priorMessages = ... # TODO: pull prior messages array from DB
# This would normally be populated from an HTTP request
endUserQuestion = "Great to hear! What's the capital of Maine?"
result = openai.ChatCompletion.create(
pipeline_slug="introduction",
messages=[
*priorMessages,
{
"role": "user",
"content": endUserQuestion
}
],
model="gpt-3.5-turbo",
gentrace={
# By specifying the previous run ID, we associate this generation to a thread in Gentrace
"previousRunId": previousRunId
}
)
runId = result["pipelineRunId"]
# Omit logic to store this run in your desired persistent storage
# Omit logic to return the chat response to the user
Once the generation completes, Gentrace will associate that generation with the previously-specified run.
UI
You can view threads in the Observe → Runs view. Runs that belong to the same thread are rolled up into a single row.

While in the detailed view for the thread, navigate through the individual runs with the left/right arrow keys.

You can also navigate through individual runs in the timeline view.
Limitations
- This feature is available for our observability features only.
- Threading supports only singly-linked runs. We currently do not support branching where multiple generations reference the same run. If two runs are submitted with the same previous run ID (
previousRunId
), one run submission will be rejected with a 400 status code.
Future work
- Thread evaluation
- Visual thread comparison
- Aggregate statistics (e.g. average run latency)
- Thread metadata
If you're interested in shaping this future work, please reach out over email.
Updated about 2 months ago