Building Queries

Once you’ve connected your database, Infactory helps you create queries that answer specific questions about your data. These queries form the foundation of your AI-powered data experiences.

Understanding the Build Page

The Build page is your command center for creating and managing queries.

The Build page consists of these main components:

  • Query List: All your queries, both automatically generated and custom
  • Query Editor: Where you view and modify query code
  • Query Parameters: Configure slots (parameters) for your query
  • Results Panel: See the structured data results when you test your query
  • Assistant Panel: Get AI-powered help with query creation and editing

Auto-Generated Queries

After connecting a data source, Infactory automatically generates approximately 12 queries based on your data schema.

Types of Auto-Generated Queries

Depending on your data, Infactory may generate queries such as:

Aggregations

Calculations like averages, sums, or counts grouped by categories

Filtering

Finding records that match specific criteria or conditions

Rankings

Ordering data by specific metrics (highest, lowest, etc.)

Correlations

Examining relationships between different numeric fields

Time Series

Analyzing trends over time periods

Comparisons

Comparing values across different categories

Testing Auto-Generated Queries

To test an auto-generated query:

1

Select a query

Click on a query from the list to load it in the query editor.

2

Review the query code

Examine the query code to understand what it does. Don’t worry if you’re not familiar with Python – the code is well-commented.

3

Run the query

Click the Run button to execute the query against your data.

4

View the results

The structured data results will appear in the Results Panel.

Understanding Slots (Query Parameters)

“Slots” are what make Infactory queries so powerful. They are placeholders in queries that can be filled with different values, making a single query capable of answering many related questions.

How Slots Work

For example, a query with the question pattern “What is the average by ?” has two slots:

  • The metric slot can be filled with any numeric field (height, weight, sales, etc.)
  • The category slot can be filled with any categorical field (position, team, region, etc.)

This means one query template can answer hundreds of specific questions like:

  • “What is the average height by position?”
  • “What is the average sales by region?”
  • “What is the average order value by product category?”

Slot Types

Infactory supports various slot types:

Text Slots

For text/string values like names, categories, or IDs

Numeric Slots

For numbers like thresholds, counts, or measurements

Date Slots

For date/time values like ranges or specific points in time

Field Slots

For selecting specific database columns or fields

Boolean Slots

For true/false options

List Slots

For multiple values

Creating Custom Queries

While auto-generated queries cover many common scenarios, you may want to create custom queries for specific needs.

Using the Infactory Assistant

The easiest way to create a new query is with the Infactory Assistant:

1

Click New Query

Click the New Query button at the top of the query list.

2

Describe what you want

In the assistant panel, describe in natural language what you want the query to do.

Example: “Create a query that finds the correlation between player height and scoring average, grouped by position.”

3

Review the generated code

The assistant will generate query code based on your description. Review it to ensure it matches your intent.

4

Test the query

Click Run to test the query with your data.

5

Refine if needed

If the query doesn’t work as expected, you can:

  • Ask the assistant to modify it
  • Edit the code directly
  • Add or modify slots

Manual Query Creation

For advanced users comfortable with Python, you can create queries manually:

1

Click New Query

Click the New Query button at the top of the query list.

2

Write your query code

Write your Python code in the query editor. The code should follow this structure:

def run(parameters, context):
    # Step 1: Load data from your connected data source
    df = context.load_table("your_table_name")
    
    # Step 2: Process the data based on parameters
    result = df.groupby(parameters["category"]).mean(parameters["metric"])
    
    # Step 3: Return the results
    return result
3

Define slots (parameters)

In the Parameters section, define the slots your query will use.

4

Test your query

Click Run to test your query with sample parameter values.

5

Save your query

Give your query a descriptive name and save it.

Query Code Structure

Understanding the structure of query code helps you create and customize queries effectively.

Basic Structure

All Infactory queries follow this basic structure:

def run(parameters, context):
    # Step 1: Load data
    df = context.load_table("your_table_name")
    
    # Step 2: Process data
    result = process_data(df, parameters)
    
    # Step 3: Return results
    return result
  • parameters: Contains the values for your query slots
  • context: Provides access to your data sources and helper functions
  • return value: The structured data result of your query

Common Data Processing Patterns

Here are some common patterns used in query code:

# Filter data based on a condition
filtered_df = df[df["column_name"] > parameters["threshold"]]

Advanced Query Features

Infactory’s query system provides several advanced features for complex scenarios.

Working with Multiple Data Sources

Queries can combine data from multiple tables or collections:

def run(parameters, context):
    # Load data from different tables
    players = context.load_table("players")
    teams = context.load_table("teams")
    stats = context.load_table("statistics")
    
    # Join the tables
    players_teams = players.merge(teams, on="team_id")
    all_data = players_teams.merge(stats, on="player_id")
    
    # Process the joined data
    result = all_data.groupby("team_name").mean("points_per_game")
    
    return result

Creating Calculated Fields

You can create new fields based on calculations:

def run(parameters, context):
    players = context.load_table("players")
    
    # Create a new field
    players["bmi"] = players["weight"] / ((players["height"] / 100) ** 2)
    
    # Use the calculated field
    result = players.groupby("position").mean("bmi")
    
    return result

Advanced Filtering

Combine multiple filters for complex query conditions:

def run(parameters, context):
    players = context.load_table("players")
    
    # Multiple filter conditions
    filtered = players[
        (players["height"] > parameters["min_height"]) &
        (players["weight"] < parameters["max_weight"]) &
        (players["position"].isin(parameters["positions"]))
    ]
    
    return filtered

Result Formatting

You can format and structure your query results for optimal use:

def run(parameters, context):
    data = context.load_table("sales")
    
    # Process data
    results = data.groupby("region").sum("revenue")
    
    # Format results
    formatted_results = {
        "labels": results["region"].tolist(),
        "values": results["revenue"].tolist(),
        "total": results["revenue"].sum(),
        "average": results["revenue"].mean()
    }
    
    return formatted_results

Query Performance Optimization

For larger datasets, consider these optimization techniques:

Filter Early

Apply filters as early as possible to reduce the amount of data being processed:

# Good: Filter first, then process
df = context.load_table("large_table")
filtered = df[df["region"] == parameters["region"]]
result = filtered.groupby("category").mean("value")

# Less efficient: Process everything, then filter
df = context.load_table("large_table")
grouped = df.groupby("category").mean("value")
result = grouped[grouped.index == parameters["category"]]

Select Only Needed Columns

Only load the columns you need for your query:

# Efficient: Only load required columns
df = context.load_table("large_table", columns=["id", "category", "value"])

# Less efficient: Load all columns
df = context.load_table("large_table")

Use Efficient Operations

Prefer vectorized operations over loops:

# Good: Vectorized operation
df["doubled"] = df["value"] * 2

# Less efficient: Loop
for i in range(len(df)):
    df.at[i, "doubled"] = df.at[i, "value"] * 2

Debugging Queries

When your query doesn’t work as expected, these debugging strategies can help:

Best Practices

Start with Templates

Use auto-generated queries as templates for your custom queries whenever possible.

Descriptive Names

Give your queries clear names that describe what they do.

Add Comments

Document your query code with clear comments explaining what each section does.

Test Edge Cases

Test your queries with extreme or unusual parameter values.

Parameter Validation

Validate parameter values and handle invalid inputs gracefully.

Focus on Reusability

Design queries with slots that maximize their reusability.

Next Steps

After building your queries, the next step is to deploy them as API endpoints. Continue to Deploying APIs to learn how to make your queries available for applications to use.