Building Queries

Infactory’s query system allows you to generate queries that answer questions about your data without writing complex code. This guide explains how to work with the automatically generated queries and create new ones.

Understanding the Build Page

The Build page is where you can view, test, and manage your queries:

  • Query List: All your queries, both automatically generated and custom
  • Query Editor: Where you can view and modify query code
  • Results Panel: Displays the structured data results when you run a query

Auto-Generated Queries

After connecting a data source, Infactory automatically generates approximately 12 queries based on your data schema. These queries are designed to answer common questions about your data.

Types of Auto-Generated Queries

Depending on your data, Infactory may generate queries such as:

  • Averages by Category: Calculates averages of numeric fields grouped by categorical fields
  • Counts by Condition: Counts items that match specific criteria
  • Correlations: Examines relationships between different numeric fields
  • Rankings: Orders data by specific metrics
  • Filtering: Retrieves data matching specific conditions

Testing Auto-Generated Queries

To test an auto-generated query:

  1. Navigate to the Build tab
  2. Select a query from the list
  3. Click the Run button
  4. View the results in the Results Panel

Creating Custom Queries

While auto-generated queries cover many common scenarios, you may want to create custom queries for specific needs.

Using the Infactory Assistant

The easiest way to create a new query is with the Infactory Assistant:

  1. Click the New Query button
  2. Use natural language to describe what you want to query (e.g., “Find the correlation between player height and shot accuracy”)
  3. The assistant will generate query code based on your description
  4. Review, modify if needed, and run the query to test it

Understanding Query Code

Each query in Infactory consists of Python code that uses data manipulation libraries to process your data. The basic structure includes:

def run(parameters, context):
    # Step 1: Load data from your connected data source
    df = context.load_table("your_table_name")
    
    # Step 2: Process the data based on parameters
    result = df.groupby("category").mean("metric")
    
    # Step 3: Return the results
    return result

Query Parameters and Slots

“Slots” are placeholders in queries that can be filled with different values. This makes queries flexible and reusable:

  • A query with the question “What is the average height by position?” might have slots for:
    • The metric to average (e.g., “height”, “weight”, “speed”)
    • The category to group by (e.g., “position”, “team”, “year”)

When the query is called, these slots are filled with specific values, allowing a single query to answer many related questions.

Editing Queries

You can edit queries to customize them for your specific needs:

Manual Editing

For advanced users who are comfortable with Python:

  1. Select a query in the Build tab
  2. Modify the code directly in the query editor
  3. Click Run to test your changes

Using the Assistant for Modifications

To modify a query with the assistant:

  1. Select a query in the Build tab
  2. Click the Modify with Assistant button
  3. Describe the changes you want to make (e.g., “Add a filter for players over 6 feet tall”)
  4. Review the suggested changes
  5. Accept or further modify them

Advanced Query Features

Infactory’s query system provides several advanced features for complex scenarios:

Working with Multiple Data Sources

Queries can combine data from multiple tables or collections:

def run(parameters, context):
    # Load data from two different tables
    players = context.load_table("players")
    teams = context.load_table("teams")
    
    # Join the tables
    combined = players.merge(teams, on="team_id")
    
    # Process the joined data
    result = combined.groupby("team_name").mean("player_height")
    
    return result

Creating Calculated Fields

You can create new fields based on calculations:

def run(parameters, context):
    players = context.load_table("players")
    
    # Create a new field
    players["bmi"] = players["weight"] / (players["height"] ** 2)
    
    # Use the calculated field
    result = players.groupby("position").mean("bmi")
    
    return result

Custom Visualizations

While Infactory returns structured data by default, you can enhance your query results with visualization metadata:

def run(parameters, context):
    result = context.load_table("players").groupby("position").mean("height")
    
    # Add visualization metadata
    viz_metadata = {
        "type": "bar_chart",
        "x_axis": "position",
        "y_axis": "height",
        "title": "Average Height by Position"
    }
    
    return result, viz_metadata

Query Performance Optimization

For larger datasets, consider these optimization techniques:

  • Limit Data Loading: Only load the columns you need
  • Add Filters Early: Filter data as early as possible in the query
  • Use Efficient Operations: Prefer vectorized operations over loops
  • Optimize Joins: Be careful with joins on large tables

Best Practices

  • Start with Auto-Generated Queries: Use these as templates for your custom queries
  • Test with Sample Data: Always test queries with real data to ensure they work as expected
  • Use Descriptive Names: Give your queries clear names that describe what they do
  • Document Parameters: Add comments explaining what each parameter does
  • Keep Queries Focused: Each query should answer a specific type of question
  • Consider Reusability: Design queries with slots for maximum reusability

Next Steps

After building your queries, the next step is to deploy them as API endpoints. Continue to Deploying Queries to learn how to make your queries available for applications to use.

Was this page helpful?