This guide demonstrates how to programmatically create and execute HTTP connections using the Infactory API.

Overview

When creating an HTTP connection, the process involves:

  1. Testing the connection to verify the API endpoint works
  2. Creating a datasource to store connection information
  3. Setting up credentials for authentication
  4. Executing the request to establish the connection and load data

Behind the scenes, this process creates:

  • A datasource record in the database
  • A data object for the HTTP request configuration
  • A data object for the response data (in parquet format)
  • Data lineage records connecting these objects

Prerequisites

  • Authentication token or session cookie
  • Project ID where you want to create the connection
  • API endpoint URL you want to connect to

Step 1: Test the HTTP Connection

First, test that your API endpoint is accessible and returns the expected data:

curl 'https://your-instance.infactory.ai/api/infactory/v1/http/test-connection' \
  -H 'content-type: application/json' \
  -H 'authorization: YOUR_AUTH_TOKEN' \
  --data-raw '{
    "url": "https://api-endpoint.example.com/data",
    "method": "GET",
    "headers": {},
    "parameters": {
      "key": {
        "value": "your-api-key-value",
        "required": true
      }
    },
    "parameterGroups": [],
    "authType": "None",
    "auth": {},
    "responsePathExtractor": "value"
  }'

Request Parameters:

  • url: The API endpoint URL
  • method: HTTP method (GET, POST, PUT, etc.)
  • headers: HTTP headers to include with the request
  • parameters: Query parameters with values and required flags
  • parameterGroups: Groups of related parameters (optional)
  • authType: Authentication type (None, API Key, Bearer Token, Basic Auth)
  • auth: Authentication details based on the auth type
  • responsePathExtractor: JSON path to extract a specific key from the response (e.g., “value”)

Response:

{
  "success": true,
  "status": 200,
  "response_time": 123,
  "content_type": "application/json",
  "size": 1024,
  "data": {
    "example": "response data"
  }
}

Step 2: Create a Datasource

Create a datasource to store your HTTP connection data:

curl 'https://your-instance.infactory.ai/api/infactory/v1/datasources' \
  -H 'content-type: application/json' \
  -H 'authorization: YOUR_AUTH_TOKEN' \
  --data-raw '{
    "name": "My HTTP Connection",
    "project_id": "your-project-id",
    "type": "http-requests",
    "status": "transformation_started"
  }'

Request Parameters:

  • name: Name of the datasource
  • project_id: ID of the project to associate with
  • type: Must be “http-requests” for HTTP connections
  • status: Initial status (typically “transformation_started”)

Response:

{
  "id": "datasource-id",
  "name": "My HTTP Connection",
  "project_id": "your-project-id",
  "type": "http-requests",
  "status": "transformation_started",
  "created_at": "2023-09-21T15:30:00Z",
  "updated_at": "2023-09-21T15:30:00Z"
}

Save the returned id as your datasource_id for the next steps.

Step 3: Create Credentials

Create credentials to store connection details and authentication information:

curl 'https://your-instance.infactory.ai/api/infactory/v1/credentials' \
  -H 'content-type: application/json' \
  -H 'authorization: YOUR_AUTH_TOKEN' \
  --data-raw '{
    "name": "API Credentials",
    "type": "api",
    "description": "Credentials for API connection",
    "metadata": {
      "url": "https://api-endpoint.example.com/data",
      "method": "GET",
      "headers": {},
      "auth": {}
    },
    "datasource_id": "your-datasource-id",
    "team_id": "your-team-id",
    "organization_id": "your-org-id",
    "config": {
      "url": "https://api-endpoint.example.com/data",
      "method": "GET",
      "headers": {},
      "auth": {}
    }
  }'

Request Parameters:

  • name: Name for the credentials
  • type: “api” for API credentials
  • description: Description of what the credentials are for
  • metadata: Additional information about the API
  • datasource_id: ID of the datasource from step 2
  • team_id: Team that can access these credentials
  • organization_id: Organization that owns these credentials
  • config: Configuration details for the connection

Step 4: Execute the HTTP Request

Finally, execute the HTTP request to establish the connection and load data:

curl 'https://your-instance.infactory.ai/api/infactory/v1/http/execute-request' \
  -H 'content-type: application/json' \
  -H 'authorization: YOUR_AUTH_TOKEN' \
  --data-raw '{
    "url": "https://api-endpoint.example.com/data",
    "method": "GET",
    "headers": {},
    "parameters": {
      "key": {
        "value": "your-api-key-value",
        "required": true
      }
    },
    "parameterGroups": [],
    "authType": "None",
    "auth": {},
    "responsePathExtractor": "value",
    "project_id": "your-project-id",
    "datasource_id": "your-datasource-id",
    "connect_spec": {
      "name": "My HTTP Connection",
      "id": "http-requests",
      "config": {
        "url": "https://api-endpoint.example.com/data",
        "method": "GET",
        "headers": {},
        "parameters": {
          "key": {
            "value": "your-api-key-value",
            "required": true
          }
        },
        "parameterGroups": [],
        "authType": "None",
        "auth": {},
        "responsePathExtractor": "value"
      }
    }
  }'

Request Parameters:

  • HTTP connection details (same as test-connection)
  • project_id: ID of the project
  • datasource_id: ID of the datasource created in step 2
  • connect_spec: Connection specification object with:
    • name: Name of the connection
    • id: Type identifier (“http-requests”)
    • config: Full configuration matching the test-connection parameters

Response:

{
  "jobs": [
    {
      "id": "job-id",
      "job_type": "build_dataline_from_connected_resource",
      "status": "queued",
      "created_at": "2023-09-21T15:35:00Z"
    }
  ],
  "data_object_id": "data-object-id"
}

What Happens Behind the Scenes

When you execute this flow:

  1. The system creates a data object to store your HTTP request configuration
  2. It executes the HTTP request and fetches data from the API
  3. The response is converted to a Parquet file and stored as another data object
  4. Data lineage is established between request and response data objects
  5. Background jobs analyze the data structure and prepare it for querying
  6. The system automatically generates query programs based on the data structure
  7. These query programs are ready to use immediately without manual coding

Automatic Query Generation

One of the powerful features of this process is that the system automatically creates query programs (ready-to-use queries) based on the API response data. This means:

  • You don’t need to manually write queries to explore the API data
  • The system examines the structure and content of the API response
  • It generates relevant queries tailored to the specific data received
  • These queries are immediately available in your project for use
  • You can execute or modify these generated queries as needed

This automatic query generation significantly accelerates the time from connection to insight, allowing you to start working with the API data immediately after establishing the connection.

Code Example (Python)

Here’s a complete Python example showing all steps:

import requests
import json

# Configuration
BASE_URL = "https://your-instance.infactory.ai/api/infactory"
AUTH_TOKEN = "your-auth-token"
PROJECT_ID = "your-project-id"
TEAM_ID = "your-team-id"
ORG_ID = "your-org-id"

headers = {
    "Content-Type": "application/json",
    "Authorization": AUTH_TOKEN
}

# Step 1: Test the connection
test_payload = {
    "url": "https://api-endpoint.example.com/data",
    "method": "GET",
    "parameters": {
        "key": {
            "value": "your-api-key-value",
            "required": True
        }
    },
    "parameterGroups": [],
    "authType": "None",
    "auth": {},
    "responsePathExtractor": "value"
}

test_response = requests.post(
    f"{BASE_URL}/v1/http/test-connection",
    headers=headers,
    data=json.dumps(test_payload)
)

if test_response.status_code != 200 or not test_response.json()["success"]:
    print("Connection test failed:", test_response.text)
    exit(1)

print("Connection test successful!")

# Step 2: Create a datasource
datasource_payload = {
    "name": "My HTTP Connection",
    "project_id": PROJECT_ID,
    "type": "http-requests",
    "status": "transformation_started"
}

datasource_response = requests.post(
    f"{BASE_URL}/v1/datasources",
    headers=headers,
    data=json.dumps(datasource_payload)
)

if datasource_response.status_code != 200:
    print("Failed to create datasource:", datasource_response.text)
    exit(1)

datasource_id = datasource_response.json()["id"]
print(f"Created datasource with ID: {datasource_id}")

# Step 3: Create credentials
credentials_payload = {
    "name": "API Credentials",
    "type": "api",
    "description": "Credentials for API connection",
    "metadata": {
        "url": "https://api-endpoint.example.com/data",
        "method": "GET",
        "headers": {},
        "auth": {}
    },
    "datasource_id": datasource_id,
    "team_id": TEAM_ID,
    "organization_id": ORG_ID,
    "config": {
        "url": "https://api-endpoint.example.com/data",
        "method": "GET",
        "headers": {},
        "auth": {}
    }
}

credentials_response = requests.post(
    f"{BASE_URL}/v1/credentials",
    headers=headers,
    data=json.dumps(credentials_payload)
)

if credentials_response.status_code != 200:
    print("Failed to create credentials:", credentials_response.text)
    exit(1)
else:
    print("Created credentials successfully")

# Step 4: Execute the HTTP request
execute_payload = {
    **test_payload,
    "project_id": PROJECT_ID,
    "datasource_id": datasource_id,
    "connect_spec": {
        "name": "My HTTP Connection",
        "id": "http-requests",
        "config": {
            **test_payload,
            "responsePathExtractor": "value"
        }
    }
}

execute_response = requests.post(
    f"{BASE_URL}/v1/http/execute-request",
    headers=headers,
    data=json.dumps(execute_payload)
)

if execute_response.status_code != 200:
    print("Failed to execute HTTP request:", execute_response.text)
    exit(1)

result = execute_response.json()
print(f"Successfully executed HTTP request!")
print(f"Data object ID: {result['data_object_id']}")
print(f"Jobs: {len(result['jobs'])} jobs created")

Now the HTTP connection is established and the data is available in your project.