This guide demonstrates how to programmatically create and execute HTTP connections using the Infactory API.
Overview
When creating an HTTP connection, the process involves:
- Testing the connection to verify the API endpoint works
- Creating a datasource to store connection information
- Setting up credentials for authentication
- Executing the request to establish the connection and load data
Behind the scenes, this process creates:
- A datasource record in the database
- A data object for the HTTP request configuration
- A data object for the response data (in parquet format)
- Data lineage records connecting these objects
Prerequisites
- Authentication token or session cookie
- Project ID where you want to create the connection
- API endpoint URL you want to connect to
Step 1: Test the HTTP Connection
First, test that your API endpoint is accessible and returns the expected data:
curl 'https://your-instance.infactory.ai/api/infactory/v1/http/test-connection' \
-H 'content-type: application/json' \
-H 'authorization: YOUR_AUTH_TOKEN' \
--data-raw '{
"url": "https://api-endpoint.example.com/data",
"method": "GET",
"headers": {},
"parameters": {
"key": {
"value": "your-api-key-value",
"required": true
}
},
"parameterGroups": [],
"authType": "None",
"auth": {},
"responsePathExtractor": "value"
}'
Request Parameters:
url
: The API endpoint URL
method
: HTTP method (GET, POST, PUT, etc.)
headers
: HTTP headers to include with the request
parameters
: Query parameters with values and required flags
parameterGroups
: Groups of related parameters (optional)
authType
: Authentication type (None, API Key, Bearer Token, Basic Auth)
auth
: Authentication details based on the auth type
responsePathExtractor
: JSON path to extract a specific key from the response (e.g., “value”)
Response:
{
"success": true,
"status": 200,
"response_time": 123,
"content_type": "application/json",
"size": 1024,
"data": {
"example": "response data"
}
}
Step 2: Create a Datasource
Create a datasource to store your HTTP connection data:
curl 'https://your-instance.infactory.ai/api/infactory/v1/datasources' \
-H 'content-type: application/json' \
-H 'authorization: YOUR_AUTH_TOKEN' \
--data-raw '{
"name": "My HTTP Connection",
"project_id": "your-project-id",
"type": "http-requests",
"status": "transformation_started"
}'
Request Parameters:
name
: Name of the datasource
project_id
: ID of the project to associate with
type
: Must be “http-requests” for HTTP connections
status
: Initial status (typically “transformation_started”)
Response:
{
"id": "datasource-id",
"name": "My HTTP Connection",
"project_id": "your-project-id",
"type": "http-requests",
"status": "transformation_started",
"created_at": "2023-09-21T15:30:00Z",
"updated_at": "2023-09-21T15:30:00Z"
}
Save the returned id
as your datasource_id
for the next steps.
Step 3: Create Credentials
Create credentials to store connection details and authentication information:
curl 'https://your-instance.infactory.ai/api/infactory/v1/credentials' \
-H 'content-type: application/json' \
-H 'authorization: YOUR_AUTH_TOKEN' \
--data-raw '{
"name": "API Credentials",
"type": "api",
"description": "Credentials for API connection",
"metadata": {
"url": "https://api-endpoint.example.com/data",
"method": "GET",
"headers": {},
"auth": {}
},
"datasource_id": "your-datasource-id",
"team_id": "your-team-id",
"organization_id": "your-org-id",
"config": {
"url": "https://api-endpoint.example.com/data",
"method": "GET",
"headers": {},
"auth": {}
}
}'
Request Parameters:
name
: Name for the credentials
type
: “api” for API credentials
description
: Description of what the credentials are for
metadata
: Additional information about the API
datasource_id
: ID of the datasource from step 2
team_id
: Team that can access these credentials
organization_id
: Organization that owns these credentials
config
: Configuration details for the connection
Step 4: Execute the HTTP Request
Finally, execute the HTTP request to establish the connection and load data:
curl 'https://your-instance.infactory.ai/api/infactory/v1/http/execute-request' \
-H 'content-type: application/json' \
-H 'authorization: YOUR_AUTH_TOKEN' \
--data-raw '{
"url": "https://api-endpoint.example.com/data",
"method": "GET",
"headers": {},
"parameters": {
"key": {
"value": "your-api-key-value",
"required": true
}
},
"parameterGroups": [],
"authType": "None",
"auth": {},
"responsePathExtractor": "value",
"project_id": "your-project-id",
"datasource_id": "your-datasource-id",
"connect_spec": {
"name": "My HTTP Connection",
"id": "http-requests",
"config": {
"url": "https://api-endpoint.example.com/data",
"method": "GET",
"headers": {},
"parameters": {
"key": {
"value": "your-api-key-value",
"required": true
}
},
"parameterGroups": [],
"authType": "None",
"auth": {},
"responsePathExtractor": "value"
}
}
}'
Request Parameters:
- HTTP connection details (same as test-connection)
project_id
: ID of the project
datasource_id
: ID of the datasource created in step 2
connect_spec
: Connection specification object with:
name
: Name of the connection
id
: Type identifier (“http-requests”)
config
: Full configuration matching the test-connection parameters
Response:
{
"jobs": [
{
"id": "job-id",
"job_type": "build_dataline_from_connected_resource",
"status": "queued",
"created_at": "2023-09-21T15:35:00Z"
}
],
"data_object_id": "data-object-id"
}
What Happens Behind the Scenes
When you execute this flow:
- The system creates a data object to store your HTTP request configuration
- It executes the HTTP request and fetches data from the API
- The response is converted to a Parquet file and stored as another data object
- Data lineage is established between request and response data objects
- Background jobs analyze the data structure and prepare it for querying
- The system automatically generates query programs based on the data structure
- These query programs are ready to use immediately without manual coding
Automatic Query Generation
One of the powerful features of this process is that the system automatically creates query programs (ready-to-use queries) based on the API response data. This means:
- You don’t need to manually write queries to explore the API data
- The system examines the structure and content of the API response
- It generates relevant queries tailored to the specific data received
- These queries are immediately available in your project for use
- You can execute or modify these generated queries as needed
This automatic query generation significantly accelerates the time from connection to insight, allowing you to start working with the API data immediately after establishing the connection.
Code Example (Python)
Here’s a complete Python example showing all steps:
import requests
import json
# Configuration
BASE_URL = "https://your-instance.infactory.ai/api/infactory"
AUTH_TOKEN = "your-auth-token"
PROJECT_ID = "your-project-id"
TEAM_ID = "your-team-id"
ORG_ID = "your-org-id"
headers = {
"Content-Type": "application/json",
"Authorization": AUTH_TOKEN
}
# Step 1: Test the connection
test_payload = {
"url": "https://api-endpoint.example.com/data",
"method": "GET",
"parameters": {
"key": {
"value": "your-api-key-value",
"required": True
}
},
"parameterGroups": [],
"authType": "None",
"auth": {},
"responsePathExtractor": "value"
}
test_response = requests.post(
f"{BASE_URL}/v1/http/test-connection",
headers=headers,
data=json.dumps(test_payload)
)
if test_response.status_code != 200 or not test_response.json()["success"]:
print("Connection test failed:", test_response.text)
exit(1)
print("Connection test successful!")
# Step 2: Create a datasource
datasource_payload = {
"name": "My HTTP Connection",
"project_id": PROJECT_ID,
"type": "http-requests",
"status": "transformation_started"
}
datasource_response = requests.post(
f"{BASE_URL}/v1/datasources",
headers=headers,
data=json.dumps(datasource_payload)
)
if datasource_response.status_code != 200:
print("Failed to create datasource:", datasource_response.text)
exit(1)
datasource_id = datasource_response.json()["id"]
print(f"Created datasource with ID: {datasource_id}")
# Step 3: Create credentials
credentials_payload = {
"name": "API Credentials",
"type": "api",
"description": "Credentials for API connection",
"metadata": {
"url": "https://api-endpoint.example.com/data",
"method": "GET",
"headers": {},
"auth": {}
},
"datasource_id": datasource_id,
"team_id": TEAM_ID,
"organization_id": ORG_ID,
"config": {
"url": "https://api-endpoint.example.com/data",
"method": "GET",
"headers": {},
"auth": {}
}
}
credentials_response = requests.post(
f"{BASE_URL}/v1/credentials",
headers=headers,
data=json.dumps(credentials_payload)
)
if credentials_response.status_code != 200:
print("Failed to create credentials:", credentials_response.text)
exit(1)
else:
print("Created credentials successfully")
# Step 4: Execute the HTTP request
execute_payload = {
**test_payload,
"project_id": PROJECT_ID,
"datasource_id": datasource_id,
"connect_spec": {
"name": "My HTTP Connection",
"id": "http-requests",
"config": {
**test_payload,
"responsePathExtractor": "value"
}
}
}
execute_response = requests.post(
f"{BASE_URL}/v1/http/execute-request",
headers=headers,
data=json.dumps(execute_payload)
)
if execute_response.status_code != 200:
print("Failed to execute HTTP request:", execute_response.text)
exit(1)
result = execute_response.json()
print(f"Successfully executed HTTP request!")
print(f"Data object ID: {result['data_object_id']}")
print(f"Jobs: {len(result['jobs'])} jobs created")
Now the HTTP connection is established and the data is available in your project.