How to Build n8n Data Pipelines (Sync, Transform, and Route Data Between Any App)
Build n8n data pipelines that extract from APIs, transform with Code nodes, and load into any target system: schedule-driven or real-time, no ETL tool needed.
Manual Data Syncing Breaks. Pipelines Don't.
Copy-pasting records between systems. Scheduled CSV exports that nobody checks. Spreadsheet formulas that break when a column moves. Every growing team has a version of this problem: data that lives in one system and needs to be somewhere else, cleaned, every day.
n8n lets you build proper data pipelines: extract from any API or database, transform with JavaScript, validate against business rules, and load into any target, on a schedule or triggered by an event. No external ETL service required.
This guide covers how to build reliable extract-transform-load workflows in n8n, from a simple API-to-Sheets sync to a paginated multi-source aggregation that combines data from several platforms into one unified record.
What You Can Automate With n8n Data Pipelines
Any recurring data movement task is a candidate:
- API-to-spreadsheet sync: pull records from Stripe, HubSpot, or your product API into Google Sheets every hour
- Database mirroring: replicate rows from a PostgreSQL table to a secondary database or analytics store
- Multi-source aggregation: combine Shopify orders, Stripe revenue, and GA4 sessions into one unified daily record
- Incremental sync: track the last synced timestamp and only fetch new or updated records on each run
- Data cleansing: normalize phone numbers, deduplicate contacts, and reject records missing required fields before they land in your CRM
- Cross-app data bridge: move records from one platform to another when there's no native integration (e.g., Jotform submissions to PostgreSQL)
The Data Pipeline
Every n8n data pipeline follows the same shape:
Schedule (or Webhook)
→ HTTP Request (Extract: fetch records from source API)
→ Code (Transform: normalize fields, compute derived values)
→ IF (Validate: reject records missing required fields)
→ Split in Batches (chunk into groups of 100)
→ HTTP Request / Sheets / DB (Load: write to target)
→ Slack (Report: send run summary with counts and errors)
This structure is repeatable and debuggable regardless of which source and target you're using.
1. Trigger: Define When the Pipeline Runs
Use a Schedule Trigger for batch pipelines:
- Hourly:
0 * * * * - Nightly at 2am:
0 2 * * * - Every 15 minutes:
*/15 * * * *
Use a Webhook Trigger when the pipeline should fire in response to an event: a new row added, a product backend event, or a manual trigger from Slack.
For real-time event-driven pipelines that need to respond within seconds, see the n8n webhook automation guide.
2. Extract: Pull Data From the Source
The HTTP Request node handles any REST API. Configure:
- Method:
GET - URL: your API endpoint (e.g.,
https://api.stripe.com/v1/charges) - Authentication: Header Auth with
Authorization: Bearer YOUR_API_KEY - Query parameters:
limit=100, date filters usingcreated[gte]
For paginated APIs, use a loop pattern: fetch a page, load it, fetch the next until all pages are consumed.
Every node exposes results at $json.fieldName. For paginated APIs, records are usually nested at $json.data or $json.items. Always inspect the HTTP Request node's output panel after a test run to find the exact field path before writing downstream Code node logic that depends on it.
3. Transform: Normalize and Reshape
A Code node transforms raw API output into the shape your target expects:
return $input.all().map(item => {
const src = item.json;
return {
json: {
id: src.id,
email: (src.email || "").toLowerCase().trim(),
amount_usd: (src.amount / 100).toFixed(2), // Stripe sends cents
created_date: new Date(src.created * 1000).toISOString().split("T")[0],
status: src.status,
name: [src.first_name, src.last_name].filter(Boolean).join(" "),
}
};
});
Keep all field mapping here, not scattered across downstream nodes. One Code node that produces clean, predictable output is far easier to debug than a chain of Set nodes doing partial transforms.
4. Validate: Filter Out Bad Records
An IF node checks each record against business rules before it reaches your target:
- Email must not be empty
- Amount must be greater than
0 - Status must be one of the expected values
Records that fail route to the false branch. Write them to an "Error Log" sheet tab or trigger a Slack alert with the record ID and reason. Records that pass continue to loading.
An IF node's false branch should write failed records somewhere visible. A Google Sheet "Error Log" tab with the record ID and failure reason makes debugging pipelines possible. Silent drops mean you discover bad data days later, not minutes later.
5. Load: Write to the Target
Depending on your target:
- Google Sheets node:
appendOrUpdateoperation to upsert by a key column - HTTP Request: POST the transformed record body to a target REST API
- PostgreSQL or MySQL node:
INSERT ... ON CONFLICT DO UPDATEto upsert by primary key - Airtable node: create or update records in a base table
Use Split in Batches (batch size: 50–200) before the load step when processing more than a few hundred records. This prevents memory pressure and makes partial failures easy to identify. You know exactly which batch failed.
6. Report: Send a Run Summary
After loading completes, a Slack or Gmail node sends a summary:
Pipeline run complete: 2026-05-22 02:00 UTC
Records processed: 1,247
Records loaded: 1,231
Records rejected: 16 (see Error Log tab)
Duration: 43s
This turns a background job into an auditable process. A failed run becomes visible immediately instead of surfacing as corrupted data days later.
Implementation Patterns
Pattern 1: Scheduled API-to-Sheets Sync
The simplest pattern: fetch from an API every hour and upsert rows into a Google Sheet.
Schedule (every hour)
→ HTTP Request (GET /api/events with date filter)
→ Code (normalize fields: remap, cast types, clean strings)
→ Google Sheets (appendOrUpdate, key column: id)
→ Slack (post count: N records synced)
Works for: Stripe charges, HubSpot contacts, product event logs, support tickets.
Use the Market Trend Analyzer template as a reference for pulling API data into a structured sheet on a schedule. It handles pagination and deduplication out of the box.
Explore the Market Trend Analyzer: an API-to-dashboard pipeline template →Pattern 2: Multi-Source Aggregation
Merge data from several APIs into one unified record per entity; for example, one row per customer combining billing, CRM, and product data.
Schedule (nightly)
→ HTTP Request (Stripe: customer list)
→ HTTP Request (HubSpot: matching contact by email)
→ HTTP Request (product DB: last login timestamp)
→ Code (merge all three into one object per customer)
→ Google Sheets (write unified customer row)
The Code node after the last fetch assembles the merged object:
const stripeData = $("Stripe").all();
const hubspotData = $("HubSpot").all();
const productData = $("Product DB").all();
return stripeData.map(s => {
const hs = hubspotData.find(h => h.json.email === s.json.email);
const pd = productData.find(p => p.json.email === s.json.email);
return {
json: {
customer_id: s.json.id,
email: s.json.email,
plan: s.json.plan,
hs_deal_stage: hs?.json.dealstage ?? "unknown",
last_login: pd?.json.last_login ?? null,
}
};
});
The Decision Makers Dashboard template uses this exact pattern: pulling signals from multiple sources and presenting them as one unified view.
Pattern 3: Incremental Sync With a Cursor
For large datasets, fetch only new or updated records since the last run by persisting a cursor in a config sheet.
Schedule (every 15 minutes)
→ Google Sheets (read cursor from config row)
→ HTTP Request (GET /records?updated_since=<cursor>)
→ Code (transform)
→ [load to target]
→ Google Sheets (write new cursor = max updated_at from this batch)
The cursor prevents double-loading already-processed records and keeps each pipeline run fast even as the total dataset grows.
n8n workflows have no persistent state between executions. Store the cursor (last synced timestamp, last processed ID) in a Google Sheet config row, a database table, or n8n's built-in static data via $workflow.staticData. Read it at the start of every run, write it at the end.
n8n Nodes You'll Use Most
| Node | Purpose |
|---|---|
| Schedule Trigger | Run the pipeline on a cron schedule |
| HTTP Request | Extract from any REST API or load to any REST target |
| Code | Transform, normalize, merge, and compute derived fields |
| IF | Validate records; route failures to an error branch |
| Split in Batches | Process large record sets in chunks to avoid memory pressure |
| Google Sheets | Append, upsert, or read cursor state from a spreadsheet |
| Merge | Combine outputs from multiple upstream extraction branches |
| PostgreSQL / MySQL | Read from or write to a relational database |
| Slack / Gmail | Send pipeline run summaries and error alerts |
| Loop Over Items | Paginate through multi-page API responses |
Getting Started
- Identify source and target: write down the API endpoint or database table you're extracting from, and exactly where the data needs to land.
- Test the extract step alone: add just the Schedule and HTTP Request nodes, run manually, and inspect the raw output. Know the exact field paths before writing any transform logic.
- Write the Code node transform: map source fields to target fields, normalize types, strip whitespace, compute derived values. Test against real output from step 2.
- Add the IF validation node: define what a valid record looks like and route failures to an error log sheet.
- Add Split in Batches before the load: even with 10 records today, the batch node makes scaling later painless.
- Add the load node: connect to Google Sheets, a database, or the target API. Test with a single record first, then a full batch.
- Add the Slack or Gmail summary node: log record counts and any errors. Activate the workflow only after the full pipeline passes a real test run.
For pipelines that need to classify or enrich records during transformation (scoring leads, detecting intent, categorizing support tickets), see the n8n AI-powered automation guide. For turning pipeline output into live dashboards and scheduled reports, see the n8n data reporting guide.
The App Analytics Dashboard template shows how to close the loop: pipeline output feeding directly into a structured reporting view.
Browse n8n data automation templates →Common questions
What's the difference between an n8n data pipeline and an n8n data report?
Can n8n replace a dedicated ETL tool like Fivetran or Airbyte?
How do I handle large datasets in n8n without hitting memory limits?
Get the workflow templates this guide is built on
Import-ready n8n JSON, step-by-step setup, and tested end-to-end. One-time payment, own it forever.
More automation guides

How to Build OpenAI Workflows with n8n (Classify, Summarize, and Generate at Scale)
n8n OpenAI workflows connect a self-hosted automation engine to capable text models, and the integration is more direct than most tutorials suggest. The pattern goes like this: something triggers the…

How to Automate HR Workflows with n8n (Onboarding, Compliance, and Wellness Monitoring)
HR Admin Runs on Repetition. n8n Handles Repetition. Every new hire triggers the same chain of tasks. Welcome email, Notion page, Trello cards, payroll enrollment reminder, badge request. Someone copy…

How to Automate HubSpot with n8n (Contacts, Deals, and Email Sequences)
HubSpot Automation That Goes Beyond the Built-In Workflow Builder HubSpot's native workflow builder handles simple branching inside HubSpot well. The moment you need to sync a closed deal to your acco…