Skip to main content
Lifetime license included with every purchase
n8n workflowsdata pipelinesETL automationdata sync

How to Build n8n Data Pipelines (Sync, Transform, and Route Data Between Any App)

Build n8n data pipelines that extract from APIs, transform with Code nodes, and load into any target system: schedule-driven or real-time, no ETL tool needed.

Nn8n Marketplace Team·May 22, 2026·9 min read

Manual Data Syncing Breaks. Pipelines Don't.

Copy-pasting records between systems. Scheduled CSV exports that nobody checks. Spreadsheet formulas that break when a column moves. Every growing team has a version of this problem: data that lives in one system and needs to be somewhere else, cleaned, every day.

n8n lets you build proper data pipelines: extract from any API or database, transform with JavaScript, validate against business rules, and load into any target, on a schedule or triggered by an event. No external ETL service required.

This guide covers how to build reliable extract-transform-load workflows in n8n, from a simple API-to-Sheets sync to a paginated multi-source aggregation that combines data from several platforms into one unified record.

What You Can Automate With n8n Data Pipelines

Any recurring data movement task is a candidate:

  • API-to-spreadsheet sync: pull records from Stripe, HubSpot, or your product API into Google Sheets every hour
  • Database mirroring: replicate rows from a PostgreSQL table to a secondary database or analytics store
  • Multi-source aggregation: combine Shopify orders, Stripe revenue, and GA4 sessions into one unified daily record
  • Incremental sync: track the last synced timestamp and only fetch new or updated records on each run
  • Data cleansing: normalize phone numbers, deduplicate contacts, and reject records missing required fields before they land in your CRM
  • Cross-app data bridge: move records from one platform to another when there's no native integration (e.g., Jotform submissions to PostgreSQL)
Get the Data Entry Hub template: a pre-built n8n data pipeline

The Data Pipeline

Every n8n data pipeline follows the same shape:

Schedule (or Webhook)
  → HTTP Request (Extract: fetch records from source API)
  → Code (Transform: normalize fields, compute derived values)
  → IF (Validate: reject records missing required fields)
  → Split in Batches (chunk into groups of 100)
    → HTTP Request / Sheets / DB (Load: write to target)
  → Slack (Report: send run summary with counts and errors)

This structure is repeatable and debuggable regardless of which source and target you're using.

1. Trigger: Define When the Pipeline Runs

Use a Schedule Trigger for batch pipelines:

  • Hourly: 0 * * * *
  • Nightly at 2am: 0 2 * * *
  • Every 15 minutes: */15 * * * *

Use a Webhook Trigger when the pipeline should fire in response to an event: a new row added, a product backend event, or a manual trigger from Slack.

For real-time event-driven pipelines that need to respond within seconds, see the n8n webhook automation guide.

2. Extract: Pull Data From the Source

The HTTP Request node handles any REST API. Configure:

  • Method: GET
  • URL: your API endpoint (e.g., https://api.stripe.com/v1/charges)
  • Authentication: Header Auth with Authorization: Bearer YOUR_API_KEY
  • Query parameters: limit=100, date filters using created[gte]

For paginated APIs, use a loop pattern: fetch a page, load it, fetch the next until all pages are consumed.

Check the exact output path before writing transform logic

Every node exposes results at $json.fieldName. For paginated APIs, records are usually nested at $json.data or $json.items. Always inspect the HTTP Request node's output panel after a test run to find the exact field path before writing downstream Code node logic that depends on it.

3. Transform: Normalize and Reshape

A Code node transforms raw API output into the shape your target expects:

return $input.all().map(item => {
  const src = item.json;
  return {
    json: {
      id:           src.id,
      email:        (src.email || "").toLowerCase().trim(),
      amount_usd:   (src.amount / 100).toFixed(2),   // Stripe sends cents
      created_date: new Date(src.created * 1000).toISOString().split("T")[0],
      status:       src.status,
      name:         [src.first_name, src.last_name].filter(Boolean).join(" "),
    }
  };
});

Keep all field mapping here, not scattered across downstream nodes. One Code node that produces clean, predictable output is far easier to debug than a chain of Set nodes doing partial transforms.

4. Validate: Filter Out Bad Records

An IF node checks each record against business rules before it reaches your target:

  • Email must not be empty
  • Amount must be greater than 0
  • Status must be one of the expected values

Records that fail route to the false branch. Write them to an "Error Log" sheet tab or trigger a Slack alert with the record ID and reason. Records that pass continue to loading.

Log rejected records, do not silently drop them

An IF node's false branch should write failed records somewhere visible. A Google Sheet "Error Log" tab with the record ID and failure reason makes debugging pipelines possible. Silent drops mean you discover bad data days later, not minutes later.

5. Load: Write to the Target

Depending on your target:

  • Google Sheets node: appendOrUpdate operation to upsert by a key column
  • HTTP Request: POST the transformed record body to a target REST API
  • PostgreSQL or MySQL node: INSERT ... ON CONFLICT DO UPDATE to upsert by primary key
  • Airtable node: create or update records in a base table

Use Split in Batches (batch size: 50–200) before the load step when processing more than a few hundred records. This prevents memory pressure and makes partial failures easy to identify. You know exactly which batch failed.

6. Report: Send a Run Summary

After loading completes, a Slack or Gmail node sends a summary:

Pipeline run complete: 2026-05-22 02:00 UTC
Records processed: 1,247
Records loaded:    1,231
Records rejected:  16 (see Error Log tab)
Duration: 43s

This turns a background job into an auditable process. A failed run becomes visible immediately instead of surfacing as corrupted data days later.

Implementation Patterns

Pattern 1: Scheduled API-to-Sheets Sync

The simplest pattern: fetch from an API every hour and upsert rows into a Google Sheet.

Schedule (every hour)
  → HTTP Request (GET /api/events with date filter)
  → Code (normalize fields: remap, cast types, clean strings)
  → Google Sheets (appendOrUpdate, key column: id)
  → Slack (post count: N records synced)

Works for: Stripe charges, HubSpot contacts, product event logs, support tickets.

Use the Market Trend Analyzer template as a reference for pulling API data into a structured sheet on a schedule. It handles pagination and deduplication out of the box.

Explore the Market Trend Analyzer: an API-to-dashboard pipeline template

Pattern 2: Multi-Source Aggregation

Merge data from several APIs into one unified record per entity; for example, one row per customer combining billing, CRM, and product data.

Schedule (nightly)
  → HTTP Request (Stripe: customer list)
  → HTTP Request (HubSpot: matching contact by email)
  → HTTP Request (product DB: last login timestamp)
  → Code (merge all three into one object per customer)
  → Google Sheets (write unified customer row)

The Code node after the last fetch assembles the merged object:

const stripeData  = $("Stripe").all();
const hubspotData = $("HubSpot").all();
const productData = $("Product DB").all();

return stripeData.map(s => {
  const hs = hubspotData.find(h => h.json.email === s.json.email);
  const pd = productData.find(p => p.json.email === s.json.email);
  return {
    json: {
      customer_id:   s.json.id,
      email:         s.json.email,
      plan:          s.json.plan,
      hs_deal_stage: hs?.json.dealstage ?? "unknown",
      last_login:    pd?.json.last_login ?? null,
    }
  };
});

The Decision Makers Dashboard template uses this exact pattern: pulling signals from multiple sources and presenting them as one unified view.

Pattern 3: Incremental Sync With a Cursor

For large datasets, fetch only new or updated records since the last run by persisting a cursor in a config sheet.

Schedule (every 15 minutes)
  → Google Sheets (read cursor from config row)
  → HTTP Request (GET /records?updated_since=<cursor>)
  → Code (transform)
  → [load to target]
  → Google Sheets (write new cursor = max updated_at from this batch)

The cursor prevents double-loading already-processed records and keeps each pipeline run fast even as the total dataset grows.

Store cursor state outside the workflow

n8n workflows have no persistent state between executions. Store the cursor (last synced timestamp, last processed ID) in a Google Sheet config row, a database table, or n8n's built-in static data via $workflow.staticData. Read it at the start of every run, write it at the end.

n8n Nodes You'll Use Most

NodePurpose
Schedule TriggerRun the pipeline on a cron schedule
HTTP RequestExtract from any REST API or load to any REST target
CodeTransform, normalize, merge, and compute derived fields
IFValidate records; route failures to an error branch
Split in BatchesProcess large record sets in chunks to avoid memory pressure
Google SheetsAppend, upsert, or read cursor state from a spreadsheet
MergeCombine outputs from multiple upstream extraction branches
PostgreSQL / MySQLRead from or write to a relational database
Slack / GmailSend pipeline run summaries and error alerts
Loop Over ItemsPaginate through multi-page API responses

Getting Started

  1. Identify source and target: write down the API endpoint or database table you're extracting from, and exactly where the data needs to land.
  2. Test the extract step alone: add just the Schedule and HTTP Request nodes, run manually, and inspect the raw output. Know the exact field paths before writing any transform logic.
  3. Write the Code node transform: map source fields to target fields, normalize types, strip whitespace, compute derived values. Test against real output from step 2.
  4. Add the IF validation node: define what a valid record looks like and route failures to an error log sheet.
  5. Add Split in Batches before the load: even with 10 records today, the batch node makes scaling later painless.
  6. Add the load node: connect to Google Sheets, a database, or the target API. Test with a single record first, then a full batch.
  7. Add the Slack or Gmail summary node: log record counts and any errors. Activate the workflow only after the full pipeline passes a real test run.

For pipelines that need to classify or enrich records during transformation (scoring leads, detecting intent, categorizing support tickets), see the n8n AI-powered automation guide. For turning pipeline output into live dashboards and scheduled reports, see the n8n data reporting guide.

The App Analytics Dashboard template shows how to close the loop: pipeline output feeding directly into a structured reporting view.

Browse n8n data automation templates
FAQ

Common questions

What's the difference between an n8n data pipeline and an n8n data report?
A data pipeline moves and transforms data: it extracts from a source, reshapes it, and loads it into a target. A data report reads already-stored data and presents it. In n8n, pipelines use nodes like HTTP Request, Code, and Split in Batches to process records; reports use nodes like Google Sheets and Gmail to surface summaries. You often build both: the pipeline feeds the warehouse, the report reads from it.
Can n8n replace a dedicated ETL tool like Fivetran or Airbyte?
For most small and mid-size data volumes (under a few million records per run), yes. n8n handles extraction via HTTP Request or database nodes, transformation via Code nodes, and loading via any integration. It lacks the pre-built connectors and schema management of enterprise ETL tools, but for custom pipelines between your own systems, it is faster to build and cheaper to run.
How do I handle large datasets in n8n without hitting memory limits?
Use the Split in Batches node to chunk large result sets into groups of 50–200 records and process them sequentially. For paginated APIs, loop over pages with a While or recursive subworkflow pattern, loading each page before fetching the next. Avoid accumulating millions of records in a single workflow execution. Process and discard as you go.
Stop reading. Start running.

Get the workflow templates this guide is built on

Import-ready n8n JSON, step-by-step setup, and tested end-to-end. One-time payment, own it forever.