in sql server on a 16 core 32 thread computer, how long does it take to insert 1 trillion rows that are 4 columns wide?
MTNCOMP | List View | Table View | myBlog (1834 Entries) ![]() |
Inserting 1 trillion rows into a SQL Server database, even on a powerful 16-core / 32-thread machine, is a massive operation and depends on a wide range of factors. There's no universal answer, but I’ll break down the key factors and provide an estimated range to help you understand the scale.
Disk I/O Speed
Are you using SSDs, NVMe, or spinning disks?
Throughput and latency will greatly impact performance.
Row Size & Table Design
How wide are the columns? Are they fixed-size (e.g., INT, BIGINT) or variable (e.g., VARCHAR)?
Presence of constraints, indexes, triggers, etc.
Batching Strategy
Single row inserts are slow.
Bulk inserts using batches of 10,000–100,000 rows are much faster.
Transaction Management
Committing per row = slow.
Committing per batch or in a single large transaction = faster, but more memory-intensive.
Parallelism
Are you parallelizing inserts across all 32 threads?
Using SSIS, BULK INSERT, or SqlBulkCopy?
Logging & Recovery Model
Full recovery model = more logging = slower.
Minimal logging (with TABLOCK + simple/bulk-logged model) can improve speed drastically.
Target Table Structure
Heap vs. clustered index.
Indexes on the table slow down inserts.
Let’s assume:
4 columns, all integers (4 bytes each), so row size ~16 bytes.
Bulk insert with SqlBulkCopy, minimal logging, SSD/NVMe.
Batching and multi-threaded parallel insert optimized.
Minimal indexing.
1 trillion rows × 16 bytes = ~16 TB of raw data (not including overhead, transaction logs, indexes).
You can reach around 100,000 rows/second per thread under ideal conditions.
With 32 threads fully utilized:
100,000 × 32 = 3.2 million rows/sec
⏱️ 1,000,000,000,000 / 3,200,000 ≈ 312,500 seconds = ~87 hours (~3.6 days)
Let's say only 1 million rows/sec aggregate.
That’s 1 trillion / 1 million = 1 million seconds ≈ 11.6 days
| Scenario | Rows/sec | Time to Insert 1T Rows |
|---|---|---|
| Ideal (fully optimized) | 3.2M+ | ~3.5 days |
| Moderate optimization | 1M | ~11.6 days |
| Poor (row-by-row inserts, full logging) | 10K–100K | Weeks to months |
Use BULK INSERT or SqlBulkCopy
Switch to simple recovery model during load
Use minimal logging with TABLOCK
Drop all indexes and constraints before insert
Partition the data for parallel loading
Consider staging tables and post-processing
Would you like help building a script or testing benchmarks with your schema?
Great question! Let's refine the estimate, now explicitly assuming 4 columns. I’ll go over it again with specific attention to row width and performance implications.
Let’s define the row and data characteristics more concretely.
4 columns, each of type INT (4 bytes)
→ Total row size: 16 bytes
1 trillion rows
Inserted using efficient bulk method (e.g., BULK INSERT, SqlBulkCopy, or bcp utility)
No indexes or constraints
Minimal logging (simple or bulk-logged recovery mode)
High-speed NVMe/SSD storage
16-core / 32-thread machine
SQL Server Enterprise/Standard
1,000,000,000,000 rows × 16 bytes = 16,000,000,000,000 bytes = 16 TB of raw data
Add some log overhead + metadata: real insert size might be ~18–20 TB on disk
Use 32 threads, each inserting via SqlBulkCopy or BULK INSERT in parallel
Row rate per thread: ~100,000–150,000 rows/sec
Total row rate: 3.2M–4.8M rows/sec
1T rows ÷ 4M rows/sec = 250,000 seconds ≈ 69 hours ≈ 2.9 days1M–2M rows/sec total throughput
Could be bottlenecked by:
Disk write I/O
Transaction log throughput
Memory pressure / TempDB contention
1T rows ÷ 1.5M rows/sec ≈ 666,667 seconds ≈ 7.7 daysRow-by-row inserts via T-SQL INSERT INTO
Logging, indexing, and checkpoint pressure
Estimated throughput: 10K–100K rows/sec
At 100K rows/sec = 1T ÷ 100K = 10M seconds ≈ 115 daysAt 10K rows/sec = 1T ÷ 10K = 100M seconds ≈ 3.2 years| Scenario | Insert Method | Rows/sec Estimate | Time to Insert 1T Rows |
|---|---|---|---|
| 🚀 Fully optimized | Parallel bulk | 4M | ~2.9 days |
| ⚖️ Moderately optimized | Bulk, some limits | 1.5M | ~7.7 days |
| 🐢 Poorly optimized | Row-by-row insert | 10K–100K | Months to years |
Use SqlBulkCopy, BULK INSERT, or bcp
Partition the target table, insert into each partition in parallel
Minimal logging (Simple recovery mode + TABLOCK)
Batch sizes of 10K–100K rows per insert
Drop all indexes and constraints before loading
Load into staging table, then index or merge as needed
Want me to help you build a script that does this efficiently? Or simulate timing on your hardware profile?
To make SQL Server run as fast as possible when inserting massive volumes of data (like 1 trillion rows), you need to minimize overhead, maximize parallelism, and optimize for throughput over transactional integrity (if acceptable for your use case).
Here’s a practical checklist you can follow to get maximum performance for inserts:
These are 10–100x faster than INSERT INTO statements.
Tips:
Use large batch sizes (e.g. 50,000–100,000 rows per batch)
Use TABLOCK to enable minimal logging
Load from flat file or stream
SQL Server logs every insert unless you tell it not to.
Set recovery model to SIMPLE or BULK_LOGGED:
ALTER DATABASE YourDB SET RECOVERY SIMPLE;Use TABLOCK hint during bulk insert:
BULK INSERT your_tableFROM 'file_path'WITH (TABLOCK, BATCHSIZE = 100000, ROWS_PER_BATCH = 100000, DATAFILETYPE = 'native');Indexes slow down inserts by forcing SQL Server to maintain them as it writes.
Indexes
Foreign keys
Default constraints
Triggers
After inserting, recreate indexes using CREATE INDEX WITH (ONLINE = ON, SORT_IN_TEMPDB = ON)
Clustered indexes are slower to insert into due to page splits and order maintenance.
CREATE TABLE your_table (
col1 INT,
col2 INT,
col3 INT,
col4 INT) WITH (HEAP);Use your 16-core / 32-thread CPU!
Break source data into N chunks and insert in parallel tasks
Use multiple BULK INSERT sessions concurrently
Partition your table (e.g., on a logical ID range)
Example (8 parallel processes):
start bulk_insert_part1.bat start bulk_insert_part2.bat ...
Insert throughput is often I/O-bound.
Use NVMe SSDs or RAID 10 arrays
Place DATA, LOG, and TEMPDB on separate drives
Ensure TEMPDB has multiple data files (1 per 4 cores is a good rule)
-- Check tempdb usageSELECT * FROM sys.dm_io_virtual_file_stats(DB_ID(N'tempdb'), NULL);TEMPDB often becomes a bottleneck.
Pre-size TEMPDB to avoid autogrowth
Use multiple equally sized files (start with 8 for 16 cores)
-- Example: 8 tempdb filesALTER DATABASE tempdb ADD FILE (
NAME = tempdev2,
FILENAME = 'D:\SQLData\tempdb2.ndf',
SIZE = 512MB,
FILEGROWTH = 512MB
);-- Repeat for tempdev3...tempdev8Autogrowth during a large load is a performance killer.
Pre-size your database and log files to hold at least 2x the expected data
Disable or control autogrowth using large increments (e.g., 1GB)
Insert into a staging table (heap, no indexes)
Once done, switch partition into your main table (very fast, metadata-only operation)
Use DMVs to check for bottlenecks:
-- Wait statsSELECT * FROM sys.dm_os_wait_stats ORDER BY wait_time_ms DESC;-- Active queriesSELECT * FROM sys.dm_exec_requests;-- CPU and disk activitySELECT * FROM sys.dm_io_virtual_file_stats(NULL, NULL);Use table partitioning if you're loading data by time, ID ranges, etc.
Consider Columnstore indexes after loading for massive read performance gains.
| Optimization | Estimated Speed Gain |
|---|---|
| Bulk insert (vs row-by-row) | 10–100× |
Minimal logging (TABLOCK) |
2–10× |
| Dropping indexes | 3–10× |
| Parallel inserts (multi-thread) | 2–16× (depending on I/O) |
| SSD/NVMe vs HDD | 3–20× |
If you want, I can generate a parallel bulk insert script tailored to your system and data structure. Just let me know the data format (CSV, binary, etc.) and where it's coming from.
Great question — in SQL Server, you cannot do truly parallel inserts in the same SQL query, because T-SQL is not inherently parallel for DML operations (like INSERT) across rows. However, there are workarounds that achieve parallelism by:
Splitting the data set
Running multiple inserts concurrently
Or leveraging T-SQL tricks with SELECT INTO + UNION ALL
INSERT in Parallel in One QuerySQL Server doesn’t parallelize inserts automatically unless certain conditions are met. Even then:
Inserts into tables with indexes or triggers force single-threaded execution
Parallel plans are read-parallel, not write-parallel
Use multiple clients, threads, or PowerShell jobs to insert data in parallel:
# PowerShell example - 4 parallel inserts 1..4 | ForEach-Object -Parallel { Invoke-Sqlcmd -Query "BULK INSERT your_table FROM 'file_part$_.csv' WITH (...)" -ServerInstance "your_server" }
Or from C#/Python:
Parallel.For(0, 4, i => {
// Each thread runs SqlBulkCopy or INSERT ... SELECT});Break your source into ID or row ranges.
-- Batch insert for IDs 1–250MINSERT INTO your_tableSELECT * FROM source_table WHERE id BETWEEN 1 AND 250000000;-- Do similar inserts in parallel (in app or SQL Agent jobs)INSERT INTO ... SELECT + UNION ALL to Simulate ParallelismSQL Server can sometimes parallelize the SELECT side, not the INSERT. You may get better performance with:
INSERT INTO your_table (col1, col2, col3, col4)SELECT * FROM(
SELECT col1, col2, col3, col4 FROM source WHERE id BETWEEN 1 AND 250000000 UNION ALL SELECT col1, col2, col3, col4 FROM source WHERE id BETWEEN 250000001 AND 500000000 UNION ALL SELECT col1, col2, col3, col4 FROM source WHERE id BETWEEN 500000001 AND 750000000 UNION ALL SELECT col1, col2, col3, col4 FROM source WHERE id BETWEEN 750000001 AND 1000000000) AS all_data;
➡️ SQL Server may parallelize the SELECT side, allowing better use of cores.
If you're inserting massive data, you can:
Create staging tables (1 per partition)
Insert into them concurrently
ALTER TABLE SWITCH them into the final table (very fast metadata operation)
BULK INSERTIf data is in files (CSV, native, etc.), split it into multiple files and run multiple BULK INSERT statements at the same time from your app or a script:
BULK INSERT your_table FROM 'data_part1.csv' WITH (TABLOCK, BATCHSIZE = 100000);
BULK INSERT your_table FROM 'data_part2.csv' WITH (TABLOCK, BATCHSIZE = 100000);-- Run these in parallel processesIf SQL Server is refusing to parallelize your query, you can try:
OPTION (MAXDOP 8) -- Limit or encourage parallelism
But remember, INSERTs themselves aren't parallelized, only the read part might be.
| Method | Parallel? | Where to Parallelize | Speed Gain |
|---|---|---|---|
| App-level threading (e.g. C#/Python) | ✅ | Outside SQL Server | ⭐⭐⭐⭐⭐ |
Multiple INSERT INTO ... SELECT |
✅* | Partially inside SQL | ⭐⭐⭐ |
BULK INSERT from multiple files |
✅ | External script / app | ⭐⭐⭐⭐ |
SELECT ... UNION ALL |
⚠️ Maybe | SELECT only (not insert) | ⭐⭐ |
| Table partitioning + SWITCH | ✅ | Parallel staging tables | ⭐⭐⭐⭐ |
Would you like a ready-made parallel insert script or Python/C# app that loads data into SQL Server efficiently?
Perfect 👍 — let’s build a parallel insert loader for SQL Server.
Below are two complete working options — one in PowerShell (very easy to run on Windows), and one in Python (great for scaling or automation).
Both will use all your CPU cores to insert into SQL Server in parallel, splitting the work into chunks.
Spawns multiple background jobs (one per CPU core or file).
Each job runs a BULK INSERT for a specific data range or file.
Uses SQL Server authentication or Windows Integrated login.
Save this as parallel_insert.ps1 and run it in PowerShell.
# CONFIGURATION $server = "localhost" $database = "MyDatabase" $table = "MyTargetTable" $filePathTemplate = "C:\data\data_part{0}.csv" # Files like data_part1.csv, data_part2.csv, etc. $fileCount = 8 # Number of files / threads $batchSize = 100000 # Function to bulk insert each file function Start-BulkInsertJob($i) { $query = @" BULK INSERT $table FROM '$(($filePathTemplate -f $i))' WITH ( FORMAT = 'CSV', FIRSTROW = 2, TABLOCK, FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', BATCHSIZE = $batchSize ); "@ Write-Host "Starting bulk insert for file $i..." Invoke-Sqlcmd -ServerInstance $server -Database $database -Query $query Write-Host "Completed file $i" } # Run inserts in parallel $jobs = @() for ($i = 1; $i -le $fileCount; $i++) { $jobs += Start-Job -ScriptBlock ${function:Start-BulkInsertJob} -ArgumentList $i } # Wait for all jobs to complete Write-Host "Waiting for all inserts to complete..." $jobs | Wait-Job | Receive-Job Write-Host "✅ All parallel inserts completed."
PowerShell 5+ or PowerShell Core
SQL Server PowerShell module:
Install-Module SqlServer
Your SQL Server must have BULK INSERT permission and TABLOCK enabled on target table.
ThreadPoolExecutor)Uses 8 parallel threads.
Each thread executes a bulk INSERT INTO ... SELECT or BULK INSERT.
Highly tunable for your system.
Save this as parallel_insert.py:
import pyodbcfrom concurrent.futures import ThreadPoolExecutor, as_completed# CONFIGserver = 'localhost'database = 'MyDatabase'table = 'MyTargetTable'file_template = r'C:\data\data_part{}.csv'threads = 8batchsize = 100000conn_str = f"DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server};DATABASE={database};Trusted_Connection=yes;"def bulk_insert(file_index):
conn = pyodbc.connect(conn_str, autocommit=True)
cursor = conn.cursor()
sql = f"""
BULK INSERT {table} FROM '{file_template.format(file_index)}'
WITH (
FORMAT = 'CSV',
FIRSTROW = 2,
TABLOCK,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\\n',
BATCHSIZE = {batchsize} );
"""
print(f"Starting file {file_index}...")
cursor.execute(sql)
print(f"✅ Completed file {file_index}")
conn.close()# Run in parallelwith ThreadPoolExecutor(max_workers=threads) as executor:
futures = [executor.submit(bulk_insert, i) for i in range(1, threads + 1)]
for f in as_completed(futures):
f.result()print("🎉 All parallel inserts completed successfully!")
Python 3.8+
ODBC Driver 17 for SQL Server
Install packages:
pip install pyodbc
Split your source data into N smaller files (one per thread or CPU core).
Each file should contain about the same number of rows.
Use TABLOCK and SIMPLE recovery mode to enable minimal logging.
Place files on separate disks if possible to improve I/O parallelism.
You can adjust $fileCount or threads to match your available cores (e.g., 16–32).
Would you like me to tailor this to your specific data format (e.g., CSV from a generator or an existing table in SQL Server)?
If you tell me:
Source type (table, file, or generated data)
Destination table name and schema
…I can generate the exact version of the script ready for your environment.
© 2026 myBlog™ v1.2 All rights reserved. We count views as reads, so let's not over think it.

