SQL Fundamentals – INSERT

Creating Data with INSERT

Up to this point, you've been querying existing data – selecting, filtering, joining, and aggregating. But in test automation, you often need to create data: setting up test users, populating product catalogs, generating orders for specific scenarios.

The INSERT statement adds new rows to a table. At its simplest, you specify the table name, the columns you're providing values for, and the actual values:

INSERT INTO Users (Email, FirstName, LastName, CreatedDate, IsActive)
VALUES ('[email protected]', 'Test', 'User', CURRENT_TIMESTAMP, 1);

This creates one new user with the specified values. The database generates the UserId automatically (assuming it's an auto-increment primary key), and the row is immediately available for querying.

The basic syntax follows this pattern:

INSERT INTO table_name (column1, column2, column3)
VALUES (value1, value2, value3);

Explicitly listing column names (rather than relying on column order) makes your code self-documenting and resilient to schema changes.

Column Order and Defaults

You don't need to provide values for every column in the table – only those that don't have defaults and aren't auto-generated. Our Users table might have these column definitions:

  • UserId – auto-increment primary key (database generates)
  • Email – required (no default)
  • FirstName – required (no default)
  • LastName – required (no default)
  • CreatedDate – defaults to CURRENT_TIMESTAMP
  • IsActive – defaults to 1
  • DeactivatedDate – nullable (defaults to NULL)

This means you can omit columns that have defaults or allow NULL:

INSERT INTO Users (Email, FirstName, LastName)
VALUES ('[email protected]', 'Jane', 'Doe');

The database fills in the missing values:

  • UserId gets the next available ID
  • CreatedDate uses CURRENT_TIMESTAMP
  • IsActive defaults to 1
  • DeactivatedDate remains NULL

You can also explicitly insert NULL into nullable columns:

INSERT INTO Orders (UserId, OrderDate, TotalAmount, Status, ShipmentDate)
VALUES (123, CURRENT_TIMESTAMP, 100.00, 'Pending', NULL);

Specifying NULL explicitly communicates intent – this order has no shipment date yet (as opposed to accidentally forgetting to provide one).

Best Practice: Always List Column Names

Some developers write INSERT INTO Users VALUES (...) without listing columns, relying on the table's column order. This is fragile – if someone adds a column to the table or reorders columns, your inserts break. Always explicitly list the columns you're inserting into for clarity and maintainability.

When creating test data, be deliberate about which values you specify and which you let default – it makes your test setup code clearer.

Bulk Inserts: Multiple Rows at Once

Test setup often requires creating multiple related records. You could execute separate INSERT statements for each row, but inserting multiple rows in a single statement is more efficient:

INSERT INTO Products (Name, Price, CategoryId, StockQuantity, IsActive)
VALUES
    ('Test Product 1', 10.99, 1, 100, 1),
    ('Test Product 2', 25.50, 2, 50, 1),
    ('Test Product 3', 99.99, 1, 25, 1),
    ('Test Product 4', 5.99, 3, 200, 1);

This single statement inserts four products. It's faster than four separate inserts because:

  • The database parses the SQL once instead of four times
  • Network round-trips are reduced (one request instead of four)
  • Transaction overhead is minimized (one commit instead of four)

For test data setup, bulk inserts dramatically improve execution speed. Creating 100 test products with 100 separate statements might take several seconds; doing it in one bulk insert completes in milliseconds.

Most databases support inserting hundreds or even thousands of rows in a single statement, though exact limits vary. For truly massive data generation (100,000+ rows), you might need to batch your inserts – say, 1,000 rows per statement.

When designing test fixtures or seed data scripts, default to bulk inserts unless you specifically need to capture the generated ID after each individual insert.

INSERT with SELECT: Copying Data

Sometimes you need to populate a table based on data from another table or query result. The INSERT INTO ... SELECT pattern accomplishes this:

INSERT INTO TestUsers (Email, FirstName, LastName, CreatedDate, IsActive)
SELECT Email, FirstName, LastName, CreatedDate, IsActive
FROM Users
WHERE CreatedDate >= '2024-01-01'
  AND IsActive = 1
LIMIT 100;

This copies 100 active users created in 2024 from the Users table into a TestUsers table. The SELECT query runs first, producing a result set, then those rows are inserted into the target table.

This technique is invaluable for:

  • Creating test environments: Copy a subset of production data to a test database
  • Archiving: Move old records to an archive table
  • Data transformation: Insert calculated or transformed data into a summary table

You can combine it with joins and complex queries:

INSERT INTO HighValueOrders (OrderId, UserId, UserEmail, TotalAmount)
SELECT o.OrderId,
       o.UserId,
       u.Email,
       o.TotalAmount
FROM Orders o
INNER JOIN Users u ON o.UserId = u.UserId
WHERE o.TotalAmount > 500
  AND o.Status = 'Completed';

This creates a denormalized reporting table containing high-value orders with user email addresses already joined – optimizing for read performance at the cost of some data duplication.

The number of columns in your SELECT must match the number of columns in your INSERT, and their data types must be compatible.

Getting Generated IDs After Insert

After inserting a row with an auto-increment primary key, you often need to know what ID was generated – especially when inserting related records. For example, after creating a user, you need their UserId to create orders for them.

Unfortunately, different database systems use different syntax for retrieving generated IDs. Here are the three most common approaches:

SQL Server: SCOPE_IDENTITY()

INSERT INTO Users (Email, FirstName, LastName, CreatedDate, IsActive)
VALUES ('[email protected]', 'New', 'User', CURRENT_TIMESTAMP, 1);

SELECT SCOPE_IDENTITY() AS NewUserId;

SCOPE_IDENTITY() returns the last identity value generated in the current scope (your session). It's safer than @@IDENTITY because it won't accidentally return an ID from a trigger or nested operation.

PostgreSQL: RETURNING Clause

INSERT INTO Users (Email, FirstName, LastName, CreatedDate, IsActive)
VALUES ('[email protected]', 'New', 'User', CURRENT_TIMESTAMP, 1)
RETURNING UserId;

PostgreSQL's RETURNING clause is elegant – it returns specified columns from the inserted row directly. You can even return multiple columns: RETURNING UserId, Email, CreatedDate.

MySQL: LAST_INSERT_ID()

INSERT INTO Users (Email, FirstName, LastName, CreatedDate, IsActive)
VALUES ('[email protected]', 'New', 'User', CURRENT_TIMESTAMP(), 1);

SELECT LAST_INSERT_ID() AS NewUserId;

MySQL's LAST_INSERT_ID() function returns the first auto-generated ID from the most recent INSERT. If you inserted multiple rows, it returns the first ID generated, and you can calculate subsequent IDs by adding 1, 2, 3, etc.

Database Abstraction in C# Code

In your C# test automation code, consider using an ORM like Dapper or Entity Framework that abstracts away these database-specific differences. For example, Dapper's QuerySingle<int> after an insert handles retrieving the generated ID regardless of which database you're using.

If writing raw SQL, consider creating helper methods that use the appropriate syntax based on your database type.

When inserting test data for related tables, capture the generated ID and use it immediately for dependent inserts.

Handling Constraint Violations

Several types of constraints can cause an INSERT to fail. Understanding these helps you write defensive test setup code.

UNIQUE Constraint Violations

If a column has a UNIQUE constraint (like Email in the Users table), attempting to insert a duplicate value throws an error:

-- This will fail if the email already exists
INSERT INTO Users (Email, FirstName, LastName, CreatedDate, IsActive)
VALUES ('[email protected]', 'Duplicate', 'User', CURRENT_TIMESTAMP, 1);

In test automation, you typically want to handle this gracefully. Options include:

  • Check if the record exists before inserting
  • Generate unique test data (append timestamps or GUIDs to email addresses)
  • Catch the exception in your C# code and handle it appropriately
  • Use database-specific "upsert" syntax (insert if not exists, update if exists)

FOREIGN KEY Constraint Violations

If you try to insert a row that references a non-existent parent record, the database rejects it:

-- This will fail if UserId 999999 doesn't exist
INSERT INTO Orders (UserId, OrderDate, TotalAmount, Status)
VALUES (999999, CURRENT_TIMESTAMP, 100.00, 'Pending');

Always create parent records before child records. In test setup, this means inserting users before orders, categories before products, and so on.

NOT NULL Constraint Violations

If you omit a required column (one without a default and marked NOT NULL), the insert fails:

-- This will fail if Email is required
INSERT INTO Users (FirstName, LastName)
VALUES ('Incomplete', 'User');

The error message typically tells you which column is missing, making these easy to diagnose and fix.