Bogus (Faker) runs on every request

856 Views Asked by At

I've been using C# Bogus (aka Faker) to generate data with EF Core, and it's been absolutely invaluable for development.

I have this set up in my context class, in the OnModelCreating method. This seems necessary to get the data into the database, since it uses the entity HasData method:

long accountId = 1;
var accountData = new Faker<AccountModel>()
    .RuleFor(m => m.Id, f => accountId++)
    .RuleFor(m => m.Name, f => f.Company.CompanyName())
    .Generate(6);
builder.Entity<AccountModel>().HasData(accountData);

However, the entire data generation script runs on each request made to the API.

Where can I put the Bogus scripts that will allow it to seed the database with dotnet ef migrations add init, but without running on every request?

Full reproduceable examples

If you're not familiar, here are some examples of the setup, including Bogus' own repo,

  1. https://github.com/bchavez/Bogus/blob/master/Examples/EFCoreSeedDb/Program.cs
  2. https://khalidabuhakmeh.com/seed-entity-framework-core-with-bogus
  3. https://medium.com/@ashishnimrot/seeding-databases-conditionally-with-faker-in-net-core-d2ff5c11fc71

Note that some examples show limiting this with a conditional for ASPNETCORE_ENVIRONMENT, which is handy if you're in production, but still makes this rough with a local setup spinning up large, complex data sets.


Update: I've been living with this every time I use Bogus in a project, as it gets removed after the early dev/test stages anyhow. However, it's annoying in larger projects to be creating possibly tens of thousands of rows of data in memory. It's become noticeable in response times with our current project.

There must be some sort of conditional to wrap this in, or alternative place for it to exist that doesn't impact each and every network request. Our current solution is "comment it out between database migrations"


As Guru Stron has pointed out, OnModelCreating shouldn't be running on every request, which makes this problem a symptom and not the disease.

The only references to the base context are in the Program.cs file:

builder.Services.AddDbContext<DbContext>(
    options =>
    {
        options.UseSqlServer(
                builder.Configuration.GetConnectionString("DbConnectionString"),
                providerOptions =>
                {
                    providerOptions.CommandTimeout((int)TimeSpan.FromMinutes(10).TotalSeconds);
                    providerOptions.EnableRetryOnFailure();
                    providerOptions.UseNetTopologySuite();
                }
            );
        options.EnableSensitiveDataLogging();
    }); 

And in our services, like so:

public AccountService(
        DbContext context, 
       ...
) { 
    _context = context; 
}

So one or both of these must be implemented incorrectly.

3

There are 3 best solutions below

0
Guru Stron On BEST ANSWER

Not sure what is meant by "each request" and I was not able to reproduce it. Though there is at least one problem with setup you have shown in the example - it does not use fixed seeds so every time you will add migration you will have new data generated potentially leading to a lot of "garbage" in those migrations. I recommend to use the fixed seed (if you are not using it already). Either by setting up a global one:

Randomizer.Seed = new Random(42);

Or on per instance base:

var accountData = new Faker<AccountModel>()
    .UseSeed(42)
    .RuleFor(m => m.Id, f => accountId++)
    .RuleFor(m => m.Name, f => f.Company.CompanyName())
    .Generate(6);

As for the issue. Again - I was not able to reproduce it (OnModelCreating is called only a single time per app instance lifetime), but if you have some code you want to be invoked only during migrations you can leverage the EF.IsDesignTime property (available since 7th version):

if (EF.IsDesignTime)
{
    long accountId = 1;
    var accountData = new Faker<AccountModel>()
        .UseSeed(42)
        .RuleFor(m => m.Id, f => accountId++)
        .RuleFor(m => m.Name, f => f.Company.CompanyName())
        .Generate(6);

    builder.Entity<AccountModel>().HasData(accountData);
}
3
Sarvesh Patil On

Rather than seeding on ModelBuilderClass u can opt in for flexible way like below in which we seed the database single time on app startup and that too by checking if relevant table is having values or not, so this is one time activity which will happen on app startup not for each api request.

Following is ApplicationDbContextInitialiser class containing database migrate and seed methods

public class ApplicationDbContextInitialiser
{
    private readonly ApplicationDbContext _context;
    private readonly ILogger<ApplicationDbContextInitialiser> _logger;
    public ApplicationDbContextInitialiser(ApplicationDbContext context, ILogger<ApplicationDbContextInitialiser> logger)
    {
        _context = context;
        _logger = logger;
    }

    public async Task InitialiseAsync()
    {
        try
        {
            await _context.Database.MigrateAsync();
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "A error occured while applying migration");
            throw;
        }
    }

    public async Task SeedAsync()
    {
        try
        {
            await TrySeedAsync();
        }
        catch (Exception ex)
        {

            _logger.LogError(ex, "A error occured while seeding the database");
            throw;
        }
    }

    public async Task TrySeedAsync()
    {
// Here we are checking if Employee table contains data or not , if not then we are seeding with faker
        if (!_context.Employees.Any())
        {
            var listOfFakeEmployees = new Faker<Employee>()
                .Ignore(e=>e.Id)
                .RuleFor(e => e.Name, f => f.Person.FullName)            
                .RuleFor(e => e.Designation, f => f.Name.JobTitle())
                .RuleFor(e => e.Department, f => f.Commerce.Department())
                .Generate(100);
            
            _context.Employees.AddRange(listOfFakeEmployees);
            await _context.SaveChangesAsync();
        }
    }
}

U can write extension method over WebApplicationBuilder like below and call the seeding methods from above class as

public static class InitialiserExtensions
 {
     public static async Task InitialiseDatabaseAsync(this WebApplication app)
     {
         using var scope = app.Services.CreateScope();

         var initialiser = scope.ServiceProvider.GetRequiredService<ApplicationDbContextInitialiser>();

         await initialiser.InitialiseAsync();

         await initialiser.SeedAsync();
     }

 }

Then Create a DI Class containing extension method over IServiceCollection like below

public static class DependancyInjection
{
    public static IServiceCollection AddInfrastructureServices(this IServiceCollection services,IConfiguration configuration)
    {
        var connectionString = configuration.GetConnectionString("sqlConnection");
        services.AddDbContext<ApplicationDbContext>((serviceProvider, options) =>
        {
            options.UseSqlServer(connectionString);
        });

        services.AddScoped<IApplicationDbContext>(provider => provider.GetRequiredService<ApplicationDbContext>());

        services.AddScoped<ApplicationDbContextInitialiser>();
        services.AddSingleton(TimeProvider.System);
        return services;
    }

}

then u can call on builder object in program.cs u can call like below

 builder.Services.AddInfrastructureServices(builder.Configuration);

and just after the WebApplication object is being instantiated u can check for dev environment and call the seed method like below

var app = builder.Build();

// Configure the HTTP request pipeline.
if (app.Environment.IsDevelopment())
{
    app.UseSwagger();
    app.UseSwaggerUI();
    await app.InitialiseDatabaseAsync();
}

Note: Here your main method in Progam.cs class should be async like this

public static async Task Main(string[] args)

I utilized JsonTaylorDev's clean architecture template for CQRS Web API implementation as the foundation for this solution. While I've made modifications to suit the specific requirements of this question, credit goes to the original template for providing a solid starting point.

0
Fahmi Noor Fiqri On

I agree with https://stackoverflow.com/users/2501279/guru-stron to use a static random seed to avoid Bogus from making different dataset for each database migration run, this is the most plausible cause of why the dataset is being regenerated multiple times. https://stackoverflow.com/users/17718198/sarvesh-patil also pointed out a better data seeding method and it is also stated by Microsoft in the EF Core documentation that using HasData has many limitations.

From Microsoft (EF Core 8.0 Documentation),

  • Migrations only considers model changes when determining what operation should be performed to get the seed data into the desired state. Thus any changes to the data performed outside of migrations might be lost or cause an error.
  • The primary key value needs to be specified even if it's usually generated by the database. It will be used to detect data changes between migrations.
  • Previously seeded data will be removed if the primary key is changed in any way.

It is also possible if you have a CI pipeline you might added the database migration command and it run multiple times and giving you a false sense of the data being seeded multiple times (because OnModelCreating is only called once per app lifetime). Again, without a minimal reproducible example from your codebase I can't tell what's happening.

References: