How to use Roslyn C # scripts in batch with multiple scripts?

I am writing a multi-threaded solution that will be used to transfer data from different sources to a central database. The solution usually has two parts:

  • Single-threaded import mechanism
  • A multi-threaded client that invokes the import engine on streams.

I use Roslyn scripts to minimize custom development. This feature is enabled through the Nuget Package Manager in the Import engine project. Each import is defined as transforming an input table — containing a set of input fields — into a destination table — again with a set of target fields.

The scripting engine is used here to allow custom conversion between input and output. For each I / O pair, there is a text box with a custom script. Here is the simplified code used to initialize the script:

//Instance of class passed to script engine
_ScriptHost = new ScriptHost_Import();

if (Script != "") //Here we have script fetched from DB as text
{
  try
  {
    //We are creating script object …
    ScriptObject = CSharpScript.Create<string>(Script, globalsType: typeof(ScriptHost_Import));
    //and we are compiling it upfront to save time since this might be invoked multiple times.
    ScriptObject.Compile();
    IsScriptCompiled = true;
  }
  catch
  {
    IsScriptCompiled = false;
  }
}

      

We will later refer to this script by:

async Task<string> RunScript()
{
    return (await ScriptObject.RunAsync(_ScriptHost)).ReturnValue.ToString();
}

      

So, after initializing the import definition, where we might have some number of I / O pair declarations along with the script object, the memory fader print increases by about 50MB per pair where scripts are defined. A similar usage pattern is used to validate target strings before storing it in the database (each field can have multiple scripts that are used to validate the data).

Overall, a typical memory footprint with a modest conversion / validation scenario is 200MB per thread. If we need to call multiple threads, the memory usage will be very high and 99% will be used for scripting. With the import mechanism enabled in the WCF based middle tier (which I did) we quickly stumbled upon an "Out of Memory" issue.

The obvious solution would be to have a single instance of scripts that will somehow send the code execution to a specific function within the script depending on the need (I / O conversion, validation, or whatever). That is, instead of script text for each field, we will have SCRIPT_ID, which will be passed as a global parameter to the script. Somewhere in the script we need to switch to a specific piece of code that would execute and return the appropriate value.

The advantage of this solution should be significantly better than memory usage. The downside of the fact that the serving script is being removed from the specific point where it is used.

Before implementing this change, I would like to hear opinions on this solution and suggestions for a different approach.

+3


source to share


2 answers


As it seems - scripting for a mission can be wasteful overkill - you use many layers of applications and memory fills up.

Other solutions:

  • How do you interact with the database? you can manipulate the request itself as per your needs, instead of writing entire script to do it.
  • How to use generics? with enough T to suit your needs:

    public class ImportEngine<T1,T2,T3,T3,T5>

  • Using Tuples (which is very similar to using generics)

But if you still think scripts are the right tool for you, I found that the use of in-memory scripts can be reduced by running the script inside your application (not with RunAsync), you can do this by going back from RunAsync to logic and reuse it instead of doing work inside a heavy and wasteful memory RunAsync

. Here's an example:

Instead of just (line script):

DoSomeWork();

      

You can do this (IHaveWork is an interface defined in your application, with just one method Work

):

public class ScriptWork : IHaveWork
{
    Work()
    {
        DoSomeWork();
    }
}
return new ScriptWork();

      

This way, you only call the heavy RunAsync for a short period of time, and it returns a worker that you can reuse inside your application (and you can of course extend it by adding parameters to the Work method and inheriting logic from your application, etc. .).

The sample also breaks the isolation between your application and the script, so you can easily provide and retrieve data from the script.



EDIT

Quick test:

This code:

    static void Main(string[] args)
    {
        Console.WriteLine("Compiling");
        string code = "System.Threading.Thread.SpinWait(100000000);  System.Console.WriteLine(\" Script end\");";
        List<Script<object>> scripts = Enumerable.Range(0, 50).Select(num =>
             CSharpScript.Create(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).ToList();

        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); // for fair-play

        for (int i = 0; i < 10; i++)
            Task.WaitAll(scripts.Select(script => script.RunAsync()).ToArray());
    }

      

Consumes about 600MB in my environment (just refer to System.Windows.Form ScriptOption

for script sizing). It is reusable Script<object>

- it does not consume more memory on the second call RunAsync

.

But we can do better:

    static void Main(string[] args)
    {
        Console.WriteLine("Compiling");
        string code = "return () => { System.Threading.Thread.SpinWait(100000000);  System.Console.WriteLine(\" Script end\"); };";

        List<Action> scripts = Enumerable.Range(0, 50).Select(async num =>
            await CSharpScript.EvaluateAsync<Action>(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).Select(t => t.Result).ToList();

        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);

        for (int i = 0; i < 10; i++)
            Task.WaitAll(scripts.Select(script => Task.Run(script)).ToArray());
    }

      

In this script, I'm simplifying a bit the solution I suggested to return an object Action

, but I think the performance impact is small (but in real implementations, I really think you should use your own interface to make it flexible).

When the script is running, you can see a steep rise in memory to ~ 240MB, but after I call the garbage collector (for demo purpose, and I did the same in the previous code), memory usage drops to ~ 30MB. It's also faster.

+1


source


I'm not sure if this existed at the time of creating the question, but there is something very similar and, say, an official way, how to run scripts multiple times without increasing the amount of program memory. You need to use the CreateDelegate method, which will do exactly what is expected.

I'll post it here just for convenience:



var script = CSharpScript.Create<int>("X*Y", globalsType: typeof(Globals));
ScriptRunner<int> runner = script.CreateDelegate();

for (int i = 0; i < 10; i++)
{
  Console.WriteLine(await runner(new Globals { X = i, Y = i }));
}

      

Some memory is required initially, but keep the runner in some kind of global list and call it quickly.

+1


source







All Articles