How to use Roslyn C # scripts in batch with multiple scripts?

Question

How to use Roslyn C # scripts in batch with multiple scripts?

I am writing a multi-threaded solution that will be used to transfer data from different sources to a central database. The solution usually has two parts:

Single-threaded import mechanism
A multi-threaded client that invokes the import engine on streams.

I use Roslyn scripts to minimize custom development. This feature is enabled through the Nuget Package Manager in the Import engine project. Each import is defined as transforming an input table — containing a set of input fields — into a destination table — again with a set of target fields.

The scripting engine is used here to allow custom conversion between input and output. For each I / O pair, there is a text box with a custom script. Here is the simplified code used to initialize the script:

//Instance of class passed to script engine
_ScriptHost = new ScriptHost_Import();

if (Script != "") //Here we have script fetched from DB as text
{
  try
  {
    //We are creating script object …
    ScriptObject = CSharpScript.Create<string>(Script, globalsType: typeof(ScriptHost_Import));
    //… and we are compiling it upfront to save time since this might be invoked multiple times.
    ScriptObject.Compile();
    IsScriptCompiled = true;
  }
  catch
  {
    IsScriptCompiled = false;
  }
}

We will later refer to this script by:

async Task<string> RunScript()
{
    return (await ScriptObject.RunAsync(_ScriptHost)).ReturnValue.ToString();
}

So, after initializing the import definition, where we might have some number of I / O pair declarations along with the script object, the memory fader print increases by about 50MB per pair where scripts are defined. A similar usage pattern is used to validate target strings before storing it in the database (each field can have multiple scripts that are used to validate the data).

Overall, a typical memory footprint with a modest conversion / validation scenario is 200MB per thread. If we need to call multiple threads, the memory usage will be very high and 99% will be used for scripting. With the import mechanism enabled in the WCF based middle tier (which I did) we quickly stumbled upon an "Out of Memory" issue.

The obvious solution would be to have a single instance of scripts that will somehow send the code execution to a specific function within the script depending on the need (I / O conversion, validation, or whatever). That is, instead of script text for each field, we will have SCRIPT_ID, which will be passed as a global parameter to the script. Somewhere in the script we need to switch to a specific piece of code that would execute and return the appropriate value.

The advantage of this solution should be significantly better than memory usage. The downside of the fact that the serving script is being removed from the specific point where it is used.

Before implementing this change, I would like to hear opinions on this solution and suggestions for a different approach.

+3

c # scripting roslyn

Vladimir.RL Apr 16 At 1:50

source to share

2 answers

I'm not sure if this existed at the time of creating the question, but there is something very similar and, say, an official way, how to run scripts multiple times without increasing the amount of program memory. You need to use the CreateDelegate method, which will do exactly what is expected.

I'll post it here just for convenience:

var script = CSharpScript.Create<int>("X*Y", globalsType: typeof(Globals));
ScriptRunner<int> runner = script.CreateDelegate();

for (int i = 0; i < 10; i++)
{
  Console.WriteLine(await runner(new Globals { X = i, Y = i }));
}

Some memory is required initially, but keep the runner in some kind of global list and call it quickly.

+1

Vladimir.RL 24 Sep 17 at 3:17 am

source to share

idanp · Accepted Answer · 2017-04-17T18:08:09+0000

As it seems - scripting for a mission can be wasteful overkill - you use many layers of applications and memory fills up.

EDIT

Quick test:

This code:

    static void Main(string[] args)
    {
        Console.WriteLine("Compiling");
        string code = "System.Threading.Thread.SpinWait(100000000);  System.Console.WriteLine(\" Script end\");";
        List<Script<object>> scripts = Enumerable.Range(0, 50).Select(num =>
             CSharpScript.Create(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).ToList();

        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); // for fair-play

        for (int i = 0; i < 10; i++)
            Task.WaitAll(scripts.Select(script => script.RunAsync()).ToArray());
    }

Consumes about 600MB in my environment (just refer to System.Windows.Form ScriptOption

for script sizing). It is reusable Script<object>

- it does not consume more memory on the second call RunAsync

.

But we can do better:

    static void Main(string[] args)
    {
        Console.WriteLine("Compiling");
        string code = "return () => { System.Threading.Thread.SpinWait(100000000);  System.Console.WriteLine(\" Script end\"); };";

        List<Action> scripts = Enumerable.Range(0, 50).Select(async num =>
            await CSharpScript.EvaluateAsync<Action>(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).Select(t => t.Result).ToList();

        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);

        for (int i = 0; i < 10; i++)
            Task.WaitAll(scripts.Select(script => Task.Run(script)).ToArray());
    }

In this script, I'm simplifying a bit the solution I suggested to return an object Action

, but I think the performance impact is small (but in real implementations, I really think you should use your own interface to make it flexible).

When the script is running, you can see a steep rise in memory to ~ 240MB, but after I call the garbage collector (for demo purpose, and I did the same in the previous code), memory usage drops to ~ 30MB. It's also faster.

How to use Roslyn C # scripts in batch with multiple scripts?

EDIT

More articles: