Parser for signed overflow values?

I am working with some old data and am faced with a lot of data from an external source that reports financial numbers with signed overflow . I've seen a lot, but it's up to my time. Before I start building a function to analyze these strangers, I wanted to check if there is a standard way to eliminate them.

I guess my question is, does the .Net framework provide a standard facility for converting signed overflow strings? If not .NET, are there any third party tools I can use so I'm not reinventing the wheel?

+3


source to share


7 replies


Perforated Numeric ( Zoned-Decimal in Cobol) comes from old punched cards where they crossed out the sign of the last digit in the number. The format is commonly used in Cobol.

Since there are Ascii and Ebcdic Cobol compilers , there are Ascii and EBCDIC versions . Zone-digital. To make it even more complex, the values ​​-0 and +0 ( {} for US-Ebcdic ( IBM037 ) are different, eg German-Ebcdic ( IBM273 where they are äü ), and different in other Ebcdic versions).

To work successfully, you need to know:

  • Whether the data was received in the Ebcdic or Ascii system.
  • if Ebcdic - which language is US, German, etc.

If the data is in the original character set, you can calculate the sign

For EBCDIC numeric hex codes:

Digit          0     1     2   ..    9

unsigned:   x'F0' x'F1' x'F2'  .. x'F9'     012 .. 9 
Negative:   x'D0' x'D1' x'D2'  .. x'D9'     }JK .. R
Positive:   x'C0' x'C1' x'C2'  .. x'C9'     {AB .. I

      

For US-Ebcdic Zoned, this is the Java code to convert the string:

int positiveDiff = 'A' - '1';
int negativeDiff = 'J' - '1';

lastChar = ret.substring(ret.length() - 1).toUpperCase().charAt(0);

    switch (lastChar) {
        case '}' : sign = "-";
        case '{' :
            lastChar = '0';
        break;
        case 'A':
        case 'B':
        case 'C':
        case 'D':
        case 'E':
        case 'F':
        case 'G':
        case 'H':
        case 'I':
            lastChar = (char) (lastChar - positiveDiff);
        break;
        case 'J':
        case 'K':
        case 'L':
        case 'M':
        case 'N':
        case 'O':
        case 'P':
        case 'Q':
        case 'R':
            sign = "-";
            lastChar = (char) (lastChar - negativeDiff);
        default:
    }
    ret = sign + ret.substring(0, ret.length() - 1) + lastChar;

      

For German EBCDIC {} it becomes äü, for another EBCDIC language you need to find the corresponding coded page.

For Ascii Zoned it is java code



    int positiveFjDiff = '@' - '0';
    int negativeFjDiff = 'P' - '0';

    lastChar = ret.substring(ret.length() - 1).toUpperCase().charAt(0);

    switch (lastChar) {
        case '@':
        case 'A':
        case 'B':
        case 'C':
        case 'D':
        case 'E':
        case 'F':
        case 'G':
        case 'H':
        case 'I':
            lastChar = (char) (lastChar - positiveFjDiff);
        break;
        case 'P':
        case 'Q':
        case 'R':
        case 'S':
        case 'T':
        case 'U':
        case 'V':
        case 'W':
        case 'X':
        case 'Y':
            sign = "-";
            lastChar = (char) (lastChar - negativeFjDiff);
        default:
    }
    ret = sign + ret.substring(0, ret.length() - 1) + lastChar;

      

Finally, if you are working in EBCDIC, you can calculate it as

sign = '+'
if (last_digit & x'F0' == x'D0') {
   sign = '-' 
} 
last_digit = last_digit | x'F0'

      


One of the latest problems is that decimal points are not stored in Zoned, Decimal. You need to look at the Kobol notebook.

eg.

 if the cobol Copybook is

    03 fld                 pic s99999.

 123 is stored as     0012C (EBCDIC source)

 but if the copybook is (v stands for assumed decimal point) 

   03 fld                  pic s999v99.

 then 123 is stored as 1230{  

      


The best way is to make a transfer to Cobol !!! or using Cobol translation packages.

There are several commercial packages available for Cobol Data processing and these are generally expensive. There are some Java - some open source packages that can handle Mainframe Cobol data.

+6


source


Presumably the specification of the file or your program tells you how to deal with this? Not?

As Bruce Martin said, the real Overpunch goes back to the days of punch cards. You typed in the last digit of a number, then re-punched (dragged) the same position on the map.

The Wiki link you included in your question is fine for this. But I'm pretty sure the source of your data is not punch cards.

While part of this answer assumes that you are using a mainframe, the suggested solution is machine independent.

Is the mainframe source of your data? We do not know, although this is important information. For now, let's say that is the case.

If it is very old data that is unchanged, it is processed on the Mainframe for the last 20 years. If the compiler you are using (assuming it comes from a COBOL program) is very, very old, you need to know the compiler option setting NUMPROC

. That's why: http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/igy3pg50/2.4.36?DT=20090820210412

Default: NUMPROC (NOPFD)

Abbreviations: None

The compiler accepts any valid sign configuration: X'A ', X'B', X'C ', X'D', X'E ', or X'F'. NUMPROC (NOPFD) is the recommended option in most cases.

NUMPROC (PFD) improves the performance of handling numeric internal decimal and zoned decimal data. Use this parameter only if your program data conforms exactly to the following IBM system standards:

Zoned decimal, unsigned: The 4-digit high-order signed byte contains X'F ".

Closed Decimal Signed Overrun: The 4-bit high order signed byte bits contain X'C 'if the number is positive or 0, and X'D' if not.

Zoned decimal point, single character: The single character contains the character '+' if the number is positive or 0, and '-' if it is not.

Internal decimal point, unsigned: The lower 4 digits of the least significant byte contain X'F '.

Internal signed decimal point: The lower 4 digits of the least significant byte contain X'C 'if the number is positive or 0, and X'D' if it is not.

The data obtained from COBOL arithmetic is in accordance with the above IBM system standards. However, using REDEFINES and grouping, change the data to no longer match. If you are using NUMPROC (PFD), use the INITIALIZE statement to initialize the data fields rather than using bulk moves.

Using NUMPROC (PFD) may affect class tests for numeric data. You should use NUMPROC (NOPFD) or NUMPROC (MIG) if the COBOL program calls programs written in PL / I or FORTRAN.

Character representation is affected not only by the NUMPROC option, but also by the NUMCLS set-time option.

Use NUMPROC (MIG) to assist in porting OS / VS COBOL programs to COBOL Enterprise. When NUMPROC (MIG) is in effect, the following processing occurs:

Preferred signs are created only on the output of MOVE statements and arithmetic operations.

No explicit sign repair is done on input.

Some implicit sign repair might occur during conversion.

Numeric comparisons are performed by a decimal comparison, not a logical comparison.

      

What does this mean for you? If NUMPROC (NOPFD) is used, you can see X'A 'through X'F' in high nybble order of the final byte of the field. If NUMPROC (PFD) is used, you shouldn't see anything else that X'C 'or X'D' is at that position.

Please note that if the file you receive was generated by the installed MFT Mainframe product you will have the same potential problem.

may and shouldn't be good things to see in the spec.

Is your data remotely critical to a business in a financial environment? Then you almost certainly have audit and compliance issues. It works like this:

Auditor, "What do you do with the data when you receive it?"
You, "The first thing I do is change it"
Auditor, "Really? How do you verify the data once you have changed it?"
You, "Errr..."

      

You might be lucky and he doesn't have an auditor.



All these non-deterministic words are not very good for programming.

So how do you get around this?

There should be no fields for the data you receive that have embedded characters. There must be no numeric fields that are not represented as character data (no binary, packed or floating point). If the field is signed, the mark must be presented separately. If the field has decimal places, you must specify the actual .

or ,

(country-specific) or alternatively a separate field with a scaling factor.

Is it difficult for the people on your mainframe? Not remotely. Insist on this. If they don't, document it so that the problems that arise are not yours, but yours.

If all the numeric data presented to you is simple character data (plus, minus, comma, numbers 0 through 9), then you will have absolutely no problem understanding the data, be it any variant of EBCDIC or any variant of ASCII.

Note that any decimal fields coming from COBOL are exact decimal amounts. Do not store / use them in all other fields other than fields in your language that can handle exact decimal amounts.

You do not provide any sample data. So here's an example:

123456{

      

It should be presented to you as:

+1234560

      

If it has two decimal places:

+12345.60
or
+12345602 (where the trailing 2 is a scaling-factor, which you validate)

      

If numeric data is to be transferred from external systems, this must always be done in character format. This will make things much easier to code, understand, maintain, and audit.

+4


source


A zoned decimal value is easy and does not require char manipulation.

private int ConvertOverpunch(byte[] number)
{
    // Works for EBCDIC or ASCII, all charsets
    int rtnVal = 0;
    for(int i = 0; i<number.length; i++)
    {
       int digit = 0x0f & number[i];
       rtnVal = (rtnVal * 10) + digit;
    }

    // Extract sign
    // This works in EBCDIC
    // Need to find out what your sign is in ASCII
    if(0xD0 & number[number.length-1])
    {
       rtnVal *= -1;
    }   

    return rtnVal;
}

      

+2


source


There are two other approaches, so you have more alternatives to choose from:

public static int Overpunch2Int_v1(string number)
{
    number = number.ToLower();
    char last = number.Last();
    number = number.Substring(0, number.Length - 1);
    if (last == '}' || (last >= 'j' && last <= 'r'))
    {
        number = "-" + number;
        if (last == '}')
            number += "0";
        else
            number += (char)(last - 'j' + '1');
    }
    else if (last == '{' || (last >= 'a' && last <= 'i'))
    {
        if (last == '{')
            number += "0";
        else
            number += (char)(last - 'a' + '1');
    }

    return Int32.Parse(number);
}

public static int Overpunch2Int_v2(string number)
{
    number = number.ToLower();
    char last = number.Last();
    number = number.Substring(0, number.Length - 1);

    if (last >= '{')
        number = (last == '}'? "-" : "") + number + "0";
    else if (last >= 'a' && last <= 'r')
    {
        bool isNegative = last >= 'j';
        char baseChar = isNegative ? 'j' : 'a';
        number = (isNegative ? "-" : "") + number + (char)(last - baseChar + '1');
    }

    return Int32.Parse(number);
}

      

Note that both methods do not check the string and expect a valid number.

+1


source


If you don't already have another option using an extension method, you can do it better by using some ideas in other posts.

/// <summary>
/// Extension method to get overpunch value
/// </summary>
/// <param name="number">the text to convert</param>
/// <returns>int</returns>
public static int OverpunchValue(this String number)
{
    int returnValue;

    var ovpValue = OverPunchValues.Instance.OverPunchValueCollection.First(o => o.OverpunchCharacter ==
        Convert.ToChar(number.Substring(number.Length - 1)));

    returnValue = Convert.ToInt32(number.Substring(0, number.Length - 1) + ovpValue.NumericalValue.ToString());

    return ovpValue.IsNegative ? returnValue * -1 : returnValue;
}

/*singleton to store values */
public class OverPunchValues
{
    public List<OverPunchValue> OverPunchValueCollection { get; set; }

    private OverPunchValues()
    {
        OverPunchValueCollection = new List<OverPunchValue>();
        OverPunchValueCollection.Add(new OverPunchValue { OverpunchCharacter = '{', IsNegative = true, NumericalValue = 0 });
        OverPunchValueCollection.Add(new OverPunchValue { OverpunchCharacter = 'J', IsNegative = true, NumericalValue = 1 });
        //add the rest of the values here...
    }

    static readonly OverPunchValues _instance = new OverPunchValues();

    public static OverPunchValues Instance
    {
        get { return _instance; }
    }
}

public class OverPunchValue
{
    public char OverpunchCharacter { get; set; }
    public bool IsNegative { get; set; }
    public int NumericalValue { get; set; }

    public OverPunchValue()
    {

    }            
}

      

And then you can call it like this:

string str = "00345{";

int temp = str.OverpunchValue();

      

+1


source


private int ConvertOverpunch(string number)
    {
        number = number.ToLower();
        Regex r = new Regex("}|j|k|l|m|n|o|p|q|r");
        if(r.IsMatch(number))
        {
            number = "-" + number;
        }
        number = number.Replace('}', '0');
        number = number.Replace('j', '1');
        number = number.Replace('k', '2');
        number = number.Replace('l', '3');
        number = number.Replace('m', '4');
        number = number.Replace('n', '5');
        number = number.Replace('o', '6');
        number = number.Replace('p', '7');
        number = number.Replace('q', '8');
        number = number.Replace('r', '9');

        number = number.Replace('{', '0');
        number = number.Replace('a', '1');
        number = number.Replace('b', '2');
        number = number.Replace('c', '3');
        number = number.Replace('d', '4');
        number = number.Replace('e', '5');
        number = number.Replace('f', '6');
        number = number.Replace('g', '7');
        number = number.Replace('h', '8');
        number = number.Replace('i', '9');

        try
        {
            int intNumber = Convert.ToInt32(number);
            return intNumber;
        }
        catch 
        {
            return 0;
        }
    }

      

This is done from the top of the head, no tests have been done.

0


source


I just wanted to listen here as I wrote a class to handle them. I wrote it before I knew the name "Signed Overpunch", so I called it "packed sign". The advantage of my approach is that it is actually a Java NumberFormatter, so it's easy to use with any framework that uses java.lang.Number or java.text.NumberFormat Anyone with more experience with these signed oversold numbers is not feel free to open a pull request to make my implementation more compatible with different encodings / variants etc. GitHub Repo

0


source







All Articles