PeopleCode | Using the Split function to separate a string into multiple parts
A. Split – Basic Usage
The ‘Split’ PeopleCode function is used for splitting a string into smaller parts, divided by whatever delimiter character you choose. The default delimiter is a space character.
So for example, consider the following line of code:
Local array of string &words = Split("This is a sentence.");
This would return an array of four elements:
&words [1] : This &words [2] : is &words [3] : a &words [4] : sentence.
B. Split – CSVs
‘Split’ is especially handy when you want to quickly process a CSV file. It provides a simple way to divide a comma-separated line into its individual fields.
Local array of string &words = Split("Field1,Field2," | "This is a sentence,Field4", ",");
This would return an array of four elements:
&words [1] : Field1 &words [2] : Field2 &words [3] : This is a sentence &words [4] : Field4
However, a typical CSV file often contains longer pieces of text that have a comma embedded as part of the text. Consider this example:
Local array of string &words = Split("Field1,Field2," | """This is a sentence, with comma added"",Field4", ",");
This would produce the following result:
&words [1] : Field1 &words [2] : Field2 &words [3] : "This is a sentence &words [4] : with comma added" &words [5] : Field4
This is not correct. The longer text for Field 3 has been split into two parts, when it should have been kept together as one field. This is because the Split has no concept of a CSV file and is simply splitting at whatever point it finds a comma.
To get around this problem, we need to write some PeopleCode in advance of the ‘Split’ that checks for ‘embedded’ commas within a field. If a comma is found, the code then substitutes the comma with a special series of characters (unlikely to be found in the text). After the Split takes place, the special series is then changed back to a comma. In the example below, the comma is substituted for three semi-colons in a row (;;;).
&test_string = "Field1,Field2," | """This is a sentence, with comma added"",Field4"; &in_field = False; /* Loop through each character of the string, one by one */ For &i = 1 To Len(&test_string) /* Get the next character */ &next_char = Substring(&test_string, &i, 1); Evaluate &next_char When = """" /* This is a double quote character */ If Not &in_field Then /* We are currently not in a field, so start a new field */ &in_field = True; Else /* We have previously found a double quote character, so the field has now ended */ &in_field = False; End-If; Break; When = "," /* This is a comma */ If &in_field Then /* We are currently in a field, so replace the comma with the special sequence */ &new_string = &new_string | ";;;"; Else /* We are not in a field, so include the comma as normal */ &new_string = &new_string | ","; End-If; Break; When-Other /* All other characters - use as normal */ &new_string = &new_string | &next_char; Break; End-Evaluate; End-For; /* Perform a split on the new string */ Local array of string &words = Split(&new_string, ","); /* Replace all instances of the special sequence with the comma */ For &i = 1 To &words.Len &words [&i] = Substitute(&words [&i], ";;;", ","); End-For;
We can confirm that the correct result is produced:
&words [1] : Field1 &words [2] : Field2 &words [3] : This is a sentence, with comma added &words [4] : Field4
See Also:
Tip 008: Create Simple CSV Export
Tip 036: Introduction to File Layouts