• Skip to main content
  • Skip to primary sidebar
  • Home
  • Connect
  • Contact
  • About Us

BareFoot PeopleSoft

Oracle PeopleSoft Tips and Tricks

Tip 056: Data Scrambling

May 21, 2018

SQL | Using the SQL ‘Random’ function to scramble sensitive data

<< Previous | Next >>

A. What is Data Scrambling?

Data scrambling is one of those requirements that only comes up occasionally, but when it does come up, it is considered a ‘big deal’. The system owners have requested that the more confidential data in the system should not be visible in non-Production environments. Therefore, a request is made to ‘scramble’ the data of any key entities, such as employees, students or customers. The scramble routine needs to update each row with a randomly selected piece of data from another row. As a result, a developer or a systems tester can no longer identify real people from the scrambled data.

The solution below will perform a data scramble of employee last names. The routine has been written in such a way that other key fields in the table will not matter. In this case, the PS_NAMES table is keyed on ‘Employee ID’, ‘Name Type’ and ‘Effective Date’. However for the script, only the one primary key matters. The script ensures that a single EMPLID will end up with the one name across all name types and effective dates.

Although we are focusing on last name only, the same scrambling routine could be used for any confidential information in the system, such as tax numbers, national identity numbers and birthdates.

 

B. The SQL ‘Random’ Function

By taking advantage of the Oracle SQL ‘random’ function, data scrambling can be a relatively straightforward task. Depending on the size of your database, the entire process need not take any longer than ten minutes per field.

The main driver of the scrambling routine is the Oracle random value function: dbms_random.value(). This returns a random number between 0 and 1 with an incredible 38 decimal places. This can be easily tested with the following statement:

select dbms_random.value()
from dual;

This SQL produces a result as follows:

To use this number to select a random row of data from a table, you can multiple the random number with the count of records in the table:

select round(dbms_random.value() *
            (select count(*) from ps_names))
from dual;

Note the use of the ‘round’ function to provide an integer value:

Finally, we will take advantage of the Oracle ‘rownum’ construct. When selecting data from a table, ‘rownum’ gives you the row number of the current row as part of the results data-set. This provides a quick way to generate a unique ID for each row of data.

 

C. The Data Scrambling Script

Tying all the above together, we can now create a SQL script which will set a field to a random value across the entire table. This examples scrambles the ‘LAST_NAME’ field in the ‘PS_NAMES’ table:

-- create temp table to hold all existing Last Name values
create table temp_scramble as
select rownum as row_seq, last_name
from ps_names;

-- add index on row sequence number
create index temp_scramble1
on temp_scramble(row_seq);

-- create temp table to hold all existing EMPLID values
create table temp_emplid
(emplid varchar(11),
random_num integer,
last_name varchar(30));

-- add index on EMPLID
create index temp_emplid1
on temp_emplid(emplid);

-- insert all current EMPLIDs into the temp table
insert into temp_emplid
select distinct emplid,
0,
' '
from ps_names;

-- update the temp table with a random number
update temp_emplid
set random_num = round(dbms_random.value() * 
                 (select count(*) from temp_scramble));

-- update the Last Name with the matching record
update temp_emplid a
set last_name = (select last_name
                   from temp_scramble
                  where row_seq = a.random_num);

-- update the real table with the randomly-selected Last Name
update ps_names a
set last_name = (select last_name 
                   from temp_emplid t
                  where a.emplid = t.emplid);

-- finally drop the temp tables
drop table temp_scramble;
drop table temp_emplid;

I went ahead and ran this script on a database with approximately 165,000 records in the PS_NAMES table. Execution time for the entire script was about 15 seconds.

See Also:

<< Previous | Next >>

Filed Under: SQL Tagged With: Data Scrambling, Random, Security

Reader Interactions

Primary Sidebar

Categories

  • Administration
  • Application Engine
  • BI Publisher
  • COBOL
  • Data Mover
  • PeopleCode
  • PeopleTools
  • PS Query
  • Security
  • SQL
  • Utilities

Copyright © 2021 Elimbah Consulting Ltd