Monday, November 12, 2012

Big Data and Privacy

I am writing an editorial (which is unusual for me). This was caused by a conversation I had with my Dad about how the world is changing with big data, and how much retiree's (like him) should know.  So here goes.

Dear Dad,

  I know you are involved in college education for Older folks that have a passion to learn.  An interesting topic would be "Big Data".  For someone outside the IT field, I would tell you that Big Data describes the plethora of new data that is generated in today's society.
Where does it come from ??


  • Logs from webservers 
  • Cell phones (including location data).
  • Search data
  • Medical data
  • Machine generated data (like from your computer in your car)
  • Sales data
All this can be tied together in new ways, that most people didn't think were possible years ago.  My favorite example is... You are walking in the mall past past a store, and you get a text on your phone.  The store's computer system has texted you a 30% offer for a new sweater, good for 2 hours.  The store knows ...
  • Your location from phone
  • That you have a ski trip planned for the next week, from your search history, and purchase history
  • That you are looking for a new sweater from your facebook, and twitter posts.
Amazing huh ?

All these things can open up miraculous possibilities. The day may come when your doctor calls you to schedule a preventive test.  Medical history gathered from a large group of people has shown that others with a similar medical history as yours have had a issue that can be tested for and prevented. Wow. really amazing things.

They can predict something that may go wrong on your car given all the information the computer has gathered.

Target (the department store), has even predicted when a woman is pregnant based on purchase history.. 

All these things are amazing possiblities, but they are also scary.

I find this topic very exciting, but I'm sure for a lot of people these ideas are very scary. Where do you cross the line and enter privacy issues.

This is going to be a interesting battle. Who ones all this data ? what data is public, and what data is private ?

But the most interesting thing is what can be done with the data....

Me, I'm an optimist so I can all the good that can come out of all this new data.

At the very least, the students would at least have a better idea what is going on in the world (behind the scenes), and they would understand why Target things "grandma" is pregnant.

Saturday, November 3, 2012

ODI Monitoring scripts

I have included some usefule ODI monitoring scripts (if you want the abridged version of this blog post).

I haven't been blogging in a while (it's been crazy), but I wanted to share some information on ODI.

I have been working on trying to monitor ODI (Oracle Data Integrator).  ODI is a somewhat recent Oracle purchase, and it has a client GUI that is used by the developers.

Me (like many of you), are DBA's, and we want to go into the database to see what is happening.  We eithor don't have access to the GUI, or we don't want access. 

ODI is a great tool used for transforming data.  It can be used to build sql statments that are executed directly in the database.  This makes it a bit different from a tool like Datastage that runs sql remotely.

Here is the first sql I was able to come up with.  It will tell you information about the load plans that have been run. You need to qualify the tables with the owner of the ODI repository..

SELECT SLI.I_LP_INST AS "Load Plan Instance #"
     , SLR.NB_RUN AS "Load Plan Run #"
     , SLI.LOAD_PLAN_NAME AS "Load Plan Name"
     , SLR.CONTEXT_CODE AS "Source System"
     , SLR.STATUS AS "Load Plan Status"
     , SLR.RETURN_CODE AS "error code"
     , CASE WHEN SLR.END_DATE IS NULL 
            THEN TRUNC(ROUND((NVL(SLR.END_DATE , SYSDATE) - SLR.START_DATE)*86400) / 3600) || ':' || 
                 LPAD(TRUNC(MOD(ROUND((NVL(SLR.END_DATE , SYSDATE) - SLR.START_DATE)*86400), 3600) / 60), 2, 0) || ':' || 
                 LPAD(MOD(ROUND((NVL(SLR.END_DATE , SYSDATE) - SLR.START_DATE)*86400), 60), 2, 0)
            ELSE TRUNC(SLR.DURATION / 3600) || ':' || LPAD(TRUNC(MOD(SLR.DURATION, 3600) / 60), 2, 0) || ':' || LPAD(MOD(SLR.DURATION, 60), 2, 0) 
       END AS "Load Time"
     , SLR.START_DATE
     , SLR.END_DATE
     , substr(to_char(SLR.START_DATE,'mm/dd/yy:hh24'),1,11) start_date_hour
  FROM SNP_LP_INST SLI
  JOIN SNP_LPI_RUN SLR ON SLI.I_LP_INST = SLR.I_LP_INST
where 'JRNL_LOAD'=sli.load_plan_name

I was able to include this in an apex report, and it display some of the history.

The next SQL igives you detail of the scenarios, for the load plan. The input is the Load plan



SELECT SLI.Load_plan_name as "Load Plan Name",
      SUBSTR(SLR.CONTEXT_CODE, 9, 5) AS "Source System",
      SLS.LP_STEP_NAME AS "Target Table",
      SLS.scen_name as "scenario name",
     TRUNC(SUM(SSTL.TASK_DUR) / 3600) || ':' ||
             LPAD(TRUNC(MOD(SUM(SSTL.TASK_DUR), 3600) / 60), 2, 0) || 
            ':' || LPAD(MOD(SUM(SSTL.TASK_DUR), 60), 2, 0) AS "Load Time"
     , SST.SESS_NO AS "Session Number"
     , SLSL.start_date as "Start Time"
     , SLSL.End_date as "End Time"
     , sum(sstl.nb_ins) as "Rows Inserted"
     , sum(sstl.nb_upd) as "Rows Updated"
     , sum(sstl.nb_del) as "Rows Deleted"
     , sum(sstl.nb_err) as "Rows Errors"
     , case 
        when (sum(sstl.nb_ins) + sum(sstl.nb_upd)) > 0 then trunc(sum(sstl.task_dur)/(sum(sstl.nb_ins) + sum(sstl.nb_upd)) ,4)
        else 0
       end as "Throughput"
  FROM SNP_LP_INST SLI
  JOIN SNP_LPI_STEP SLS 
                  ON SLI.I_LP_INST = SLS.I_LP_INST
  JOIN SNP_LPI_STEP_LOG SLSL
        ON SLS.I_LP_STEP = SLSL.I_LP_STEP
      AND SLS.I_LP_INST = SLSL.I_LP_INST
  JOIN SNP_SESS_TASK SST
            ON SST.SESS_NO = SLSL.SESS_NO
  JOIN SNP_SESS_TASK_LOG SSTL
            ON SSTL.SCEN_TASK_NO = SST.SCEN_TASK_NO 
           AND SST.SESS_NO = SSTL.SESS_NO
  JOIN SNP_LPI_RUN  SLR 
            on SLI.I_LP_INST = SLR.I_LP_INST
 WHERE (1=1)
  AND SLSL.I_LP_INST = :P6_SCENARIO
   AND SLS.LP_STEP_TYPE = 'RS'
--   AND SLSL.STATUS IN ('M','D')
 GROUP BY SUBSTR(SLR.CONTEXT_CODE, 9, 5),
           SLSL.start_date,SLSL.end_date,SLI.load_plan_name,
          SLS.scen_name,SLS.LP_STEP_NAME, SST.SESS_NO

Finally, this is the last query. This query takes the task number as an input, and will display the detail for all the tasks contained in a scenario.
SELECT SST.TASK_NAME2 AS "Session Name"
     , SST.TASK_NAME3 AS "Task Name"
     , CASE WHEN SSTL.TASK_END IS NULL 
            THEN TRUNC(ROUND((NVL(SSTL.TASK_END , SYSDATE) - SSTL.TASK_BEG)*86400) / 3600) || ':' || 
                 LPAD(TRUNC(MOD(ROUND((NVL(SSTL.TASK_END , SYSDATE) - SSTL.TASK_BEG)*86400), 3600) / 60), 2, 0) || ':' || 
                 LPAD(MOD(ROUND((NVL(SSTL.TASK_END , SYSDATE) - SSTL.TASK_BEG)*86400), 60), 2, 0)
            ELSE TRUNC(TASK_DUR / 3600) || ':' || LPAD(TRUNC(MOD(TASK_DUR, 3600) / 60), 2, 0) || ':' || LPAD(MOD(TASK_DUR, 60), 2, 0) 
       END AS "Load Time"
     , substr(sst.def_context_code,9,5) "Context"
     , SSTL.TASK_BEG AS "Start Time"
     , SSTL.TASK_END AS "End Time"
     , SSTL.NB_DEL AS "Rows Deleted"
     , SSTL.NB_UPD AS "Rows Updated"
     , SSTL.NB_INS AS "Rows Inserted"
     , SSTL.NB_ERR AS "# Of Errors"
     , SST.SESS_NO 
     , sst.scen_task_no
/* UNCOMMENT TO GET SQL EXECUTED FOR THIS STEP */
  FROM SNP_SESS_TASK SST,
      SNP_SESS_TASK_LOGv SSTL 
 WHERE (1=1)
   AND SST.SESS_NO =:P7_TASK   
   AND SSTL.TASK_STATUS IN ('D','M','R')
   AND SSTL.SCEN_TASK_NO = SST.SCEN_TASK_NO 
   AND SST.SESS_NO = SSTL.SESS_NO